Confusion over AWS EBS Snapshot - amazon-ebs

I'm confused with the incremental backup mechanism of EBS snapshot. Suppose If i start taking EBS volume backup say at 8 AM. During backing up if there is any activity/changes being performed on the volume, does it also gets backed up or it will backup only the changes till 8AM.
Thumb rule is that the volume should have less activity or no activity while taking backup. Any suggestions on this?

in-progress snapshot is not affected by ongoing reads and writes to the volume.It will contain changes only till 8AM.
Refer http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-snapshot.html

Related

Two Oracle RMAN Backup between an EOB

I need to take two backups before and after the day end process. If the EOD process starts at 10.00 p.m. The backup should contain all the data right at 10.00 p.m. before starting the EOD and the backup process should not impact the EOD process as well. Is there a way to achieve this?
Please note that I need to retrieve RMAN backups for disk and then tape.
A "snapshot" type of backup would only be necessary if you are running in NOARCHIVELOG mode, and you'd have to shutdown the entire database to do it as a "cold" backup (you can't get a logically consistent backup without transaction logs while the database is open for read/write activity). This would presumably impact your end-of-day process.
Assuming that the database is in ARCHIVELOG mode, and that you can run your backup as a "hot" backup while the database is up and running, you do not need to worry about the timing of your backup at all.
Run a backup whenever it makes sense based on system load or activity (being sure to backup the archive logs too), and if you need to recover from a backup later then recover to the exact point in time that you need - before or after your end-of-day process. See the documentation for Point in Time Recovery options: https://docs.oracle.com/en/database/oracle/oracle-database/19/bradv/rman-performing-flashback-dbpitr.html
The restore and recovery operation will restore from the backup and then re-apply all transactions to bring the database back to the desired point in time. The only thing the timing of the backup job would affect would be the number of transactions that might need to be re-applied after the data files are restored.

On a google compute engine (GCE), where are snapshots stored?

I've made two snapshots using the GCE console. I can see them there on the console but cannot find them on my disks. Where are they stored? If something should corrupt one of my persistent disk, will the snapshots still be available? If they're not stored on the persistent disk, will I be charged extra for snapshot storage?
GCE has added a new level of abstraction. The disks were separated from the VM instance. This allows you to attach a disk to several instances or restore snapshots to another VMs.
In case your VM or disk become corrupt, the snapshots are safely stored elsewhere. As for additional costs - keep in mind that snapshots store only files that changed since the last snapshot. Therefore the space needed for 7 snapshots is often not more than 30% more space than one snapshot. You will be charged for the space they use, but the costs are quite low from what i observed (i was charged 0.09$ for 3.5 GB snapshot during one month).
The snapshots are stored separately on Google's servers, but are not attached to or part of your VM. You can create a new disk from an existing snapshot, but Google manages the internal storage and format of the snapshots.

SQL Server Snapshot on live DB

I am in need of some help.I want to reinitialize one of my subscribers with a new snapshot.my previous snapshot i generated was when activity was low on the production database.It took under 2 minutes.
My question is,can i generate a new snapshot during the day when the applications are using the database live? would it lock my tables badly to where transaction wont be followed to the database?
Yes, you can create initial snapshot during the day.
With transactional replication Snapshot Agent takes locks only during the initial phase of snapshot generation.
Performance impact of this operation depends on current workload of your system. So, if previous snapshot generation took 2 minutes, I think I will take not much more time.

Query about EBS Backed Instances + Backup on S3 + Snapshots

I've spent a number of days looking into putting up two Windows Servers on Amazon, a domain controller and a remote desktop services server but there are a few questions that I can't find detailed or any answers for:
1) When you have an EBS backed instance I assume this means that all files (OS/Applications/Pagefile) etc are all stored on EBS? Physically in the datacentre, lets assume I have 50 gig of OS files/application data etc, are these all stored on just one SAN type device? What happens if that device blows up or say that particular data centre gets destroyed. Is the data elsewhere? What is the probability that your entire EBS volume can just disappear?
2) As I understand it you can backup your EBS instance to S3 with snapshotting. I assume you can choose how often to snapshot (say daily?). In my above scenario if I have 50 gig of files, and snapshot once a day. Over 7 days will my S3 storage be 350 gig or will it be 50 gig + incremental changes I have made over the week?
3) I remember reading somewhere that the instance has to go offline to snapshot. If that is the case does it do this by shutting down the guest OS, snapshotting then booting up or does it just detach the data, prevent you from connecting while it snapshots, then bring it back to the exact moment before it went for a snapshot.
4) I understand the concept of paying per month per gig of space but how I am concerned about the $0.11 per 1 million I/O requests. How does that work when I am running a windows server? I have no idea how many I/O requests a server makes to its disks. I am assuming a lot of the entire VM is being stored on an EBS volume. Is running a server on the standard EBS going to slow it down radically?
5) Are people using the snapshot to S3 as their main backup are are people running other types of backup for Data?
Sorry for the noob questions - I'd appreciate any partial answers, answers or advice anyone could offer me. Thanks in advance!
1) amazon is fuzzy on this. They say that data is replicated within the AZ it belongs to and that if you have less than 20GB of data changed since the last snapshot your annual failure rate is ~ 0.1-0.4%
2) snapshots are triggered manually, and are done incrementally
3) Depends on your filesystem. For example on a linux box with an xfs volume you can freeze IO to the volume, do your snapshot (takes only a second or so) and then unfreeze. If you take a snapshot without doing something similar you run the risk of the data being in an inconsistent state. This will depend on your filesystem
4) I run all my instances on EBS. You probably wouldn't want your pagefile on EBS, it would make more sense to use instance storage for that. The amount of IOs you use will be very dependant on the workload. The IO count depends heavily on your workload - an application server does a lot less IOPs than a database server for example. You're unlikely to use more than a few dollars a month per volume if you're running particularly IO heavy operations
5) Personally I don't care about the installed software/configuration (I have AMIs with that all setup so I can restore that in minutes), I only care about the data. I back that data up separately (S3 & Glacier). Partly that's because I was bitten by a bug EBS had about a year ago or so where they lost some snapshots
You also use multiple strategies, as Fantius commented. For example on the mongodb servers I run the boot volume is small (and never snapshotted or backed up since it can be restored automatically from an AMI), with a separate data volume containing the actual mongodb data. The mongodb volume is snapshotted as well as storing dumps on S3. Snapshots are an efficient way of creating backups (since you're only storing incremental changes) however you can't transfer them out of your EC2 region, whereas a tarball on S3 can easily be copied anywhere.

How to back up a huge file with Bacula?

It currently is 700MB but it's conceivable that it'll grow beyond the 1GB. Normally I just copy this file to another location (for the curious, it's the database of a Zope instance,
a ZODB file).
This file changes little from day to day, but I understand Bacula can't do inside-the-file subdivision for incremental backups. Anyway, it doesn't matter. What I want to do is a full backup daily and keep two of them and a full backup weekly and also keep two of them. So
at any given time I can get yesterday, the day before yesterday, a week
ago and two weeks ago. Would you think that's a good idea?
I suppose I should make two schedules, daily and weekly. But which numbers should I have on the volumes and the pools to achieve this? Two volumes of 1.5GB? Any hints or guidance is welcome, I'm not a sysadmin and my experience with Bacula is very limited.
Online backup of a large database file is risky business, as the file might change while you are reading it, rendering the backup inconsistent and possibly useless. I believe you should not be making backups of the ZODB file itself, but rather of diffs created daily by the repozo tool. This way, you also outsource the job of handling the inside-the-file subdivisions that you say Bacula is incapable of dealing with.
In my experience with bacula and backup to disk, it is best to keep one volume per backup job. That way there is no dead space in the files as jobs expire. Bacula can reuse the whole volume and it cuts down on disk utilization. Use the "Set Maximum Volume Jobs = 1" directive in the pool resource.
I would set up two pools, a daily and weekly. Set the volume retention to two days in the daily and two weeks in the weekly. Schedule the daily on say, mon-sat, and the weekly on sunday.
Depending on your infrastructure, I would recommend taking a snapshot of the volume you are backing up to "freeze" it and make the backup from there.
For some of our backups we are using LVM snapshots (http://tldp.org/HOWTO/LVM-HOWTO/snapshots_backup.html), to avoid locking any of our databases (we have terabytes of data to backup and a lock would have a huge impact on the service)
Then, as you said that the database is not moving too much, I would go and have a 6 days retention period, 6 volumes for Dailies and 2 volumes for weeklies. Your Dailies should hit the Incremental backup pool, and the Weeklies should be the fulls.
For example, make the Weeklies (Fulls) run on Monday and then an incremental every day (Tue-Sun). This will allow you to come back any day of the week if you realize your data is corrupted, without taking too much space or time during the backup.
EDIT: And... I should check the post dates before answering. Haha. 3 Years late.
For the open source bacula (bacula.org) the best idea is to use the "Set Maximum Volume Jobs = 1" directive indeed.
If you want the "inside-the-file subdivision for incremental backups", please consider the Delta Plugin from Bacula Systems - https://www.baculasystems.com/products/bacula-enterprise-plugins/delta.