On a google compute engine (GCE), where are snapshots stored? - backup

I've made two snapshots using the GCE console. I can see them there on the console but cannot find them on my disks. Where are they stored? If something should corrupt one of my persistent disk, will the snapshots still be available? If they're not stored on the persistent disk, will I be charged extra for snapshot storage?

GCE has added a new level of abstraction. The disks were separated from the VM instance. This allows you to attach a disk to several instances or restore snapshots to another VMs.
In case your VM or disk become corrupt, the snapshots are safely stored elsewhere. As for additional costs - keep in mind that snapshots store only files that changed since the last snapshot. Therefore the space needed for 7 snapshots is often not more than 30% more space than one snapshot. You will be charged for the space they use, but the costs are quite low from what i observed (i was charged 0.09$ for 3.5 GB snapshot during one month).

The snapshots are stored separately on Google's servers, but are not attached to or part of your VM. You can create a new disk from an existing snapshot, but Google manages the internal storage and format of the snapshots.

Related

Foundry Data Storage Optimization

Hi I have a general question about pipelines optimization in order to lower storage space.
Does deleting trashed datasets help alleviate disk storage? Ex. Remove obsolete datasets: a.) based on business knowledge and utilization and b.) datasets in the trash.
Also, We'd like to manage the copies of datasets that are stored when a schedule runs. We believe that if we ever had to fall back to a previous version, we only need to reference the latest one, as opposed to keeping multiple copies.
Does this affect storage? And is there a way to manage configuration on this?
Deleting trashed datasets (in typical setups) will result in their underlying files being deleted, but typically a larger driver of storage consumption is the set of previous dataset views kept by default.
You can control the length of time these files and views are kept using the Foundry Retention service. I'd recommend you consult with platform documentation and your support team for configuration of this service.
Retention will compute and mark files matching your configuration for deletion and periodically delete them, thus reducing your storage consumption.

GCE snapshot - no system state saved?

Being relatively new to GCE, but not to other virtualization tools like VmWare or VirtuaBox, I'm not able to find in GCE a concrete way to get a full snapshot of a live machine.
I'm guessing it's my fault or poor knowledge, but really GCE doesn't saves the "system state", or else dumps memory to snapshot?
I'd found many scripts and examples on how to flush buffers to disks before I create the snapshot, but no way to obtain a complete state of the machine, including what the machine itself is running at THAT point.
Let me say that, if this is correct, the GCE snapshot IS NOT a snapshot.
Thanks in advance for your help.
That's a VM image, not a snapshot, and it does not include the contents of RAM or the processor state. A snapshot is a point-in-time copy of a persistent disk.
[link] (http://vcloud.vmware.com/uk/using-vcloud-air/tutorials/working-with-snapshots)
Here's an example of a cloud platform saving true snapshots, portraits of a specific second of a working machine.
Let me add a thought:
I don't know if VCloud is considering a particular state, gains privileged access to disks for a limited time, avoiding contingency, or else does a temporary duplication of the working disk in another volume.
I'm still reading around, trying to get INTO the problem.
BUT... it dumps memory to snapshot.
This is the point, and I'm wondering why this seems to be not possible in GCE.

Query about EBS Backed Instances + Backup on S3 + Snapshots

I've spent a number of days looking into putting up two Windows Servers on Amazon, a domain controller and a remote desktop services server but there are a few questions that I can't find detailed or any answers for:
1) When you have an EBS backed instance I assume this means that all files (OS/Applications/Pagefile) etc are all stored on EBS? Physically in the datacentre, lets assume I have 50 gig of OS files/application data etc, are these all stored on just one SAN type device? What happens if that device blows up or say that particular data centre gets destroyed. Is the data elsewhere? What is the probability that your entire EBS volume can just disappear?
2) As I understand it you can backup your EBS instance to S3 with snapshotting. I assume you can choose how often to snapshot (say daily?). In my above scenario if I have 50 gig of files, and snapshot once a day. Over 7 days will my S3 storage be 350 gig or will it be 50 gig + incremental changes I have made over the week?
3) I remember reading somewhere that the instance has to go offline to snapshot. If that is the case does it do this by shutting down the guest OS, snapshotting then booting up or does it just detach the data, prevent you from connecting while it snapshots, then bring it back to the exact moment before it went for a snapshot.
4) I understand the concept of paying per month per gig of space but how I am concerned about the $0.11 per 1 million I/O requests. How does that work when I am running a windows server? I have no idea how many I/O requests a server makes to its disks. I am assuming a lot of the entire VM is being stored on an EBS volume. Is running a server on the standard EBS going to slow it down radically?
5) Are people using the snapshot to S3 as their main backup are are people running other types of backup for Data?
Sorry for the noob questions - I'd appreciate any partial answers, answers or advice anyone could offer me. Thanks in advance!
1) amazon is fuzzy on this. They say that data is replicated within the AZ it belongs to and that if you have less than 20GB of data changed since the last snapshot your annual failure rate is ~ 0.1-0.4%
2) snapshots are triggered manually, and are done incrementally
3) Depends on your filesystem. For example on a linux box with an xfs volume you can freeze IO to the volume, do your snapshot (takes only a second or so) and then unfreeze. If you take a snapshot without doing something similar you run the risk of the data being in an inconsistent state. This will depend on your filesystem
4) I run all my instances on EBS. You probably wouldn't want your pagefile on EBS, it would make more sense to use instance storage for that. The amount of IOs you use will be very dependant on the workload. The IO count depends heavily on your workload - an application server does a lot less IOPs than a database server for example. You're unlikely to use more than a few dollars a month per volume if you're running particularly IO heavy operations
5) Personally I don't care about the installed software/configuration (I have AMIs with that all setup so I can restore that in minutes), I only care about the data. I back that data up separately (S3 & Glacier). Partly that's because I was bitten by a bug EBS had about a year ago or so where they lost some snapshots
You also use multiple strategies, as Fantius commented. For example on the mongodb servers I run the boot volume is small (and never snapshotted or backed up since it can be restored automatically from an AMI), with a separate data volume containing the actual mongodb data. The mongodb volume is snapshotted as well as storing dumps on S3. Snapshots are an efficient way of creating backups (since you're only storing incremental changes) however you can't transfer them out of your EC2 region, whereas a tarball on S3 can easily be copied anywhere.

EBS Volume from Ubuntu to RedHat

I would like to use an EBS volume with data on it that I've been working with in an Ubuntu AMI in a RedHat 6 AMI. The issue I'm having is that RedHat says that the volume does not have a valid partition table. This is the fdisk output for the unmounted volume.
Disk /dev/xvdk: 901.9 GB, 901875499008 bytes
255 heads, 63 sectors/track, 109646 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/xvdk doesn't contain a valid partition table
Interestingly, the volume isn't actually 901.9 GB but 300 GB.. I don't know if that means anything. I am very concerned about possibly erasing the data in the volume by accident. Can anyone give me some pointers for formatting the volume for RedHat without deleting its contents?
I also just checked that the volume works in my Ubuntu instance and it definitely does.
I'm not able to advise on the partition issue as such, other than stating that you definitely neither need nor want to format it, because formatting is indeed a (potentially) destructive operation. My best guess would be that RedHat isn't able to identify the file system currently in use on the EBS volume, which must be advertized by some means accordingly.
However, to ease with experimenting and gain some peace of mind, you should get acquainted with one of the major Amazon EBS features, namely to create point-in-time snapshots of volumes, which are persisted to Amazon S3:
These snapshots can be used as the starting point for new Amazon EBS
volumes, and protect data for long-term durability. The same snapshot
can be used to instantiate as many volumes as you wish.
This is detailed further down in section Amazon EBS Snapshots:
Snapshots can also be used to instantiate multiple new volumes, expand
the size of a volume or move volumes across Availability Zones. When a
new volume is created, there is the option to create it based on an
existing Amazon S3 snapshot. In that scenario, the new volume begins
as an exact replica of the original volume. [...] [emphasis mine]
Therefore you can (and actually should) always start experiments or configuration changes like the one you are about to perform by at least snapshotting the volume (which will allow you to create a new one from that point in time in case things go bad) or creating a new volume from that snapshot immediately for the specific task at hand.
You can create snapshots and new volumes from snapshots via the AWS Management Console, as usual there are respective APIs available as well for automation purposes (see API and Command Overview) - see Creating an Amazon EBS Snapshot for details.
Good luck!.

EC2 snapshots vs. bundled instances

It's my understanding that EC2 snapshots are incremental in nature, so snapshot B contains only the difference between itself and snapshot A. Then if you delete snapshot A, the difference is allocated to snapshot B in Amazon S3 so that you have a complete snapshot. This also leads me to believe that it isn't prohibitively expensive to have daily snapshots A-Z for example, that it in storage cost it is basically the same as one snapshot.
What I really want is to back up my snapshots to a bucket in Amazon S3, so that if an entire EC2 region is having some problems --ahem cough, cough-- the snapshot can be moved into another region and launched as a backup instance in a new region.
However, it seems you can only bundle an instance and then upload a bundled instance to S3, not a snapshot.
The bundle is the entire instance correct? If this is the case then are historical bundled instances significantly more costly in practice than snapshots?
I use an instance store AMI and store my changing data on EBS volumes using the XFS filesystem. This means I can freeze the filesystems, create a snapshot and unfreeze them.
My volumes are 1GB (although mostly empty) and the storage cost is minuscule.
I don't know how an EBS backed AMI would work with this but I can't see why it would be any different. Note, however,that you need to bundle an instance in order to start it. Perhaps you could just snapshot everything as a backup and only bundle them when required.