how to create a mirror of an instance? - instance

Using Google Compute Engine, how can I create a mirror of an instance? The instance is already created, but I need to create an identical mirror as a backup. Ideally, if something goes wrong in the original instance, the backup should automatically take over.

Take a look at snapshots. You can take a snapshot of your instance and use it to create a new disk to spin up another instance.

And if you're looking for a way to use snapshots as backup, you should take a look at Compute Engine Persistent Disk Backups using Snapshots.

Related

DynamoDB backup and restore using Data pipelines. How long does it take to backup and recover?

I'm planning to use Data pipelines as a backup and recovery tool for our DynamoDB. We will be using amazon's prebuilt pipelines to backup to s3, and use the prebuilt recovery pipeline to recover to a new table in case of a disaster.
This will also serve a dual purpose of data archival for legal and compliance reasons. We have explored snapshots, but this can get quite expensive compared to s3. Does anyone have an estimate on how long it takes to backup a 1TB database? And How long it takes to recover a 1TB database?
I've read amazon docs and it says it can take up to 20 minutes to restore from a snapshot but no mention of how long for a data pipeline. Does anyone have any clues?
Does the newly released feature of exporting from DynamoDB to S3 do what you want for your use case? To use this feature, you must have continuous backups enabled though. Perhaps that will give you the short term backup you need?
It would be interesting to know why you're not planning to use the built-in backup mechanism. It offers point in time recovery and it is highly predictable in terms of cost and performance.
The Data Pipelines backup is unpredictable, will very likely cost more and operationally it is much less reliable. Plus getting a consistent snapshot (ie point in time) requires stopping the world. Speaking from experience, I don't recommend using Data Pipelines for backing up DynamoDB tables!
Regarding how long it takes to take a backup, that depends on a number of factors but mostly on the size of the table and the provisioned capacity you're willing to throw at it, as well as the size of the EMR cluster you're willing to work with. So, it could take anywhere from a minute to several hours.
Restoring time also depends on pretty much the same variables: provisioned capacity and total size. And it can also take anywhere from a minute to many hours.
Point in time backups offer consistent, predictable and most importantly reliable performance regardless of the size of the table: use that!
And if you're just interested in dumping the data from the table (i.e not necessarily the restore part) use the new export to S3.

On a google compute engine (GCE), where are snapshots stored?

I've made two snapshots using the GCE console. I can see them there on the console but cannot find them on my disks. Where are they stored? If something should corrupt one of my persistent disk, will the snapshots still be available? If they're not stored on the persistent disk, will I be charged extra for snapshot storage?
GCE has added a new level of abstraction. The disks were separated from the VM instance. This allows you to attach a disk to several instances or restore snapshots to another VMs.
In case your VM or disk become corrupt, the snapshots are safely stored elsewhere. As for additional costs - keep in mind that snapshots store only files that changed since the last snapshot. Therefore the space needed for 7 snapshots is often not more than 30% more space than one snapshot. You will be charged for the space they use, but the costs are quite low from what i observed (i was charged 0.09$ for 3.5 GB snapshot during one month).
The snapshots are stored separately on Google's servers, but are not attached to or part of your VM. You can create a new disk from an existing snapshot, but Google manages the internal storage and format of the snapshots.

Jackrabbit repository incremental backup

I'm using Jackrabbit v2.2.x. I want to know if is there a way to take incremental backup of a jackrabbit repository? I mean, just the delta (difference) based on date or something else. Actually the problem is that the repository size is in TeraBytes and every time we have to take production data it takes a lot of time to copy full repository.
If the storage backend support incremental backups, an incremental low level backup might be the easiest solution.
If not, possibly you could use the EventJournal to iterate over the changes since the last backup, and just backup those changes. Most likely this will require more work however.
Another solution is to do an incremental backup of the data store (if this is what uses most of the disk space), and do a full backup of the node data (persistence managers).

Cloning an Amazon Linux Instance

I currently have an amazon instance (Medium - High CPU) running off the instance store with most of my data and code sitting in /mnt mounted to sda2. The instance is just the way i need it to work. How can I clone this instance and make an exact copy (data and all) to another (preferably cheaper, micro) instance for testing my new code changes? Also what backup suggestions are recommend for this setup?
Thanks
Be careful with instance store, your instance if terminated will restore your data. I suggest you put the important data to an EBS volumes.
Please see my post http://www.capsunlock.net/2009/12/create-ebs-boot-ami.html
It's possible to clone the current instance and make an EBS backed AMI.

How to transfer an image to an Amazon EBS volume for EC2 usage?

I have a local filesystem image that I want to transfer to an Amazon EBS volume and boot as an EC2 micro instance. The instance should have the EBS volume as it's root filesystem - and I will be booting the instance with the Amazon PV-GRUB "kernels".
I have used ec2-bundle-image to create a bundle from the image, and I have used ec2-upload-bundle to upload the bundle to Amazon S3. However, now when I'd like to use ec2-register to register the image for usage, I can't seem to find a way to make the uploaded bundle be the ebs root image. It would seem that it requires an EBS snapshot to make the root device, and I have no idea how I would convert the bundle in to an EBS snapshot.
I do realize, that I could probably do this by starting a "common" instance, attaching an EBS volume to it and then just using 'scp' or something to transfer the image directly to the EBS volume - but is this really the only way? Also, I have no desire to use EBS snapshots as such, I'd rather have none - can I create a micro instance with just the EBS volume as root, without an EBS snapshot?
Did not find any way to do this :(
So, I created a new instance, attached a newly created EBS volume to, attached it to the instance, and transferred the data via ssh.
Then, to be able to boot the volume, I still need to create a snapshot of it and then create an AMI that uses the snapshot - and as a result, I get another EBS volume that is created from the snapshot and is the running instance's root volume.
Now, if I want to minimize expences, I can remove the created snapshot, and the original EBS volume.
NOTE: If the only copy of the EBS volume is the root volume of an instance, it may be deleted when the instance is terminated. This setting can be changed with the command-line tools - or the instance may simple by "stopped" instead of "terminated", and then a snapshot can be generated from the EBS volume. After taking a snapshot, the instance can ofcourse be terminated.
Yes, there is no way to upload an EBS Image via S3, and using a Instance where you attach an additional volume is the best way. If you attach that Volume after the instance is started, it will also not be deleted.
Note, do not worry too much about Volume->snapshot->Volume, as those share the same data blocks (as long as you dont modify them). The storage cost ist not trippled, only 1,1times one Volume. EBS snapshots and image creation is quite handy in that regard. Dont hesitate to use multiple snapshots. The less you "work" in a snapshot the smaller is its block usage later on if you start it as an AMI.