How to make a daily back up of my ec2 instance? [closed] - amazon-s3

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have a community AMI based Linux EC2 instance in AWS. Now I want to take a daily back up of my instance, and upload that image in to S3.
Is that the correct way of doing the back up of my EC2 instance? Can anybody help me to point out the correct method for taking back up of my EC2 instance?

Hopefully your instance is EBS backed.
If so, you can backup your instance by taking an EBS Snapshot. That can be done through aws.amazon.com (manually), using AWS Command Line Tools (which can be automated and scheduled in cron or Windows Task Scheduler as appropriate) or through the AWS API.
You want to ensure that no changes are made to the state of the database backup files during the snapshot process. When I used this strategy for MySQL running on Ubuntu, I used a script to ensure a consistent snapshot. That script uses a feature of the XFS file system to freeze the filesystem during the snapshot. In that deployment, the snapshot only took 2-3 seconds and was performed at a very off-peak time. Any website visitors would experience a 2-3 second lag. For Windows, if the device can not be rebooted for the snapshot (you have no maintenance window at night), I would instead create a separate EBS device (e.g. a "S:\" device for snapshots), use SQL Server backup tools to create a .bak file on that other device, then create an EBS snapshot of that separate EBS device.
For details on scripting the backup, see this related question:
Automating Amazon EBS snapshots anyone have a good script or solution for this on linux
If you have separate storage mounted e.g. for your database, be sure you back that up too!
UPDATE
To create a snapshot manually,
Browse to https://console.aws.amazon.com/ec2/home?#s=Volumes
Right-click on the volume you want to backup (the instance the volume is attached to is in the column named 'Attachment Information')
Select Create Snapshot
To create an AMI image from the instance and lauch other instances just like it (in instances with more resources or to balance load, etc.):
Browse to https://console.aws.amazon.com/ec2/home?#s=Instances
Right-click on the instance you want to create the AMI from
Select Create Image (EBS AMI)

Related

Best practice for storing "permanent" variables consumed by applications

I have an application that needs to store a "last_updated_at" variable from a dataset that it obtains from an API. So that in the next job it takes that "last_updated_at" and starts looking only at data after that "last_updated_at" as it retrieves data from other API's. At the end of its execution, it refreshes the "last_updated_at" and saves it. Then a job will come in tomorrow and will start all over again with that "last_updated_at" stored value.
The question is, how is best to save that variable, what's the best practice on where to save it (and retrieve it next time)?
This application comes from a github repo, I built a container from it and have the container at AWS, on every push to the repo a new container will be built. We often update that repo->build the new container -> Pull image on machines.
So with that context, where's the best place to save that "last_updated_at" that needs to be consumed and updated on every execution. There will only be 1 machine with the container and running it, no more machines will have the container. So what's best considering we constantly update that repo and this is a prod environment?
In a csv or txt in the machine running the job
In some cloud like S3
As a OS environment variable on the machine?
As a environment variable on the container running this?
In a github file in a folder of parameters?
In a csv or txt in the container of the machine running the job?
Any other way?
Lastly, should the answer depend whether there's only 1 machine installing the container or more than 1 but only 1 is running at a given time?

Query about EBS Backed Instances + Backup on S3 + Snapshots

I've spent a number of days looking into putting up two Windows Servers on Amazon, a domain controller and a remote desktop services server but there are a few questions that I can't find detailed or any answers for:
1) When you have an EBS backed instance I assume this means that all files (OS/Applications/Pagefile) etc are all stored on EBS? Physically in the datacentre, lets assume I have 50 gig of OS files/application data etc, are these all stored on just one SAN type device? What happens if that device blows up or say that particular data centre gets destroyed. Is the data elsewhere? What is the probability that your entire EBS volume can just disappear?
2) As I understand it you can backup your EBS instance to S3 with snapshotting. I assume you can choose how often to snapshot (say daily?). In my above scenario if I have 50 gig of files, and snapshot once a day. Over 7 days will my S3 storage be 350 gig or will it be 50 gig + incremental changes I have made over the week?
3) I remember reading somewhere that the instance has to go offline to snapshot. If that is the case does it do this by shutting down the guest OS, snapshotting then booting up or does it just detach the data, prevent you from connecting while it snapshots, then bring it back to the exact moment before it went for a snapshot.
4) I understand the concept of paying per month per gig of space but how I am concerned about the $0.11 per 1 million I/O requests. How does that work when I am running a windows server? I have no idea how many I/O requests a server makes to its disks. I am assuming a lot of the entire VM is being stored on an EBS volume. Is running a server on the standard EBS going to slow it down radically?
5) Are people using the snapshot to S3 as their main backup are are people running other types of backup for Data?
Sorry for the noob questions - I'd appreciate any partial answers, answers or advice anyone could offer me. Thanks in advance!
1) amazon is fuzzy on this. They say that data is replicated within the AZ it belongs to and that if you have less than 20GB of data changed since the last snapshot your annual failure rate is ~ 0.1-0.4%
2) snapshots are triggered manually, and are done incrementally
3) Depends on your filesystem. For example on a linux box with an xfs volume you can freeze IO to the volume, do your snapshot (takes only a second or so) and then unfreeze. If you take a snapshot without doing something similar you run the risk of the data being in an inconsistent state. This will depend on your filesystem
4) I run all my instances on EBS. You probably wouldn't want your pagefile on EBS, it would make more sense to use instance storage for that. The amount of IOs you use will be very dependant on the workload. The IO count depends heavily on your workload - an application server does a lot less IOPs than a database server for example. You're unlikely to use more than a few dollars a month per volume if you're running particularly IO heavy operations
5) Personally I don't care about the installed software/configuration (I have AMIs with that all setup so I can restore that in minutes), I only care about the data. I back that data up separately (S3 & Glacier). Partly that's because I was bitten by a bug EBS had about a year ago or so where they lost some snapshots
You also use multiple strategies, as Fantius commented. For example on the mongodb servers I run the boot volume is small (and never snapshotted or backed up since it can be restored automatically from an AMI), with a separate data volume containing the actual mongodb data. The mongodb volume is snapshotted as well as storing dumps on S3. Snapshots are an efficient way of creating backups (since you're only storing incremental changes) however you can't transfer them out of your EC2 region, whereas a tarball on S3 can easily be copied anywhere.

EC2 snapshots vs. bundled instances

It's my understanding that EC2 snapshots are incremental in nature, so snapshot B contains only the difference between itself and snapshot A. Then if you delete snapshot A, the difference is allocated to snapshot B in Amazon S3 so that you have a complete snapshot. This also leads me to believe that it isn't prohibitively expensive to have daily snapshots A-Z for example, that it in storage cost it is basically the same as one snapshot.
What I really want is to back up my snapshots to a bucket in Amazon S3, so that if an entire EC2 region is having some problems --ahem cough, cough-- the snapshot can be moved into another region and launched as a backup instance in a new region.
However, it seems you can only bundle an instance and then upload a bundled instance to S3, not a snapshot.
The bundle is the entire instance correct? If this is the case then are historical bundled instances significantly more costly in practice than snapshots?
I use an instance store AMI and store my changing data on EBS volumes using the XFS filesystem. This means I can freeze the filesystems, create a snapshot and unfreeze them.
My volumes are 1GB (although mostly empty) and the storage cost is minuscule.
I don't know how an EBS backed AMI would work with this but I can't see why it would be any different. Note, however,that you need to bundle an instance in order to start it. Perhaps you could just snapshot everything as a backup and only bundle them when required.

Backup strategy for user uploaded files on Amazon S3? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
We're switching from storing all user uploaded files on our servers, to using Amazon S3. It's approx. 300 GB of files.
What is the best way to keep an backup of all files? I've seen a few different suggestions:
Copy bucket to a bucket in a different S3 location
Versioning
Backup to an EBS with EC2
Pros/cons? Best practice?
What is the best way to keep an backup of all files?
In theory, you don't need to. S3 has never lost a single bit in all these years. Your data is already stored in multiple data centers.
If you're really worried about accidentally deleting the files, use IAM keys. For each IAM user, disable the delete operation. And/or turn on versioning and remove the ability for an IAM user to do the real deletes.
If you still want a backup, EBS or S3 is pretty trivial to implement: Just run an S3 Sync utility to sync between buckets or to the EBS drive. (There are a lot of them, and it's trivial to write.) Note that you pay for unused space on your EBS drive, so it's probably more expensive if you're growing. I wouldn't use EBS unless you really had a use for local access to the files.
The upside of the S3 bucket sync is you can quickly switch your app to using the other bucket.
You could also use Glacier to backup your files, but that has some severe limitations.
IMHO, backup to another S3 bucket in another Availability Zone (hence Bucket) is the best way to go:
You already have the infrastructure to manipulate S3 so there is little change to do
This will ensure that in the event of a catastrophic failure of S3, your backup AZ won't be affected
Other solutions have drawbacks this doesn't have:
Versioning is not catastrophic failure proof
EBS backup requires specific implementation to manipulate these backups directly on the disk.
I didn't try it myself but Amazon have a versioning feature that could solve your backup fears - see: http://aws.amazon.com/about-aws/whats-new/2010/02/08/versioning-feature-for-amazon-s3-now-available/
Copy bucket to a bucket in a different S3 location:
This may not be necessary because S3 already has achieved six "9" reliable by redundancy backup. People who want to achieve data accessing performance globally might make copy of buckets in different data center. So, unless you want to avoid some unlikely disaster like "911", then you can make a copy in Tokyo data center for buckets in New York.
However, within same data center, copying buckets to different buckets gives you very little help when disaster happens to same data center.
Versioning
It helps you achieve storage efficiency by saving redundancy and helps to restore faster. Definitely it is a good choice.
Backup to an EBS with EC2
You probably will NEVER do this because EBS is a much expensive/faster storage in AWS compared with S3. And its main purpose is for backup EC2 image for faster bootup. EC2 is computing instance which has nothing to do with storage or S3. It is totally irrelevant and I cannot see any point that you introduce EC2 to your data backup.

How to transfer an image to an Amazon EBS volume for EC2 usage?

I have a local filesystem image that I want to transfer to an Amazon EBS volume and boot as an EC2 micro instance. The instance should have the EBS volume as it's root filesystem - and I will be booting the instance with the Amazon PV-GRUB "kernels".
I have used ec2-bundle-image to create a bundle from the image, and I have used ec2-upload-bundle to upload the bundle to Amazon S3. However, now when I'd like to use ec2-register to register the image for usage, I can't seem to find a way to make the uploaded bundle be the ebs root image. It would seem that it requires an EBS snapshot to make the root device, and I have no idea how I would convert the bundle in to an EBS snapshot.
I do realize, that I could probably do this by starting a "common" instance, attaching an EBS volume to it and then just using 'scp' or something to transfer the image directly to the EBS volume - but is this really the only way? Also, I have no desire to use EBS snapshots as such, I'd rather have none - can I create a micro instance with just the EBS volume as root, without an EBS snapshot?
Did not find any way to do this :(
So, I created a new instance, attached a newly created EBS volume to, attached it to the instance, and transferred the data via ssh.
Then, to be able to boot the volume, I still need to create a snapshot of it and then create an AMI that uses the snapshot - and as a result, I get another EBS volume that is created from the snapshot and is the running instance's root volume.
Now, if I want to minimize expences, I can remove the created snapshot, and the original EBS volume.
NOTE: If the only copy of the EBS volume is the root volume of an instance, it may be deleted when the instance is terminated. This setting can be changed with the command-line tools - or the instance may simple by "stopped" instead of "terminated", and then a snapshot can be generated from the EBS volume. After taking a snapshot, the instance can ofcourse be terminated.
Yes, there is no way to upload an EBS Image via S3, and using a Instance where you attach an additional volume is the best way. If you attach that Volume after the instance is started, it will also not be deleted.
Note, do not worry too much about Volume->snapshot->Volume, as those share the same data blocks (as long as you dont modify them). The storage cost ist not trippled, only 1,1times one Volume. EBS snapshots and image creation is quite handy in that regard. Dont hesitate to use multiple snapshots. The less you "work" in a snapshot the smaller is its block usage later on if you start it as an AMI.