What is better to use, EC2 instances for EBS or Amazon S3 for subscription based streaming channel like Netflix.
150GB upload per month, 250GB streaming per month, no peak time, with viewers based around Australia, India, North America, Europe, Brazil
and 80TB of storage that needs to migrate to the cloud?
For scalability and worldwide presence, the definite answer (using only AWS services) is:
Store videos on Amazon S3
Serve videos through Amazon CloudFront
Amazon CloudFront has presence in 70+ locations around the world and will handle the video streaming protocols for you. Mark content as private and have your application determine whether users are entitled to view videos. You can then generate pre-signed URLs that permit access to a given video for a limited period of time. See: Serving Private Content through CloudFront
In comparison, using Amazon EC2 + Amazon EBS is a poor choice because:
You would need to scale-out additional instances based upon your load
You would need to run instances in multiple regions to be closer to your users (hence lower latency)
You would need to replicate all videos to every server rather than storing a single copy of each video
Please note that your largest cost will be Data Transfer (see Amazon CloudFront Pricing. Your quoted figure of "250GB streaming per month" seems extremely low -- my family alone uses that much bandwidth per month!
How fast can we download files from Amazon S3, is there an upper limit (and they distribute it between all the requests from the same user), or does it only depend on my internet connection download speed? I couldn't find it in their SLA.
What other factors does it depend on? Do they throttle the data transfer rate at some level to prevent abuse?
This has been addressed in the recent Amazon S3 team post Amazon S3 Performance Tips & Tricks:
First: for smaller workloads (<50 total requests per second), none of
the below applies, no matter how many total objects one has! S3 has a
bunch of automated agents that work behind the scenes, smoothing out
load all over the system, to ensure the myriad diverse workloads all
share the resources of S3 fairly and snappily. Even workloads that
burst occasionally up over 100 requests per second really don't need
to give us any hints about what's coming...we are designed to just
grow and support these workloads forever. S3 is a true scale-out
design in action.
S3 scales to both short-term and long-term workloads far, far greater
than this. We have customers continuously performing thousands of
requests per second against S3, all day every day. [...] We worked with other
customers through our Premium Developer Support offerings to help them
design a system that would scale basically indefinitely on S3. Today
we’re going to publish that guidance for everyone’s benefit.
[emphasis mine]
You may want to read the entire post to gain more insight into the S3 architecture and resulting challenges for really massive workloads (i.e., as stressed by the S3 team, it won't apply at all for most use cases).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
We're switching from storing all user uploaded files on our servers, to using Amazon S3. It's approx. 300 GB of files.
What is the best way to keep an backup of all files? I've seen a few different suggestions:
Copy bucket to a bucket in a different S3 location
Versioning
Backup to an EBS with EC2
Pros/cons? Best practice?
What is the best way to keep an backup of all files?
In theory, you don't need to. S3 has never lost a single bit in all these years. Your data is already stored in multiple data centers.
If you're really worried about accidentally deleting the files, use IAM keys. For each IAM user, disable the delete operation. And/or turn on versioning and remove the ability for an IAM user to do the real deletes.
If you still want a backup, EBS or S3 is pretty trivial to implement: Just run an S3 Sync utility to sync between buckets or to the EBS drive. (There are a lot of them, and it's trivial to write.) Note that you pay for unused space on your EBS drive, so it's probably more expensive if you're growing. I wouldn't use EBS unless you really had a use for local access to the files.
The upside of the S3 bucket sync is you can quickly switch your app to using the other bucket.
You could also use Glacier to backup your files, but that has some severe limitations.
IMHO, backup to another S3 bucket in another Availability Zone (hence Bucket) is the best way to go:
You already have the infrastructure to manipulate S3 so there is little change to do
This will ensure that in the event of a catastrophic failure of S3, your backup AZ won't be affected
Other solutions have drawbacks this doesn't have:
Versioning is not catastrophic failure proof
EBS backup requires specific implementation to manipulate these backups directly on the disk.
I didn't try it myself but Amazon have a versioning feature that could solve your backup fears - see: http://aws.amazon.com/about-aws/whats-new/2010/02/08/versioning-feature-for-amazon-s3-now-available/
Copy bucket to a bucket in a different S3 location:
This may not be necessary because S3 already has achieved six "9" reliable by redundancy backup. People who want to achieve data accessing performance globally might make copy of buckets in different data center. So, unless you want to avoid some unlikely disaster like "911", then you can make a copy in Tokyo data center for buckets in New York.
However, within same data center, copying buckets to different buckets gives you very little help when disaster happens to same data center.
Versioning
It helps you achieve storage efficiency by saving redundancy and helps to restore faster. Definitely it is a good choice.
Backup to an EBS with EC2
You probably will NEVER do this because EBS is a much expensive/faster storage in AWS compared with S3. And its main purpose is for backup EC2 image for faster bootup. EC2 is computing instance which has nothing to do with storage or S3. It is totally irrelevant and I cannot see any point that you introduce EC2 to your data backup.
I am migrating my Java,Tomcat, Mysql server to AWS EC2.
I have already attached EBS volume for storing MySql data. In my web application people may upload images. So I should persist them. There are 2 alternatives in my mind:
Save uploaded images to EBS volume.
Use the S3 service.
The followings are my notes, please be skeptic about them, as my expertise is not on servers, but software development.
EBS plus: S3 storage is more expensive. (0.15 $/Gb > 0.1$/Gb)
S3 plus: Serving statics from EBS may influence my web server's performance negatively. Is this true? Does Serving images affect server performance notably? For S3 my server will not be responsible for serving statics.
S3 plus: Serving statics from EBS may result I/O cost, probably it will be minor.
EBS plus: People say EBS is faster.
S3 plus: People say S3 is more safe for persistence.
EBS plus: No need to learn API, it is straight forward to save the images to EBS volume.
Namely I can not decide, will be happy if you guide.
Thanks
The price comparison is not quite right:
S3 charges are $0.14 per GB USED, whereas EBS charges are $0.10 per GB PROVISIONED (the size of your EBS volume), whether you use it or not. As a result, S3 may or may not be cheaper than EBS.
I'm currently using S3 for a project and it's working extremely well.
EBS means you need to manage a volume + machines to attach it to. You need to add space as it's filling up and perform backups (not saying you shouldn't back up your S3 data, just that it's not as critical).
It also makes it harder to scale: when you want to add additional machines, you either need to pull off the images to a separate machine or clone the images across all. This also means you're adding a bottleneck: you'll have to manage your own upload process that will either upload to all machines or have a single machine managing it.
I recommend S3: it's set and forget. Any number of machines can be performing uploads in parallel and you don't really need to notify other machines about the upload.
In addition, you can use Amazon Cloudfront as a cheap CDN in front of the images instead of directly downloading from S3.
I have architected solutions on AWS for Stock photography sites which stores millions of images spanning TB's of data, I would like to share some of the best practice in AWS for your requirement:
P1) Store the Original Image file in S3 Standard option
P2) Store the reproducible images like thumbs etc in the S3 Reduced Redundancy option (RRS) to save costs
P3) Meta data about images including the S3 URL can be stored in Amazon RDS or Amazon DynamoDB depending upon the query complexity. Query the entries from Amazon RDS. If your query is complex it is also common practice to Store the meta data in Amazon CloudSearch or Apache Solr.
P4) Deliver your thumbs to users with low latency using Amazon CloudFront.
P5) Queue your image conversion either thru SQS or RabbitMQ on Amazon EC2
P6) If you are planning to use EBS, then they are not scalable with your EC2. So ideally you can use GlusterFS as your common storage pool for all your images. Multiple Amazon EC2 in Auto Scaled mode can still connect to it and access/write images.
You already outlined the advantages and disadvantages of both.
If you are planning to store terabytes of images, with storage requirements increasing day after day, S3 will probably be your best bet as it is built especially for these kinds of situations. You get unlimited storage space, without having to worry about sharding your data over many EBS volumes.
The recurrent cost of S3 is that it comes 50% more expensive than EBS. You will also have to learn the API and implement it in your application, but that is a one-off expense which I think you should be able to absorb very quickly.
Do you expect the images to last indefinitely?
The Amazon EBS FAQ is pretty clear; the annual failure rate is not "essentially zero"; they quote 0.1% to 0.5%. It's better than the disk under your desk, but it would need some kind of backup.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
So, I have a dedicated server. I host about dozen or so small sites.
Is there a real benefit in using S3(or Mosso) for my image and static file hosting? My server has more than enough disk space, or am I completely missing the point of S3?
I keep reading about how wonderful and cheap it is, and I ask myself "self, why aren't you using this" and the reply is always "why?"
if you're running within the included storage and bandwidth of your server and your needs are being served well, you are already doing the simplest thing that is working for you and that is where you should always start. Off the top of my head I can think of a couple reasons why you may want to move some storage to S3 in the future:
Your storage or bandwidth needs grow beyond what you have and S3 is cheaper than upgrading your current solution
You move to a multiple-dedicated-server solution for failover/performance reasons and want to be able to store your assets in a single shared location
Your bandwidth needs are highly
variable (so you can avoid a monthly
fee when you're not getting traffic) [Thanks Jim, from the comments]
If you run an entire website off of a single machine, and that machine is more than enough to handle your site, then kudos, images are not a bottleneck that needs solving right now. Forget about S3 for now.
However, as your server gets busier, you will want your server to be spending all of its time doing server things. Transferring static content like flat HTML files and images is an easy, dumb job, and wasting precious active connections, bandwidth, and CPU cycles on them is no good. By switching to S3, your server can concentrate on doing what's important, which is whatever your program actually DOES.
S3 also has benefits of being distributed around and attached to what's probably a fatter pipe than your server, which means the images will show up slightly more quickly on your client's machines, so that's an added bonus.
S3 is also backed up, which means that it makes for a pretty nice place to store pretty much any private data under the sun, in addition to stuff that you want to serve to others (although don't confuse the permissions settings between those two things -- in fact, you may want to use separate accounts entirely).
S3 is also nigh-infinite, which means that if you want to let users upload files to your site (profile images, attachments, etc), S3 is a great choice so that you don't have to constantly worry if your server is going to run out of disk space (obligatory $$$ warning here).
But like I said at the top, if you're a one-server setup with a handful of users, none of this really matters. It's a tool like any other, and it may not be something you need yet.
It's simply a matter of doing the numbers: given a certain amount of traffic for a set of files, you can calculate exactly how much hosting those file on S3 would cost you, and you should be able to do the same for your current provider. If the number is lower for S3, there you have your reason.
An added benefit is that S3 scales pretty much linearly with traffic and you pay only for what you actually use, whereas most providers charge you a flat fee no matter how little traffic you actially have, and some will gouge you badly if you ever exceed the maximum traffic included in the flat fee.
Better speed and availability could be an additional benefit.
Basically, if you have a site that could potentially incur wildly disparate traffic, then using S3 for its images and other static files means that if you're hit by the Slashdot effect, the site has a much better chance of staying reachable, and you have a much better chance of avoiding nasty surprises concerning excess traffic fees.
The advantages of Amazon S3 are reliability, scalability, speed and cost. Here is some info on each.
Reliability: Amazon stores your data in multiple data centers. If there was a disaster and one data center was destroyed your content would continue to be served from the second data center. It’s very unlikely that data you upload to Amazon would ever be lost.
Scalability: If one of your web sites becomes popular and millions of people visit the site, your web server will not be able to handle the load. In comparison when you upload your files to Amazon they are stored in multiple locations. If the load on your content grows your files are automatically replicated to more servers so your files will always be available.
Speed: Amazon has a service called CloudFront that works in conjunction with Amazon S3. When you activate CloudFront on your S3 content your content is moved to edge locations. These are servers that make your content available for high speed transfer.
Cost: With Amazon S3 you only pay for what you use. If you have a few files that get little traffic you will only pay a few cents a month.
SprightlySoft has a blog post which gives even more reasons why Amazon S3 is great. Read it at http://sprightlysoft.com/blog/?p=8
If you're hosting a high-traffic site, the bandwidth cost (and latency issues) of hosting images yourself makes S3 and other services like Akami attractive. For a low-traffic site, it probably isn't an issue.
I'd say that there's no reason if your base hosting plan provides enough space/bandwidth. Where I think it's useful is when your file transfers become enough that you have to look at buying an add-on of storage/bandwidth from the provider -- in that case, S3 may be a viable alternative. But if I'm paying $X/month and not using all of the storage, there's no upside to it.
On the other hand, if your capacity planning calls for you to someday exceed the provider's limits, S3 may be a good solution from the start so you don't have files being served from multiple places.
I would second the mention of "redundancy" -- you can count on any content that's in S3 to be distributed to multiple data centers, and effectively been very much always accessible for anyone with functioning network connection.
Cost may be another factor: data transfer rates for S3 are quite competitive.
And speed is the last one: you can access data VERY fast from S3. But that's more of an issue for data other than browser-viewable images.
For small sites, S3 or Mosso may not be that reasonable for image hosting, but if you have any video files (.wmv, .flv, etc...) or large downloads (app distributions, etc..), I'd still put them on S3 or Mosso to save potential bandwidth spikes if for some odd reason, your content becomes wildly popular.
You write:
My server has more than enough disk space, or am I completely missing the point of S3?
You are not missing the point if what you have on you server is write-once read-less-than-once stuff, such as disaster-recovery backups (which you hope will be read-never), because transfer times will not matter. The point of S3 is delivery speed.
First, S3 distributes your content geographically. End users benefit from shorter paths.
Second, S3 can act as a BitTorrent seed, which not only conserves your bandwidth, it means your most popular content will be distributed faster because it can take advantage of the ad-hoc swarm. There are reports on the AWS Discussion Forums that S3 support of the BitTorrent protocol is "very, very spotty." I have not tested it myself.
Many of you won't have this problem, but if you (and your web server) are located in Australia (read: the 3rd world of the Internet), you run into the issue that S3 does not have geographically close locations, which means there will be a higher latency on your images and other static content. Scalable: yes. Fast: no.
From what I hear, besides low cost, the main advantage is the ease of backup from an EC2 setup.
Link..
http://groups.drupal.org/node/2383
Speed might be the only benefit. If your dedicated server is simply networked through your ISP (which may well throttle upstream speeds even if downstream speeds are high) then you might find that your sites are often slow to load. If so, then S3 or another dedicated server provider can help. Other than that, I can think of absolutely no reason why Amazon's service would be more appropriate for you - especially with simple, static sites.
It's not really directly related to your actual hosting of web sites, but it's certainly an important part of it, especially if the sites don't belong to you alone -- S3 is a great backup solution. There are tools such as duplicity that can automatically and efficiently back things up onto S3 for you, and it's extremely cheap for this purpose. I back up a fairly large amount of data for less than $1/month.
Besides the Fat Pipe and Local Delivery arguments for S3 there is also the manner of a single server does not function optimally when its functioning both as a db server and as a file server. If your running any sort of db I would suggest offloading all your static files to s3. The cost is trivial and you will see pretty big performance gains on page load.