what is better Amazon EBS or S3 for streaming and uploading video - amazon-s3

What is better to use, EC2 instances for EBS or Amazon S3 for subscription based streaming channel like Netflix.
150GB upload per month, 250GB streaming per month, no peak time, with viewers based around Australia, India, North America, Europe, Brazil
and 80TB of storage that needs to migrate to the cloud?

For scalability and worldwide presence, the definite answer (using only AWS services) is:
Store videos on Amazon S3
Serve videos through Amazon CloudFront
Amazon CloudFront has presence in 70+ locations around the world and will handle the video streaming protocols for you. Mark content as private and have your application determine whether users are entitled to view videos. You can then generate pre-signed URLs that permit access to a given video for a limited period of time. See: Serving Private Content through CloudFront
In comparison, using Amazon EC2 + Amazon EBS is a poor choice because:
You would need to scale-out additional instances based upon your load
You would need to run instances in multiple regions to be closer to your users (hence lower latency)
You would need to replicate all videos to every server rather than storing a single copy of each video
Please note that your largest cost will be Data Transfer (see Amazon CloudFront Pricing. Your quoted figure of "250GB streaming per month" seems extremely low -- my family alone uses that much bandwidth per month!

Related

What are pros and cons of using AWS S3 vs Cassandra as a image store?

Which DB is better for storing images in a photo-sharing application?
We don't recommend storing images directly in Cassandra. Most companies (they're household names you'd know very well and likely using their services) store images/videos/media on an object store like AWS S3 and Google Cloud Store.
Only the metadata of the media are stored in Cassandra for very fast retrieval -- S3 URL/URI, user info, media info, etc.
The advantage of using Cassandra is that it can be deployed to a hybrid combination of public clouds so you're not tied to one vendor. Being able to distribute your Cassandra nodes across clouds means that you can get as close as possible to your users. Cheers!
AWS S3 is an object storage service that works very well for unstructured data. It offers infinite store where the size of an object is restricted to 5TB. S3 is suitable for storing large objects.
DynamoDB is a NoSQL, low latency database which is suitable for semi-structured data. DynamoDB uses cases are usually where we want to store large number of small records and have a millisecond latency, DynamoDB record size limit is 400KB
For a photosharing application, you need Both S3 and DynamoDB. S3 acts as a storage, DynamoDB is your Database which lists all galleries, files, timestamps, captions, users etc
You can store photos in Amazon S3, but photo's metadata in someother database.
Amazon S3 well suited for any objects for large size as well.

Upload bandwidth limits on amazon s3

Are there any (documented or undocumented) upload bandwidth (per account/ip?) restrictions on Amazon S3?
I'm potentially looking at storing 200mbits constantly but I figure Amazon may have an issue with that.
I could not find any documented limits and tests seem to suggest that the maximum throughput was greater than 8gbit in 2011.

On what factors does the download speed of assets from Amazon S3 depends?

How fast can we download files from Amazon S3, is there an upper limit (and they distribute it between all the requests from the same user), or does it only depend on my internet connection download speed? I couldn't find it in their SLA.
What other factors does it depend on? Do they throttle the data transfer rate at some level to prevent abuse?
This has been addressed in the recent Amazon S3 team post Amazon S3 Performance Tips & Tricks:
First: for smaller workloads (<50 total requests per second), none of
the below applies, no matter how many total objects one has! S3 has a
bunch of automated agents that work behind the scenes, smoothing out
load all over the system, to ensure the myriad diverse workloads all
share the resources of S3 fairly and snappily. Even workloads that
burst occasionally up over 100 requests per second really don't need
to give us any hints about what's coming...we are designed to just
grow and support these workloads forever. S3 is a true scale-out
design in action.
S3 scales to both short-term and long-term workloads far, far greater
than this. We have customers continuously performing thousands of
requests per second against S3, all day every day. [...] We worked with other
customers through our Premium Developer Support offerings to help them
design a system that would scale basically indefinitely on S3. Today
we’re going to publish that guidance for everyone’s benefit.
[emphasis mine]
You may want to read the entire post to gain more insight into the S3 architecture and resulting challenges for really massive workloads (i.e., as stressed by the S3 team, it won't apply at all for most use cases).

Can I Have Two Regions Under One Amazon S3 Account?

I have two wordpress blogs and I am planning to use amazon S3 with one blog and (amazon S3+cloudfront) for another blog.
I read that we need to choose a location when we start our AWS account.
However, for one site (One using cloudfront and amazon S3), my target market is US and UK and another site (Using amazon S3 alone), my target market is India.
In this case, should I use two separate accounts? Or can I have one single account with two locations? (US and Asia).
The one I am using cloudfront for will have video streaming and the one which I use S3 alone will be heavy on images.
Thank you in advance
you can have multiple locations within the same account. When creating a bucket you will be given a choice in which region to create it. E.g. you can have different buckets within the same account located in the different regions.
Thanks,
Andy

Should I persist images on EBS or S3?

I am migrating my Java,Tomcat, Mysql server to AWS EC2.
I have already attached EBS volume for storing MySql data. In my web application people may upload images. So I should persist them. There are 2 alternatives in my mind:
Save uploaded images to EBS volume.
Use the S3 service.
The followings are my notes, please be skeptic about them, as my expertise is not on servers, but software development.
EBS plus: S3 storage is more expensive. (0.15 $/Gb > 0.1$/Gb)
S3 plus: Serving statics from EBS may influence my web server's performance negatively. Is this true? Does Serving images affect server performance notably? For S3 my server will not be responsible for serving statics.
S3 plus: Serving statics from EBS may result I/O cost, probably it will be minor.
EBS plus: People say EBS is faster.
S3 plus: People say S3 is more safe for persistence.
EBS plus: No need to learn API, it is straight forward to save the images to EBS volume.
Namely I can not decide, will be happy if you guide.
Thanks
The price comparison is not quite right:
S3 charges are $0.14 per GB USED, whereas EBS charges are $0.10 per GB PROVISIONED (the size of your EBS volume), whether you use it or not. As a result, S3 may or may not be cheaper than EBS.
I'm currently using S3 for a project and it's working extremely well.
EBS means you need to manage a volume + machines to attach it to. You need to add space as it's filling up and perform backups (not saying you shouldn't back up your S3 data, just that it's not as critical).
It also makes it harder to scale: when you want to add additional machines, you either need to pull off the images to a separate machine or clone the images across all. This also means you're adding a bottleneck: you'll have to manage your own upload process that will either upload to all machines or have a single machine managing it.
I recommend S3: it's set and forget. Any number of machines can be performing uploads in parallel and you don't really need to notify other machines about the upload.
In addition, you can use Amazon Cloudfront as a cheap CDN in front of the images instead of directly downloading from S3.
I have architected solutions on AWS for Stock photography sites which stores millions of images spanning TB's of data, I would like to share some of the best practice in AWS for your requirement:
P1) Store the Original Image file in S3 Standard option
P2) Store the reproducible images like thumbs etc in the S3 Reduced Redundancy option (RRS) to save costs
P3) Meta data about images including the S3 URL can be stored in Amazon RDS or Amazon DynamoDB depending upon the query complexity. Query the entries from Amazon RDS. If your query is complex it is also common practice to Store the meta data in Amazon CloudSearch or Apache Solr.
P4) Deliver your thumbs to users with low latency using Amazon CloudFront.
P5) Queue your image conversion either thru SQS or RabbitMQ on Amazon EC2
P6) If you are planning to use EBS, then they are not scalable with your EC2. So ideally you can use GlusterFS as your common storage pool for all your images. Multiple Amazon EC2 in Auto Scaled mode can still connect to it and access/write images.
You already outlined the advantages and disadvantages of both.
If you are planning to store terabytes of images, with storage requirements increasing day after day, S3 will probably be your best bet as it is built especially for these kinds of situations. You get unlimited storage space, without having to worry about sharding your data over many EBS volumes.
The recurrent cost of S3 is that it comes 50% more expensive than EBS. You will also have to learn the API and implement it in your application, but that is a one-off expense which I think you should be able to absorb very quickly.
Do you expect the images to last indefinitely?
The Amazon EBS FAQ is pretty clear; the annual failure rate is not "essentially zero"; they quote 0.1% to 0.5%. It's better than the disk under your desk, but it would need some kind of backup.