Using CloudFront, is it still useful to use different S3 region? - amazon-s3

Up to now, we were using S3 to store our files, using buckets in different regions to be closest to our data generator and people getting data (much more GET than POST, POST'er typically closer to GET'er).
We are moving to CloudFront for many reasons. So now the data is pushed and got from the closest CloudFront endpoint from the user, as a proxy to/from S3.
The question that now arises is whether it is still useful for any reason to store our data on a bucket depending on the region?
GET will not be faster as they are served from the CF endpoint, except for the very first GET of an CF area after a "long" duration without GET
POST will not be faster as they are pushed to the CF endpoint
The cost of CF does not seem to be affected by the region of the origin S3

As you said, region may not make any significant difference for GETs, since Amazon CloudFront distributions have a single endpoint: cloudfront.amazonaws.com
However if you are writing (put) to S3 directly from your user side, here it might be better to keep it at a closer region.

Related

Difference between Data Transfer and GET request for Amazon S3

I was looking at my billing at noticed my price for Data Transfer made almost 100% of my bill, so I want to be sure I understand exactly what Data Transfer entails, that a GET request.
Just for context I host my website on a different server and have it hooked up to an S3 to store user generated files. These files are the made available for download. Does Data Transfer just cover the bandwidth used to download the file, or is it also used to display one of the files store on my s3 on my site. So for example, if I store a mp3 file on my s3, and display this file on the site to play (excluding the downloading), is that just a GET request thats being sent to get and display the file? To me the definitions are little ambiguous. Any help!?
The GET per-request charge is the charge for handling the actual request for the file (checking whether it exists, checking permissions, fetching it from storage, and preparing to return it to the requester), each time it is downloaded.
The data transfer charge is for the actual transfer of the file's contents from S3 to the requester, over the Internet, each time it is downloaded.
If you include a link to a file on your site but the user doesn't download it and the browser doesn't load it to automatically play, or pre-load it, or something like that, S3 would not know anything about that, so you wouldn't be billed. That's also true if you are using pre-signed URLs -- those don't result in any billing unless they're actually used, because they're generated on your server.
If you include an image on a page, and the image is in S3, every time the page is viewed, you're billed for the request and the transfer, unless the browser has cached the image.
If you use CloudFront in front of S3, so that your image or download links point to CloudFront, you would pay only the request charge from S3, not the transfer charge, from S3, because CloudFront would be billing you the transfer charge instead of S3 (and, additionally, a CloudFront per-request charge, but since CloudFront's data transfer charges are slightly cheaper than S3 in some regions, it's not necessarily a bad deal, by any means).

Proper way to name objects in mass storage service

I wonder as one of my personal projects development goes further forward how should i organize the files ( images, videos, audio files ) uploaded by the users onto AWS's S3/GCE Cloud Storage, i'm used to see these kinds of URL below;
Facebook fbcdn-sphotos-g-a.akamaihd.net/hphotos-ak-xft1/v/t1.0-9/11873531_1015...750483_5263546700711467249_n.jpg?oh=b3f06f7e...b7ebf7&oe=56392950&__gda__=1446569890_628...c7765669456
Tumblr 36.media.tumblr.com/686b47...e93fa09c2478/tumblr_nt7lnyP3ld1rqbl96o1_500.png
Twitter pbs.twimg.com/media/CMimixsV...AcZeM.jpg
Does these random characters carry some kind of meaning? or they're just "UUIDs"? Is there a performance/organization issue in using, for instance this kind of URL below?
content.socialnetworkX.com/userY/post/customName_dinosaurs.jpg
EDIT: Let be clear that i'm considering millions of files.
For S3, see the Performance Considerations page where it talks about object naming. Specifically, if you plan to upload objects at a high rate, you should avoid sequentially named objects, as they can be a bottleneck.
Google Cloud Storage does not have this performance bottleneck. See this answer.

Unique challenge of s3 Bucket Policy for 'Grant/Restrictions' access

I read the directions for posting, so I will be as specific as possible.
I have an S3 bucket with numerous FLV files that I will be allowing customers to stream on THEIR domains.
What I am trying to accomplish is
Setting a bucket policy that 'GRANTS' access to specific domains (a list) to stream my bucket files from their domains.
A bucket policy that restricts a user to 'one stream' per domain. In other words, for each domain listed in the above policy, they can only stream one file at a time on their site.
The premise is a video site where customers will be streaming videos specific to their niche. I make host and deliver the videos, but need some control over their delivery.
All files are in ONE bucket. There aren't any weird things going on with the files. It's very straight forward.
I just need the bucket policy control that would Grant and also Restrict the ability of my customers to stream my content from their domains.
I PRAY I have been clear enough, but please don't hesitate to ask if I have confused you...
Thanks VERY much
A
I don't think you can achieve what you want by simply setting access permissions to the bucket.
I checked in AccessControlList and CannedAccessControlList.
Your best bet will be to write a webservice wrapper to access the bucket data.
You will have better control over the data you serve and may be you might also explore the option of cached copy of data for higher optimization.

How to store pointer to S3 objects in Amazon SimpleDB?

I'm trying to figure out how to store a database consisting of metadata in Amazon SimpleDB, with the actual content the metadata refers to (videos) in S3. As I understand it, I should place a pointer in SimpleDB that refers to the videos in S3. What is this pointer, exactly? Is it the URL of the video located in S3?
Also, are there any code samples that would pertain to this?
Thanks!
You're right, just enter the url on simpleDB and you're done.
What you're trying to do is pointed as an use case: http://aws.amazon.com/en/simpledb/usecases_metadata_indexing/
Taking a look at the code library, you can filter by S3 or SimpleDB and you'll find examples like SimpleDB PHP Sample Program Set and Travel Log - Sample Java Web Application.
Regards.

Possible to get image from Amazon S3 but create it if it doesn't exist

I'm not sure how to word the question but here is what I am looking to do.
I have a site that uses custom map tile overlays on a google map.
The javascript calls a php file on my server that checks to see if an existing map tile exists for the given x, y, and zoom level.
If if exists, it displays that image using file_get_contents.
If it doesn't exist, it creates the new tile then displays it.
I would like to utilize Amazon S3 store and serve the images since there could end being a lot of them and my server is slow. If I have my script check to see if the image exists on amazon and then display it, I am guessing I am not getting the benefits of the speed and Amazons CDN. Is there a way to do this?
Or is there a way to try and pull the file from Amazon first then set up something on Amazon to redirect to my script if the files no there?
Maybe host the script on another of Amazons services? The tile generation is quite slow also in some cases.
Thanks
Ideas:
1 - Use CloudFront, but point it to a cluster of tile generation machines. This way, you can generate the tiles on demand, and any future requests are served right from Cloudfront.
2 - Use CloudFront, but back with with an S3 store of generated tiles. Turn on logging for the S3 bucket, so you can detect failed requests. Consume those logs on a schedule, and generate the missing tiles. This results in a cheaper way of generating tiles, but means that when a tile fails the user get's nothing.
3 - Just pre-generate all the tiles. Throw tasks in an SQS queue, then spin up a collection of EC2 instances to generate the tiles. This will cost the most up front, but all users get a fast experience.
I've written a blog post with a strategy for dealing with this. It's designed to make intelligent and thrifty use of CloudFront, maximize caching and deal with new versions of existing images. You may find the technique described there helpful. The example code shows how to handle different dimensions (i.e. thumbnails) of images. You could modify it to handle different zoom levels.
I need to update that post to support CloudFront custom origins, and I think that for your application you might be better off skipping S3 and using a custom origin. The advantage of a custom origin is simply that it's probably going to be easier to manage all of your images on your local filesystem compared to managing them on S3.