Amazon Cloudfront sometimes ignoring http headers from S3 - amazon-s3

I have a cloudfront distribution that serves files from AWS S3.
Check the following files:
http://s3.amazonaws.com/kekanto_static/css/site.v70.css
http://s3.amazonaws.com/kekanto_static/css/site.v71.css
If you take a look at the headers, you'll realize that BOTH files contain the entry:
Content-Encoding:gzip
Both are properly gzipped in the S3 bucket as it should be.
But when served from Cloudfront, the content-encoding is not coming:
http://img.kekanto.com/css/site.v70.css
http://img.kekanto.com/css/site.v71.css
Any idea why this is happening?
I also checked the Cloudfront endpoints:
http://d1l2emnu9r9dtc.cloudfront.net/css/site.v70.css
http://d1l2emnu9r9dtc.cloudfront.net/css/site.v71.css
And the problem remains the same.
EDIT:
Looks like after an invalidation everything is working properly again, so you will be unable to test the URIs I've given.
All I can think about is that the S3 bucket makes the file available and a only a while after is that the headers become available, causing sometimes the headers getting skipped.
How can I prevent this? Other than setting my uploading files to sleep for a while before letting my web servers know there's a new version?

Related

How best to serve versioned S3 files from CloudFront?

I have a CloudFront distribution that has two origins in it: one origin that is a static S3 bucket (which is cached by CloudFront), and another origin that is a dynamic server (which is not cached). When users log into my application, the dynamic origin redirects users to the static, cached S3 bucket origin.
Right now, I'm handling the versioning of my S3 bucket by doing the following: I prepend the version of my code to the path of the S3 bucket on every release (so if my S3 bucket's path is normally /static/ui, it now becomes /v1.2/static/ui). The S3 bucket's cache behavior in CloudFront has the path pattern /static/ui, BUT the origin settings for the S3 bucket has the origin path /v1.2. Unfortunately, because the origin path isn't included in my cache behavior, whenever I have to change it to point to a new version, I have to invalidate my cache so that CloudFront will check the new origin path.
So, the release process goes like this:
I create a new version of my UI code and add it to S3, and prepend the version to my S3 bucket's path (creating a path that looks like this /v1.2/static/ui).
I change my "Origin Path" value in CloudFront associated with my S3 origin to have the new version in it (so it becomes /v1.2). This makes it so that all requests to my CloudFront distribution get forwarded to my origin with /v1.2 prepended to the origin path.
I invalidate my CloudFront cache.
This method of versioning my UI works - but is there a better way to do it? I'd like to be able to handling versioning my S3 bucket without having to invalidate the cache every time I change versions.
I ended up versioning my S3 bucket and handling my CloudFront cache busting by doing the following:
I changed the way that Webpack built my static assets by adding a hash to all of the files Webpack builds except for my index.html, which now points to all of my hashed filenames. So, for example, previously, a given JavaScript chunk that Webpack builds was called 12.js, and now it might be called 0b2fb3071ad992833dd4.js, and the index.html now references that new hashed filename. The file hash is generated by the content within the file, and will change if the content in the file changes.
I make sure that when my static files are uploaded to S3, the index.html file has the header Cache-Control: no-cache sent out with every request to the index.html file in my S3 bucket. This makes it so that CloudFront never caches the index.html file, but will still cache the hashed filenames that the index.html points to.
I prepend the version of my static files to the S3 bucket's path where I upload my static assets (so for example, my S3 bucket's path might look like /v1.2/static/ui/index.html when I make a request to my index.html file).
Upon every new release, I update my CloudFront origin path for my S3 bucket origin to point to the new version of my UI (so it might change from /v1.2 to /v1.3).
Because my index.html isn't cached by CloudFront, and because all other static files that the index.html points are hashed, and will have different file names upon each release (if the file changed at all), I no longer need to manually send a CloudFront cache invalidation to my S3 bucket's path upon every release, since upon every release, the index.html file will point to totally new files which will not be cached in CloudFront until users make new requests to it.
This also means that switching my CloudFront origin path will now instantaneously switch users to different versions of my UI, and I don't have to wait the estimated 5 minutes for a manual CloudFront cache invalidation to take effect.
Lastly, this method of cache busting also works with browser-side caching, so I was able to have files saved in my S3 bucket send a Cache-Control: max-age=604800 header to users for all files except for my index.html file and enable browser-side caching for my users. The browser-side cache is invalidated when the index.html points to a new hashed filename for its static assets. This greatly improved the performance of my application.
This all came at the cost of not being able to cache my index.html file in CloudFront or the user's browser, but I think the benefits of caching this way outweigh the cost.

My Cloudfront/S3 cache averages 60% misses even using Managed-CachingOptimized settings – why?

I have a Cloudfront distribution which has a single S3 origin serving static files. These files all have cache-control: public,max-age=31536000 (one year) as metadata, but when I view the distribution statistics, I see a consistent 60% Miss rate.
I can see that the objects with the lowest Hit rates (usually around 50%) are all from a specific folder, which my Django app uploads thumbnails to. These files still have the same headers, though, so I can't figure out why they're different.
For example – when I load this file (S3 origin, Cloudfront link) in my browser, I see a age: 1169380; x-cache: Hit from cloudfront headers. But if I curl the same URL, I see x-cache: Miss from cloudfront and no age header – if I curl again, the Age begins incrementing from 0 (and I see a Hit).
This feels wrong to me – the cache policy I'm using is a Cloudfront default (Managed-CachingOptimized) which doesn't forward any headers or querystrings, so why does my curl command trigger a call to origin when I just loaded the same file via my browser, and got a cached response?
It's possible I've misunderstood how Cloudfront is supposed to cache these files so would appreciate any pointers.
(If it helps, this page will give a bunch of URLs which show the issue, eg. any image under https://static.scenepointblank.com/thumbs/*)

s3 never-ending pending audio requests

I have an mp3 file on s3 (and have experienced with many other mp3 files) that is not playing in chrome (and other browsers as well: FF, safari, etc). The network dialog in chrome shows that there is a pending request that is seemingly never responded to by s3, however if I do a wget to the URL, I get an immediate response.
Additionally, if I serve the exact same file off of a server running nginx, I can access the URL in chrome as well instantaneously. I know that S3 does support byte range requests, so there should be no issue with chrome's byte range queries. Additionally, I've verified that the file is accessible, and that its content type is audio/mpeg.
Here is the file in question:
http://s3.amazonaws.com/josh-tmdbucket/23/talks/ffcc525a0761cd9e7023ab51c81edb781077377d.mp3
Here is a screenshot of the network requests in chrome for that URL:
I solved this by creating a CloudFront distribution. You need to create a distribution for your bucket. For example if you have a bucket named example-bucket, go to CloudFront and click on create distribution. Your bucket will appear in Origin Domain Name as example-bucket.s3.amazonaws.com
Now you can use example-bucket.s3.amazonaws.com url to load content.
This worked for me but I am not sure if it will work for others.
Had same exact issue with files.
Original url looked like this =>
https://my-bucket-name.s3-eu-west-1.amazonaws.com/EIR.mp4
Added CloudFront distribution and it solved all my issues.
Url changed only a bit:
https://my-bucket-name.s3.amazonaws.com/EIR.mp4
(but you can modify it a little while creating distribution / even setting your own DNS if you wish).

Flushing CloudFront's cache when S3 files are deleted

I have set up S3 with CloudFront as my CDN. As you know, when you upload files to S3 bucket, they are pushed to all CloudFront's edge locations and cached for best performance.
If I delete files from S3 they remain in CDN's cache and are still being served to the end-users. How to prevent this behavior? I want CloudFront to serve only the files that are actually available in the S3 storage.
Thanks in advance.
You can invalidate objects on Cloudfront using the API or the Console. When doing this, the files get deleted from the Cloudfront edge locations.

Amazon S3 Cloudfront Deployment Best Practice

Our current plan for a site is to use Amazon's Cloudfront service as a CDN for asset files such as CSS, JavaScript, and Images, and any other static files.
We currently have 1 bucket in S3 that contains all of these static files. The files are separated into different folders depending on what they are, "Scripts" are JS files, "Images" are Images, etc yadda yadda yadda.
So, what I didn't realize from the start was that once you deploy a Bucket from S3 to a Cloudfront Distribution, then every subsequent update to the bucket won't deploy again to that same Distribution. So, it looks as if you have to redeploy the bucket to another Cloudfront instance every time you have a static file update.
That's fine for images, because we can easily make sure that if there is a change to an image, then we just create a new image. But, that's difficult to do for CSS and JS.
So, that gets me to the Best Practice questions:
Is it best practice to create another Cloudfront Distribution for every production deployment? The problem here would be that causes trouble with CNAME records.
Is it best practice to NOT warehouse CSS and JS in Cloudfront because of the nature of those files, and their need to be easily modified? Seems like the answer to this would be NO because that's the purpose of a CDN.
Is there some other method with Cloudfront that I don't know about?
You can issue invalidation requests to CloudFront.
http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html
Instead of an S3 bucket, though, we use our own server as a custom origin. We have .htaccess alias style_*.css to style.css, and we inject the file modification time for style.css in the HTML. As CloudFront sees a totally different URL, it'll fetch the new version.
(Note: Some CDNs let you do that via query string, but CloudFront ignores all query string data for caching, hence the .htaccess solution.)
edit: CloudFront can be (optionally) configured to use query strings now.
CloudFront has started supporting query strings, which you can use to invalidate cache.
http://aws.typepad.com/aws/2012/05/amazon-cloudfront-support-for-dynamic-content.html