How does versioning work on Amazon Cloudfront? - amazon-s3

I've just set up a static website on Amazon S3. I'm also using the Cloudfront CDN service.
According to Amazon, there are 2 methods available for clearing the Cloudfront cache: invalidation and versioning. My question is regarding the latter.
Consider the following example:
I link to an image file (image.jpg) from my index.html file. I then decide to replace the image. I upload a second image with the filename: image_2.jpg and change the link in my index.html file.
Will the changes automatically take effect or is some further action required?
What triggers the necessary changes if the edited and newly uploaded files are located in the bucket and not the cache?

Versioning in CloudFront is nothing more than adding (or prefixing) a version in the name of the object or 'folder' where objects are in stored.
all objects in a folder v1 and use a URL like
https://xxx.cloudfront.net/v1/image.png
all objects contain a version in their name like image_v1.png and use a URL like https://xxx.cloudfront.net/image_v1.png
The second option is often a bit more work but then you don't need to upload new files which do not require to be updated (=cheaper in the context of storage). The first solution is often more clear and requires less work.
Using CloudFront Versioning requires more S3 storage but is often cheaper than creating many invalidations.
The other way to invalidate the cache is to create invalidations (can be expensive). If you don't really need invalidations but just need more quick cache refreshes (default 24h) then you can update the origin TTL settings (origin level) or set cache duration for an individual object (object level).

Your cloudfront configuration has a cache TTL, which tells you when the file will be updated, regardless of when the source changes.
If you need it updated right away, use the invalidation function on your index.html file

I'll chime in on this in case anyone else comes here looking for what I did. You can set up Cloudfront with S3 versioning enabled and reference specific S3 versions if you know which version you need. I put it behind a presigned Cloudfront URL and ended up with this in the Java SDK:
S3Properties s3Properties... // Custom properties pulled from a config file
String cloudfrontUrl = "https://" + s3Properties.getCloudfrontDomain() + "/" +
documentS3Key + "?versionId=" + documentS3VersionId;
URL cloudfrontSignedUrl = new URL(CloudFrontUrlSigner.getSignedURLWithCannedPolicy(
cloudfrontUrl,
s3Properties.getCloudfrontKeypairId(),
SignerUtils.loadPrivateKey(s3Properties.getCloudfrontKeyfilePath()),
getPresignedUrlExpiration()));

Related

How best to serve versioned S3 files from CloudFront?

I have a CloudFront distribution that has two origins in it: one origin that is a static S3 bucket (which is cached by CloudFront), and another origin that is a dynamic server (which is not cached). When users log into my application, the dynamic origin redirects users to the static, cached S3 bucket origin.
Right now, I'm handling the versioning of my S3 bucket by doing the following: I prepend the version of my code to the path of the S3 bucket on every release (so if my S3 bucket's path is normally /static/ui, it now becomes /v1.2/static/ui). The S3 bucket's cache behavior in CloudFront has the path pattern /static/ui, BUT the origin settings for the S3 bucket has the origin path /v1.2. Unfortunately, because the origin path isn't included in my cache behavior, whenever I have to change it to point to a new version, I have to invalidate my cache so that CloudFront will check the new origin path.
So, the release process goes like this:
I create a new version of my UI code and add it to S3, and prepend the version to my S3 bucket's path (creating a path that looks like this /v1.2/static/ui).
I change my "Origin Path" value in CloudFront associated with my S3 origin to have the new version in it (so it becomes /v1.2). This makes it so that all requests to my CloudFront distribution get forwarded to my origin with /v1.2 prepended to the origin path.
I invalidate my CloudFront cache.
This method of versioning my UI works - but is there a better way to do it? I'd like to be able to handling versioning my S3 bucket without having to invalidate the cache every time I change versions.
I ended up versioning my S3 bucket and handling my CloudFront cache busting by doing the following:
I changed the way that Webpack built my static assets by adding a hash to all of the files Webpack builds except for my index.html, which now points to all of my hashed filenames. So, for example, previously, a given JavaScript chunk that Webpack builds was called 12.js, and now it might be called 0b2fb3071ad992833dd4.js, and the index.html now references that new hashed filename. The file hash is generated by the content within the file, and will change if the content in the file changes.
I make sure that when my static files are uploaded to S3, the index.html file has the header Cache-Control: no-cache sent out with every request to the index.html file in my S3 bucket. This makes it so that CloudFront never caches the index.html file, but will still cache the hashed filenames that the index.html points to.
I prepend the version of my static files to the S3 bucket's path where I upload my static assets (so for example, my S3 bucket's path might look like /v1.2/static/ui/index.html when I make a request to my index.html file).
Upon every new release, I update my CloudFront origin path for my S3 bucket origin to point to the new version of my UI (so it might change from /v1.2 to /v1.3).
Because my index.html isn't cached by CloudFront, and because all other static files that the index.html points are hashed, and will have different file names upon each release (if the file changed at all), I no longer need to manually send a CloudFront cache invalidation to my S3 bucket's path upon every release, since upon every release, the index.html file will point to totally new files which will not be cached in CloudFront until users make new requests to it.
This also means that switching my CloudFront origin path will now instantaneously switch users to different versions of my UI, and I don't have to wait the estimated 5 minutes for a manual CloudFront cache invalidation to take effect.
Lastly, this method of cache busting also works with browser-side caching, so I was able to have files saved in my S3 bucket send a Cache-Control: max-age=604800 header to users for all files except for my index.html file and enable browser-side caching for my users. The browser-side cache is invalidated when the index.html points to a new hashed filename for its static assets. This greatly improved the performance of my application.
This all came at the cost of not being able to cache my index.html file in CloudFront or the user's browser, but I think the benefits of caching this way outweigh the cost.

Is Cache Invalidation on S3 a One Time Event or a Type of Policy?

I have a set of files on Amazon S3 cloudfront that I do not want cached. I was able to create an Invalidation on a single file that seemed to work. However, going forward the file seems to be cached again even though the Invalidation Entry is still present. Is the Invalidation a "One Time Event"? Does anyone know the exact details of how this works.
I would like a set of files to basically never be cached going forward.
Thanks for any suggestions and best practices advice.
Invalidation removes a cached entry from CloudFront's edge locations, but has no impact on whether or not the invalidated object(s) are cached again in the future. All else held equal: after you issue an invalidation, objects that were previously cached will be cached again on subsequent requests.
Before we explore the options, two definitions that are important to understand:
Cache behaviors are effectively routes with dedicated configurations applying only to requests matching the route (known as a path pattern)
Cache policies are instructions for how CloudFront will cache your responses. Cache policies are attached to one or more cache behaviors. The min and max TTL set a floor and ceiling on the value returned in your Cache-Control/Expires headers. The default TTL determines the length of time to cache a response when you don't provide a Cache-Control/Expires header.
Do you want to prevent caching for all files in your S3 bucket?
Attach the CachingDisabled cache policy (provided by CloudFront) to your default cache behavior.
Do you want to prevent caching for only certain files in your S3 bucket?
If the files you do not want to cache live in the same directory, create a cache behavior to match that path and use the CachingDisabled cache policy (provided by CloudFront) to prevent files in that directory from being cached. This instructs CloudFront to use a cache policy that does not cache responses when processing requests that match a specific path/route.
Set a Cache-Control header as metadata on the objects in S3 to instruct CloudFront not to cache, while caching the other objects.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html (Scroll down to Adding headers to your objects using the Amazon S3 console)

How to make browser download html when its content changed in s3?

I am using s3 bucket to host my web site. Whenever I release a new version of my web site, I want all clients download it from s3 instead of reading from their browser cache. I know I can set up an expire time for the object saved on s3 bucket but it is not an idea solution since users have to use the cached content for a period of time. Is there a way to force browser to download the content if they are changed in s3 bucket?
Irrespective of whether you are using s3 bucket for hosting or any other hosting server, caching can be controlled by appending hash number to file name.
For example your js file bundle name should be like bundle.7e2c49a622975ebd9b7e.js.
When you deploy it again it will change to some other hash value bundle.205199ab45963f6a62ec.js.
By doing this, browser automatically knows that, new file has arrived and should be downloaded again.
This can be easily done using any popular bundlers like grunt, gulp, webpack.
webpack example

Are new items added to a S3 bucket automatically served over CDN?

I have a bucket of images on S3. I set up the bucket contents to be delivered over a CDN using Cloudfront. Now if I add new images to the bucket do I need to re-setup Cloudfront for the bucket or are new items automatically distributed?
The items aren't automatically distributed, but there's nothing else you need to configure. The CDN nodes look locally for cached objects affiliated with that bucket. If the object doesn't exist but needs to be served from the CDN, the node requests it from S3 and caches it for a predetermined amount of time.
It's actually a lot like "DNS propagation", a common misnomer by which some people think DNS setting changes "propagate" around the world. In reality, the world's servers request the information then cache it locally. You're not waiting for DNS changes to propagate; you're waiting for the cached settings to expire. So it goes with CloudFront: New objects are served and cached when they're first requested. If you replace an object with a new object by the same name, CDN nodes that cached the object will show the old one until it expires.

Amazon S3 Cloudfront Deployment Best Practice

Our current plan for a site is to use Amazon's Cloudfront service as a CDN for asset files such as CSS, JavaScript, and Images, and any other static files.
We currently have 1 bucket in S3 that contains all of these static files. The files are separated into different folders depending on what they are, "Scripts" are JS files, "Images" are Images, etc yadda yadda yadda.
So, what I didn't realize from the start was that once you deploy a Bucket from S3 to a Cloudfront Distribution, then every subsequent update to the bucket won't deploy again to that same Distribution. So, it looks as if you have to redeploy the bucket to another Cloudfront instance every time you have a static file update.
That's fine for images, because we can easily make sure that if there is a change to an image, then we just create a new image. But, that's difficult to do for CSS and JS.
So, that gets me to the Best Practice questions:
Is it best practice to create another Cloudfront Distribution for every production deployment? The problem here would be that causes trouble with CNAME records.
Is it best practice to NOT warehouse CSS and JS in Cloudfront because of the nature of those files, and their need to be easily modified? Seems like the answer to this would be NO because that's the purpose of a CDN.
Is there some other method with Cloudfront that I don't know about?
You can issue invalidation requests to CloudFront.
http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html
Instead of an S3 bucket, though, we use our own server as a custom origin. We have .htaccess alias style_*.css to style.css, and we inject the file modification time for style.css in the HTML. As CloudFront sees a totally different URL, it'll fetch the new version.
(Note: Some CDNs let you do that via query string, but CloudFront ignores all query string data for caching, hence the .htaccess solution.)
edit: CloudFront can be (optionally) configured to use query strings now.
CloudFront has started supporting query strings, which you can use to invalidate cache.
http://aws.typepad.com/aws/2012/05/amazon-cloudfront-support-for-dynamic-content.html