CF Distribution status (edge servers) - amazon-s3

Is there a way for me to use AWS cli/api to verify that my static file(s) reached the edge servers once iv'e updated my s3 origin?
Flow im imagining:
Updating s3 origin -> waiting for CF to report OK that my update reached all CF edge servers -> continue with my CI operations.

Unlike with other cloud providers, you don’t push your content to Cloudfront, you will instead place the CDN in front of your resources and have them cached as they are retrieved on a per-file basis. You can control the TTL of the cache by setting the right Cache-Control headers. If you want the cache to be flushed and files be re-retrieved from the origin before the TTL expires, you can invalidate the cache. This will clear the CDN cache, and Cloudfront will look up the resources on the origin again for a given URL.

Related

How best to serve versioned S3 files from CloudFront?

I have a CloudFront distribution that has two origins in it: one origin that is a static S3 bucket (which is cached by CloudFront), and another origin that is a dynamic server (which is not cached). When users log into my application, the dynamic origin redirects users to the static, cached S3 bucket origin.
Right now, I'm handling the versioning of my S3 bucket by doing the following: I prepend the version of my code to the path of the S3 bucket on every release (so if my S3 bucket's path is normally /static/ui, it now becomes /v1.2/static/ui). The S3 bucket's cache behavior in CloudFront has the path pattern /static/ui, BUT the origin settings for the S3 bucket has the origin path /v1.2. Unfortunately, because the origin path isn't included in my cache behavior, whenever I have to change it to point to a new version, I have to invalidate my cache so that CloudFront will check the new origin path.
So, the release process goes like this:
I create a new version of my UI code and add it to S3, and prepend the version to my S3 bucket's path (creating a path that looks like this /v1.2/static/ui).
I change my "Origin Path" value in CloudFront associated with my S3 origin to have the new version in it (so it becomes /v1.2). This makes it so that all requests to my CloudFront distribution get forwarded to my origin with /v1.2 prepended to the origin path.
I invalidate my CloudFront cache.
This method of versioning my UI works - but is there a better way to do it? I'd like to be able to handling versioning my S3 bucket without having to invalidate the cache every time I change versions.
I ended up versioning my S3 bucket and handling my CloudFront cache busting by doing the following:
I changed the way that Webpack built my static assets by adding a hash to all of the files Webpack builds except for my index.html, which now points to all of my hashed filenames. So, for example, previously, a given JavaScript chunk that Webpack builds was called 12.js, and now it might be called 0b2fb3071ad992833dd4.js, and the index.html now references that new hashed filename. The file hash is generated by the content within the file, and will change if the content in the file changes.
I make sure that when my static files are uploaded to S3, the index.html file has the header Cache-Control: no-cache sent out with every request to the index.html file in my S3 bucket. This makes it so that CloudFront never caches the index.html file, but will still cache the hashed filenames that the index.html points to.
I prepend the version of my static files to the S3 bucket's path where I upload my static assets (so for example, my S3 bucket's path might look like /v1.2/static/ui/index.html when I make a request to my index.html file).
Upon every new release, I update my CloudFront origin path for my S3 bucket origin to point to the new version of my UI (so it might change from /v1.2 to /v1.3).
Because my index.html isn't cached by CloudFront, and because all other static files that the index.html points are hashed, and will have different file names upon each release (if the file changed at all), I no longer need to manually send a CloudFront cache invalidation to my S3 bucket's path upon every release, since upon every release, the index.html file will point to totally new files which will not be cached in CloudFront until users make new requests to it.
This also means that switching my CloudFront origin path will now instantaneously switch users to different versions of my UI, and I don't have to wait the estimated 5 minutes for a manual CloudFront cache invalidation to take effect.
Lastly, this method of cache busting also works with browser-side caching, so I was able to have files saved in my S3 bucket send a Cache-Control: max-age=604800 header to users for all files except for my index.html file and enable browser-side caching for my users. The browser-side cache is invalidated when the index.html points to a new hashed filename for its static assets. This greatly improved the performance of my application.
This all came at the cost of not being able to cache my index.html file in CloudFront or the user's browser, but I think the benefits of caching this way outweigh the cost.

Is Cache Invalidation on S3 a One Time Event or a Type of Policy?

I have a set of files on Amazon S3 cloudfront that I do not want cached. I was able to create an Invalidation on a single file that seemed to work. However, going forward the file seems to be cached again even though the Invalidation Entry is still present. Is the Invalidation a "One Time Event"? Does anyone know the exact details of how this works.
I would like a set of files to basically never be cached going forward.
Thanks for any suggestions and best practices advice.
Invalidation removes a cached entry from CloudFront's edge locations, but has no impact on whether or not the invalidated object(s) are cached again in the future. All else held equal: after you issue an invalidation, objects that were previously cached will be cached again on subsequent requests.
Before we explore the options, two definitions that are important to understand:
Cache behaviors are effectively routes with dedicated configurations applying only to requests matching the route (known as a path pattern)
Cache policies are instructions for how CloudFront will cache your responses. Cache policies are attached to one or more cache behaviors. The min and max TTL set a floor and ceiling on the value returned in your Cache-Control/Expires headers. The default TTL determines the length of time to cache a response when you don't provide a Cache-Control/Expires header.
Do you want to prevent caching for all files in your S3 bucket?
Attach the CachingDisabled cache policy (provided by CloudFront) to your default cache behavior.
Do you want to prevent caching for only certain files in your S3 bucket?
If the files you do not want to cache live in the same directory, create a cache behavior to match that path and use the CachingDisabled cache policy (provided by CloudFront) to prevent files in that directory from being cached. This instructs CloudFront to use a cache policy that does not cache responses when processing requests that match a specific path/route.
Set a Cache-Control header as metadata on the objects in S3 to instruct CloudFront not to cache, while caching the other objects.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html (Scroll down to Adding headers to your objects using the Amazon S3 console)

Why do I need Amazon S3 and Cloudfront?

I've read a lot of articles stating that I should be using Amazon S3 in conjunction with the CDN Cloudfront. I'm currently not doing this. I'm simply using Cloudfront with my standard shared hosting package.
Is it OK to use Cloudfront on its own with my standard shared hosting package? Surely there is no added benefit to using S3 also as the files are already located within Cloudfront.
Any enlightenment on this is much appreciated.
Leigh
S3 allows you to do things like static webhosting, with logging and redirection. I.E www.example.com redirects to example.com. You can then use Cloudfront to place your assets as close to the end user as possible ("nearest edge location"). An excellent guide on how to do this is in the AWS docs. Two main things are that S3 supports https, and changes to files in S3 are reflected instantly. Because Cloudfront is a CDN, you have to manually expire files if you change them, otherwise is could take up to 24 hours to reflect your changes.
http://docs.aws.amazon.com/gettingstarted/latest/swh/website-hosting-intro.html
A quick comparison between the two is given here:
http://www.bucketexplorer.com/documentation/cloudfront--amazon-s3-vs-amazon-cloudfront.html
There is no problem of using CloudFront against your own origin server comparing to a S3 server.
There are some benefits of using S3:
Data transfer is faster between S3 and CloudFront
Don't need to worry about the stability and maintenance of origin S3 server
Multiple origin regions
There are also benefits if you use your own server:
Cost saving of S3 hosting (this depends on whether you need to pay for your own server)
Easy for customization should you need it
Data storage location for company/country regulation
So it's all depending on your specific circumstances, such as how much you pay for your hosting package, do you need low-level configuration of your origin server, and how sensitivity your data is.
I would say for majority of the small/medium projects, S3 is a perfect place to store data.

Flushing CloudFront's cache when S3 files are deleted

I have set up S3 with CloudFront as my CDN. As you know, when you upload files to S3 bucket, they are pushed to all CloudFront's edge locations and cached for best performance.
If I delete files from S3 they remain in CDN's cache and are still being served to the end-users. How to prevent this behavior? I want CloudFront to serve only the files that are actually available in the S3 storage.
Thanks in advance.
You can invalidate objects on Cloudfront using the API or the Console. When doing this, the files get deleted from the Cloudfront edge locations.

Are new items added to a S3 bucket automatically served over CDN?

I have a bucket of images on S3. I set up the bucket contents to be delivered over a CDN using Cloudfront. Now if I add new images to the bucket do I need to re-setup Cloudfront for the bucket or are new items automatically distributed?
The items aren't automatically distributed, but there's nothing else you need to configure. The CDN nodes look locally for cached objects affiliated with that bucket. If the object doesn't exist but needs to be served from the CDN, the node requests it from S3 and caches it for a predetermined amount of time.
It's actually a lot like "DNS propagation", a common misnomer by which some people think DNS setting changes "propagate" around the world. In reality, the world's servers request the information then cache it locally. You're not waiting for DNS changes to propagate; you're waiting for the cached settings to expire. So it goes with CloudFront: New objects are served and cached when they're first requested. If you replace an object with a new object by the same name, CDN nodes that cached the object will show the old one until it expires.