I have a static website hosted on S3 with Cloudfront and I can't get TTL to work - amazon-s3

I've set the TTL max, min and default all to 0 (on the "Default Cache Behavior Settings" page), thinking this would mean that when I upload a new file called events.html to S3 it would replace the old events.html page, but I'm still seeing the cached version after a few hours.
I am just trying to update the content on some of my webpages.

If you want to invalidate your cache with new update in s3, you need to do it explicitly with putobject event. You can call the lambda to invalidate CF cache.
Here is example: https://blog.miguelangelnieto.net/posts/Automatic_Cloudfront_invalidation_with_Amazon_Lambda.html
Be aware that with above approach, if you refresh cache more than 1000 files in a month you have to pay extra invalidation fees. Refer CF pricing.
Also with TTL, you can do it but it will happen after the TTL value is elapsed and you have to clear the browser cache to view it.

Related

Lambda script to direct to fallback S3 domain subfolder when not found

As per this question, and this one the following piece of code, allows me to point a subfolder in a S3 bucket to my domain.
However in instances where the subdomain is not found, I get the following error message:
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>2CE9B7837081C817</RequestId>
<HostId>
T3p7mzSYztPhXetUu7GHPiCFN6l6mllZgry+qJWYs+GFOKMjScMmRNUpBQdeqtDcPMN3qSYU/Fk=
</HostId>
</Error>
I would not like it to display this error message, instead in instances like this I would like to serve from another S3 bucket subdomain (i.e. example-bucket.s3-website.us-east-2.amazonaws.com/error) for example where the user will be greeted with a fancy error message. So therefore in a situation where a S3 bucket subfolder is not found, it should fall back to there. How do I accomplish this by changing the node function below.
'use strict';
// if the end of incoming Host header matches this string,
// strip this part and prepend the remaining characters onto the request path,
// along with a new leading slash (otherwise, the request will be handled
// with an unmodified path, at the root of the bucket)
const remove_suffix = '.example.com';
// provide the correct origin hostname here so that we send the correct
// Host header to the S3 website endpoint
const origin_hostname = 'example-bucket.s3-website.us-east-2.amazonaws.com'; // see comments, below
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
const headers = request.headers;
const host_header = headers.host[0].value;
if(host_header.endsWith(remove_suffix))
{
// prepend '/' + the subdomain onto the existing request path ("uri")
request.uri = '/' + host_header.substring(0,host_header.length - remove_suffix.length) + request.uri;
}
// fix the host header so that S3 understands the request
headers.host[0].value = origin_hostname;
// return control to CloudFront with the modified request
return callback(null,request);
};
The Lambda#Edge function is an origin request trigger -- it runs after the CloudFront cache is checked and a cache miss has occurred, immediately before the request (as it stands after being modified by the trigger code) is sent to the origin server. By the time the response arrives from the origin, this code has finished and can't be used to modify the response.
There are several solutions, including some that are conceptually valid but extremely inefficient. Still, I'll mention those as well as the cleaner/better solutions, in the interest of thoroughness.
Lambda#Edge has 4 possible trigger points:
viewer-request - when request first arrives at CloudFront, before the cache is checked; fires for every request.
origin-request - after the request is confirmed to be a cache miss, but before the request is sent to the origin server; only fires in cache misses.
origin-response - after a response (whether success or error) is returned from the origin server, but before the response is potentially stored in the cache and returned to the viewer; if this trigger modifies the response, the modified response will be stored in the CloudFront cache if cacheable, and returned to the viewer; only fires on cache misses
viewer-response - inmediately before the response is returned to the viewer, whether from the origin or cache; fires for every non-error response, unless that response was spontaneously emitted by a viewer-request trigger, or is the result of a custom error document that sets the status code to 200 (a definite anti-pattern, but still possible), or is a CloudFront-generated HTTP to HTTPS redirect.
Any of the trigger points can assume control of the signal flow, generate its own spontaneous response, and thus change what CloudFront would have ordinarily done -- e.g. if you generate a response directly from an origin-request trigger, CloudFront doesn't actually contact the origin... so what you could theoretically do is check S3 in the origin-request trigger to see if the request will succeed and generate a custom error response, instead. The AWS Javascript SDK is automatically bundled into the Lambda#Edge environmemt. Technically legitimate, this is probably a terrible idea in almost any case, since it will increase both costs and latency due to extra "look-ahead" requests to S3.
Another option is to write a separate origin-response trigger to check for errors, and if occurs, replace it with a customized response from the trigger code. But this idea also qualifies as non-viable, since that trigger will fire for all responses to cache misses, whether success or failure, increasing costs and latency, wasting time for a majority of cases.
A better idea (cost, performance, ease-of-use) is CloudFront Custom Error Pages, which allows you to define a specific HTML document that CloudFront will use for every error matching the specified code (e.g. 403 for access denied, as in the original question). CloudFront can also change that 403 to a 404 when handling those errors. This requires that you do several things when the source of the error file is a bucket:
create a second CloudFront origin pointing to the bucket
create a new cache behavior that routes exactly that one path (e.g. /shared/errors/not-found.html) to the error file over to the new origin (this means you can't use that path on any of the subdomains -- it will always go directly to the error file any time it's requested)
configure a CloudFront custom error response for code 403 to use the path /shared/errors/not-found.html.
set Error Caching Minimum TTL to 0, at least while testing, to avoid some frustration for yourself. See my write-up on this feature but disregard the part where I said "Leave Customize Error Response set to No".
But... that may or may not be needed, since S3's web hosting feature also includes optional Custom Error Document support. You'll need to create a single HTML file in your original bucket, enable the web site hosting feature on the bucket, and change the CloudFront Origin Domain Name to the bucket's web site hosting endpoint, which is in the S3 console but takes the form of${bucket}.s3-website.${region}.amazonaws.com. In some regions, the hostname might have a dash - rather than a dot . after s3-website for legacy reasons, but the dot format should work in any region.
I almost hesitate mention one other option that comes to mind, since it's fairly advanced and I fear the description might seem quite convoluted... but you also could do the following, and it would be pretty slick, since it would allow you to potentiallh generate a custom HTML page for each erroneous URL requested.
Create a CloudFront Origin Group with your main bucket as the primary and a second, empty, "placeholder" bucket as secondary. The only purpose served by the second bucket is so that we give CloudFront a valid name that it plans to connect to, even though we won't actually connect to it, as may become clear, below.
When request fails to the primary origin, matching one of the configured error status codes, the secondary origin is contacted. This is intended for handling the case when an origin fails, but we can leverage it for our purposes, because before actually contacting the failover origin, the same origin request trigger fires a second time.
If the primary origin returns an HTTP status code that you’ve configured for failover, the Lambda function is triggered again, when CloudFront re-routes the request to the second origin.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/high_availability_origin_failover.html#concept_origin_groups.lambda
(It would be more accurate to say "...when CloudFront is preparing to re-route the request to the second origin," because the trigger fires first.)
When the trigger fires a second time, the specific reason it fires isn't preserved, but there is a way to identify whether you're running in the first or second invocation: one of these two values will contain the hostname of the origin server CloudFront is preparing to contact:
event.Records[0].cf.request.origin.s3.domainName # S3 rest endpoints
event.Records[0].cf.request.origin.custom.domainName # non-S3 origins and S3 website-hosting endpoints
So we can test the appropriate value (depending on origin type) in the trigger code, looking for the name of the second "placeholder" bucket. If it's there, bypass the current logic and generate the 404 response from inside the Lambda function. This could be dynamic/customized HTML, such as with the page URI or perhaps one that varies depending on whether / or some other page is requested. As noted above, spontaneously generating a response from an origin-request trigger prevents CloudFronr from actually contacting the origin. Generated responses from an origin-request trigger are limited to 1MB but that should be beyond sufficient for this use case.

Specify metadata cache expiry for Simple.Odata.Client?

Is there a way with Simple.Odata.Client to specify the expiry of the metadata cache?
Within ConfigureServices I'm making an initial request to GetMetadataDocumentAsync to preload the metadata document. I would like to expire the cache after n hours/days to ensure that changes to the resource are reflected in my application.
My current thought is to just set a timer to run ODataClient.ClearMetadataCache(); however I just wanted to check that there isn't something I should be setting when instantiating the ODataClient or when retrieving the metadata document.
Many thanks

ImageResizer DiskCache+AzureReader strange behaviour

I'm using ImageResizer + Diskcache plugin and I'm finding issues to get cache working properly. Either the images are cached forever (regardless how many times I upload a new image) or changing some settings I get the old image in some browsers/computers and the new one in others.
This is what I have now in my web.config:
<add name="AzureReader2" connectionString="blahblahblah" endpoint="http://blahblahblah.blob.core.windows.net/" prefix="~/" redirectToBlobIfUnmodified="false" requireImageExtension="false" checkForModifiedFiles="true" cacheMetadata="true"/>
and:
<diskcache dir="~/imagecache" autoclean="true" hashModifiedDate="true" subfolders="8192" asyncWrites="true" asyncBufferSize="10485760" cacheAccessTimeout="15000" logging="true" />
Not sure if is something I can achieve using the existing parameters. My goal is to invalidate the cache preferably when the new image has been uploaded without having to change the query string serving the image to get the new one.
I was thinking:
Maybe having a blob storage trigger that when a replacement image has
been uploaded, fires a webhook that deletes the cache for that image?
Or a web request to my imageresizer app to preload the new image in
cache so it replaces the old cached image???
I've seen some posts about using IVirtualFileWithModifiedDate but from what I understand that would have a big performance impact? Is probably 5% of our images request that will have someone uploading an image and expecting it to see it right away since most of the images barely change but it's really frustrating if the image doesn't show the new one not even a day after they have uploaded it!
If I can use IVirtualFileWithModifiedDate to invalidate the cache when the image has changed and not in every image request? Would that be possible?
I get the old image in some browsers/computers and the new one in others.
That different browsers are displaying different versions indicates that either browser caching or proxy/CDN caching is at fault.
ImageResizer's DiskCache hashes the modified date, so it is always as correct as the storage provider.
Regarding your expectations around server-side invalidation:
You're using checkForModifiedFiles="true" cacheMetadata="true", which means that Azure is queried for the latest modified date, but that metadata is cached with a sliding expiration window of 1 hour. I.e, if a URL hasn't been accessed in 1 hour, the next request will cause the modified date to be checked. See StandardMetadataCache.
You can change this behavior by implementing IMetadataCache yourself and assigning that cache to the .MetadataCache member of the storage provider you're using.

How to work around RequestTimeTooSkewed in Fine Uploader

I'm using Fine Uploader with S3 and I have a client whose computer time is off, resulting in an S3 RequestTimeTooSkewed error. Ideally, my client would have the right time, but I'd like to have my app be robust to this situation.
I've seen this post - https://github.com/aws/aws-sdk-js/issues/399 on how to automatically retry the request. You take the ServerTime from the error response and use that as the time in the response. An alternative approach would just be to get the time from a reliable external source every time, avoiding the need for a retry. However, I'm not sure how to hook either approach into S3 Fine Uploader. Does anyone have an idea of how to do this?
A solution was provided in Fine Uploader 5.5 to address this very situation. From the S3 feature documentation:
If the clock on the machine running Fine Uploader is too far off of the current date, S3 may reject any requests sent from this machine. To overcome this situation, you can include a clock drift value, in milliseconds, when creating a new Fine Uploader instance. One way to set this value is to subtract the current time according to the browser from the current unix time according to your server. For example:
var uploader = new qq.s3.FineUploader({
request: {
clockDrift: SERVER_UNIX_TIME_IN_MS - Date.now()
}
})
If this value is non-zero, Fine Uploader S3 will use it to pad the x-amz-date header and the policy expiration date sent to S3.

Get the eventual url for a file uploaded to S3

Working on an app for a client that will asynchronously receive a request, reply immediately, then go out and fetch a set of large files to perform work on, and finally upload the results to S3 minutes or hours later.
Can we know ahead of time what the eventual url of the file on S3 will be? I'm thinking of creating a hash based on the filename and some other metadata that we know at the incoming request initialization and using that as the name of the S3 file. Is there a predictable pattern of S3 host plus bucket plus file name, or is it something that we don't know until the file upload is complete?
I'm entertaining the idea of returning the eventual S3 filename to the initial request, with the expectation that on the client's end they can periodically check the url for the result. In addition, I'm considering requiring the client to pass a callback url in with the request. The app will then hit that url later with the success/fail status of the work.
Thanks.
The URL of a file uploaded to S3 can be entirely determined by you - it's purely dependent on the bucket and key name. Specifically, it's of the form:
http://s3.amazonaws.com/BUCKETNAME/KEYNAME
(Or certain other formats, depending. It's still completely predictable.)
So long as you pick the key name ahead of time, you'll know what the eventual URL will be.