how to determine s3 browser upload policy expiration duration - file-upload

I have moved server based s3 upload to direct upload to s3 from browser, I am setting policy on upload with some expiration.
$policy = array(
'expiration' => gmdate('Y-m-d\TG:i:s\Z', strtotime('+10 minutes')),
'conditions' => array(
array('bucket' => Configure::Read('s3upload.s3bucket')),
array('acl' => 'private'),
array('starts-with', '$key', ''),
array('success_action_redirect' => $success_action_redirect),
array('starts-with', '$Content-Type', ''),
array("content-length-range", 0, 16*1024*1024) //16 mb
)
);
I need to pass AWSAccessKey, encoded policy, signature in form data, which can be compromise for fake uploads ( I have doubts on this as well).
To avoid exploitation i should be putting minimum expiration time ? but as client side, file size and connection speed can not be determined.
I have experimented with 10 seconds expiry and big file upload to verify, I observed all uploads attempts getting Policy expiration error,
<Error>
<Code>AccessDenied</Code>
<Message>Invalid according to Policy: Policy expired.</Message>
<RequestId>2178D49CAA739EB3</RequestId>
<HostId>id=</HostId>
</Error>
but as my WI-FI connection frequently disconnects so I am not able to determine correct behaviour, but with stable lan connection I am able to upload big file with smaller policy expiration.
How should I determine correct expiration value for policy expiration?

Don't do this.
array('starts-with', '$key', ''),
This allows any file in the bucket to be overwritten by manipulating the form. Put the exact key where the file will be uploaded in the policy. The worst anyone could do if they came into possession of the signature would then be to overwrite the same file.
However, if your site uses SSL and the S3 form post uses SSL, the chance of this information being compromised is very remote.
Note also that if form posts behave like signed GET requests, then the expiration duration should only matter until the upload starts. Once it is running, it should be able to finish after the expiration without causing an error, as long as it starts before the expiration. I assume that S3 authenticates the request when the headers are received, but perhaps that isn't true of form post uploads? Are you finding that not to be the case?

Related

Lambda script to direct to fallback S3 domain subfolder when not found

As per this question, and this one the following piece of code, allows me to point a subfolder in a S3 bucket to my domain.
However in instances where the subdomain is not found, I get the following error message:
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>2CE9B7837081C817</RequestId>
<HostId>
T3p7mzSYztPhXetUu7GHPiCFN6l6mllZgry+qJWYs+GFOKMjScMmRNUpBQdeqtDcPMN3qSYU/Fk=
</HostId>
</Error>
I would not like it to display this error message, instead in instances like this I would like to serve from another S3 bucket subdomain (i.e. example-bucket.s3-website.us-east-2.amazonaws.com/error) for example where the user will be greeted with a fancy error message. So therefore in a situation where a S3 bucket subfolder is not found, it should fall back to there. How do I accomplish this by changing the node function below.
'use strict';
// if the end of incoming Host header matches this string,
// strip this part and prepend the remaining characters onto the request path,
// along with a new leading slash (otherwise, the request will be handled
// with an unmodified path, at the root of the bucket)
const remove_suffix = '.example.com';
// provide the correct origin hostname here so that we send the correct
// Host header to the S3 website endpoint
const origin_hostname = 'example-bucket.s3-website.us-east-2.amazonaws.com'; // see comments, below
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
const headers = request.headers;
const host_header = headers.host[0].value;
if(host_header.endsWith(remove_suffix))
{
// prepend '/' + the subdomain onto the existing request path ("uri")
request.uri = '/' + host_header.substring(0,host_header.length - remove_suffix.length) + request.uri;
}
// fix the host header so that S3 understands the request
headers.host[0].value = origin_hostname;
// return control to CloudFront with the modified request
return callback(null,request);
};
The Lambda#Edge function is an origin request trigger -- it runs after the CloudFront cache is checked and a cache miss has occurred, immediately before the request (as it stands after being modified by the trigger code) is sent to the origin server. By the time the response arrives from the origin, this code has finished and can't be used to modify the response.
There are several solutions, including some that are conceptually valid but extremely inefficient. Still, I'll mention those as well as the cleaner/better solutions, in the interest of thoroughness.
Lambda#Edge has 4 possible trigger points:
viewer-request - when request first arrives at CloudFront, before the cache is checked; fires for every request.
origin-request - after the request is confirmed to be a cache miss, but before the request is sent to the origin server; only fires in cache misses.
origin-response - after a response (whether success or error) is returned from the origin server, but before the response is potentially stored in the cache and returned to the viewer; if this trigger modifies the response, the modified response will be stored in the CloudFront cache if cacheable, and returned to the viewer; only fires on cache misses
viewer-response - inmediately before the response is returned to the viewer, whether from the origin or cache; fires for every non-error response, unless that response was spontaneously emitted by a viewer-request trigger, or is the result of a custom error document that sets the status code to 200 (a definite anti-pattern, but still possible), or is a CloudFront-generated HTTP to HTTPS redirect.
Any of the trigger points can assume control of the signal flow, generate its own spontaneous response, and thus change what CloudFront would have ordinarily done -- e.g. if you generate a response directly from an origin-request trigger, CloudFront doesn't actually contact the origin... so what you could theoretically do is check S3 in the origin-request trigger to see if the request will succeed and generate a custom error response, instead. The AWS Javascript SDK is automatically bundled into the Lambda#Edge environmemt. Technically legitimate, this is probably a terrible idea in almost any case, since it will increase both costs and latency due to extra "look-ahead" requests to S3.
Another option is to write a separate origin-response trigger to check for errors, and if occurs, replace it with a customized response from the trigger code. But this idea also qualifies as non-viable, since that trigger will fire for all responses to cache misses, whether success or failure, increasing costs and latency, wasting time for a majority of cases.
A better idea (cost, performance, ease-of-use) is CloudFront Custom Error Pages, which allows you to define a specific HTML document that CloudFront will use for every error matching the specified code (e.g. 403 for access denied, as in the original question). CloudFront can also change that 403 to a 404 when handling those errors. This requires that you do several things when the source of the error file is a bucket:
create a second CloudFront origin pointing to the bucket
create a new cache behavior that routes exactly that one path (e.g. /shared/errors/not-found.html) to the error file over to the new origin (this means you can't use that path on any of the subdomains -- it will always go directly to the error file any time it's requested)
configure a CloudFront custom error response for code 403 to use the path /shared/errors/not-found.html.
set Error Caching Minimum TTL to 0, at least while testing, to avoid some frustration for yourself. See my write-up on this feature but disregard the part where I said "Leave Customize Error Response set to No".
But... that may or may not be needed, since S3's web hosting feature also includes optional Custom Error Document support. You'll need to create a single HTML file in your original bucket, enable the web site hosting feature on the bucket, and change the CloudFront Origin Domain Name to the bucket's web site hosting endpoint, which is in the S3 console but takes the form of${bucket}.s3-website.${region}.amazonaws.com. In some regions, the hostname might have a dash - rather than a dot . after s3-website for legacy reasons, but the dot format should work in any region.
I almost hesitate mention one other option that comes to mind, since it's fairly advanced and I fear the description might seem quite convoluted... but you also could do the following, and it would be pretty slick, since it would allow you to potentiallh generate a custom HTML page for each erroneous URL requested.
Create a CloudFront Origin Group with your main bucket as the primary and a second, empty, "placeholder" bucket as secondary. The only purpose served by the second bucket is so that we give CloudFront a valid name that it plans to connect to, even though we won't actually connect to it, as may become clear, below.
When request fails to the primary origin, matching one of the configured error status codes, the secondary origin is contacted. This is intended for handling the case when an origin fails, but we can leverage it for our purposes, because before actually contacting the failover origin, the same origin request trigger fires a second time.
If the primary origin returns an HTTP status code that you’ve configured for failover, the Lambda function is triggered again, when CloudFront re-routes the request to the second origin.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/high_availability_origin_failover.html#concept_origin_groups.lambda
(It would be more accurate to say "...when CloudFront is preparing to re-route the request to the second origin," because the trigger fires first.)
When the trigger fires a second time, the specific reason it fires isn't preserved, but there is a way to identify whether you're running in the first or second invocation: one of these two values will contain the hostname of the origin server CloudFront is preparing to contact:
event.Records[0].cf.request.origin.s3.domainName # S3 rest endpoints
event.Records[0].cf.request.origin.custom.domainName # non-S3 origins and S3 website-hosting endpoints
So we can test the appropriate value (depending on origin type) in the trigger code, looking for the name of the second "placeholder" bucket. If it's there, bypass the current logic and generate the 404 response from inside the Lambda function. This could be dynamic/customized HTML, such as with the page URI or perhaps one that varies depending on whether / or some other page is requested. As noted above, spontaneously generating a response from an origin-request trigger prevents CloudFronr from actually contacting the origin. Generated responses from an origin-request trigger are limited to 1MB but that should be beyond sufficient for this use case.

I have a static website hosted on S3 with Cloudfront and I can't get TTL to work

I've set the TTL max, min and default all to 0 (on the "Default Cache Behavior Settings" page), thinking this would mean that when I upload a new file called events.html to S3 it would replace the old events.html page, but I'm still seeing the cached version after a few hours.
I am just trying to update the content on some of my webpages.
If you want to invalidate your cache with new update in s3, you need to do it explicitly with putobject event. You can call the lambda to invalidate CF cache.
Here is example: https://blog.miguelangelnieto.net/posts/Automatic_Cloudfront_invalidation_with_Amazon_Lambda.html
Be aware that with above approach, if you refresh cache more than 1000 files in a month you have to pay extra invalidation fees. Refer CF pricing.
Also with TTL, you can do it but it will happen after the TTL value is elapsed and you have to clear the browser cache to view it.

s3 file downloads are unpredictable

We have uploaded several zip files to s3. All are in the hundreds of MB range.
We download the files, typically via a script, it appears that the file size and type both change. The new file size typically is about 300 bytes and the file type once downloaded is xml.
The content of the files look similar to this (whitespace added for clarity):
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Key>gpdb-5.0.0.0/greenplum-db-5.0.0.0-rhel5-x86_64.zip</Key>
<RequestId>83D2047BDBA195A6</RequestId>
<HostId>tXKFaiRaNjD26j6fcrTjCk858PGBH2RAjLE1aO4+8hovD6mf+hUzJvCdWKKgrDJGaHXsjWbQP2A=</HostId>
</Error>
Any thoughts as to what might be causing this? It does not happen all of the time. It's somewhat intermittent.
As you will note in the S3 API Reference, this isn't a file -- it's an error message.
Files in S3 are called Objects, and the path + filename of the object is referred to as the object key.
A key is the unique identifier for an object within a bucket. Every object in a bucket has exactly one key. [...] Every object in Amazon S3 can be uniquely addressed through the combination of the web service endpoint, bucket name, key, and optionally, a version. For example, in the URL http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl, "doc" is the name of the bucket and "2006-03-01/AmazonS3.wsdl" is the key.
http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#BasicsKeys
This error message, which should have been accompanied by a 404 Not Found error code when you tried to access the object, indicates that there is no object in the bucket whose key (path + filename) is the one shown in the error -- the one you requested. You should be able to confirm its absence in the S3 console.
If the object should have been uploaded some time in the past, this error means the object wasn't actually uploaded, or has subsequently been deleted.
If the object was very recently uploaded (typically within seconds), you should not get this error, but it is possible for this error to occur under either of two additional conditions:
If you try to check whether an object exists by sending a GET or HEAD request to the bucket, then uploaded the object. If you do this, a short period of time may elapse before the object is accessible because of internal optimizations inside S3. When you try to fetch a non-existent object, S3 may -- for a brief time -- have an internal concept that the object is not there, even though it has safely been stored. Retry your request.
If you already had an object with the same key, then you deleted it, then you uploaded a new object with the same key, for a brief time after the new upload, you could either get the error above or you could actually download the old object again.
These conditions are somewhat uncommon, but they can occur, particularly if your bucket has a lot of traffic, due to S3's consistency model, which is an engineered tradeoff between performance, reliability, and immediate availability of uploaded objects when the same object has been recently downloaded, attempted to be downloaded, deleted, or overwritten.
The <RequestId> and <HostId> codes in the error response are opaque diagnostic codes that you can provide to AWS support, if you need to submit a support request about a specific problem you are experiencing with S3... they can use these to find the specific request and identify the problem. They are not considered sensitive information, since they have no meaning outside of AWS.
In this case, there is no apparent need to contact AWS support, because it appears that you are simply trying to download an object that is not in the specific bucket from which you tried to download this particular file. If you get alternating success and failure for the exact same file, that's unexpected, and a support case might be in order... but typically, an internal error in S3 should result in a very different response.

How to work around RequestTimeTooSkewed in Fine Uploader

I'm using Fine Uploader with S3 and I have a client whose computer time is off, resulting in an S3 RequestTimeTooSkewed error. Ideally, my client would have the right time, but I'd like to have my app be robust to this situation.
I've seen this post - https://github.com/aws/aws-sdk-js/issues/399 on how to automatically retry the request. You take the ServerTime from the error response and use that as the time in the response. An alternative approach would just be to get the time from a reliable external source every time, avoiding the need for a retry. However, I'm not sure how to hook either approach into S3 Fine Uploader. Does anyone have an idea of how to do this?
A solution was provided in Fine Uploader 5.5 to address this very situation. From the S3 feature documentation:
If the clock on the machine running Fine Uploader is too far off of the current date, S3 may reject any requests sent from this machine. To overcome this situation, you can include a clock drift value, in milliseconds, when creating a new Fine Uploader instance. One way to set this value is to subtract the current time according to the browser from the current unix time according to your server. For example:
var uploader = new qq.s3.FineUploader({
request: {
clockDrift: SERVER_UNIX_TIME_IN_MS - Date.now()
}
})
If this value is non-zero, Fine Uploader S3 will use it to pad the x-amz-date header and the policy expiration date sent to S3.

Get the eventual url for a file uploaded to S3

Working on an app for a client that will asynchronously receive a request, reply immediately, then go out and fetch a set of large files to perform work on, and finally upload the results to S3 minutes or hours later.
Can we know ahead of time what the eventual url of the file on S3 will be? I'm thinking of creating a hash based on the filename and some other metadata that we know at the incoming request initialization and using that as the name of the S3 file. Is there a predictable pattern of S3 host plus bucket plus file name, or is it something that we don't know until the file upload is complete?
I'm entertaining the idea of returning the eventual S3 filename to the initial request, with the expectation that on the client's end they can periodically check the url for the result. In addition, I'm considering requiring the client to pass a callback url in with the request. The app will then hit that url later with the success/fail status of the work.
Thanks.
The URL of a file uploaded to S3 can be entirely determined by you - it's purely dependent on the bucket and key name. Specifically, it's of the form:
http://s3.amazonaws.com/BUCKETNAME/KEYNAME
(Or certain other formats, depending. It's still completely predictable.)
So long as you pick the key name ahead of time, you'll know what the eventual URL will be.