Allowing multiple content types in HTTP POST Amazon S3 upload policy document - amazon-s3

Does anybody know how to allow multiple content types in an Amazon S3 upload policy when uploading using HTTP POST? I can't seem to find the answer to this anywhere.
I am aware that I can restrict an upload to any file with a MIME type that starts with "image/" as follows:
{"expiration": "2015-02-28T00:00:00Z",
"conditions": [
["starts-with", "$Content-Type", "image/*"]
]
}
But how would I go about allowing only a certain few MIME types which might not all start with the same characters?

This isn't supported. It's either a single pattern match (including a wildcard), or you have to allow all.
Depending on how the form is being generated -- dynamically, one assumes -- you might be able to simply tell the application the content-type of the file you intend to upload when requesting the resource that builds the form, hence, telling the application what content-type value to use on the form and when generating the policy document.
If the application doesn't find that content-type in its list of acceptable values, it could just refuse to render the form, and refuse to create and sign a matching policy statement.
Depending on the application, there may be little point in worrying too much about the Content-Type field here, because this is not actually restricting the content-types that can be uploaded... it's only restricting the value passed in the value parameter of input type="input" name="Content-Type". That's all this actually restricts.
There's no validation being performed as to whether that value accurately represents the MIME type of the payload that is being updated, so the policy document isn't restricting what kind of content you can upload. It's only restricting what kind of content you can claim you are uploading.
It may also be more appropriate to just accept otherwise-unusable uploads and handle the problem on the back-end, after the fact.

Related

List of reserved parameter names for AWS S3

It seems that the GET parameter location is a reserved parameter on AWS S3. Say I have a resource on an S3 bucket, accessible via the web:
http://my-bucket.s3.amazonaws.com/index.html
... and I simply append the GET parameter location to it, I get an HTTP 403:
http://my-bucket.s3.amazonaws.com/index.html?location=US
It works so long as I change the parameter name to something else. For example:
http://my-bucket.s3.amazonaws.com/index.html?loc=US
So clearly location is a reserved word in AWS S3. My question is: is there a list of all reserved words I shouldn't try to use as GET parameters with S3?
I searched the docs but couldn't find any such list.
location in the query tells S3 that you're asking for the location of a bucket. It's one of several "subresources" (things that are not objects) in S3 that are accessed via query string parameters.
You could probably compile a nearly complete list by reviewing the entire API reference documentation, but here's a partial list found in some older docs (Signature Version 2):
The subresources that must be included when constructing the CanonicalizedResource Element are acl, lifecycle, location, logging, notification, partNumber, policy, requestPayment, torrent, uploadId, uploads, versionId, versioning, versions, and website.
https://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html
They periodically add new ones, like select and delete and tagging, so an exhaustive list is not future-proof.
Your safest bet is to use parameters beginning with x- (but not beginning with x-amz since these may be reserved or carry other implications). This is mentioned in the logging documentation:
You can include custom information to be stored in the access log record for a request by adding a custom query-string parameter to the URL for the request. Amazon S3 ignores query-string parameters that begin with "x-", but includes those parameters in the access log record for the request, as part of the Request-URI field of the log record.
https://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html

RESTful - Different URI of same resource to get different forms of same resource

I'm pushed into a peculiar situation where I'm not able to decide what is wrong and what is right.
I've a resource called Invoice. To get a JSON or XML representation I use below URI
somedomain.com/inovices/{InvoiceNumber} - Invoice number is numeric
Accept: application/xml
When I want a PDF of same resource I use below URI:
somedomain.com/inovices/{InvoiceNumber} - Invoice number is numeric
Accept: application/pdf
Both the above url's are served for authenticated requests. We also want to support same resource using a GUID for unauthenticated requests, hence we want to use below URL
somedomain.com/inovices/{GUID}
Accept: application/pdf
Above URL is like a permanent URL and anybody can access this URL any number of times. My confusion is whether providing URL as above is RESTful or not. Because in one URL I'm using invoice number which is numeric and for permanent URL I'm replacing it with GUID.
Reason why I felt this is wrong is same resource is represented with two different URI's (number and GUID) even though they are returning same resource. Or is it just my assumption that it is wrong? Is it against any REST cosntraint is what I'm not able to understand?
There's no problem at all different URIs to point at the same resource. It's not only ok, sometimes it's also recommended, if that would add value to the user.
Think of these examples:
GET /api/users/543
GET /api/users/bob-marley
or, as SO does:
GET /questions/16637720/restful-different-uri-of-same-resource-to-get-different-forms-of-same-resource
GET /q/16637720/1118323
There are similar examples everywhere. You may want to add this "unnessecary" information if it's going to help users, or SEO, and still keep the short version available. Or imagine senarios, where you would want to add more ways of accessing resources without breaking existing ones. It sounds quite common to me and you don't break any rules by having more than one URI for the same resource.
If you are worried that a user might think that the two resources are different since she used different URI, you can have one URI redirect to the other, to make it explicit that it's exactly the same resource (as SO does when you hit the short link, or the link without the thread heading).
Here is a relevant answer.

REST API with different content types

I have a REST API endpoint to modify a resource i.e. PUT. The resource can have a file associated with it so I support two different content types: application/json and multipart/form-data. The first is for modifying the resource without associating a file and the second for when I want to associate a file with the resource.
What is the best way of representing this? Should I use the same URI for two different content-types e.g. update/:resourceId? Should I have two different endpoints e.g. update/:resourceId and updateWithResource/:resourceId? Or is this completely the wrong way to go and I should do something else?
Content types are just different representations of the same ressource. So as long as they represent the same thing, they can, and they should, share the same URI.
URI should not affect content type, that's not RESTish. Negotiate the representation only with the content-type header.

S3 Bucket Types

Just wondering if there is a recommended strategy for storing different types of assets/files in separate S3 buckets or just put them all in one bucket? The different types of assets that I have include: static site images, user's profile images, user-generated content like documents, files, and videos.
As far as how to group files into buckets. That is really not that critical of an issue unless you want to have different domain names or CNAMEs fordifferent types on content, in which case you would need a separate bucket for each domain name you would want to use.
I would tend to group them by functionality. Perhaps static files used in your application that you have full control over you might deploy into a separate bucket from content that is going to be user generated. Or you might want to have video in a different bucket than images, etc.
To add to my earlier comments about S3 metadata. It is going to be a critical part of optimizing how you server up content from S3/Cloudfront.
Basically, S3 metadata consists of key-value pairs. So you could have Content-Type as a key with a value of image/jpeg for example if the file is .jpg. This will automatically send appropriate Content-Type headers corresponding to your values for requests made directly to S3 URL or via Cloudfront. The same is true of Cache-Control metatags. You can also use your own custom metatags. For example, I use a custom metatag named x-amz-meta-md5 to store an md5 hash of the file. It is used for simple bucket comparisons against content stored in a revision control system, so we don't have to make checksums of each file in the bucket on the fly. We use this for pushing differential content updates to the buckets (i.e. only push those that have changed).
As far as how revision control goes. I would HIGHLY recommend using versioned file names. In other words say you have bigimage.jpg and you want to make an update, call it bigimage1.jpg and change your code to reflect this. Why? Because optimally, you would like to set long expiration time frames in your Cache-Control headers. Unfortunately, if you then want to deploy a file of the same name and you are using Cloudfront, it becomes problematic to invalidate the edge caching locations. Whereas if you have a new file name, Cloudfront would just begin to populate the edge nodes and you don't have to worry about invalidating the cache at all.
Similarly for user-produced content, you might want to include an md5 or some other (mostly) unique identifier scheme, so that each video/image can have its own unique filename and place in the cache.
For your reference here is a link to the AWs documentation on setting up streaming in Cloudfront
http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/CreatingStreamingDistributions.html

http webrequest content type null

is it necessary to mention content-type in http header while uploading the file. i tried using c#. i had set it "image/png" while uploading a pdf file and when i downloaded the uploaded file, the pdf file was perfect. it didn't get corrupted.
so what is the role of specifying content-type in http header.
can it be null or any other wrong value.
because the application that i am making, user will just give the file and and i just need to upload it.
any help highly appreciated. thanks in advance.
Weird that nobody answered this question.
You should always set the content-type, some software (servers) may break when it's omitted.
You could give it any value that the target system can handle.
Since you can easily fake http request, you can also fake the headers.
So your target system (in this case an upload processor) should only accept content-type
values it's able to handle and you MUST validate if the given content-type actually matches the data that is send in the body (the uploaded file itself).
You can never trust the content-type value, until you validate it somehow.
As a PHP developer I always check any file upload against a mime-type validator to be sure I got what I expected. For example I use getimagesize() to detect whether it's an image or not and if it is to get it's file format type (PNG).
Since both PNG and PDF files are binary file formats your upload succeeded.
This is because you coded it that way or the target system falls back on default settings or does some checking for itself.