Special chars in Amazon S3 keys? - amazon-s3

Is it possible to have special characters like åäö in the key? If i urlencode the key before storing it works, but i cant really find a way to access the object. If i write åäö in the url i get access denied (like i get if the object is not found). If i urlencode the url i paste in the browser i get "InvalidURICouldn't parse the specified URI".
Is there some way to do this?

Amazon supports key names with Unicode characters. You do not need to URL encode the key name when you upload a file to Amazon. You do need to URL encode the key name when you generate a download URL.
If you upload a file named åäö.txt to a bucket named mybucket, the download URL will be http://mybucket.s3.amazonaws.com/%C3%A5%C3%A4%C3%B6.txt
If you are using .NET the SprightlySoft S3 Component for .NET has a function to easily generate a download URL and it fully supports special characters in key names. Give it a try at sprightlysoft.com

Related

Can I generate a presigned s3 url that incorporates an s3-select expression?

I'm storing large datasets in s3 and want to create presigned urls to hand out to clients who want to download selected columns from a dataset. The (java) sdk does not seem to offer a pre-packaged way to do this.
Has Amazon made any explicit statement about using s3 select with a presigned url? I couldn't find anything by googling or browsing docs.
Flailing about, I sent a request to a presigned url generated by the sdk with an XML request body (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectSELECTContent.html) for an s3 select, but I get back a SignatureDoesNotMatch Error response. Maybe I need to modify the authentication parameters because I'm changing the content and content-type. Am I on a wild goose chase here or could something like this possibly work?
Is there an easier way? Or is it entirely unsupported?

List of reserved parameter names for AWS S3

It seems that the GET parameter location is a reserved parameter on AWS S3. Say I have a resource on an S3 bucket, accessible via the web:
http://my-bucket.s3.amazonaws.com/index.html
... and I simply append the GET parameter location to it, I get an HTTP 403:
http://my-bucket.s3.amazonaws.com/index.html?location=US
It works so long as I change the parameter name to something else. For example:
http://my-bucket.s3.amazonaws.com/index.html?loc=US
So clearly location is a reserved word in AWS S3. My question is: is there a list of all reserved words I shouldn't try to use as GET parameters with S3?
I searched the docs but couldn't find any such list.
location in the query tells S3 that you're asking for the location of a bucket. It's one of several "subresources" (things that are not objects) in S3 that are accessed via query string parameters.
You could probably compile a nearly complete list by reviewing the entire API reference documentation, but here's a partial list found in some older docs (Signature Version 2):
The subresources that must be included when constructing the CanonicalizedResource Element are acl, lifecycle, location, logging, notification, partNumber, policy, requestPayment, torrent, uploadId, uploads, versionId, versioning, versions, and website.
https://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html
They periodically add new ones, like select and delete and tagging, so an exhaustive list is not future-proof.
Your safest bet is to use parameters beginning with x- (but not beginning with x-amz since these may be reserved or carry other implications). This is mentioned in the logging documentation:
You can include custom information to be stored in the access log record for a request by adding a custom query-string parameter to the URL for the request. Amazon S3 ignores query-string parameters that begin with "x-", but includes those parameters in the access log record for the request, as part of the Request-URI field of the log record.
https://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html

multiple browser with same input file

an input text file from the web browser needs to be processed in AWS lambda and the output (JSON) needs to rendered back to the browser. (note: AWS beanstalk is being used).
How to handle a case, if there are 10 users/client uploading text file with the same name. AWS lambda should render the output to the respective user/client. How to do this with S3 or EFS.
(NOTE: the users cannot be uniquely identified as there any login credentials for the users)
We had similar problem and solved in the following way.
Find the uniqueness and name it accordingly.
Filename-TimeStamp.Extension
If there are frequent uploads in a given time, then add a random sequence number.
Filename-TimeStamp-RandomSequence.Extension
If you want to make it completely random, you can use uuid(hexadecimal) or idgen (alphanumeric)
Hope this help.

hiding s3 path in aws cloudfront url

I am trying to make sure I did not miss anything in the AWS CloudFront documentation or anywhere else ...
I have a (not public) S3 bucket configured as origin in a CloudFront web distribution (i.e. I don't think it matters but I am using signed urls).
Let's say a have a file in a S3 path like
/someRandomString/someCustomerName/someProductName/somevideo.mp4
So, perhaps the url generated by CloudFront would be something like:
https://my.domain.com/someRandomString/someCustomerName/someProductName/somevideo.mp4?Expires=1512062975&Signature=unqsignature&Key-Pair-Id=keyid
Is there a way to obfuscate the path to actual file on the generated URL. All 3 parts before the filename can change, so I prefer not to use "Origin Path" on Origin Settings to hide the begging of the path. With that approach, I would have to create a lot of origins mapped to the same bucket but different paths. If that's the only way, then the limit of 25 origins per distribution would be a problem.
Ideally, I would like to get something like
https://my.domain.com/someRandomObfuscatedPath/somevideo.mp4?Expires=1512062975&Signature=unqsignature&Key-Pair-Id=keyid
Note: I am also using my own domain/CNAME.
Thanks
Cris
One way could be to use a lambda function that receives the S3 file's path, copies it into an obfuscated directory (maybe it has a simple mapping from source to origin) and then returns the signed URL of the copied file. This will ensure that only the obfuscated path is visible externally.
Of course, this will (potentially) double the data storage so you need some way to clean up the obfuscated folders. That could be done on a time-based manner, so if each signed URL is expected to expire after 24 hours, you could create folders based on date, and each of the obfuscated directories could be deleted every other day.
Alternatively, you could use a service like tinyurl.com or something similar to create a mapping. It would be much easier, save on storage, etc. The only downside would be that it would not reflect your domain name.
If you have the ability to modify the routing of your domain then this is a non-issue, but I presume that's not an option.
Obfuscation is not a form of security.
If you wish to control which objects users can access, you should use Pre-Signed URLs or Cookies. This way, you can grant access to private objects via S3 or CloudFront and not worry about people obtaining access to other objects.
See: Serving Private Content through CloudFront

How to implement XML-safe private Amazon S3 URLs?

On my photography website, I am storing photos on Amazon S3. To actually display them on the website, I am using signed URLs. This means that image URLs expire. Only the web application itself is able to generate valid image file URLs.
An example URL would look like this:
http://media.jungledragon.com/images/1849/21346_small.JPG?AWSAccessKeyId=05GMT0V3GWVNE7GGM1R2&Expires=1411603210&Signature=9MMO3zEXECtvB0w%2FuMEN8obt1ow%3D
Note that by the time you read this, that URL may have already expired. That's ok, the question is about the format.
Whilst the above URL format works fine on the website, it breaks XML files. The reason for this is the & character, which should be escaped.
For example, I'm trying to implement Windows 8.1 live tiles for the website, which you can link to an RSS feed. My RSS feed is here:
http://www.jungledragon.com/all/rss/promoted
That feed will work in most RSS readers, however, the Windows 8 tile builder (http://www.buildmypinnedsite.com/en) is particularly strict about the XML being valid. Here you can see the error it throws on said feed:
http://notifications.buildmypinnedsite.com/?feed=http://www.jungledragon.com/all/rss/promoted&id=1
Now, my simple thinking was to encode the & that are part of the signed URLs, by & or &. Whilst that may make the XML valid, unfortunately S3 does not accept & to be encoded. When used like that, the image will no longer load.
I'm wondering whether I am in a circular problem that cannot be solved?
I have had many similar problems with RSS feeds. XML documents should always use & (or an equivalent like & or &). If a reader is not capable of extracting the URL properly, then the reader is the culprit, not you. But I can tell you that reader programmers will disagree with you.
If you are a programmer, you could fix the problem by having a redirect, but that's a bit of work. So you'd retrieve the URL from S3, save that in your database and create a URL on your website such as http://www.jungledragon.com/images/123 and link the S3 URL with your images/123 page. Now when someone goes to page images/123, you retrieve the URL you saved from your S3 server.
Actually, if the URL http://www.jungledragon.com/images/123 is a reference to your image, you can get the S3 URL at that time and do the redirect on the fly!