The idea was to generate a random key for every file being uploaded, pass this key to S3 in order to encrypt it and store the key in the database. Once the user wants to access the file, the key is read from the database and passed to S3 once again.
The first part works. My objects are uploaded and encrypted successfully, but I have issues with retrieving them.
Retrieving files with request headers set:
When setting the request headers such as x-amz-server-side-encryption-customer-algorithm etc. when performing the GET request to the resource, works, and I am able to access it. But since I want to these resources as src to an <img>-Tag, I cannot perform GET requests which require headers to be set.
Thus, I thought about:
Pre signing urls:
To create a pre signed url, I built the HMAC SHA1 of the required string and used it as a signature. The calculated signature is accepted by S3 but I get the following error when requesting the pre signed URL:
Requests specifying Server Side Encryption with Customer provided keys must provide an appropriate secret key.
The URL has the form:
https://s3-eu-west-1.amazonaws.com/bucket-id/resource-id?x-amz-server-side-encryption-customer-algorithm=AES256&AWSAccessKeyId=MyAccessKey&Expires=1429939889&Signature=GeneratedSignature
The reason why the error is shown seems to be pretty clear to me. At no point in the signing process was the encryption key used. Thus, the request cannot work. As a result, I added the encryption key as Base64, and Md5 representation as parameters to the URL. The URL now has the following format:
https://s3-eu-west-1.amazonaws.com/bucket-id/resource-id?x-amz-server-side-encryption-customer-algorithm=AES256&AWSAccessKeyId=MyAccessKey&Expires=1429939889&Signature=GeneratedSignature&x-amz-server-side-encryption-customer-key=Base64_Key&x-amz-server-side-encryption-customer-key-MD5=Md5_Key
Although the key is now present (imho), I do get the same error message.
Question
Does anyone know, how I can access my encrypted files with a GET request which does not provide any headers such as x-amz-server-side-encryption-customer-algorithm?
It seems intuitive enough to me that what you are trying should have worked.
Apparently, though, when they say "headers"...
you must provide all the encryption headers in your client application.
— http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html#sse-c-how-to-programmatically-intro
... they do indeed actually mean headers and S3 doesn't accept these particular values when delivered as part of the query string, as you would expect, since S3 sometimes is somewhat flexible in that regard.
I've tested this, and that's the conclusion I've come to: doing this isn't supported.
A GET request with x-amz-server-side-encryption-customer-algorithm=AES256 included in the query string (and signature), along with the X-Amz-Server-Side-Encryption-Customer-Key and X-Amz-Server-Side-Encryption-Customer-Key-MD5 headers does work as expected... as I believe you've discovered... but putting the key and key-md5 in the query string, with or without including it in the signature seems like a dead end.
It seemed somewhat strange, at first, that they wouldn't allow this in the query string, since so many other things are allowed there... but then again, if you're going to the trouble of encrypting something, there seems little point in revealing the encryption key in a link... not to mention that the key would then be captured in the S3 access logs, leaving the encryption seeming fairly well pointless all around -- and perhaps that was their motivation for requiring it to actually be sent in the headers and not the query string.
Based on what I've found in testing, though, I don't see a way to use encrypted objects with customer-provided keys in hyperlinks, directly.
Indirectly, of course, a reverse proxy in front of the S3 bucket could do the translation for you, taking the appropriate values from the query string and placing them into the headers, instead... but it's really not clear to me what's to be gained by using customer-provided encryption keys for downloadable objects, compared to letting S3 handle the at-rest encryption with AWS-managed keys. At-rest encryption is all you're getting either way.
Related
I'm trying to implement the file storage ыукмшсу with basic S3 compatible API using akka-http.
I use s3 java sdk to test my service API and got the problem with the putObject(...) method. I can't consume file properly on my akka-http backend. I wrote simple route for the test purposes:
def putFile(bucket: String, file: String) = put{
extractRequestEntity{ ent =>
val finishedWriting = ent.dataBytes.runWith(FileIO.toPath(new File(s"/tmp/${file}").toPath))
onComplete(finishedWriting) { ioResult =>
complete("Finished writing data: " + ioResult)
}
}
}
It saves file, but file is always corrupted. Looking inside the file I found the lines like these:
"20000;chunk-signature=73c6b865ab5899b5b7596b8c11113a8df439489da42ddb5b8d0c861a0472f8a1".
When I try to PUT file with any other rest client it works as fine as expected.
I know S3 uses "Expect: 100-continue" header and may it he causes problems.
I really can't figure out how to deal with that. Any help appreciated.
This isn't exactly corrupted. Your service is not accounting for one of the four¹ ways S3 supports uploads to be sent on the wire, using Content-Encoding: aws-chunked and x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD.
It's a non-standards-based mechanism for streaming an object, and includes chunks that look exactly like this:
string(IntHexBase(chunk-size)) + ";chunk-signature=" + signature + \r\n + chunk-data + \r\n
...where IntHexBase() is pseudocode for a function that formats an integer as a hexadecimal number as a string.
This chunk-based algorithm is similar to, but not compatible with, Transfer-Encoding: chunked, because it embeds checksums in the stream.
Why did they make up a new HTTP transfer encoding? It's potentially useful on the client side because it eliminates the need to either "read your payload twice or buffer [the entire object payload] in memory [concurrently]" -- one or the other of which is otherwise necessary if you are going to calculate the x-amz-content-sha256 hash before the upload begins, as you otherwise must, since it's required for integrity checking.
I am not overly familiar with the internals of the Java SDK, but this type of upload might be triggered by using .withInputStream() or it might be standard behavor for files too, or for files over a certain size.
Your minimum workaround would be to throw an HTTP error if you see x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD in the request headers since you appear not to have implemented this in your API, but this would most likely only serve to prevent storing objects uploaded by this method. The fact that this isn't already what happens automatically suggests that you haven't implemented x-amz-content-sha256 handling at all, so you are not doing the server-side payload integrity checks that you need to be doing.
For full compatibility, you'll need to implement the algorithm supported by S3 and assumed to be available by the SDKs, unless the SDKs specifically support a mechanism for disabling this algorithm -- which seems unlikely, since it serves a useful purpose, particularly (it appears) for streams whose length is known but that aren't seekable.
¹ one of four -- the other three are a standard PUT, a web-based html form POST, and the multipart API that is recommended for large files and mandatory for files larger than 5 GB.
The multipart upload overview documentation has, in the Multipart Upload Listings section, the following warning:
Note
Only use the returned listing for verification. You should not use the result of this listing when sending a complete multipart upload request. Instead, maintain your own list of the part numbers you specified when uploading parts and the corresponding ETag values that Amazon S3 returns.
Why?
Why I ask: Let's say I want to support resuming an upload that is interrupted. Doing so means knowing what remains to be uploaded, and therefore what already was uploaded. Knowing this is simpler if I may disregard the above warning. S3 is persisting the list of already-uploaded parts. I can obtain it from List Parts.
Whereas if I heed that warning, instead I'd need to intercept break or kill signals and persist the uploaded parts list locally. Although that's feasible, it seems silly to do this if S3 already has the list.
Furthermore, the warning says to use List Parts "only for verification". OK. Let's say I persist my own list, and compare it to List Parts. If they do not match, what am I going to do? I'm going to believe List Parts -- if S3 doesn't think it has a part, of course I'm going to upload it again. Therefore if List Parts is the ultimate authority, why not simply use it in the first place, and use it alone?
If they do not match, what am I going to do? I'm going to believe List Parts -- if S3 doesn't think it has a part, of course I'm going to upload it again.
You're missing the point of the warning.
It's not so much about whether parts were received. It's about whether they were received intact.
When you complete a multipart upload, you have to send a list of the parts and their etags. The etags are the hex md5sum of each part.
The lazy and careless way to complete a multipart upload would be to blindly submit the etags of the parts by just reading them from the "list" operation.
That is what they are warning against.
The correct way is to use your locally-created list, based on what you think S3 should have received, what you think the etag of each part should have been, based on the local file.
If you are resuming an upload that was interrupted, you should go back and compare the parts already uploaded (by re-reading and re-checksumming the parts of the local file) against the checksums S3 has calculated against the parts already stored (as returned by the list operation)... then either resend any incorrect parts or missing parts, or abandon the upload because the local file may have changed if one or more parts doesn't match your local calculation.
Additionally, in the interest of data integrity, you should be sending the md5 of each part with the individual part uploads, base64-encoded, with a Content-MD5 header, since this will cause S3 to refuse to accept a part that has been corrupted in any way during the upload.
One of the reasons, and probably the main one, to use UUID is to avoid having a "centralized" point responsible for creating and assigning ids to resources.
That means that, for REST APIs, the clients could (and should) be able to generate, and give the UUID for a certain resource when they POST that specific resource for the first time. That would minimize problems related with successfully posting a resource for the first time but not getting the ID back as response (connectivity problems for example). That can result in a new post for some of the clients, generating duplicated resources.
My question are:
Having UUID generated by clients and being exposed in the REST API is a best practice?
Are there any other alternatives to that?
REST does not really care if the UUID is generated by the server or by the client. It just needs a unique resource-identifier in form of an URI.
What form the URI has, is not important to clients and servers - only that they are unique and may be obtained by clients (HATEOAS). You need of course also a resource on the server side which is able to create the sub-resource for you and understands that you want to provide the UUID instead of generating an own one. Instead of a UUID you could f.e. also use a url-encoded title of a blog-post or like this question a combination of hash-value and question-title 31584303/rest-api-and-uuid to uniquely identify a resource.
Generating a UUID by a client is in my opinion not used that ofen in practice, but I may be wrong on this matter. The actual question is rather, why should a client really provide an own UUID instead of letting the server create one? The client is, IMO, only interested in getting the data to the server and having some way to retrieve it at some later point, which will be provided through the location header returned in the response of a POST request.
If connection issues are an actual concern, you could let the client send an empty POST to create a resource and send back the location within the header. The content is than added via a PUT request.
There still can be some connection issues involved:
request does not reach the server
response does not reach the client
While the primer one is no issue for the client as well as for the server as no operation is executed and the request can simply be resent, the latter one will actually create a resource at the server side, though the link never reaches the client. Therefore the actual resource is in an unreferencable state unless you provide a way to iterate over all resources, which contains also the empty ones.
A server can have a cleanup thread in the back which removes empty resources after a given amount of time. If a client sends a further empty POST request but this time also receives the URI of the created resource, he can update the state of the resource via PUT. PUT is idempotent. If the server did not receive the request, the client can simply resent it. PUT has the semantic of updating existing or creating a new resource if it is not yet available. So, the server can create the resource in that case with the provided content. If the request did reach the server, a further update does not change the state of the resource.
one advantage of client-generated UUID is: the client knows the resource key even before creating the resource. no need to read the response of the POST/PUT except for errors
I've been creating Presigned HTTP PUT URLs and everything was working great until I wanted to start using "folders" in S3; I wanted the key to have the character '/'.
Now I get Signature doesn't match when I send the HTTP PUT requests due to the fact the '/' probably changes to %2F... If I escape the character before creating the presigned URL it works great, but then the Amazon console management doesn't understand it and shows it as one file instead of subfolders.
Any idea?
P.s.
The HTTP PUT requests are sent using C++ with POCO NET library.
EDIT
I'm using Poco HttpRequest from C++ to my Java web server to generate a signed url (returned on the response).
C++ then uses this url to put a file in s3 using Poco again.
The problem was that the urls returned from the web server were parsed through Poco URI objects that auto decoded the s3 object key thus changing it.With that in mind I was able to fix my problem.
Tricky - I'll try to approach this bottom up.
Disclaimer: I got carried away visually inspecting the Poco libraries instead of actually debugging a code sample, which should yield more reliable results much faster, see below ;)
Analysis
If I escape the character before creating the presigned URL it works
great, but then the Amazon console management doesn't understand it
and shows it as one file instead of subfolders.
The latter stems from S3 not having a concept of folders on the storage level actually, see e.g. section Index Documents and Folders within Index Document Support:
Objects stored in Amazon S3 are stored within a flat container, i.e.,
an Amazon S3 bucket, and it does not provide any hierarchical
organization, similar to a file system's. However, you can create a
logical hierarchy using object key names and use these names to infer
logical folders that contain these objects.
That's exactly what the AWS Management Console is doing here as well:
The AWS Management Console also supports the concept of folders, by
using the same key naming convention used in the preceding sample.
However, your test regarding the assumption of / being encoded as %2F proves, that this is indeed how Poco::Net is encoding the URL when performing the HTTP PUT request.
(I'm actually a bit surprised that the AWS Java SDK seems to generate different URLs here for / vs. %2F, insofar a recent analysis regarding Why is my S3 pre-signed request invalid when I set a response header override that contains a “+”? seems to indicate respective canonicalization by the AWS .NET SDK, see below for more on this.)
Potential Solution
In order for your scenario to work as desired, you'll need to figure out where the URL is encoded this way - I could think of two components in principle:
Poco::Net
Finding out why Poco::Net is encoding the URL different than S3 (if at all, see below) is best done by debugging your code, here's where I'd start:
Class HTTPRequest uses class URI in turn, which automatically performs a few normalizations on all URIs and URI parts passed to it, in particular percent-encoded characters are decoded. The other way round is handled by method encode(), which is where things get interesting and call for a breakpoint, see URI.cpp:
lines 575 ff. - here encode() does its magic, which indeed seems to be in place, insofar neither the code within the function nor the various chars passed in via the reserved parameter contain the offending / (see lines 47 ff. for the respective constants in use)
consequently you might want to set a breakpoint in this function and backtrace the callstack to find out which code is actually doing the encoding upfront, which might not yield an offender at all, see below.
Java => C++ transition
You haven't specified yet, which channel is actually used to communicate the pre-signed URL generated by the AWS Java SDK to C++ in turn. Given the code review (mind you, visual inspection only, I haven't debugged this myself yet) of the Poco::Net functionality yields the conclusion, that no obvious offender can be identified in the library itself, thus it seems more likely that it might already enter your C++ layer encoded (easily verified via debugging of course) - are you by chance using any kind of web service between these components for example?
Good luck!
I am implementing a simple license-file system, and would like to know if there are any mistakes I'm making with my current line of implementation.
The message data is smaller than the key. I'm using RSA, with a keysize of 3072bits.
The issuer of the licenses generates the message to be signed, and signs it, using a straightforwards RSA-based approach, and then applies a similar approach to encrypt the message. The encrypted message and the signature are stored together as the License file.
Sha512 the message.
Sign the hash with the private key.
Sign the message with the private key.
Concatenate and transmit.
On receipt, the verification process is:
Decrypt the message with the public key
Hash the message
Decrypt the hash from the file with the public key, and compare with the local hash.
The implementation is working correctly so far, and appears to be valid.
I'm currently zero-padding the message to match the keysize, which is probably
a bad move (I presume I should be using a PKCS padding algorithm, like 1 or 1.5?)
Does this strategy seem valid?
Are there any obvious flaws, or perspectives I'm overlooking?
The major flaw I noticed: you must verify the padding is still there when you decrypt.
(If you know the message length in advance then you might be able to get away with using your own padding scheme, but it would probably still be a good idea to use an existing one as you mentioned).
I am not sure why you're bothering to encrypt the message itself - as you've noted it can be decrypted by anyone with the public key anyway so it is not adding anything other than obfuscation. You might as well just send the message and the encrypted-padded-hash.
I would recommend using a high level library that provides a "sign message" function, like cryptlib or KeyCzar(if you can). These benefit from a lot more eyeballs than your code is likely to see, and take care of all the niggly padding issues and similar.