AWS S3 upload integrity checking - amazon-s3

If a client is using AWS request signing (Signature Version 4), is there ever a reason to do separate integrity checking for AWS S3 uploads, or is the integrity checking inherent in the protocol adequate?
I'm referring particularly to multi-part uploads, which are described here:
https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html
https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html
but also to single-part uploads.
To briefly summarize:
Each request to upload a part of a file is signed with a SHA-256 hash
of the entire request, including headers and data.
In response to each part, AWS returns an ETag, which is a proprietary
hash of the data in that part of the file. Usually this is an MD5 of
the data for that part, but in the case of AWS-KMS encryption, it's
an undocumented algorithm.
After all parts are uploaded, the client sends a request that
specifies that the individual parts be stitched together into a
file/key. The request contains the part numbers, and the AWS-generated ETag of each
part.
Some clients do extra checking based on the key's final AWS-generated ETag vs a locally-calculated version of the ETag (which has been discussed at What is the algorithm to compute the Amazon-S3 Etag for a file larger than 5GB? for instance), but is there any point to this?
One of the reasons I ask is that apparently no one has yet reverse-engineered the ETag algorithm used when server-side AWS-KMS encryption is in effect. However, it appears to me that integrity checking is sufficiently inherent in the protocol that additional checking is unnecessary.
Thanks.

Related

Would there be a compelling reason for implementing integrity check in a file transfer protocol, if the channel uses TLS?

I am developing a client server pair of applications to transfer files by streaming bytes over TCP/IP and the channel would use TLS always.
(Note: Due to certain OS related limitations SFTP or other such secure file transfer protocols cannot be used)
The application level protocol involves minimum but sufficient features to get the file to the other side.
I need to decide if the application level protocol needs to implement an integrity check (Ex: MD5).
Since TLS guarantees integrity, would this be redundant?
The use of TLS can provide you with some confidence that the data has not been changed (intentionally or otherwise) in transit, but not necessarily that the file that you intended to send is identical to the one that you receive.
There are plenty of other opportunities for the file to be corrupted/truncated/modified (such as when it's being read from the disk/database by the sender, or when it's written to disk by the receiver). Implementing your own integrity checking would help protect against those cases.
In terms of how you do the checking, if you're worried about malicious tampering then you should be checking a cryptographic signature (using something like GPG), rather than just a hash of the file. If you're going to use a hash then it's generally recommended to use a more modern algorithm such as a SHA-256 rather than the (legacy) MD5 algorithm - although most of the issues with MD5 won't affect you if you're only concerned about accidental corruption.

How do I download an encrypted s3 object without decryption?

I'm using Server-Side Encryption with Customer-Provided Encryption Keys (SSE-C) to store some files. I want to download them but not decrypt them just yet. The use case is something like the Game of Thrones finale. I want cable operators to have the data but give them the key in the last second. But the decrypt headers are mandatory when the file is encrypted. Maybe I can toggle the mark that the file is encrypted?
For this application, you wouldn't use any variant of SSE.
SSE prevents your content from being stored on S3's internal disks in a form where accidental or deliberate compromise of those physical disks or their raw bytes -- however unlikely -- would expose your content to unauthorized personnel. That is fundamentally the purpose of all varieties of SSE. The variants center around how the keys are managed.
Server-side encryption is about data encryption at rest—that is, Amazon S3 encrypts your data at the object level as it writes it to disks in its data centers and decrypts it for you when you access it.
https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html
SSE is decrypted by S3 and transiently re-encrypted using TLS for transmission on the network during the download. The final result in the client's hands is unencrypted.
For the application described, you would just upload the encrypted content to S3 without S3 being aware of the (external, already-applied) encryption.
If you also used some kind of SSE, that would be unrelated to the external encryption that you would also apply. Arguably, SSE would be somewhat redundant if the content is already encrypted before upload.
In fact, in the application described, depending on sensitivity and value of the content, each recipient would potentially have different keys and/or a slightly different source file (thus a substantially different encrypted file), so that the source of a leak could be identified by identifying which source variant was compromised.

Amazon S3 Data integrity MD5 vs SSL/TLS

I'm currently working with the Amazon S3 API, and have a general wondering about the server-side integrity checks that can be done if you provide the MD5 hash during posting of an object.
I'm not sure I understand if the integrity check is required if you send the data (I'm assuming the object data you're posting also) via SSL/TLS, which provide their own support for data integrity in transit.
Should you send the digest regardless if you're posting over SSL/TLS? Isn't it superfluous to do so? Or is there something I'm missing?
Thanks.
Integrity checking provided by TLS provides no guarantees about what happens going into the TLS wrapper at the sender side, or coming out of it and being written to disk at the receiver.
So, no, it is not entirely superfluous because TLS is not completely end-to-end -- the unencrypted data is still processed, however little, on both ends of the connection... and any hardware or software that touches the unencrypted bits can malfunction and mangle them.
S3 gives you an integrity checking mechanism -- two, if you use both Content-MD5 and x-amz-content-sha256 -- and it seems unthinkable to try to justify bypassing them.

Can ImageResizer be exploited if presets are not enabled?

My project uses the Presets plugin with the flag onlyAllowPresets=true.
The reason for this is to close a potential vulnerability where a script might request an image thousands of times, resizing with 1px increment or something like that.
My question is: Is this a real vulnerability? Or does ImageResizer have some kind of protection built-in?
I kind of want to set the onlyAllowPresets to false, because it's a pain in the butt to deal with all the presets in such a large project.
I only know of one instance where this kind of attack was performed. If you're that valuable of a target, I'd suggest using a firewall (or CloudFlare) that offers DDOS protection.
An attack that targets cache-misses can certainly eat a lot of CPU, but it doesn't cause paging and destroy your disk queue length (bitmaps are locked to physical ram in the default pipeline). Cached images are still typically served with a reasonable response time, so impact is usually limited.
That said, run a test, fake an attack, and see what happens under your network/storage/cpu conditions. We're always looking to improve attack handling, so feedback from more environments is great.
Most applications or CMSes will have multiple endpoints that are storage or CPU-intensive (often a wildcard search). Not to say that this is good - it's not - but the most cost-effective layer to handle this often at the firewall or CDN. And today, most CMSes include some (often poor) form of dynamic image processing, so remember to test or disable that as well.
Request signing
If your image URLs are originating from server-side code, then there's a clean solution: sign the urls before spitting them out, and validate during the Config.Current.Pipeline.Rewrite event. We'd planned to have a plugin for this shipping in v4, but it was delayed - and we've only had ~3 requests for the functionality in the last 5 years.
The sketch for signing would be:
Sort querystring by key
Concatenate path and pairs
HMACSHA256 the result with a secret key
Append to end of querystring.
For verification:
Parse the query,
Remove the hmac
Sort query and concatenate path as before
HMACSHA256 the result and compare to the value we removed.
Raise an exception if it's wrong.
Our planned implementation would permit for 'whitelisted' variations - certain values that a signature would permit to be modified by the client - say for breakpoint-based width values. This would be done by replacing targeted key/value pairs with a serialized whitelist policy prior to signature. For validation, pairs targeted by a policy would be removed prior to signature verification, and policy enforcement would happen if the signature was otherwise a match.
Perhaps you could add more detail about your workflow and what is possible?

Is authentication required/recommended with a stream cipher?

I want to secure the communication of a TCP-based program using a shared passphrase/key. The easiest way to do that without having to deal with block size, padding, ... is to directly use a stream cipher. Doing that way, the amount of data is not changed between clear and encrypted data and the modification is trivial.
Using only a stream cipher means that there is no authentication and I have always considered/heard that encryption without authentication is not secure enough and should not be used.
If adding authentication to a stream cipher is mandatory, we lose the simplicity that stream cipher has added because we must add an HMAC or use an authenticated encryption method (like crypto_secretbox from NaCl), there is a minimum message length, we must handle padding, ...
What would you recommend? Is it safe to only use stream cipher without authentication in some particular cases?
Using some kind of message authenticator is particularly important with stream ciphers, because the relationship between changes to the ciphertext and changes to the plaintext is so simple.
You can't just blindly go and apply the stream cipher without adding any extra information to the stream, anyway - remember the most important rule of stream ciphers:
NEVER RE-USE THE SAME KEYSTREAM
So unless you are only ever going to encrypt a single connection, and throw the passphrase away afterwards, you will need to generate a session key for each connection from the shared secret. This implies that you will need to send some extra information at the start of the connection, and since you're sending that anyway, sending a HMAC after each message should be no big deal.
Using a stream cipher because it seems simpler is usually a mistake, anyway. You mentioned crypto_secretbox from NaCl - I recommend using that, it will take care of the authentication and padding issues for you.
You could consider using AES in GCM-mode. That will give you a stream-cipher with built-in authentication.