Do you need to obtain a key and save it somewhere once Amazon uses SSE-S3 to encrypt a file? - amazon-s3

I think I have a fatal misunderstanding of how SSE-S3 encryption works on an Amazon S3 bucket.
I encrypted some of my files and it says the encrypting was successful but I was never given any key to store.
How does SSE-S3 work? Once I enable it on a file, is the accessing of that file any different? It seems to be the same. I'm still able to access the file using its URL in my web browser. I guess the key is stored for me by the bucket and once I access my bucket, any file I want is automatically decrypted? I guess this is to deter people attempting to hack into a bucket and steal all its files?
This is what I'm seeing on a particular file.

Do you need to obtain a key and save it somewhere once Amazon uses SSE-S3 to encrypt a file?
No, the encryption key is fully managed by Amazon S3. The whole encryption and decryption process are taken care of by S3 and you don't need to do anything else besides flipping the switch.
I encrypted some of my files and it says the encrypting was successful but I was never given any key to store.
Because the key storage is also managed by S3.
How does SSE-S3 work?
You upload a file to S3
S3 generates a plain data key 🔑 and encrypts it with the S3 master key, so now there are two blobs which correspond to 🔑 and E(🔑)
S3 encrypts your file using the plain data key 🔑
S3 stores your encrypted file and E(🔑) side by side
S3 servers wipe out the plain data key 🔑 from the memory
Once I enable it on a file, is the accessing of that file any different?
No, S3 does all the hard encryption and decryption work for you. You just access the file as normal.
I guess the key is stored for me by the bucket and once I access my bucket, any file I want is automatically decrypted?
You are right. S3 stores the E(🔑) for you with your file side-by-side. When you access the file, the underlying data is automatically decrypted.
I guess this is to deter people attempting to hack into a bucket and steal all its files?
This prevents malicious people with physical access to the hard drives that holds your data from gaining access to the raw bytes of your file.

Related

GPG Decrypt using AWS Transfer Family and Preserve Folder Structure

I am trying to decrypt a file uploaded via sFTP to an S3 bucket and preserve the folder structure of the s3 key.
I have a gpg-encrypted file being uploaded via sFTP to an S3 bucket. The customer uploads a file with a certain folder structure (which I am relying on for metadata), so they might upload a file that appears like this:
customer/folder1/file1.xlsx.gpg.
or another file that appears like this:
customer/folder2/file2.xlsx.gpg
I want to decrypt these files so that their s3 keys are
customer/folder1/file1.xlsx
and
customer/folder2/file2.xlsx
but I only see the option to use ${Transfer:User Name} when parameterizing the file location of the decrypt step, so I end up with
customer/file1.xlsx
and
customer/file2.xlsx
instead and lose the folder structure.
Is there a way to do this?
For anyone else finding limitations with AWS Transfer Family, the solution I have come up with is to store the gpg keys in a secret key, process the S3 trigger sent when .gpg file is placed in the bucket, read the gpg file from the S3 bucket as a stream, decrypt it using a python gpg client and the stored key (which is looked up based on the folder structure of the gpg file), then store the decrypted file in the S3 bucket, preserving the folder structure. A second S3 trigger will be sent upon creation of this file, and my lambda can then process that trigger and process the decrypted file normally.
I have discovered that with the python API for S3, you can store metadata with an object, but I don't believe this is doable if a file is being placed via sFTP. So I think I'm stuck relying on folder structure for metadata.

AWS S3 File Integrity and multipart practicality

I am trying to create a file integrity verification using AWS S3 but have been dumbfounded by the multipart gotcha of S3.
As per the documentation:
When using MD5, Amazon S3 calculates the checksum of the entire multipart object after the upload is complete. This checksum is not a checksum of the entire object, but rather a checksum of the checksums for each individual part.
My particular usecase is distribution of binaries, I produce a manifest of the uploaded binaries which contains the SHA256 of each file, and then sign the manifest with my PGP key.
There is no straightforward way for me to both, upload a binary with the appropriate checksum or check for the checksum of a binary while it's still on S3 without downloading it, using the getObjectAttributes API call.
My current understanding for an integrity implementation based on what is suggested by AWS's documentation is:
Chunk each binary to be uploaded locally, based on AWS S3's multipart specifications.
Produce the SHA256 for each of those chunks, BASE64 encoded.
Concatenate the chunks' SHA256 to produce the "Object's checksum"
Store all the chunk's SHA256 as well as the "Object's checksum" on my manifest.
So I can then:
Invoke the getObjectAttributes API call to get the "Object's checksum" from AWS and compare it with my manifest.
Use the manifest's stored chunk SHA256's to reliably verify locally by repeating the chunk-and-sha process described above, when I have the binary downloaded.
Is that what AWS really expects us to implement for integrity verification? Am I missing something glaringly obvious as to how one implements a file integrity check end-to-end given this very particular way S3 choses to checksum large files?
fwiw my stack is Node.js and I am using AWS SDK

Can Apache Drill handle KMS encrypted files?

I've been experimenting with Apache drill and can successfully query a CSV file in my S3 bucket that is not KMS encrypted. But, when I try to query the exact same file that has been KMS encrypted, I get an error.
Is Apache capable of handling KMS encrypted files? And, if so, how?
Looks like Drill doesn't support it yet. Feel free to create a Jira ticket for it with providing details and use cases.
https://issues.apache.org/jira/projects/DRILL

Can I trust aws-cli to re-upload my data without corrupting when the transfer fails?

I extensively use S3 to store encrypted and compressed backups of my workstations. I use the aws cli to sync them to S3. Sometimes, the transfer might fail when in progress. I usually just retry it and let it finish.
My question is: Does S3 has some kind of check to make sure that the previously failed transfer didn't leave corrupted files? Does anyone know if syncing again is enough to fix the previously failed transfer?
Thanks!
Individual files uploaded to S3 are never partially uploaded. Either the entire file is completed and S3 stores the file as an S3 object, or the upload is aborted and S3 object is never stored.
Even in the multi-part upload case, multiple parts can be uploaded but they never form a complete S3 object unless all of the pieces are uploaded and the "Complete Multipart Upload" operation is performed. So there is no need worry about corruption via partial uploads.
Syncing will certainly be enough to fix the previously failed transfer.
Yes, looks like AWS CLI does validate what it uploads and takes care of corruption scenarios by employing MD5 checksum.
From https://docs.aws.amazon.com/cli/latest/topic/s3-faq.html
The AWS CLI will perform checksum validation for uploading and downloading files in specific scenarios.
The AWS CLI will calculate and auto-populate the Content-MD5 header for both standard and multipart uploads. If the checksum that S3 calculates does not match the Content-MD5 provided, S3 will not store the object and instead will return an error message back the AWS CLI.

AWS S3 and AjaXplorer

I'm using AjaXplorer to give access to my clients to a shared directory stored in Amazon S3. I installed the SD, configured the plugin (http://ajaxplorer.info/plugins/access/s3/) and could upload and download files but the upload size is limited to my host PHP limit which is 64MB.
Is there a way I can upload directly to S3 without going over my host to improve speed and have S3 limit, no PHP's?
Thanks
I think that is not possible, because the server will first climb to the PHP file and then make transfer to bucket.
Maybe
The only way around this is to use some JQuery or JS that can bypass your server/PHP entirely and stream directly into S3. This involves enabling CORS and creating a signed policy on the fly to allow your uploads, but it can be done!
I ran into just this issue with some inordinately large media files for our website users that I no longer wanted to host on the web servers themselves.
The best place to start, IMHO is here:
https://github.com/blueimp/jQuery-File-Upload
A demo is here:
https://blueimp.github.io/jQuery-File-Upload/
This was written to upload+write files to a variety of locations, including S3. The only tricky bits are getting your MIME type correct for each particular upload, and getting your bucket policy the way you need it.