I am trying to decrypt a file uploaded via sFTP to an S3 bucket and preserve the folder structure of the s3 key.
I have a gpg-encrypted file being uploaded via sFTP to an S3 bucket. The customer uploads a file with a certain folder structure (which I am relying on for metadata), so they might upload a file that appears like this:
customer/folder1/file1.xlsx.gpg.
or another file that appears like this:
customer/folder2/file2.xlsx.gpg
I want to decrypt these files so that their s3 keys are
customer/folder1/file1.xlsx
and
customer/folder2/file2.xlsx
but I only see the option to use ${Transfer:User Name} when parameterizing the file location of the decrypt step, so I end up with
customer/file1.xlsx
and
customer/file2.xlsx
instead and lose the folder structure.
Is there a way to do this?
For anyone else finding limitations with AWS Transfer Family, the solution I have come up with is to store the gpg keys in a secret key, process the S3 trigger sent when .gpg file is placed in the bucket, read the gpg file from the S3 bucket as a stream, decrypt it using a python gpg client and the stored key (which is looked up based on the folder structure of the gpg file), then store the decrypted file in the S3 bucket, preserving the folder structure. A second S3 trigger will be sent upon creation of this file, and my lambda can then process that trigger and process the decrypted file normally.
I have discovered that with the python API for S3, you can store metadata with an object, but I don't believe this is doable if a file is being placed via sFTP. So I think I'm stuck relying on folder structure for metadata.
Related
We have an Apache Camel app that is supposed to read files in a certain directory structure in S3, process the files (generating some metadata based on the folder the file is in), submit the data in the file (and metadata) to another system and finally put the consumed files into a different bucket, deleting the original from the incoming bucket.
The behaviour I'm seeing is that when I programatically create the directory structure in S3, those "folders" are being consumed, so the dir structure disappears.
I know S3 technically does not have folders, just empty files ending in /.
The twist here is that any "folder" created in the S3 Console, are NOT consumed, they stay there as we want them to. Any folders created via AWS CLI, or boto3 are immediately consumed.
The problem is that we do need the folders to be created with automation, there are too many to do by hand.
I've reached out to AWS Support, and they just tell me that there are no differences between how the Console creates folders, and how the CLI does it. Support confirmed that the command I used in CLI is correct.
I think my issue is similar to Apache Camel deleting AWS S3 bucket's folder , but that has no answer...
How can I get Camel to not "eat" any folders?
I think I have a fatal misunderstanding of how SSE-S3 encryption works on an Amazon S3 bucket.
I encrypted some of my files and it says the encrypting was successful but I was never given any key to store.
How does SSE-S3 work? Once I enable it on a file, is the accessing of that file any different? It seems to be the same. I'm still able to access the file using its URL in my web browser. I guess the key is stored for me by the bucket and once I access my bucket, any file I want is automatically decrypted? I guess this is to deter people attempting to hack into a bucket and steal all its files?
This is what I'm seeing on a particular file.
Do you need to obtain a key and save it somewhere once Amazon uses SSE-S3 to encrypt a file?
No, the encryption key is fully managed by Amazon S3. The whole encryption and decryption process are taken care of by S3 and you don't need to do anything else besides flipping the switch.
I encrypted some of my files and it says the encrypting was successful but I was never given any key to store.
Because the key storage is also managed by S3.
How does SSE-S3 work?
You upload a file to S3
S3 generates a plain data key 🔑 and encrypts it with the S3 master key, so now there are two blobs which correspond to 🔑 and E(🔑)
S3 encrypts your file using the plain data key 🔑
S3 stores your encrypted file and E(🔑) side by side
S3 servers wipe out the plain data key 🔑 from the memory
Once I enable it on a file, is the accessing of that file any different?
No, S3 does all the hard encryption and decryption work for you. You just access the file as normal.
I guess the key is stored for me by the bucket and once I access my bucket, any file I want is automatically decrypted?
You are right. S3 stores the E(🔑) for you with your file side-by-side. When you access the file, the underlying data is automatically decrypted.
I guess this is to deter people attempting to hack into a bucket and steal all its files?
This prevents malicious people with physical access to the hard drives that holds your data from gaining access to the raw bytes of your file.
I am trying to access the file that has been deleted from an s3 Bucket using aws lambdas.
I have set up a trigger for s3:ObjectRemoved*, however after extracting the bucket and file name of the deleted file, the file is deleted from s3 so I do not have access to the contents of the file.
What approach should be taken with AWS lambda to get the contents of the file after a file is deleted from an s3 bucket.
Comment proposed by #keithRozario was useful however with versioning, applying a GET request will result in a not found error as per the s3 documentation.
#Ersoy suggestion of creating a 'bin' bucket or directory/prefix with the same file name and working with that as per your requirements.
In my case copying the initial object created to a bin directory and then accessing that folder when the file is deleted from the main upload directory.
I have an S3 bucket and the URL of a large file. I would like to store the content located at the URL in the S3 bucket.
I could download the file to my local machine and then upload it to S3 with Cloudberry or Jungledisk or whatever. However, if the file is large, this may take a long time because the file must be transferred twice, and my network connection is much slower than Amazon's.
If I have a lot of data to store in S3, I can start an EC2 instance, retrieve the files to the instance with curl or wget, and then push the data from the EC2 instance to S3. This works, but it's a lot of steps if I just want to archive one file.
Any suggestions?
You can stream the file directly from the source to S3.
If you are using node, you can use streaming-s3.
Some quick questions:
Does S3 support soft link?
On mounted S3 on EC2, I can't access the created directory in Linux EC2 instance from AWS UI, however created files are accessible.
Thanks
Amazon S3 is an object store, not a filesystem. It has a specific set of APIs for uploading, listing, downloading, etc but it does not behave like a normal filesystem.
There are some utilities that can mount S3 as a filesystem (eg Expandrive, Cloudberry Drive, s3fs), but in the background these utilities are actually translating requests into API calls. This can cause some issues -- for example, you can modify a 100MB file on a local disk by just writing one by to disk. If you wish to modify one byte on S3, you must upload the whole object again. This can cause synchronization problems between your computer and S3, so such methods are not recommended for production situations. (However, they're a great way of uploading/downloading initial data.)
A good in-between option is to use the AWS Command-Line Interface (CLI), which has commands such as aws s3 cp and aws s3 sync, which are reliable ways to upload/download/sync files with Amazon S3.
To answer your questions...
Amazon S3 does not support a "soft link" (symbolic link). Amazon S3 is an object store, not a file system, so it only contains objects. Objects can also have meta-data that is often for cache control, redirection, classification, etc.
Amazon S3 does not support directories (sort of). Amazon S3 objects are kept within buckets, and the buckets are 'flat' -- they do not contains directories/sub-folders. However, it does maintain the illusion of directories. For example, if file bar.jpg is stored in the foo directory, then the Key (filename) of the object is foo/bar.jpg. This makes the object 'appear' to be in the foo directory, but that's not how it is stored. The AWS Management Console maintains this illusion by allowing users to create and open Folders, but the actual data is stored 'flat'.
This leads to some interesting behaviours:
You do not need to create a directory to store an object in the directory.
Directories don't exist. Just store a file called images/cat.jpg and the images directory magically appears (even though it doesn't exist).
You cannot rename objects. The Key (filename) is a unique identifier for the object. To 'rename' an object, you must copy it to a new Key and delete the original.
You cannot rename a directory. They don't exist. Instead, rename all the objects within the directory (which really means you have to copy the objects, then delete their old versions).
You might create a directory but not see it. Amazon S3 keeps track of CommonPrefixes to assist in listing objects by path, but it doesn't create traditional directories. So, don't get worried if you create a (pretend) directory and then don't see it. Just store your object with a full-path name and the directory will 'appear'.
The above-mentioned utilities take all this into account when allowing an Amazon S3 bucket to be mounted. They translate 'normal' filesystem commands into Amazon S3 API calls, but they can't do everything (eg they might emulate renaming a file but they typically won't let you rename a directory).