I am trying to access the file that has been deleted from an s3 Bucket using aws lambdas.
I have set up a trigger for s3:ObjectRemoved*, however after extracting the bucket and file name of the deleted file, the file is deleted from s3 so I do not have access to the contents of the file.
What approach should be taken with AWS lambda to get the contents of the file after a file is deleted from an s3 bucket.
Comment proposed by #keithRozario was useful however with versioning, applying a GET request will result in a not found error as per the s3 documentation.
#Ersoy suggestion of creating a 'bin' bucket or directory/prefix with the same file name and working with that as per your requirements.
In my case copying the initial object created to a bin directory and then accessing that folder when the file is deleted from the main upload directory.
Related
We have an Apache Camel app that is supposed to read files in a certain directory structure in S3, process the files (generating some metadata based on the folder the file is in), submit the data in the file (and metadata) to another system and finally put the consumed files into a different bucket, deleting the original from the incoming bucket.
The behaviour I'm seeing is that when I programatically create the directory structure in S3, those "folders" are being consumed, so the dir structure disappears.
I know S3 technically does not have folders, just empty files ending in /.
The twist here is that any "folder" created in the S3 Console, are NOT consumed, they stay there as we want them to. Any folders created via AWS CLI, or boto3 are immediately consumed.
The problem is that we do need the folders to be created with automation, there are too many to do by hand.
I've reached out to AWS Support, and they just tell me that there are no differences between how the Console creates folders, and how the CLI does it. Support confirmed that the command I used in CLI is correct.
I think my issue is similar to Apache Camel deleting AWS S3 bucket's folder , but that has no answer...
How can I get Camel to not "eat" any folders?
I am looking to create a way where when I upload a file to S3, I would want an original copy and compressed copy that stores in different folders in the same bucket (different bucket works too). I tried to do that with the serverless app repository 'compress'. However, it does not compress images > 4MB
The structure I want to create is:
I upload the file to S3
The original file (with the 100% file size) goes into one folder
A compressed copy is created that goes into another folder in the same bucket
Is there a way to do figure this out? I'm new to AWS
Yes, you can achieve this by doing the following.
Uploaded file to S3 triggers a bucket event notification with a destination of an AWS Lambda. The Lambda reads the file from S3, compresses the file, and then saves the file to another folder in the same bucket
So we uploaded about 2000 files into a bucket, using boto3, where there's no versioning enabled.
However, some of those files were already in the bucket by the same name. Is there a name to check which of those files were already in the bucket before being uploaded? The bucket has no versioning enabled which is the issue here.
We did not initially create a list of the file contents of the bucket.
I have a working config to push files from a directory on my server to an S3 bucket. NiFi is running on a different server so I have a getSFTP. The source files have subfolders my putS3Object current config does not support and jams all of the files at the root level of the S3 bucket. I know there's a way to get putS3Object to create directories using defined folders. The ObjectKey by default is set to ${filename}. If set to say, my/directory/${filename}, it creates two folders, my and the subfolder directory, and puts the files inside. However, I do NOT know what to set for the object key to replicate the file(s) source directories.
Try ${path}/${filename} based on this in the documentation:
Keeping with the example of a file that is picked up from a local file system, the FlowFile would have an attribute called filename that reflected the name of the file on the file system. Additionally, the FlowFile will have a path attribute that reflects the directory on the file system that this file lived in.
When using redshift spectrum, it seems you can only import data providing location until a folder, and it imports all the files inside the folder.
Is there a way to import import only one file from inside a folder with many files. When providing full path with filename , I think it treats the file as a manifest file and gives errors: manifest is too large or JSON not supported.
Is there any other way?
You inadvertently answered your own question: Use a manifest file
From CREATE EXTERNAL TABLE - Amazon Redshift:
LOCATION { 's3://bucket/folder/' | 's3://bucket/manifest_file' }
The path to the Amazon S3 bucket or folder that contains the data files or a manifest file that contains a list of Amazon S3 object paths. The buckets must be in the same AWS Region as the Amazon Redshift cluster.
If the path specifies a manifest file, the s3://bucket/manifest_file argument must explicitly reference a single fileāfor example,'s3://mybucket/manifest.txt'. It can't reference a key prefix.
The manifest is a text file in JSON format that lists the URL of each file that is to be loaded from Amazon S3 and the size of the file, in bytes. The URL includes the bucket name and full object path for the file. The files that are specified in the manifest can be in different buckets, but all the buckets must be in the same AWS Region as the Amazon Redshift cluster.
I'm not sure why it requires the length of each file. It might be used to distribute the workload amongst multiple nodes.