AWS S3 bucket notification lambda throws exception (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey) - amazon-s3

We have a AWS Glue DataBrew job which puts the output to some S3 bucket folder. Then a java lambda is notified for this Put notification. But the following sample code throws exception:
S3EventNotification.S3EventNotificationRecord record = event.getRecords().get(0);
String s3Bucket = record.getS3().getBucket().getName();
String s3Key= record.getS3().getObject().getUrlDecodedKey();
//following throws exception --404 NoSuchKey
S3Object s3object = s3Client.getObject(s3Bucket , s3Key);
When seen in logs we see that the key is something like:
**input_files/processed_file_22Dec2022_1671678897600/fdg629ae-4f91-4869-891c-79200772fb92/databrew-test-put-object.temp
So is it that the, lambda gets the file which is still being copied into the S3 folder?. When we upload the file manually using the console, it works fine. But when databrew job uploads it, we are seeing issues.
I expect the file to be read by lambda function with the correct key.
Thanks

What is your trigger event type?
So is it that the, lambda gets the file which is still being copied into the S3 folder?
If you have a Put trigger, Lambda will get triggered when the object upload completes. S3 wouldn't create a temporary object and then delete it.
I haven't used AWS Glue DataBrew before but perhaps that is creating that temporary object? If that is the case, you need to handle it in your code.

Related

Cross-region copy fails for Linode Object Storage

We're using boto3 with Linode Object Storage, which is compatible with AWS S3 according to their documentation.
Everything seems to work well, except cross-region copy operation.
When I download an object from source region/bucket and then upload it to destination region/bucket, everything works well. Although, I'd like to avoid that unnecessary upload/download step.
I have the bucket named test-bucket on both regions. And I'd like to copy the object named test-object from us-east-1 to us-southeast-1 cluster.
Here is the example code I'm using:
from boto3 import client
from boto3.session import Session
sess = Session(
aws_access_key_id='***',
aws_secret_access_key='***'
)
s3_client_src = sess.client(
service_name='s3',
region_name='us-east-1',
endpoint_url='https://us-east-1.linodeobjects.com'
)
# test-bucket and test-object are already exists.
s3_client_trg = sess.client(
service_name='s3',
region_name='us-southeast-1',
endpoint_url='https://us-southeast-1.linodeobjects.com'
)
copy_source = {
'Bucket': 'test-bucket',
'Key': 'test-object'
}
s3_client_trg.copy(CopySource=copy_source, Bucket='test-bucket', Key='test-object', SourceClient=s3_client_src)
When I call:
s3_client_src.list_objects(Bucket='test-bucket')['Contents']
It shows me that the test-object exists, But when I run copy, then it throws following message:
An error occurred (NoSuchKey) when calling the CopyObject operation: Unknown
Any help is appreciated!

AWS structure of S3 trigger

I am building a Python Lambda in AWS and wanted to add an S3 trigger to it. Following these instructions I saw how to get the bucket and key on which I got the trigger using:
def func(event):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
There is an example of such an object in the link, but I wasn't able, however, to find a description of the entire event object anywhere in AWS' documentation.
Is there a documentation for this object's structure? Where might I find it?
You can find documentation about the whole object in the S3 documentation:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-content-structure.html
I would also advise to iterate the records, because there could be multiple at once:
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
[...]

how to update metadata on an S3 object larger than 5GB?

I am using the boto3 API to update the S3 metadata on an object.
I am making use of How to update metadata of an existing object in AWS S3 using python boto3?
My code looks like this:
s3_object = s3.Object(bucket,key)
new_metadata = {'foo':'bar'}
s3_object.metadata.update(new_metadata)
s3_object.copy_from(CopySource={'Bucket':bucket,'Key':key}, Metadata=s3_object.metadata, MetadataDirective='REPLACE')
This code fails when the object is larger than 5GB. I get this error:
botocore.exceptions.ClientError: An error occurred (InvalidRequest) when calling the CopyObject operation: The specified copy source is larger than the maximum allowable size for a copy source: 5368709120
How does one update the metadata on an object larger than 5GB?
Due to the size of your object, try invoking a multipart upload and use the copy_from argument. See the boto3 docs here for more information:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.MultipartUploadPart.copy_from
Apparently, you can't just update the metadata - you need to re-copy the object to S3. You can copy it from s3 back to s3, but you can't just update, which is annoying for objects in the 100-500GB range.

Activating data pipeline when new files arrived on S3 using SNS

How to activating data pipeline when new files arrived on S3.For EMR scheduling using triggered using SNS when new files arrived on S3.
You can execute data pipeline without using SNS. When files will be arrived into S3 Location
Create S3 event which should invoke lambda function.enter image description here
Create Lambda Function (Make sure the role which you will give that has s3 , lambda, data pipeline permissions).
Paste below code in lambda function to execute data pipeline (mention your data pipeline_id)
import boto3
def lambda_handler(event, context):
try :
client = boto3.client('datapipeline',region_name='ap-southeast-2')
s3_client = boto3.client('s3')
data_pipeline_id="df-09312983K28XXXXXXXX"
response_pipeline = client.describe_pipelines(pipelineIds=[data_pipeline_id])
activate = client.activate_pipeline(pipelineId=data_pipeline_id,parameterValues=[])
except Exception as e:
raise Exception("Pipeline is not found or not active")

Add folder in Amazon s3 bucket

I want to add Folder in my amazon s3 bucket using coding.
Can you please suggest me how to achieve this?
There are no folders in Amazon S3. It just that most of the S3 browser tools available show part of the key name separated by slash as a folder.
If you really need that you can create an empty object with the slash at the end. e.g. "folder/" It will looks like a folder if you open it with a GUI tool and AWS Console.
As everyone has told you, in AWS S3 there aren't any "folders", you're thinking of them incorrectly. AWS S3 has "objects", these objects can look like folders but they aren't really folders in the fullest sense of the word. If you look for creating folders on the Amazon AWS S3 you won't find a lot of good results.
There is a way to create "folders" in the sense that you can create a simulated folder structure on the S3, but again, wrap your head around the fact that you are creating objects in S3, not folders. Going along with that, you will need the command "put-object" to create this simulated folder structure. Now, in order to use this command, you need the AWS CLI tools installed, go here AWS CLI Installation for instructions to get them installed.
The command is this:
aws s3api put-object --bucket your-bucket-name --key path/to/file/yourfile.txt --body yourfile.txt
Now, the fun part about this command is, you do not need to have all of the "folders" (objects) created before you run this command. What this means is you can have a "folder" (object) to contain things, but then you can use this command to create the simulated folder structure within that "folder" (object) as I discussed earlier. For example, I have a "folder" (object) named "importer" within my S3 bucket, lets say I want to insert sample.txt within a "folder" (object) structure of the year, month, and then a sample "folder" (object) within all of that.
If I only have the "importer" object within my bucket, I do not need to go in beforehand to create the year, month, and sample objects ("folders") before running this command. I can run this command like so:
aws s3api put-object --bucket my-bucket-here --key importer/2016/01/sample/sample.txt --body sample.txt
The put-object command will then go in and create the path that I have specified in the --key flag. Here's a bit of a jewel: even if you don't have a file to upload to the S3, you can still create objects ("folders") within the S3 bucket, for example, I created a shell script to "create folders" within the bucket, by leaving off the --body flag, and not specifying a file name, and leaving a slash at the end of the path provided in the --key flag, the system creates the desired simulated folder structure within the S3 bucket without inserting a file in the process.
Hopefully this helps you understand the system a little better.
Note: once you have a "folder" structure created, you can use the S3's "sync" command to syncronize the descendant "folder" with a folder on your local machine, or even with another S3 bucket.
Java with AWS SDK:
There are no folders in s3, only key/value pairs. The key can contain slashes (/) and that will make it appear as a folder in management console, but programmatically it's not a folder it is a String value.
If you are trying to structure your s3 bucket, then your naming conventions (the keys you give your files) can simply follow normal directory patterns, i.e. folder/subfolder/file.txt.
When searching (depending on language you are using), you can search via prefix with a delimiter. In Java, it would be a listObjects(String storageBucket, String prefix, String delimiter) method call.
The storageBucket is the name of your bucket, the prefix is the key you want to search, and the delimiter is used to filter your search based off the prefix.
The AWS:S3 rails gem does this by itself:
AWS::S3::S3Object.store("teaser/images/troll.png", file, AWS_BUCKET)
Will automatically create the teaser and images "folders" if they don't already exist.
With AWS SDK .Net works perfectly, just add "/" at the end of the name folder:
var folderKey = folderName + "/"; //end the folder name with "/"
AmazonS3 client = Amazon.AWSClientFactory.CreateAmazonS3Client(AWSAccessKey, AWSSecretKey);
var request = new PutObjectRequest();
request.WithBucketName(AWSBucket);
request.WithKey(folderKey);
request.WithContentBody(string.Empty);
S3Response response = client.PutObject(request);
Then refresh your AWS console, and you will see the folder
With aws cli, it is possible to copy an entire folder to a bucket.
aws s3 cp /path/to/folder s3://bucket/path/to/folder --recursive
There is also the option to sync a folder using aws s3 sync
This is a divisive topic, so here is a screenshot in 2019 of the AWS S3 console for adding folders and the note:
When you create a folder, S3 console creates an object with the above
name appended by suffix "/" and that object is displayed as a folder
in the S3 console.
Then 'using coding' you can simply adjust the object name by prepending a valid folder name string and a forward slash.
For Swift I created a method where you pass in a String for the folder name.
Swift 3:
import AWSS3
func createFolderWith(Name: String!) {
let folderRequest: AWSS3PutObjectRequest = AWSS3PutObjectRequest()
folderRequest.key = Name + "/"
folderRequest.bucket = bucket
AWSS3.default().putObject(folderRequest).continue({ (task) -> Any? in
if task.error != nil {
assertionFailure("* * * error: \(task.error?.localizedDescription)")
} else {
print("created \(Name) folder")
}
return nil
})
}
Then just call
createFolderWith(Name:"newFolder")
In iOS (Objective-C), I did following way
You can add below code to create a folder inside amazon s3 bucket programmatically. This is working code snippet. Any suggestion Welcome.
-(void)createFolder{
AWSS3PutObjectRequest *awsS3PutObjectRequest = [AWSS3PutObjectRequest new];
awsS3PutObjectRequest.key = [NSString stringWithFormat:#"%#/", #"FolderName"];
awsS3PutObjectRequest.bucket = #"Bucket_Name";
AWSS3 *awsS3 = [AWSS3 defaultS3];
[awsS3 putObject:awsS3PutObjectRequest completionHandler:^(AWSS3PutObjectOutput * _Nullable response, NSError * _Nullable error) {
if (error) {
NSLog(#"error Creating folder");
}else{
NSLog(#"Folder Creating Sucessful");
}
}];
}
Here's how you can achieve what you're looking for (from code/cli):
--create/select the file (locally) which you want to move to the folder:
~/Desktop> touch file_to_move
--move the file to s3 folder by executing:
~/Desktop> aws s3 cp file_to_move s3://<path_to_your_bucket>/<new_folder_name>/
A new folder will be created on your s3 bucket and you'll now be able to execute cp, mv, rm ... statements i.e. manage the folder as usual.
If this new file created above is not required, simply delete it. You now have an s3 bucket created.
You can select language of your choice from available AWS SDK
Alternatively you can try minio client libraries available in Python, Go, .Net, Java, Javascript for your application development environment, it has example directory with all basic operations listed.
Disclaimer: I work for Minio
In swift 2.2 you can create folder using
func createFolderWith(Name: String!) {
let folderRequest: AWSS3PutObjectRequest = AWSS3PutObjectRequest()
folderRequest.key = Name + "/"
folderRequest.bucket = "Your Bucket Name"
AWSS3.defaultS3().putObject(folderRequest).continueWithBlock({ (task) -> AnyObject? in
if task.error != nil {
assertionFailure("* * * error: \(task.error?.localizedDescription)")
} else {
print("created \(Name) folder")
}
return nil
})
}
Below creates a empty directory called "mydir1".
Below is nodejs code, it should be similar for other languages.
The trick is to have slash (/) at the end of the name of object, as in "mydir1/", otherwise a file with name "mydir1" will be created.
let AWS = require('aws-sdk');
AWS.config.loadFromPath(__dirname + '\\my-aws-config.json');
let s3 = new AWS.S3();
var params = {
Bucket: "mybucket1",
Key: "mydir1/",
ServerSideEncryption: "AES256" };
s3.putObject(params, function (err, data) {
if (err) {
console.log(err, err.stack); // an error occurred
return;
} else {
console.log(data); // successful response
return;
/*
data = {
ETag: "\"6805f2cfc46c0f04559748bb039d69ae\"",
ServerSideEncryption: "AES256",
VersionId: "Ri.vC6qVlA4dEnjgRV4ZHsHoFIjqEMNt"
}
*/
} });
Source: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
in-order to create a directory inside s3 bucket and copy contents inside that is pretty simple.
S3 command can be used:
aws s3 cp abc/def.txt s3://mybucket/abc/
Note: / is must that makes the directory, otherwise it will become a file in s3.
I guess your query is just simply creating a folder inside folder(subfolder).
so while coping any directory data inside a bucket sub folder use command like this.
aws s3 cp mudit s3://mudit-bucket/Projects-folder/mudit-subfolder --recursive
It will create a subfolder and put ur directory contents in it. Also once your subfolder data gets empty. Your Subfolder will automatically gets deleted.
You can use copy command to create a folder while copy a file.
aws s3 cp test.xml s3://mybucket/myfolder/test.xml