Lambda not invoking if the uploaded files are large in size in s3 bucket? - amazon-s3

I have created a lambda which would invoke and do the transformation based on the event in the target source bucket.
This is working fine when I upload the small size of file in the targeted source bucket.
But when I upload large file(eg: 65 mb file), it looks lambda not invoking based on that event..
Appreciate if anyone can help on this kind of issue?
Thanks

I am guessing, big files would be uploaded on S3 via S3 Multipart Upload instead of a regular put-object operation.
Maybe your Lambda function is just subscribed to s3:ObjectCreated:Put events. You need to add s3:ObjectCreated:CompleteMultipartUpload permission to Lambda as well.

The large files in S3 are uploaded via S3 Multipart Upload instead of a regular PUT or single part upload process.
There can be two problems
``In your lambda you probably have created the subscription for s3:ObjectCreated:Put events. You should add s3:ObjectCreated:CompleteMultipartUpload too in the Lambda subscription list.
Your lambda timeout could be small for and that works for the smaller files. You might want to increase that.

There could be any of these issues:
Your event only captures s3:ObjectCreated:Put event, as others have mentioned. Usually if it's a big file, the event is s3:ObjectCreated:CompleteMultipartUpload instead. You could either (a) add s3:ObjectCreated:CompleteMultipartUpload event to your capture, or (b) simply use s3:ObjectCreated:* event - this will include Put, MultiPart Upload, Post, Copy, and also other similar events to be added in the future (source: https://aws.amazon.com/blogs/aws/s3-event-notification/)
Your Lambda function might run longer limit you set (limit is 15min).
Your Lambda function requires more memory than the limit you set.
Your Lambda function requires more disk space than the limit you set. This may be an issue if your function downloads the data on disk first and perform transformation there (limit is 512MB).

Related

get notified if objects is there in a particular folders of a s3 bucket for more than 7 hours

I have a lambda to process the files in a folder of a s3 bucket. I would like to setup an alarm/notification if objects are in the folders for more than 7 hours and not processed by the lambda
You can use the tags of objects in S3, have something like tag name Processed : true or false changed by your lambda processor.
Then in your another scheduled lambda you can check the creation object if > 7h and processed : false (that means not processed by the lambda), if found you create a notification in SNS
Set object expiration to 7 hours for the S3 bucket. Then have a lambda get triggered by the delete event. The lambda can be one that notifies you and saves the file into another bucket or forwards it to your original lambda. The lambda triggered could be the one that should have been triggered when uploading the object.
Alternatively, you can add tags to the uploaded files. A tag could be ttl: <date-to-delete>. You have a CloudWatch scheduled event that runs a lambda, for instance every hour, and checks for all objects in the S3 bucket whether the ttl-tag's value is older than the current time.
Personally, I would go with the first approach as it's more event driven and less scheduled processing.
On another note, it's rather strange that the lambda doesn't get triggered for some S3 objects. I don't know how you deploy your Lambda and configure the trigger to S3. If you're using serverless or CDK I don't see how your lambda doesn't get triggered for every uploaded file with a configuration similar to the following:
// serverless example
functions:
users:
handler: users.handler
events:
- s3:
bucket: photos
event: s3:ObjectCreated:*
rules:
- prefix: uploads/
- suffix: .jpg
In this example the users lambda gets triggered for every jpg file that gets created in photos/uploads.

AWS Lambda function trigger on object creation in S3 does not work

I am doing an upload like this:
curl -v -X PUT -T "test.xml" -H "Host: my-bucket-upload.s3-eu-central-1.amazonaws.com" -H "Content-Type: application/xml" https://my-bucket-upload.s3-eu-central-1.amazonaws.com/test.xml
The file gets uploaded and I can see it in my S3 bucket.
The trick is, when I try to create a lambda function to be triggered on creation, it never gets invoked. If I upload the file using the S3 web interface, it works fine. What am I doing wrong? Is there any clear recipe on how to do it?
Amazon S3 APIs such as PUT, POST, and COPY can create an object. Using
these event types, you can enable notification when an object is
created using a specific API, or you can use the s3:ObjectCreated:*
event type to request notification regardless of the API that was used
to create an object.
Check the notification event setup on the bucket
Go to bucket on AWS management console
Click the properties tab on the bucket
Click the Events to check the notification event setup
Case 1:
s3:ObjectCreated:* - Lambda should be invoked regardless of PUT, POST or COPY
Other case:-
If the event is setup for specific HTTP method, use that method on
your CURL command to create the object on S3 bucket. This way it
should trigger the Lambda function
Check the prefix in bucket/properties.
If there is a world like foo/, that means that only the objects inside the foo folder will trigger the evert to lambda.
Make sure the prefix you're adding contains safe special characters mentioned here. As per AWS documentation, some characters require special handling. Please be mindful of that.
Also, I noticed modifying the trigger on lambda page doesn't get applied until you delete the trigger and new one (even if it is same). Learned hard way. AWS does behaves weird sometime.
Faced similar issues and figured our that the folder names should not have spaces.

Create AWS lambda event source to trigger file create by aws cli cp command

I want to create a AWS lambda event source to catch the action of upload a file via aws cli cp command, but it couldn't be triggered when i upload a file. Here is what i have done:
I configured the event source as following:
I have tried all the four option of Object Created event type, it just didn't work.
I use the aws cli as following:
aws s3 cp sample.html s3://ml.hengwei.me/data/
Is there anywhere i miss configured?
You are triggering your Lambda from the wrong event type.
Using the awscli to cp files up into S3 does not cause an s3:ObjectCreated:Copy event (which I believe relates to an S3 copy operation, copying an object from one bucket to another). In your case, the object is being uploaded to S3 and I presume that it results in either s3:ObjectCreated:Put or s3:ObjectCreated:CompleteMultipartUpload.
The events include:
s3:ObjectCreated:Put – An object was created by an HTTP PUT
operation.
s3:ObjectCreated:Post – An object was created by HTTP POST
operation.
s3:ObjectCreated:Copy – An object was created an S3 copy
operation.
s3:ObjectCreated:CompleteMultipartUpload – An object was
created by the completion of a S3 multi-part upload.
s3:ObjectCreated:* – An object was created by one of the event types
listed above or by a similar object creation event added in the
future.
Full list of events is here. Note that the awscli may or may not use multi-part upload so you need to handle both situations.

Replacing bytes of an uploaded file in Amazon S3

I understand that in order to upload a file to Amazon S3 using Multipart, the instructions are here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/llJavaUploadFile.html
How do I go about replacing the bytes (say, between the range 4-1523) of an uploaded file? Do I need to make use of Multipart Upload to achieve this? or do I fire a REST call with the range specified in the HTTP header?
Appreciate any advice.
Objects in S3 are immutable.
If it's a small object, you'll need to upload the entire object again.
If it's an object over 5MB in size, then there is a workaround that allows you to "patch" a file, using a modified approach to the multipart upload API.
Background:
As you know, a multipart upload allows you to upload a file in "parts," with minimum part size 5MB and maximum part count 10,000.
However a multipart "upload" doesn't mean you have to "upload" all the data again, if some or all of it already exists in S3, and you can address it.
PUT part/copy allows you to "upload" the individual parts by specifying octet ranges in an existing object. Or more than one object.
Since uploads are atomic, the "existing object" can be the object you're in the process of overwriting, since it remains unharmed and in place until you actually complete the multipart upload.
But there appears to be nothing stopping you from using the copy capability to provide the data for the parts you want to leave the same, avoiding the actual upload then using a normal PUT part request to upload the parts that you want to have different content.
So, while not a byte-range patch with granularity to the level of 1 octet, this could be useful for emulating an in-place modification of a large file. Examples of valid "parts" would be replacing a minimum 5 MB chunk, on a 5MB boundary, for files smaller than 50GB, or replacing a mimimum 500MB chunk on 500MB boundary for objects up to 5TB, with minimum part sizes varying between those to extremes, because of the requirement that a multipart upload have no more than 10,000 parts. The catch is that a part must start at an appropriate offset, and you need to replace the whole part.
Michael's answer is pretty explanatory on the background of the issue. Just adding the actual steps to be performed to achieve this, in case you're wondering.
List object parts using ListParts
Identify the part that has been modified
Start a multipart upload
Copy the unchanged parts using UploadPartCopy
Upload the modified part
Finish the upload to save the modification
Skip 2 if you already know which part has to be changed.
Tip: Each part has an ETag, which is MD5 hash of the specified part. This can be used to verify is that particular part has been changed.

I need Multi-Part DOWNLOADS from Amazon S3 for huge files

I know Amazon S3 added the multi-part upload for huge files. That's great. What I also need is a similar functionality on the client side for customers who get part way through downloading a gigabyte plus file and have errors.
I realize browsers have some level of retry and resume built in, but when you're talking about huge files I'd like to be able to pick up where they left off regardless of the type of error out.
Any ideas?
Thanks,
Brian
S3 supports the standard HTTP "Range" header if you want to build your own solution.
S3 Getting Objects
I use aria2c. For private content, you can use "GetPreSignedUrlRequest" to generate temporary private URLs that you can pass to aria2c
S3 has a feature called byte range fetches. It’s kind of the download compliment to multipart upload:
Using the Range HTTP header in a GET Object request, you can fetch a byte-range from an object, transferring only the specified portion. You can use concurrent connections to Amazon S3 to fetch different byte ranges from within the same object. This helps you achieve higher aggregate throughput versus a single whole-object request. Fetching smaller ranges of a large object also allows your application to improve retry times when requests are interrupted. For more information, see Getting Objects.
Typical sizes for byte-range requests are 8 MB or 16 MB. If objects are PUT using a multipart upload, it’s a good practice to GET them in the same part sizes (or at least aligned to part boundaries) for best performance. GET requests can directly address individual parts; for example, GET ?partNumber=N.
Source: https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html
Just updating for current situation, S3 natively supports multipart GET as well as PUT. https://youtu.be/uXHw0Xae2ww?t=1459.
NOTE: For Ruby user only
Try aws-sdk gem from Ruby, and download
object = AWS::S3::Object.new(...)
object.download_file('path/to/file.rb')
Because it download a large file with multipart by default.
Files larger than 5MB are downloaded using multipart method
http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Object.html#download_file-instance_method