Aws lambda getting executed multiple times - amazon-s3

I have implemented a simple lambda function which gets triggered whenever there is objected created on s3 bucket.
Whenever an object is created on S3 the lambda gets triggered.However , once the lambda is triggered, the lambda keeps executing at a certain interval even if there is upload on S3 bucket.
Any suggestions would be really helpful.

Your function is timing out because you aren't calling the callback or using the context.succeed() method. I believe retry is two with backoff for errors, but with timeout, S3 will keep retrying for a period of time that is not guaranteed but is usually quite long (a day?)

Related

Lambda invocations problem with PutObject from S3 bucket

I have created a Lambda function which trigger whenever there is PutObject event in S3 bucket. However, for example out of 1 million requests of S3 PutObject event only 500k times the lambda invocations happen. Ideally it should invoke exactly 1 million times.
How can we troubleshoot the issue?
We tried CloudTrail and Cloud watch to check logs but no luck.
Note: At present, concurrency is set to 30
Screenshot which shown sharp decline in the graph.

Monitor S3 - Send an alert if more than 5 minutes have passed since a last file was written

I have a program that uploads files to S3 every 5 minutes.
I need to monitor it. So I want to check every 10 minutes what is the time of the last file uploaded and if it more than X minutes sends an alert (email) about it.
I understand that I need to use CloudWatch and Lambda. But I don't know how to do it.
Any help, please.
The following AWS products should help you build this:
AWS EventBridge (formerly known as CloudWatch Events)
AWS Lambda
AWS SES
Solution outline:
Create your Lambda function.
Create a scheduled event rule in EventBridge.
When creating the rule, use a rate of 10 minutes.
Set your Lambda from step 1. as target of your rule.
When your Lambda is triggered, run your business logic to check when the last file was uploaded.
If you need to send an email, you can use AWS SES to send it to your recipients.
Important:
You need to allow AWS EventBridge to call your Lambda. If you do all of this in the AWS console, the required permissions should be set automatically. If you use CloudFormation, Terraform or SAM, you probably need to add those permissions to your Lambda.

Should we handle a lambda container crash?

Reading a lot about error handling for AWS Lambdas and nothing covers to topic of a running Lambda container just crashing.
Is this a possibility because it seems like one? I'm building an event driven system using Lambdas, triggered by a file upload to S3 and am uncertain if I should bother building in logic to pickup processing if a lambda has died.
e.g. File object is create on S3 -> S3 notifies Lambda of the event -> Lambda instance happens to crash before it can start processing -> Event is now gone forever* (assumption here, I'm unsure if that's true, but can't find anything to say the contrary).
I'm debating building in logic to reconcile what is on S3 and what was processed each day so I can detect the (albeit rare) scenario where a Lambda died (died and couldn't write a failure to a DLQ) and we need to process these files. Is this worth it? Would S3 somehow know that the lambda died and it needs to put the event on a DLQ of its own?
From https://docs.aws.amazon.com/fr_fr/lambda/latest/dg/with-s3.html AWS S3 are async.
Next from https://docs.aws.amazon.com/lambda/latest/dg/invocation-retries.html, Async lambda invocation are retries twice without any queuing.
I guess if more tries are needed, better to setup a SNS/SQS queuing.

Load Control Function in AWS Step Function

An AWS Step Function State Machine has a Lambda Function at its core, that does heavy writes to a S3 bucket. When the State Machine gets a usage spike, the function starts failing due to S3 blocking further requests (com.amazonaws.services.s3.model.AmazonS3Exception: Please reduce your request rate.). This obviously leads to failures of the state machine execution as a whole and it takes the whole system some minutes to fully recover.
I looked into the AWS Lambda Function Scaling Documentation and found out, that when we reduce the reserved concurrency flag, the function will start to return 429 status codes, as soon as it can't handle new events.
So my idea to load control the function execution can be summarized as following:
Set the reserved concurrency to some lower value.
Catching the 429 errors in the step function and retrying with a backoff rate.
I'd like to have feedback from you guys, on the following aspects:
a. Does my approach make sense or am I missing some obvious better way? I first thought of looking into managing the load with AWS SQS or some execution wide locking/semaphore but didn't really see any further.
b. Is there maybe another way to tackle the issue from the S3 side?
This approach worked well for me:
States:
MyFunction:
Type: Task
End: true
Resource: "..."
Retry:
- ErrorEquals:
- TooManyRequestsException
IntervalSeconds: 30
MaxAttemtps: 5
BackoffRate: 2

How can I get notification about new S3 objects?

I have a scenario where we have many clients uploading to s3.
What is the best approach to knowing that there is a new file?
Is it realistic/good idea, for me to poll the bucket ever few seconds?
UPDATE:
Since November 2014, S3 supports the following event notifications:
s3:ObjectCreated:Put – An object was created by an HTTP PUT operation.
s3:ObjectCreated:Post – An object was created by HTTP POST operation.
s3:ObjectCreated:Copy – An object was created an S3 copy operation.
s3:ObjectCreated:CompleteMultipartUpload – An object was created by the completion of a S3 multi-part upload.
s3:ObjectCreated:* – An object was created by one of the event types listed above or by a similar object creation event added in the future.
s3:ReducedRedundancyObjectLost – An S3 object stored with Reduced Redundancy has been lost.
These notifications can be issued to Amazon SNS, SQS or Lambda. Check out the blog post that's linked in Alan's answer for more information on these new notifications.
Original Answer:
Although Amazon S3 has a bucket notifications system in place it does not support notifications for anything but the s3:ReducedRedundancyLostObject event (see the GET Bucket notification section in their API).
Currently the only way to check for new objects is to poll the bucket at a preset time interval or build your own notification logic in the upload clients (possibly based on Amazon SNS).
Push notifications are now built into S3:
http://aws.amazon.com/blogs/aws/s3-event-notification/
You can send notifications to SQS or SNS when an object is created via PUT or POST or a multi-part upload is finished.
Your best option nowadays is using the AWS Lambda service. You can write a Lambda using either node.js javascript, java or Python (probably more options will be added in time).
The lambda service allows you to write functions that respond to events from S3 such as file upload. Cost effective, scalable and easy to use.
You can implement a pub-sub mechanism relatively simply by using SNS, SQS, and AWS Lambda. Please see the below steps. So, whenever a new file is added to the bucket a notification can be raised and acted upon that (everything is automated)
Please see attached diagram explaining the basic pub-sub mechanism
Step 1
Simply configure the S3 bucket event notification to notify an SNS topic. You can do this from the S3 console (Properties tab)
Step 2
Make an SQS Queue subscribed to this topic. So whenever an object is uploaded to the S3 bucket a message will be added to the queue.
Step 3
Create an AWS Lambda function to read messages from the SQS queue. AWS Lambda supports SQS events as a trigger. Therefore, whenever a message appears in the SQS queue, Lambda will trigger and read the message. Once successfully read the message it will be automatically deleted from the queue. For the messages that can't be processed by Lambda (erroneous messages) will not be deleted. So those messages will pile up in the queue. To prevent this behavior using a Dead Letter Queue (DLQ) is a good idea.
In your Lambda function, add your logic to handle what to do when users upload files to the bucket
Note: DLQ is nothing more than a normal queue.
Step 4
Debugging and analyzing the process
Make use of AWS Cloud watch to log details. Each Lambda function creates a log under log groups. This is a good place to check if something went wrong.