How can I get notification about new S3 objects? - amazon-s3

I have a scenario where we have many clients uploading to s3.
What is the best approach to knowing that there is a new file?
Is it realistic/good idea, for me to poll the bucket ever few seconds?

UPDATE:
Since November 2014, S3 supports the following event notifications:
s3:ObjectCreated:Put – An object was created by an HTTP PUT operation.
s3:ObjectCreated:Post – An object was created by HTTP POST operation.
s3:ObjectCreated:Copy – An object was created an S3 copy operation.
s3:ObjectCreated:CompleteMultipartUpload – An object was created by the completion of a S3 multi-part upload.
s3:ObjectCreated:* – An object was created by one of the event types listed above or by a similar object creation event added in the future.
s3:ReducedRedundancyObjectLost – An S3 object stored with Reduced Redundancy has been lost.
These notifications can be issued to Amazon SNS, SQS or Lambda. Check out the blog post that's linked in Alan's answer for more information on these new notifications.
Original Answer:
Although Amazon S3 has a bucket notifications system in place it does not support notifications for anything but the s3:ReducedRedundancyLostObject event (see the GET Bucket notification section in their API).
Currently the only way to check for new objects is to poll the bucket at a preset time interval or build your own notification logic in the upload clients (possibly based on Amazon SNS).

Push notifications are now built into S3:
http://aws.amazon.com/blogs/aws/s3-event-notification/
You can send notifications to SQS or SNS when an object is created via PUT or POST or a multi-part upload is finished.

Your best option nowadays is using the AWS Lambda service. You can write a Lambda using either node.js javascript, java or Python (probably more options will be added in time).
The lambda service allows you to write functions that respond to events from S3 such as file upload. Cost effective, scalable and easy to use.

You can implement a pub-sub mechanism relatively simply by using SNS, SQS, and AWS Lambda. Please see the below steps. So, whenever a new file is added to the bucket a notification can be raised and acted upon that (everything is automated)
Please see attached diagram explaining the basic pub-sub mechanism
Step 1
Simply configure the S3 bucket event notification to notify an SNS topic. You can do this from the S3 console (Properties tab)
Step 2
Make an SQS Queue subscribed to this topic. So whenever an object is uploaded to the S3 bucket a message will be added to the queue.
Step 3
Create an AWS Lambda function to read messages from the SQS queue. AWS Lambda supports SQS events as a trigger. Therefore, whenever a message appears in the SQS queue, Lambda will trigger and read the message. Once successfully read the message it will be automatically deleted from the queue. For the messages that can't be processed by Lambda (erroneous messages) will not be deleted. So those messages will pile up in the queue. To prevent this behavior using a Dead Letter Queue (DLQ) is a good idea.
In your Lambda function, add your logic to handle what to do when users upload files to the bucket
Note: DLQ is nothing more than a normal queue.
Step 4
Debugging and analyzing the process
Make use of AWS Cloud watch to log details. Each Lambda function creates a log under log groups. This is a good place to check if something went wrong.

Related

Monitor S3 - Send an alert if more than 5 minutes have passed since a last file was written

I have a program that uploads files to S3 every 5 minutes.
I need to monitor it. So I want to check every 10 minutes what is the time of the last file uploaded and if it more than X minutes sends an alert (email) about it.
I understand that I need to use CloudWatch and Lambda. But I don't know how to do it.
Any help, please.
The following AWS products should help you build this:
AWS EventBridge (formerly known as CloudWatch Events)
AWS Lambda
AWS SES
Solution outline:
Create your Lambda function.
Create a scheduled event rule in EventBridge.
When creating the rule, use a rate of 10 minutes.
Set your Lambda from step 1. as target of your rule.
When your Lambda is triggered, run your business logic to check when the last file was uploaded.
If you need to send an email, you can use AWS SES to send it to your recipients.
Important:
You need to allow AWS EventBridge to call your Lambda. If you do all of this in the AWS console, the required permissions should be set automatically. If you use CloudFormation, Terraform or SAM, you probably need to add those permissions to your Lambda.

Should we handle a lambda container crash?

Reading a lot about error handling for AWS Lambdas and nothing covers to topic of a running Lambda container just crashing.
Is this a possibility because it seems like one? I'm building an event driven system using Lambdas, triggered by a file upload to S3 and am uncertain if I should bother building in logic to pickup processing if a lambda has died.
e.g. File object is create on S3 -> S3 notifies Lambda of the event -> Lambda instance happens to crash before it can start processing -> Event is now gone forever* (assumption here, I'm unsure if that's true, but can't find anything to say the contrary).
I'm debating building in logic to reconcile what is on S3 and what was processed each day so I can detect the (albeit rare) scenario where a Lambda died (died and couldn't write a failure to a DLQ) and we need to process these files. Is this worth it? Would S3 somehow know that the lambda died and it needs to put the event on a DLQ of its own?
From https://docs.aws.amazon.com/fr_fr/lambda/latest/dg/with-s3.html AWS S3 are async.
Next from https://docs.aws.amazon.com/lambda/latest/dg/invocation-retries.html, Async lambda invocation are retries twice without any queuing.
I guess if more tries are needed, better to setup a SNS/SQS queuing.

Download all objects from S3 Bucket and send content to SQS

I am using python boto3 to get all the objects in a bucket but it returns the keys and not the content
I have a service which reads messages from SQS (a duplicate message is also present in s3 bucket) and does some operations. I have lost some sqs messages because of some failures and sqs 14 day policy.
The files have json data, with each file ranging from 4-8kb.
Now I want to re-drive all the objects from s3 to SQS.
Is there a way to get content of all files and then transfer them to SQS ?
Turning John Rotenstein's comment into an answer:
Is there a way to get content of all files and then transfer them to SQS ?
No. You will have to write something yourself, probably in the same way that you stored it in the first place. There is no automated method to move data from S3 to SQS

AWS: Broadcast notifications for multiple worker processes running on multiple instances

I have multiple application instances inside of Amazon EC2, each running several worker processes. What I want is each worker process to be subscribed to some notification(e.g. configuration change). This notification should be basically broadcast message, so that once it is sent - every worker receives it.
I know SQS does not support messages broadcast. Looking through similar questions/threads I see the suggestions to use SNS instead of SQS. I'm not sure this will work for me due to the following reasons:
application instances are part of autoscaling group so they can be dynamically added and removed. In this case I don't see any clear way to unsubscribe every worker(I have multiple workers per instance) once instance gets terminated, which means I'll end up with the mess of dead subscribers after some time.
protocol to use for subscription is also not clear. HTTP endpoint looks like the only option, which means my every worker should run HTTP server on its own port. It also looks I should listen only on instance public IP, which adds one more layer of complexity and insecurity.
At the moment I have a solution based on third party - I'm using 0MQ pub/sub server. But I'm looking for some out-of-box solutions AWS provides.
Thanks,
Vovan
The out-of-the-box AWS solution that comes to mind would be to create one SNS topic, and then for each instance, when the instance boots up, it would create its own SQS queue and subscribe the queue to the SNS topic, so that each individual queue gets a broadcast copy of each message you publish to SNS.
You'd want unsubscribe and delete these queues on instance termination, which could be done with lifecycle hooks.
If you didn't want to use a server to manage the processing of the lifecycle hooks (which publish the launch or termination events to SNS or SQS) you could create an AWS API Gateway endpoint to fire an AWS Lambda function, then subscribe the API Gateway endpoint to the SNS topic using https, to handle the cleanup tasks in Lambda, with no server needed.
That's several services working together and may sound a little complicated, but would be very inexpensive and require little maintenance or attention.
One more solution I've figured out is to use Amazon Kinesis.The implication here is that each subscriber has to maintain it's own checkpoint to receive only most recent notifications.
I realize this is an old thread, but I'd like to share my experience with this. Kinesis has a 5 reads/sec throttle. So if you have 10 nodes polling for events in the stream 1/sec, you're going to be in a constant state of throttling.
Kinesis looks to be primarily for massive writes with just a few readers, which doesn't quite fit a broadcast to many nodes use-case.
Redis is handy solution for broadcasting a message to all subscribers on a topic. It is convenient because it can be used as a docker container for rapid prototyping, but is also offered by AWS as a managed service for multi-node clusters.

Get notified when user uploads to an S3 bucket? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Notification of new S3 objects
We've got an app that stores user data on S3. The part of our app that handles the uploads is decoupled from the part that processes the data. In some cases, the user will be able to upload data directly to S3 without going through our app at all (this may happen if they have their own S3 account and supply us with credentials).
Is it possible to get notified whenever the contents of an S3 bucket change? It would be cool if somehow a message could get sent that says "this file was added/updated/deleted: foo".
Short of that, is there some timestamp somewhere I could poll that would tell the last time the bucket was updated?
If I can't do either of these things, then the only alternative is the crawl the entire bucket and look for changes. This will be slow and expensive.
Update 2014-11:
As Alan Illing points out in the comments, AWS now supports notifications from S3 to SNS, which can be forwarded automatically to SQS: http://aws.amazon.com/blogs/aws/s3-event-notification/
S3 can also send notifications to AWS Lambda to run your own code directly.
Original response that predicted S3->SNS notifications:
If Amazon supported this, they would use SNS to send out notifications that an object has been added to a bucket. However, at the moment, the only bucket event supported by S3 and SNS is to notify you when Amazon S3 detects that it has lost all replicas of a Reduced Redundancy Storage (RRS) object and can no longer service requests for that object.
Here's the documentation on the SNS events supported by S3:
http://docs.amazonwebservices.com/AmazonS3/latest/dev/NotificationHowTo.html
Based on the way that the documentation is written, it looks like Amazon has ideas for other notification events to add (like perhaps your idea for finding out when new keys have been added).
Given that it isn't supported directly by Amazon, the S3 client that uploads the object to S3 will need to trigger the notification, or you will need to do some sort of polling.
Custom event notification for uploads to S3 could be done using SNS if you like to get near-real-time updates for processing, or it can be done through SQS if you like to let the notifications pile up and process them out of a queue at your own pace.
If you are polling, you could reduce the number of keys you need to request by having the client upload with a prefix of, say, "unprocessed/..." followed by the unique key. Your polling software can then query just S3 keys starting with that prefix. When it is ready to process, it could change the key to "processing/..." and then later to "processed/..." or whatever. Objects in S3 are currently renamed by copy+delete operations performed by S3.