I'm pretty new to S3. I'm trying to create a Bucket and receive notifications on Object Created events using code only (not with the AWS Management UI).
I'm writing in dotnet so I'm using the AWSSDK.Core nuget package.
Until now I've managed to create a bucket using the sdk.
It seems like a trivial task though I couldn't find references around the web to accomplish it.
Also, the object storage is S3 compatible, not AWS S3.
I tried configuring a SNS Topic, but it seems that in order to enable notifications, the API requires SQS as a Queueing service, not RabbitMQ.
I did see another approach - configuration of a lambda function that transfers messages to RabbitMQ, but couldn't find references and documentation as well.
Any help is appreciated :)
Related
Do I really need to use confluent (CLI maybe)? Can I write my custom connector?
How can I write my first Kafka Sink? How to deploy them?
For now, let's assume we have the following details:
Topic: curious.topic
S3 bucket name: curious.s3
Data in the topic: Text/String
My OS: Mac
You start at the documentation for S3 Sink, looking over the configuration properties, and understand how to run Connect itself and deploy any connector (use the REST API); no, confluent CLI is never needed.
You don't need to "write your own sink" because Confluent already has an S3 Sink Connector. Sure, you could fork their open-source repo, and compile it yourself, but that doesn't seem to be what you're asking.
You can download the connector using different command confluent-hub.
Note: pinterest/secor does the same thing, without Kafka Connect.
I want to view AmazonS3 JMX metrics. Specifically the number of retries performed by the client. The only problem is that according to this article it seems that it supports only Cloudwatch, and I want to see it momentarily using jconsole.
Enabling without credential file, doesn't seem to work.
Is there some kind of workaround?
Im quite new to AWS and also new to Kafka (using Confluent platform and .NET) .
We will receive large files (~1-40+Mb) to our S3-bucket and the consuming side of this should process these files. We will have all our messaging over Kafka.
Ive read that you should not send large files over Kafka, but maybe Im misinformed here?
If we instead want to just get an event that a new file has arrived on our S3-bucket (and of course some kind of reference to it), how would we go about?
You can receive notifications about events that happen in your S3 bucket like when a new object is created/deleted etc.
From the S3 documentation (as of writing this), the following destinations are supported:
Simple Notification Service (SNS)
Simple Queue Service (SQS)
AWS Lamdba function
For instance, you can choose SQS as your S3 notification destination and use Kafka SQS Source Connector to stream the events to Kafka.
Then you can write your Kafka consumer applications that react to this events.
And yes, it is not recommended to send large files over Kafka. Just send pointers to them and let the consumer application fetch the information using those pointers. If you are consumer wants to fetch some s3 objects, configure your consumer to use the S3 SDKs.
Useful resources:
Enabling event notifications in S3
S3 Notification Event Structure (JSON) with examples
Kafka SQS Source Connector
I've search for examples and I have not found any.
My intention is to use a Redis Stream as a source to Spring Cloud Dataflow and route messages to AWS Kinesis or S3 data sinks
Redis is not listed as a Spring Cloud Dataflow source. Will I have to create a custom binder?
Redis only seems available as a sink with PubSub
There used to be a redis-binder for Spring Cloud Stream, but that has been deprecated for a while now. We have plans to implement a binder for Redis Streams in the future, though.
That said, if you have data in Redis, it'd be good to start building a redis-source as a custom application. We have many suppliers/sources that you can use as a reference.
There's currently also a blog-series in the works, which can be of further guidance when building custom applications.
Lastly, feel free to contribute the redis-supplier/source to the applications repo, we can collaborate on a pull request.
We have a large extended network of users that we track using badges. The total traffic is in the neighborhood of 60 Million impressions a month. We are currently considering switching from a fairly slow, database-based logging solution (custom-built on PHP—messy...) to a simple log-based alternative that relies on Amazon S3 logs and Splunk.
After using Splunk for some other analyisis tasks, I really like it. But it's not clear how to set up a source like S3 with the system. It seems that remote sources require the Universal Forwarder installed, which is not an option there.
Any ideas on this?
Very late answer but I was looking for the same thing and found a Splunk app that does what you want, http://apps.splunk.com/app/1137/. I have yet not tried it though.
I would suggest logging j-son preprocessed data to a documentdb database. For example, using azure queues or simmilar service bus messaging technologies that fit your scenario in combination with azure documentdb.
So I'll keep your database based approach and modify it to be a schemaless easy to scale document based DB.
I use http://www.insight4storage.com/ from AWS Marketplace to track my AWS S3 storage usage totals by prefix, bucket or storage class over time; plus it shows me the previous versions storage by prefix and per bucket. It has a setting to save the S3 data as splunk format logs that might work for your use case, in addition to its UI and webservice API.
You use Splunk Add-On for AWS.
This is what I understand,
Create a Splunk instance. Use the website version or the on-premise
AMI of splunk to create an EC2 where splunk is running.
Install Splunk Add-On for AWS application on the EC2.
Based on the input logs type (e.g. Cloudtrail logs, Config logs, generic logs, etc) configure the Add-On and supply AWS account id or IAM Role, etc parameters.
The Add-On will automatically ping AWS S3 source and fetch the latest logs after specified amount of time (default to 30 seconds).
For generic use case (like ours), you can try and configure Generic S3 input for Splunk