Kafka S3 Source Connector

Kafka S3 Source Connector - amazon-s3

I have a requirement where sources outside of our application will drop a file in an S3 bucket that we have to load in a kafka topic. I am looking at Confluent's S3 Source connector and currently working on defining the configuration for setting up the connector in our environment. But a couple of posts indicated that one can use S3 Source connector only if you have used the S3 Sink connector to drop the file in S3.
Is the above true? Where / what property do I use to define the output topic in the configuration? And can the messages be transformed when reading from S3 and putting them in the topic. Both will be JSON / Avro formats.
Confluent's Quick Start example also assumes you have used the S3 Sink connector, hence the questiion.
Thank you

I received a response from Confluent that it is true that the Confluent S3 Source connector can only be used with the Confluent S3 Sink connector. It cannot be used independently

Confluent release version 2.0.0 as of 2021-12-15. This version includes generalized s3 source connection mode

Related

Kafka S3 Sink basic doubts

Do I really need to use confluent (CLI maybe)? Can I write my custom connector?
How can I write my first Kafka Sink? How to deploy them?
For now, let's assume we have the following details:
Topic: curious.topic
S3 bucket name: curious.s3
Data in the topic: Text/String
My OS: Mac

You start at the documentation for S3 Sink, looking over the configuration properties, and understand how to run Connect itself and deploy any connector (use the REST API); no, confluent CLI is never needed.
You don't need to "write your own sink" because Confluent already has an S3 Sink Connector. Sure, you could fork their open-source repo, and compile it yourself, but that doesn't seem to be what you're asking.
You can download the connector using different command confluent-hub.
Note: pinterest/secor does the same thing, without Kafka Connect.

Send data from kafka to s3 using python

For my current project, I am working with Kafka (python) and wanted to know if there is any method by which I can send the streaming Kafka data to the AWS S3 bucket(without using Confluent). I am getting my source data from Reddit API.
I even wanted to know whether Kafka+s3 is a good combination for storing the data which will be processed using pyspark or I should skip the s3 step and directly read data from Kafka.

Kafka S3 Connector doesn't require "using Confluent". It's completely free, open source and works with any Apache Kafka cluster.
Otherwise, sure, Spark or plain Kafka Python consumer can write events to S3, but you've not clearly explained what happens when data is in S3, so maybe start with processing the data directly from Kafka

How to send data to AWS S3 from Kafka using Kakfa Connect without Confluent?

I have a local instance of Apache Kafka 2.0.0 , it running very well. In my test I produce and consume data from twitter and put them in a specific topic twitter_tweets and everything is OK. But now I want to consume the topic twitter_tweets with Kafka Connect using de connector Kafka Connect S3 and obviusly store the data in AWS S3 without using Confluent-CLI.
Can I do this without Confluent? Anyone have an example or something to help me?

without Confluent
S3 Sink is open source; so is Apache Kafka Connect.
Connect framework is not specific to Confluent
You may use Kafka Connect Docker image, for example, or you may use confluent-hub to install S3 Connect on your own Kafka Connect installation.

Can Amazon S3 act as Source to Kafka Cluster?

Amazon S3 can be used as Sink in Kafka Cluster,I want to check if Amazon S3 can be used as a source to Kafka cluster.

Kafka Connect - FileSystem Connector has supported S3 as Kafka source connector. But it's not an official supported connector.
This will consume from S3, and duplicate those records into Kafka local-HDD segment files.

S3 connectors to connect with Kafka for streaming data from on-premise to cloud

I want to stream data from on-premise to Cloud(S3) using Kafka. For which I need to intsall kafka on source machine and also on cloud. But I don't want to intsall it on cloud. I need some S3 connector through which I can connect with kafka and stream data from on-premise to cloud.

If your data is in Avro or Json format (or can be converted to those formates), you can use the S3 connector for Kafka Connect. See Confluent's docs on that
Should you want to move actual (bigger) files via Kafka, be aware that Kafka is designed for small messages and not for file transfers.

There is a kafka-connect-s3 project consisting of both sink and source connector from Spreadfast, which can handle text format. Unfortunately it is not really updated, but works nevertheless

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Kafka S3 Source Connector - amazon-s3

I received a response from Confluent that it is true that the Confluent S3 Source connector can only be used with the Confluent S3 Sink connector. It cannot be used independently

Confluent release version 2.0.0 as of 2021-12-15. This version includes generalized s3 source connection mode

Related

Kafka S3 Sink basic doubts

Send data from kafka to s3 using python

How to send data to AWS S3 from Kafka using Kakfa Connect without Confluent?

Can Amazon S3 act as Source to Kafka Cluster?

S3 connectors to connect with Kafka for streaming data from on-premise to cloud

Categories

Resources