Pentaho Kafka producer example - pentaho

I am trying to create a transformation using Kafka producer and consumer in Pentaho Data Integration. Is there any example available in Pentaho for Apache Kafka Producer and Consumer? or can you please let me know how to create the transformation?

You need to download a plugin to enable your apache kafka. This DOC
shows you how to use Apache Kafka in PDI. Hope it helps :)

Related

Kafka S3 Sink basic doubts

Do I really need to use confluent (CLI maybe)? Can I write my custom connector?
How can I write my first Kafka Sink? How to deploy them?
For now, let's assume we have the following details:
Topic: curious.topic
S3 bucket name: curious.s3
Data in the topic: Text/String
My OS: Mac
You start at the documentation for S3 Sink, looking over the configuration properties, and understand how to run Connect itself and deploy any connector (use the REST API); no, confluent CLI is never needed.
You don't need to "write your own sink" because Confluent already has an S3 Sink Connector. Sure, you could fork their open-source repo, and compile it yourself, but that doesn't seem to be what you're asking.
You can download the connector using different command confluent-hub.
Note: pinterest/secor does the same thing, without Kafka Connect.

Apache Flume Kafka Producer - Generate partition_id/key dynamically

I have a task to use Apache Flume to send messages to a Kafka topic. The caveat is that I have to specify a partition based on an IP address that will be in the message.
Is there a way to configure Apache Flume to do this dynamically, or do I have to implement a custom Producer plugin?
Thank you.
After some research, the only real custom way of configuring the producer is to roll your own.
I implemented the AbstractSink interface and put in my own hash algorithm to generate a partition to send the message to.

Integrating Kafka and sql server 2008

I have a SQL 2008 R2 database that i would like to integrate with Kafka
so essentially I want to use Change data capture to capture changes in my table and put them on a Kafka Queue - this is for the front end Devs to read the data off Kafka. Has anyone done this before or have any tips on how to go about it?
Kafka Connectors will solve this problem now in particular the JDBC connector.
The JDBC connector allows you to import data from any relational
database with a JDBC driver into Kafka topics. By using JDBC, this
connector can support a wide variety of databases without requiring
custom code for each one.
Source: http://docs.confluent.io/3.0.0/connect/connect-jdbc/docs/jdbc_connector.html
See also:
Kafka Connect JDBC Connector source code on GitHub
Kafka Connect Documentation
There is no way you can do it directly from Sql server. You have to write your own producer that will pull day from Sql, and push to Kafka queue. We are currently doing the same thing via background services that pushed data to Kafka

How do I use ActiveMQ in Apache Flink?

I am getting my data through ActiveMQ which I want to process in real time with Apache Flink DataStreams. There is support for many messaging services like RabbitMQ and Kafka but I can't see any support for ActiveMQ. How can I use it?
Since there is not support for ActiveMQ, I would recommend implement a custom source.
You basically have to implement the SourceFunction interface.
If you want to have exactly-once semantics, you can base your implementation on the MultipleIdsMessageAcknowledgingSourceBase class.
I would recommend you to start with implementing a SourceFunction
Found a JMS connector for Flink:
https://github.com/jkirsch/senser/blob/master/src/main/java/edu/tuberlin/senser/images/flink/io/FlinkJMSStreamSource.java

Implementing kafka consumer in spark-1.0

I need to implement kafka consumer in spark streaming for spark 1.0 . I have written a kafka producer. Can anyone please help me on how to write a spark receiver for pulling the messages from kafka? Also, May I please know how to run kafka spark streaming project in Intellij IDEA?
Spark streaming comes with its own consumer.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.0.2</version>
</dependency>
And in the official repository exists some examples for streaming and a wordcount for kafka.
In intellij just import spark-streaming and spark-streaming-kafka modules and write a simple main like the example.
https://github.com/dibbhatt/kafka-spark-consumer
..
This utility will help to pull messages from Kafka Cluster using Spark
Streaming. The Kafka Consumer is Low Level Kafka Consumer (
SimpleConsumer) and have better handling of the Kafka Offsets and
handle failures
..
And according to this blog post,
provides better replaying control in case of downstream failures (e.g. Spark machines died).
use KafkaUtils.createStream();
Here's the API: https://spark.apache.org/docs/1.0.2/api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html