Implementing kafka consumer in spark-1.0 - intellij-idea

I need to implement kafka consumer in spark streaming for spark 1.0 . I have written a kafka producer. Can anyone please help me on how to write a spark receiver for pulling the messages from kafka? Also, May I please know how to run kafka spark streaming project in Intellij IDEA?

Spark streaming comes with its own consumer.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.0.2</version>
</dependency>
And in the official repository exists some examples for streaming and a wordcount for kafka.
In intellij just import spark-streaming and spark-streaming-kafka modules and write a simple main like the example.

https://github.com/dibbhatt/kafka-spark-consumer
..
This utility will help to pull messages from Kafka Cluster using Spark
Streaming. The Kafka Consumer is Low Level Kafka Consumer (
SimpleConsumer) and have better handling of the Kafka Offsets and
handle failures
..
And according to this blog post,
provides better replaying control in case of downstream failures (e.g. Spark machines died).

use KafkaUtils.createStream();
Here's the API: https://spark.apache.org/docs/1.0.2/api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html

Related

How can I configure Redis as a Spring Cloud Dataflow Source?

I've search for examples and I have not found any.
My intention is to use a Redis Stream as a source to Spring Cloud Dataflow and route messages to AWS Kinesis or S3 data sinks
Redis is not listed as a Spring Cloud Dataflow source. Will I have to create a custom binder?
Redis only seems available as a sink with PubSub
There used to be a redis-binder for Spring Cloud Stream, but that has been deprecated for a while now. We have plans to implement a binder for Redis Streams in the future, though.
That said, if you have data in Redis, it'd be good to start building a redis-source as a custom application. We have many suppliers/sources that you can use as a reference.
There's currently also a blog-series in the works, which can be of further guidance when building custom applications.
Lastly, feel free to contribute the redis-supplier/source to the applications repo, we can collaborate on a pull request.

Is Redis a good idea for Spring Cloud Stream? Should I use Kafka or RabbitMQ?

I'm deploying a small Spring Cloud Stream project,
using only http sources and jdbc sinks (3 instances each). The estimated load is 10 hits/second.
I was thinking on using redis because I feel more confortable with it, but in the latest documentation almost all the refereces are to kafka and RabbitMQ so I am wondering if redis is not going to be supported in the future or if there is any issue using redis.
Regards
Redis is not recommended for production with Spring Cloud Stream - the binder is not fully functional and message loss is possible.

How do I use ActiveMQ in Apache Flink?

I am getting my data through ActiveMQ which I want to process in real time with Apache Flink DataStreams. There is support for many messaging services like RabbitMQ and Kafka but I can't see any support for ActiveMQ. How can I use it?
Since there is not support for ActiveMQ, I would recommend implement a custom source.
You basically have to implement the SourceFunction interface.
If you want to have exactly-once semantics, you can base your implementation on the MultipleIdsMessageAcknowledgingSourceBase class.
I would recommend you to start with implementing a SourceFunction
Found a JMS connector for Flink:
https://github.com/jkirsch/senser/blob/master/src/main/java/edu/tuberlin/senser/images/flink/io/FlinkJMSStreamSource.java

what's the difference between kafka.javaapi.* and org.apache.kafka.*?

I am a new learner of kafka.But what make me confused is that there seems to be two packages of kafka clients.
One is kafka.javaapi.* like
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
the other is org.apache.kafka.*. like
import org.apache.kafka.clients.producer.KafkaProducer<K,V>
which is shown is page http://kafka.apache.org/082/javadoc/index.html?org/apache/kafka/clients/producer
I don't know what's their differences.
Before Kafka 0.8.2, kafka.javaapi.producer.Producer was the only official Java client (producer) which is implemented with Scala.
From Kafka 0.8.2, there comes a new Java producer API, org.apache.kafka.clients.producer.KafkaProducer, which is fully implemented with Java.
Kafka 0.8.2 Documentation says
We are in the process of rewritting the JVM clients for Kafka. As of 0.8.2 Kafka includes a newly rewritten Java producer. The next release will include an equivalent Java consumer. These new clients are meant to supplant the existing Scala clients, but for compatability they will co-exist for some time. These clients are available in a seperate jar with minimal dependencies, while the old Scala clients remain packaged with the server.
If you are interested in kafka.javaapi.producer.Producer, refer to 2.1 Producer API in Kafka 0.8.1 Documentation.

Pentaho Kafka producer example

I am trying to create a transformation using Kafka producer and consumer in Pentaho Data Integration. Is there any example available in Pentaho for Apache Kafka Producer and Consumer? or can you please let me know how to create the transformation?
You need to download a plugin to enable your apache kafka. This DOC
shows you how to use Apache Kafka in PDI. Hope it helps :)