Pros and Cons of Kafka vs Rabbit MQ - rabbitmq

Kafka and RabbitMQ are well known message brokers. I want to build a microservice with Spring Boot and it seems that Spring Cloud provides out of the box solutions for them as the defacto choices. I know a bit of the trayectory of RabbitMQ which has lot of support. Kafka belongs to Apache so it should be good. So whats the main goal difference between RabbitMQ and Kafka? Take in consideration this will be used with Spring Cloud. Please share your experiences and criteria. Thanks in advance.

I certainly wouldn't consider Kafka as lightweight. Kafka relies on ZooKeeper so you'd need to throw ZooKeeper to your stack as well.
Kafka is pubsub but you could re-read messages. If you need to process large volumes of data, Kafka performs much better and its synergy with other big-data tools is a lot better. It specifically targets big data.

Three application level difference is:
Kafka supports re-read of consumed messages while rabbitMQ
not.
Kafka supports ordering of messages in partition while rabbitMQ
supports it with some constraint such as one exchange routing
to the queue,one queue, one consumer to queue.
Kafka is for fast in publishing data to partition than rabbitMQ.

Kafka is more than just a pub/sub messaging platform. It also includes APIs for data integration (Kafka Connect) and stream processing (Kafka Streams). These higher level APIs make developers more productive versus using only lower level pub/sub messaging APIs.
Also Kafka has just added Exactly Once Semantics in June 2017 which is another differentiator.

To start with Kafka does more than what RabbitMQ does. Message broker is just one subset of Kafka but Kafka can also act as Message storage and Stream processing. Comparing just the Message broker part, again Kafka is more robust than RabbitMQ as it supports Replication (for availability) and partitioning (for scalability), Replay of messages (if needed to reprocess) and it is Pull based. RabbitMQ can be scalable by using multiple consumers for a given queue but again it is push based and you lose ordering among multiple consumers.
It all depends on the use case and your question doesn't provide the use case and performance requirements to suggest one over other.

I found a nice answer in this youtube video Apache Kafka Explained (Comprehensive Overview).
It basically says that the difference between Kafka and standard JMS systems like RabbitMQ or ActiveMQ it that
Kafka consumers pull the messages from the brokers which allows for buffering messages for as long as the retention period holds. While in most JMS systems messages are pushed to the consumers which make strategies like back-pressure harder to achieve.
Kafka also eases the replacement of events by storing them on disk, so they can be replaced at any time.
Kafka guarantees the ordering of message within a partition.
Kafka overall provides an easy way for building scalable and fault-tolerant systems.
Kafka requires is more complex and harder to understand than JMS systems.

Related

Using both, request-reply and pub-sub for microservices communication

We are planning to introduce both, pub-sub and request-reply communication models to our micriservices architecture. Both communication models are needed.
One of the solutions could be using RabbitMQ as it can provide both models and provide HA, clusterring ang other interesting features.
RabbitMQ request-reply model requires using queues, both for input and for output messages. Only one service can read from the input queue and this increases coupling.
Is there any other recommended solution for using both request-reply and pub-sub communication models in the same system?
Does service mesh could be a better option?
It shall be suppoered by node.js, python and. Net CORE.
Thank you for your help
There multiple pub-sub and request-reply support HA communication models :
1. Kafka
Kafka relies heavily on the filesystem for storing and caching messages. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel’s pagecache.
Kafka is designed with failure in mind. At some point in time, web communications or storage resources fail. When a broker goes offline, one of the replicas becomes the new leader for the partition. When the broker comes back online, it has no leader partitions. Kafka keeps track of which machine is configured to be the leader. Once the original broker is back up and in a good state, Kafka restores the information it missed in the interim and makes it the partition leader once more.
See :
https://kafka.apache.org/
https://docs.cloudera.com/documentation/kafka/latest/topics/kafka_ha.html
https://docs.confluent.io/4.1.2/installation/docker/docs/tutorials/clustered-deployment.html
2. Redis
Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.
See :
https://redis.io/
https://redislabs.com/redis-enterprise/technology/highly-available-redis/
https://redis.io/topics/sentinel
3. ZeroMQ
ZeroMQ (also known as ØMQ, 0MQ, or zmq) looks like an embeddable networking library but acts like a concurrency framework. It gives you sockets that carry atomic messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fan-out, pub-sub, task distribution, and request-reply. It's fast enough to be the fabric for clustered products. Its asynchronous I/O model gives you scalable multicore applications, built as asynchronous message-processing tasks. It has a score of language APIs and runs on most operating systems.
See :
https://zeromq.org/
http://zguide.zeromq.org/pdf-c:chapter3
http://zguide.zeromq.org/pdf-c:chapter4
4. RabbitMQ
RabbitMQ is lightweight and easy to deploy on premises and in the cloud. It supports multiple messaging protocols. RabbitMQ can be deployed in distributed and federated configurations to meet high-scale, high-availability requirements.
My preference would be to have REST api for request-reply pattern. This is specially applicable for internal microservices where you are in control of communication mechanism. I don't understand your comment about why they are not scalable if you defined them as properly and you can scale out and down the number of instances for the services based on demand. Be it Kafka, RabbitMQ, or any other broker, I don't think they are developed for request-reply as primary use case. And don't forget that whatever broker you are using, if it is A->B->C in REST, it will be A->broker->B->broker->C->broker->A and broker need to do it house keeping.
Then for pub-sub, I would use Kafka as it is unified model which can support pub-sub as well as point to point.
But if you still wanted to use a broker for request-reply, I would check Kafka as it can scale massively via partitions and lot of near real streaming applications are built using that. So It could be near the minimal latency requirement of request-reply pattern. But then I would want a framework on top of that to associate request and replies. So I would consider using Spring Kafka to achieve that

What's the difference between RabbitMQ and kafka?

Which will fair better under different scenarios?
I know that RabbitMQ is based on AMQP protocol, and has visualization for the developer.
RabbitMQ, as you noted, is an implementation of AMQP. AMQP is a messaging protocol which originated in the financial industry. Its core metaphors are Messages, Exchanges, and Queues.
Kafka was designed as a log-structured distributed datastore. It has features that make it suitable for use as part of a messaging system, but it also can accommodate other use cases, like stream processing. It originated at LinkedIn and its core metaphors are Messages, Topics, and Partitions.
Subjectively, I find RabbitMQ more pleasant to work with: its web-based management tool is nice; there is little friction in creating, deleting, and configuring queues and exchanges. Library support is good in many languages. As in its default configurations Rabbit only stores messages in memory, latencies are low.
Kafka is younger, the tooling feels more clunky, and it has had relatively poor support in non-JVM languages, though this is getting better. On the other hand, it has stronger guarantees in the face of network partitions and broker loss, and since it is designed to move messages to disk as soon as possible, it can accommodate a larger data set on typical deployments. (Rabbit can page messages to disk but sometimes it doesn't perform well).
In either, you can design for direct (one:one), fanout (one:many), and pub-sub (many:many) communication patterns.
If I were building a system that needed to buffer massive amounts of incoming data with strong durability guarantees, I'd choose Kafka for sure. If I was working with a JVM language or needed to do some stream processing over the data, that would only reinforce the choice.
If, on the other hand, I had a use case in which I valued latency over throughput and could handle loss of transient messages, I'd choose Rabbit.
Kafka:
Message will be always there. You can manage this by specifying a
message retention policy.
It is distributed event streaming platform.
We can use it as a log
Kafka streaming, you can change and process the message automatically.
We can not set message priority
Retain order only inside a partition. In a partition, Kafka guarantees that the whole batch of messages either fail or pass.
Not many mature platforms like RMQ (Scala and Java)
RabbitMQ:
RabbitMQ is a queuing system so messages get deleted just after consume.
It is distributed, by a message broker.
It can not be used like this
Streaming is not supported in RMQ
We can set the priority of the message and can consume on the basis of the same.
Does not support guarantee automaticity even in relation to transactions involving a single queue.
Mature platform ( Have the best compatibility with all languages)

What are advantages of Kafka over RabbitMQ?

Looking for pros and cons of using Apache Kafka over RabbitMQ. Also to decide if I should move my existing infrastructure over to Kafka.
very different, some you might consider to begin with -
a) rabbit mq is queue FIFO.
kafka is a log, your writes are appended to the tail, but you read from where you want.
b) Kafka is truly distributed - data is sharded , replicated, durability guarantees can be tuned, availability can be tuned.
rabbitmq has limited support for the above.
c) Kafka also comes OOB with consumer frameworks which allow distributed processing of the log reliably. Kafka streams also has stream processing semantics built into it.
rabbitmq the consumer is just FIFO based, reading from the HEAD and processing 1 by 1.
d) Kafka is extensible in the consumer model, allows you to build exactly once, atmost once , atleast once.

Need advice on suitable message queue for Storm spout

I'm developing a prototype Lambda system and my data is streaming in via Flume to HDFS. I also need to get the data into Storm. Flume is a push system and Storm is more pull so I don't believe it's wise to try to connect a spout to Flume, but rather I think there should be a message queue between the two. Again this is a prototype, so I'm looking for best practices, not perfection. I'm thinking of putting an AMQP compliant queue as a Flume sink and then pulling the messages from a spout.
Is this a good approach? If so, I want to use a message queue that has relatively robust support in both the Flume world (as a sink) and the Storm world (as a spout). If I go AMQP then I assume that gives me the option to use whatever AMQP-compliant queue I want to use, correct? Thanks.
If your going to use AMQP, I'd recommend sticking to the finalized 1.0 version of the AMQP spec. Otherwise, your going to feel some pain when you try to upgrade to it from previous versions.
Your approach makes a lot of sense, but, for us the AMQP compliant issue looked a little less important. I will try to explain why.
We are using Kafka to get data into storm. The main reason is mainly around performance and usability. AMQP complaint queues do not seem to be designed for holding information for a considerable time, while with Kafka this is just a definition. This allows us to keep messages for a long time and allow us to "playback" those easily (as the message we wish to consume is always controlled by the consumer we can consume the same messages again and again without a need to set up an entire system for that purpose). Also, Kafka performance is incomparable to anything that I have seen.
Storm has a very useful KafkaSpout, in which the main things to pay attention to are:
Error reporting - there is some improvement to be done there. Messages are not as clear as one would have hoped.
It depends on zookeeper (which is already there if you have storm) and a path is required to be manually created for it.
According to the storm version, pay attention to the Kafka version in use. It is documented, but, can really easily be missed and cause unclear problems.
You can have the data streamed to a broker topic first. Then flume and storm spout can both consume from that topic. Flume has a jms source that makes it easy to consume from the message broker. And a storm jms spout to get the messages into storm.

Using Redis for Pub Sub . Advantages / Disadvantages over RabbitMQ

Our requirement is very simple. Send messages to users subscribed to a topic. We need our messaging system to be able to support millions of topics and maybe millions of subscribers to any given topic in near real time. Our application is built with Java.
We almost decided on RabbitMQ because of the community support, documentation and features (possibly it will deliver everything we need). But I am very inclined towards using Redis because it looks promising and lightweight. Honestly I have limited understanding about Redis as a messaging system, but looking at a growing number of companies using it as a queuing(with Ruby Resque), I want to know if there is an offering like Resque in Java and what are the advantages or disadvantages of using Redis as a MQ over RabbitMQ.
RabbitMQ supports clustering and now has active/active High Availably queues allowing for greater scale out and availability options then possible with Redis out of the box.
RabbitMQ gives you a greater amount of control on everything from the users/permissions of exchanges/queues, to the durability of a specific exchange or queue (disk vs memory), to the guarantees of delivery (transactions, Publisher Confirms).
It also allows for more flexibility and options on your topologies (fanout, topic, direct) and routing to multiple queues, RPC with private queues and reply-to, etc.