Using both, request-reply and pub-sub for microservices communication - rabbitmq

We are planning to introduce both, pub-sub and request-reply communication models to our micriservices architecture. Both communication models are needed.
One of the solutions could be using RabbitMQ as it can provide both models and provide HA, clusterring ang other interesting features.
RabbitMQ request-reply model requires using queues, both for input and for output messages. Only one service can read from the input queue and this increases coupling.
Is there any other recommended solution for using both request-reply and pub-sub communication models in the same system?
Does service mesh could be a better option?
It shall be suppoered by node.js, python and. Net CORE.
Thank you for your help

There multiple pub-sub and request-reply support HA communication models :
1. Kafka
Kafka relies heavily on the filesystem for storing and caching messages. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel’s pagecache.
Kafka is designed with failure in mind. At some point in time, web communications or storage resources fail. When a broker goes offline, one of the replicas becomes the new leader for the partition. When the broker comes back online, it has no leader partitions. Kafka keeps track of which machine is configured to be the leader. Once the original broker is back up and in a good state, Kafka restores the information it missed in the interim and makes it the partition leader once more.
See :
https://kafka.apache.org/
https://docs.cloudera.com/documentation/kafka/latest/topics/kafka_ha.html
https://docs.confluent.io/4.1.2/installation/docker/docs/tutorials/clustered-deployment.html
2. Redis
Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.
See :
https://redis.io/
https://redislabs.com/redis-enterprise/technology/highly-available-redis/
https://redis.io/topics/sentinel
3. ZeroMQ
ZeroMQ (also known as ØMQ, 0MQ, or zmq) looks like an embeddable networking library but acts like a concurrency framework. It gives you sockets that carry atomic messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fan-out, pub-sub, task distribution, and request-reply. It's fast enough to be the fabric for clustered products. Its asynchronous I/O model gives you scalable multicore applications, built as asynchronous message-processing tasks. It has a score of language APIs and runs on most operating systems.
See :
https://zeromq.org/
http://zguide.zeromq.org/pdf-c:chapter3
http://zguide.zeromq.org/pdf-c:chapter4
4. RabbitMQ
RabbitMQ is lightweight and easy to deploy on premises and in the cloud. It supports multiple messaging protocols. RabbitMQ can be deployed in distributed and federated configurations to meet high-scale, high-availability requirements.

My preference would be to have REST api for request-reply pattern. This is specially applicable for internal microservices where you are in control of communication mechanism. I don't understand your comment about why they are not scalable if you defined them as properly and you can scale out and down the number of instances for the services based on demand. Be it Kafka, RabbitMQ, or any other broker, I don't think they are developed for request-reply as primary use case. And don't forget that whatever broker you are using, if it is A->B->C in REST, it will be A->broker->B->broker->C->broker->A and broker need to do it house keeping.
Then for pub-sub, I would use Kafka as it is unified model which can support pub-sub as well as point to point.
But if you still wanted to use a broker for request-reply, I would check Kafka as it can scale massively via partitions and lot of near real streaming applications are built using that. So It could be near the minimal latency requirement of request-reply pattern. But then I would want a framework on top of that to associate request and replies. So I would consider using Spring Kafka to achieve that

Related

ActiveMQ datastore for cluster setup

We have been using ActiveMQ version 5.16.0 broker with single instances in production. Now we are planning to use cluster of AMQ brokers for HA and load distribution with consistency in message data. Currently we are using only one queue
HA can be achieved using failover but do we need to use the same datastore or it can be separated? If I use different instances for AMQ brokers then how to setup a common datastore.
Please guide me how to setup datastore for HA and load distribution
Multiple ActiveMQ servers clustered together can provide HA in a couple ways:
Scale message flow by using compute resources across multiple broker nodes
Maintain message flow during single node planned or unplanned outage of a broker node
Share data store in the event of ActiveMQ process failure.
Network of brokers solve #1 and #2. A standard 3-node cluster will give you excellent performance and ability to scale the number of producers and consumers, along with splitting the full flow across 3-nodes to provide increased capacity.
Solving for #3 is complicated-- in all messaging products. Brokers are always working to be completely empty-- so clustering the data store of a single-broker becomes an anti-pattern of sorts. Many times, relying on RAID disk with a single broker node will provide higher reliability than adding NFSv4, GFSv2, or JDBC and using shared-store.
That being said, if you must use a shared store-- follow best practices and use GFSv2 or NFSv4. JDBC is much slower and requires significant DB maintenance to keep running efficiently.
Note: [#Kevin Boone]'s note about CIFS/SMB is incorrect and CIFS/SMB should not be used. Otherwise, his responses are solid.
You can configure ActiveMQ so that instances share a message store, or so they have separate message stores. If they share a message store, then (essentially) the brokers will automatically form a master-slave configuration, such that only one broker (at a time) will accept connections from clients, and only one broker will update the store. Clients need to identify both brokers in their connection URIs, and will connect to whichever broker happens to be master.
With a shared message store like this, locks in the message store coordinate the master-slave assignment, which makes the choice of message store critical. Stores can be shared filesystems, or shared databases. Only a few shared filesystem implementations work properly -- anything based on NFS 4.x should work. CIFS/SMB stores can work, but there's so much variation between providers that it's hard to be sure. NFS v3 doesn't work, however well-implemented, because the locking semantics are inappropriate. In any case, the store needs to be robust, or replicated, or both, because the whole broker cluster depends on it. No store, no brokers.
In my experience, it's easier to get good throughput from a shared file store than a shared database although, of course, there are many factors to consider. Poor network connectivity will make it hard to get good throughput with any kind of shared store (or any kind of cluster, for that matter).
When using individual message stores, it's typical to put the brokers into some kind of mesh, with 'network connectors' to pass messages from one broker to another. Both brokers will accept connections from clients (there is no master), and the network connections will deal with the situation where messages are sent to one broker, but need to be consumed from another.
Clients' don't necessarily need to specify all brokers in their connection URIs, but generally will, in case one of the brokers is down.
A mesh is generally easier to set up, and (broadly speaking) can handle more client load, than a master-slave with shared filestore. However, (a) losing a broker amounts to losing any messages that were associated with it (until the broker can be restored) and (b) the mesh interferes with messaging patterns like message grouping and exclusive consumers.
There's really no hard-and-fast rule to determine which configuration to use. Many installers who already have some sort of shared store infrastructure (a decent relational database, or a clustered NFS, for example) will tend to want to use it. The rise in cloud deployments has had the effect that mesh operation with no shared store has become (I think) a lot more popular, because it's so symmetric.
There's more -- a lot more -- that could be said here. As a broad question, I suspect the OP is a bit out-of-scope for SO. You'll probably get more traction if you break your question up into smaller, more focused, parts.

What's the difference between RabbitMQ and kafka?

Which will fair better under different scenarios?
I know that RabbitMQ is based on AMQP protocol, and has visualization for the developer.
RabbitMQ, as you noted, is an implementation of AMQP. AMQP is a messaging protocol which originated in the financial industry. Its core metaphors are Messages, Exchanges, and Queues.
Kafka was designed as a log-structured distributed datastore. It has features that make it suitable for use as part of a messaging system, but it also can accommodate other use cases, like stream processing. It originated at LinkedIn and its core metaphors are Messages, Topics, and Partitions.
Subjectively, I find RabbitMQ more pleasant to work with: its web-based management tool is nice; there is little friction in creating, deleting, and configuring queues and exchanges. Library support is good in many languages. As in its default configurations Rabbit only stores messages in memory, latencies are low.
Kafka is younger, the tooling feels more clunky, and it has had relatively poor support in non-JVM languages, though this is getting better. On the other hand, it has stronger guarantees in the face of network partitions and broker loss, and since it is designed to move messages to disk as soon as possible, it can accommodate a larger data set on typical deployments. (Rabbit can page messages to disk but sometimes it doesn't perform well).
In either, you can design for direct (one:one), fanout (one:many), and pub-sub (many:many) communication patterns.
If I were building a system that needed to buffer massive amounts of incoming data with strong durability guarantees, I'd choose Kafka for sure. If I was working with a JVM language or needed to do some stream processing over the data, that would only reinforce the choice.
If, on the other hand, I had a use case in which I valued latency over throughput and could handle loss of transient messages, I'd choose Rabbit.
Kafka:
Message will be always there. You can manage this by specifying a
message retention policy.
It is distributed event streaming platform.
We can use it as a log
Kafka streaming, you can change and process the message automatically.
We can not set message priority
Retain order only inside a partition. In a partition, Kafka guarantees that the whole batch of messages either fail or pass.
Not many mature platforms like RMQ (Scala and Java)
RabbitMQ:
RabbitMQ is a queuing system so messages get deleted just after consume.
It is distributed, by a message broker.
It can not be used like this
Streaming is not supported in RMQ
We can set the priority of the message and can consume on the basis of the same.
Does not support guarantee automaticity even in relation to transactions involving a single queue.
Mature platform ( Have the best compatibility with all languages)

Pros and Cons of Kafka vs Rabbit MQ

Kafka and RabbitMQ are well known message brokers. I want to build a microservice with Spring Boot and it seems that Spring Cloud provides out of the box solutions for them as the defacto choices. I know a bit of the trayectory of RabbitMQ which has lot of support. Kafka belongs to Apache so it should be good. So whats the main goal difference between RabbitMQ and Kafka? Take in consideration this will be used with Spring Cloud. Please share your experiences and criteria. Thanks in advance.
I certainly wouldn't consider Kafka as lightweight. Kafka relies on ZooKeeper so you'd need to throw ZooKeeper to your stack as well.
Kafka is pubsub but you could re-read messages. If you need to process large volumes of data, Kafka performs much better and its synergy with other big-data tools is a lot better. It specifically targets big data.
Three application level difference is:
Kafka supports re-read of consumed messages while rabbitMQ
not.
Kafka supports ordering of messages in partition while rabbitMQ
supports it with some constraint such as one exchange routing
to the queue,one queue, one consumer to queue.
Kafka is for fast in publishing data to partition than rabbitMQ.
Kafka is more than just a pub/sub messaging platform. It also includes APIs for data integration (Kafka Connect) and stream processing (Kafka Streams). These higher level APIs make developers more productive versus using only lower level pub/sub messaging APIs.
Also Kafka has just added Exactly Once Semantics in June 2017 which is another differentiator.
To start with Kafka does more than what RabbitMQ does. Message broker is just one subset of Kafka but Kafka can also act as Message storage and Stream processing. Comparing just the Message broker part, again Kafka is more robust than RabbitMQ as it supports Replication (for availability) and partitioning (for scalability), Replay of messages (if needed to reprocess) and it is Pull based. RabbitMQ can be scalable by using multiple consumers for a given queue but again it is push based and you lose ordering among multiple consumers.
It all depends on the use case and your question doesn't provide the use case and performance requirements to suggest one over other.
I found a nice answer in this youtube video Apache Kafka Explained (Comprehensive Overview).
It basically says that the difference between Kafka and standard JMS systems like RabbitMQ or ActiveMQ it that
Kafka consumers pull the messages from the brokers which allows for buffering messages for as long as the retention period holds. While in most JMS systems messages are pushed to the consumers which make strategies like back-pressure harder to achieve.
Kafka also eases the replacement of events by storing them on disk, so they can be replaced at any time.
Kafka guarantees the ordering of message within a partition.
Kafka overall provides an easy way for building scalable and fault-tolerant systems.
Kafka requires is more complex and harder to understand than JMS systems.

Redis PUB/SUB and high availability

Currently I'm working on a distributed test execution and reporting system. I'm planning to use Redis PUB/SUB as a message queue and message distribution system.
I'm new to Redis, so I'm trying to read as many docs as I can and play around with it. One of the most important topics is high availability. As I said, I'm not an expert, but I'm aware of the possible options - using Sentinel, replication, clustering, etc.
What's not clear for me is how the Pub/Sub feature and the HA options are related each other. What's the best practice to build a reliable messaging system with Redis? By reliable I mean if my Redis message broker is down there should be some kind of a backup node (a slave?) that should be able to take over this role.
Is there a purely server-side solution? Or do I need to create a smart wrapper around the Redis client to handle this? Will a Sentinel-driven setup help me?
Doing pub sub in Redis with failover means thinking about additional factors in the client side. A key piece to understand is that subscriptions are per-connection. If you are subscribed to a channel on a node and it fails, you will need to handle reconnect and resubscribe. Because subscriptions are done at the connection level it is not something which can be replicated.
Regarding the details as to how it works and what you can expect to see, along with ways around it see a post I made earlier this year at https://objectrocket.com/blog/how-to/reliable-pubsub-and-blocking-commands-during-redis-failovers
You can lower the risk surface by subscribing to slaves and publishing to the master, but you would then need to have non-promotable slaves to subscribe to and still need to handle losing a slave - there is just as much chance to lose a given slave as there is a master.
IMO, PUB/SUB is not a good choice, may be disque (comes from antirez, author of the Redis) fits better:
Disque, an in-memory, distributed job queue

Using Redis for Pub Sub . Advantages / Disadvantages over RabbitMQ

Our requirement is very simple. Send messages to users subscribed to a topic. We need our messaging system to be able to support millions of topics and maybe millions of subscribers to any given topic in near real time. Our application is built with Java.
We almost decided on RabbitMQ because of the community support, documentation and features (possibly it will deliver everything we need). But I am very inclined towards using Redis because it looks promising and lightweight. Honestly I have limited understanding about Redis as a messaging system, but looking at a growing number of companies using it as a queuing(with Ruby Resque), I want to know if there is an offering like Resque in Java and what are the advantages or disadvantages of using Redis as a MQ over RabbitMQ.
RabbitMQ supports clustering and now has active/active High Availably queues allowing for greater scale out and availability options then possible with Redis out of the box.
RabbitMQ gives you a greater amount of control on everything from the users/permissions of exchanges/queues, to the durability of a specific exchange or queue (disk vs memory), to the guarantees of delivery (transactions, Publisher Confirms).
It also allows for more flexibility and options on your topologies (fanout, topic, direct) and routing to multiple queues, RPC with private queues and reply-to, etc.