RabbitMQ Quorum Queue - Is the data written to/delivered by only one node? - rabbitmq

In a RabbitMQ Quorum Queue (using raft) cluster of say 4 nodes (N1-N4),
Can I have a consumer that can read only from N1/N2? In this case, will a message produced in N3, be delivered to a consume via N1/N2?
As per the documentation from the below post:
https://www.cloudamqp.com/blog/2019-04-03-quorum-queues-internals-a-deep-dive.html
With Raft, all reads and writes go through a leader whose job it is to
replicate the writes to its followers. When a client attempts to
read/write to a follower, it is told who the leader is and told to
send all writes to that node. The leader will only confirm the write
to the client once a quorum of nodes has confirmed they have written
the data to disk. A quorum is simply a majority of node
If this is the case, How can scaling be achieved if it's just the leader node that's gonna do all the work?

First of all, RabbitMQ clusters should have an odd number of nodes, so that a majority can always be established in the event of a network partition.
Consumers can always read from any node in a RabbitMQ cluster. If a queue master/mirror is not running on the node to which the consumer is connected, the communication will be forwarded to another node.
How can scaling be achieved if it's just the leader node that's gonna
do all the work?
"scaling" is so non-specific a word that I hesitate to answer this. But I assume you're asking what happens with multiple quorum queues. The answer is that each queue has its own leader, and these leaders are distributed around the cluster.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

Related

RabbitMQ Mirrored Queues on Multiple Clusters

Is it possible to use RabbitMQ HA using multiple(2) RabbitMQ clusters?
Here is my requirement:
We have 2 RabbitMQ clusters (each with 4 nodes). All the nodes in both the clusters will be using same Erlang cookie. So that, even though these 2 clusters are physically in separate locations, but will act as a single cluster with 8 nodes.
We are planning to use HAProxy to load balance both the clusters (8 nodes). Both publisher and consumer will be using this proxy to connect to the broker.
We would like to use mirrored queues for HA with ha-mode:exactly, ha-params:4, ha-sync-mode:automatic along with auto-heal for cluster_partition_handling.
Question:
In case of HA, is there a way we can specify to use 2 nodes from the first cluster and 2 nodes from the second cluster. As I understand, this can be done via policy ha-mode:nodes and use node names but that way it will always use the same node, can this setup be dynamic?
As both the clusters are very reliable, will it be the right approach to use auto-heal for cluster_partition_handling (in case of split brain)?
As per this "By default, queues within a RabbitMQ cluster are located on a single node (the node on which they were first declared). This is in contrast to exchanges and bindings, which can always be considered to be on all nodes.". Does this mean exchanges are mirrored by default? So when a message arrives at an exchange and that node goes down, will the message be available on the other exchange on the other node?
The RabbitMQ team monitors this mailing list and only sometimes answers questions on StackOverflow.
So that, even though these 2 clusters are physically in separate locations, but will act as a single cluster with 8 nodes.
Please do not do this. RabbitMQ clusters require reliable network connections with low latency. If your cluster crosses a WAN or availability zone your chance of having network partitions greatly increases. See this section of the docs for more information. You should use either the shovel or federation feature.
Does this mean exchanges are mirrored by default? So when a message arrives at an exchange and that node goes down, will the message be available on the other exchange on the other node?
Yes and yes.

RabbitMQ HighAvailability

I am new to RabbitMQ. I wanted to know how memory is used in case of HA.
For example, in Kafka the partition use a specific amount of memory if data is present or not in it and so do the replications .In RabbitMQ how are the queues allocated memory ? and How does HA work ?Do the mirrored queues occupy the same amout of memory each replicated node ?
Queues in RabbitMQ don't need a lot of resources per se, but messages will be kept in memory in most of the cases. When a message is sent to the queue that has mirrored queues, this message will be replicated among other nodes defined by the mirroring policy. The idea of mirrored queues is to provide high availability, so if the broker hosting the master queue crashes, a new master queue will be elected among alive mirrored queues. The switch to the new node should happen quite fast, because all messages are ready to be consumed.
Simple example:
The cluster consists of 3 nodes:
The test queue was created on the node-1.rabbitmq node and the mirroring policy was applied to replicate messages on all nodes:
Approximately 70k messages were sent to the test queue and the screenshot from the RabbitMQ management tool is shown below:
It is clear that all nodes got messages and they are kept in memory.
Memory consumption of RabbitMQ is a tricky topic and there are many factors which can affect it (type of the queue, the amount of messages in other queues, reaching the defined limits, etc.). In the official documentation it is stated:
RabbitMQ can report on its own memory use, to let you see where your system is using memory. Note that all measurements are somewhat approximate, based on values returned by the underlying Erlang virtual machine; however they should still be accurate enough to be useful.

Is RabbitMQ Clustering including scalability too?

I want to build a RabbitMQ system which is able to scale out for the sake of performance.
I've gone through the official document of RabbitMQ Clustering. However, its clustering doesn't seem to support scalability. That's because only through master queue we can publish/consume, even though the master queue is reachable from any node of a cluster. Other than the node on which a master queue resides, we can't process any publish/consume.
Why do we cluster then?
Why do we cluster then?
To ensure availability.
To enforce data replication.
To spread the load/data accross queues on different nodes. Master queues can be stored on different node and replicated with a factor < number of cluster nodes.
Other than the node on which a master queue resides, we can't process
any publish/consume.
Client can be connected on any node of the cluster. This node will transfer 'the request' to the master queue node and vice versa. As a downside it will generate extra hop.
Answer to the question in the title Is RabbitMQ Clustering including scalability too? - yes it does, this is achieved by simply adding more nodes/removing some nodes to/from the cluster. Of course you have to consider high availability - that is queue and exchange mirroring etc.
And just to make something clear regarding:
However, its clustering doesn't seem to support scalability. That's
because only through master queue we can publish/consume, even though
the master queue is reachable from any node of a cluster.
Publishing is done to exchange, queues have nothing to with publishing. A publishing client publishes only to an exchange and a routing key. It doesn't need any knowledge about the queue.

Is it necessary to use three nodes to build RabbitMQ cluster?

I have to say the official website provides very little information to understand RabbitMQ clearly.
The official website suggests using three nodes to build a cluster. What is the reason for that? I suppose it's like ZooKeeper, which needs an odd number of nodes to do a quorum and elect the master.
Also, what is the advantage of using a non-HA cluster? Improve the performance or what? If the node which a queue resides is down, then the queue is not working. So for all situation, is it necessary to set the cluster to be mirror queue and auto-sync?
Three nodes is the minimum to have a reasonable HA.
Suppose you have a queue mirrored in two nodes, if one gets down, another one will be promoted as the new slave or master.
Please read here section Automatically handling partitions and the section More about pause-minority mode
is therefore not a good idea to enable pause-minority mode on a
cluster of two nodes since in the event of any network partition or
node failure, both nodes will pause
RabbitMQ can handle the cluster in different ways, depending on where you deploy it - LAN or WAN or unstable LAN etc. And you can also use federation, shovel
what is the advantage of using a non-HA cluster? Improve the performance or what?
I'd say yes, or simply you have an environment where you don't need to have HA queues since you can have only temporary queues.
is it necessary to set the cluster to be mirror queue and auto-sync?
You can also decide for manual-sync, since when you sync the queue is blocked, and if you have lots of messages to sync, it can be a problem. For example, you can decide to sync the queues when you don't have traffic.
Here (section Unsynchronised Slaves) it is explained clearly.
Your question is a bit general, and it depends on what are you looking for.

How distributed should queues be in a RabbitMQ cluster?

Assume you have a small rabbitmq system of 3 nodes that is supposed to handle 100+ decently high volume queues in the same exchange. Given that queues only exist on the node they are created on (we're not using replicated, High Availability queues), what's the best way to create the queues? Is there any benefit to having the queues distributed among the cluster nodes, or is it better to keep them all on one node and have rmq do the routing?
It depends on your application, really.
RabbitMQ is smart about sending messages, so it'll only send a message to a node in the cluster if
a queue that holds that message resides on that node or
if a consumer has connected to that node and has requested the message.
In general, you should aim to declare queues on the nodes on which both the publishers and the consumers for that queue will connect to. In other words, you should aim to connect publishers and consumers to the node that holds the queues they use. This assumes you're trying to conserve bandwidth used overall.
If you're using clustering to improve throughput (and you probably are), and you don't care about internal bandwidth used, you should aim to connect your publishers/consumers to the nodes in a balanced way and not worry about the internal routing mechanisms.
One last thing to think about is memory and disk-space. Queues store messages in main memory, and fallback to disk if that's insufficient. So, if you declare all your queues in one place, that'll result in one node that's "over-worked" and two nodes with memory to spare.
As part of a move towards redundancy and failover in an application I'm working on, I've just finished setting up a RabbitMQ cluster behind a proxy, and have all of my publishers and consumers connect via the proxy, which round robins connections to the individual nodes as they come in from the clients. Prior to upgrading RabbitMQ to 2.7.1, this seemed to pretty evenly distribute queues to the separate nodes, though this would of course depend pretty heavily on how your proxy balances the requests and when your clients try to connect (and declare a queue)...
Having said all that, I just upgraded to RabbitMQ 2.7.1, which was pretty painless, and gave us HA queues, which is a pretty big win for our apps. At any rate, if you're interested in the set up, and think it would be of benefit to your queue problem, I'd be happy to share the setup.