Just like mentioned in title, when a queue is declared on a server amongst a group of nodes which are all in a cluster, is it physically on a single server? or physically spread over nodes and considered logically on a server?
Quote from rabbitmq docs
All data/state required for the operation of a RabbitMQ broker is
replicated across all nodes. An exception to this are message
queues, which by default reside on one node, though they are visible
and reachable from all nodes.
So unless the queues are mirrored, they are on one node (for mirroring queues see here).
Related
I was reading through the documentation of RabbitMQ on their website and came across two terminologies which seem to be doing the same thing - "Durable Queues" and "Disk Node". As per the documentation if I make a Disk Node, all data except messages, message store indices, queue indices and other node state (not sure what are the other node states).
So, if I make my node a Disk Node, do I still need to mark my queue as durable to survive broker restarts ?
Same question goes for durable exchanges as well.
Disk nodes and durable queues are two different concepts within RabbitMQ.
RabbitMQ maintains certain internal information (such as users, passwords, vhosts, ...) within specific mnesia tables. Disk nodes store these tables on disk. As the related documentation states:
This does not include messages, message store indices, queue indices and other node state.
To ensure durability/persistence of exchanges, queues or messages you need to explicitly state it when you declare/publish them.
Is it possible to use RabbitMQ HA using multiple(2) RabbitMQ clusters?
Here is my requirement:
We have 2 RabbitMQ clusters (each with 4 nodes). All the nodes in both the clusters will be using same Erlang cookie. So that, even though these 2 clusters are physically in separate locations, but will act as a single cluster with 8 nodes.
We are planning to use HAProxy to load balance both the clusters (8 nodes). Both publisher and consumer will be using this proxy to connect to the broker.
We would like to use mirrored queues for HA with ha-mode:exactly, ha-params:4, ha-sync-mode:automatic along with auto-heal for cluster_partition_handling.
Question:
In case of HA, is there a way we can specify to use 2 nodes from the first cluster and 2 nodes from the second cluster. As I understand, this can be done via policy ha-mode:nodes and use node names but that way it will always use the same node, can this setup be dynamic?
As both the clusters are very reliable, will it be the right approach to use auto-heal for cluster_partition_handling (in case of split brain)?
As per this "By default, queues within a RabbitMQ cluster are located on a single node (the node on which they were first declared). This is in contrast to exchanges and bindings, which can always be considered to be on all nodes.". Does this mean exchanges are mirrored by default? So when a message arrives at an exchange and that node goes down, will the message be available on the other exchange on the other node?
The RabbitMQ team monitors this mailing list and only sometimes answers questions on StackOverflow.
So that, even though these 2 clusters are physically in separate locations, but will act as a single cluster with 8 nodes.
Please do not do this. RabbitMQ clusters require reliable network connections with low latency. If your cluster crosses a WAN or availability zone your chance of having network partitions greatly increases. See this section of the docs for more information. You should use either the shovel or federation feature.
Does this mean exchanges are mirrored by default? So when a message arrives at an exchange and that node goes down, will the message be available on the other exchange on the other node?
Yes and yes.
I want to build a RabbitMQ system which is able to scale out for the sake of performance.
I've gone through the official document of RabbitMQ Clustering. However, its clustering doesn't seem to support scalability. That's because only through master queue we can publish/consume, even though the master queue is reachable from any node of a cluster. Other than the node on which a master queue resides, we can't process any publish/consume.
Why do we cluster then?
Why do we cluster then?
To ensure availability.
To enforce data replication.
To spread the load/data accross queues on different nodes. Master queues can be stored on different node and replicated with a factor < number of cluster nodes.
Other than the node on which a master queue resides, we can't process
any publish/consume.
Client can be connected on any node of the cluster. This node will transfer 'the request' to the master queue node and vice versa. As a downside it will generate extra hop.
Answer to the question in the title Is RabbitMQ Clustering including scalability too? - yes it does, this is achieved by simply adding more nodes/removing some nodes to/from the cluster. Of course you have to consider high availability - that is queue and exchange mirroring etc.
And just to make something clear regarding:
However, its clustering doesn't seem to support scalability. That's
because only through master queue we can publish/consume, even though
the master queue is reachable from any node of a cluster.
Publishing is done to exchange, queues have nothing to with publishing. A publishing client publishes only to an exchange and a routing key. It doesn't need any knowledge about the queue.
I have to say the official website provides very little information to understand RabbitMQ clearly.
The official website suggests using three nodes to build a cluster. What is the reason for that? I suppose it's like ZooKeeper, which needs an odd number of nodes to do a quorum and elect the master.
Also, what is the advantage of using a non-HA cluster? Improve the performance or what? If the node which a queue resides is down, then the queue is not working. So for all situation, is it necessary to set the cluster to be mirror queue and auto-sync?
Three nodes is the minimum to have a reasonable HA.
Suppose you have a queue mirrored in two nodes, if one gets down, another one will be promoted as the new slave or master.
Please read here section Automatically handling partitions and the section More about pause-minority mode
is therefore not a good idea to enable pause-minority mode on a
cluster of two nodes since in the event of any network partition or
node failure, both nodes will pause
RabbitMQ can handle the cluster in different ways, depending on where you deploy it - LAN or WAN or unstable LAN etc. And you can also use federation, shovel
what is the advantage of using a non-HA cluster? Improve the performance or what?
I'd say yes, or simply you have an environment where you don't need to have HA queues since you can have only temporary queues.
is it necessary to set the cluster to be mirror queue and auto-sync?
You can also decide for manual-sync, since when you sync the queue is blocked, and if you have lots of messages to sync, it can be a problem. For example, you can decide to sync the queues when you don't have traffic.
Here (section Unsynchronised Slaves) it is explained clearly.
Your question is a bit general, and it depends on what are you looking for.
I am new to RabbitMQ, so please excuse me for trivial questions:
1) In case of clustering in RabbitMQ, if a node fails, load shift to another node (without stopping the other nodes). Similarly, we can also add new fresh nodes to the existing cluster without stopping existing nodes in cluster. Is that correct?
2) Assume that we start with a single rabbitMQ node, and create 100 queues on it. Now producers started sending message at faster rate. To handle this load, we add more nodes and make a cluster. But queues exist on first node only. How does load balanced among nodes now? And if we need to add more queues, on which node we should add them? Or can we add them using load balancer.
Thanks In Advance
1) In case of clustering in RabbitMQ, if a node fails, load shift to another node (without stopping the other nodes). Similarly, we can also add new fresh nodes to the existing cluster without stopping existing nodes in cluster. Is that correct?
If a node on which the queue was created fails, rabbitmq will elect a new master for that queue in the cluster as long as mirroring for the queue is enabled. Clustering provides HA based on a policy that you can define.
2) Assume that we start with a single rabbitMQ node, and create 100 queues on it. Now producers started sending message at faster rate. To handle this load, we add more nodes and make a cluster. But queues exist on first node only. How does load balanced among nodes now?
The load is not balanced. The distributed cluster provides HA and not load balancing. Your requests will be redirected to the node in the cluster on which the queue resides.
And if we need to add more queues, on which node we should add them? Or can we add them using load balancer.
That depends on your use case. Some folks use a round robin and create queues on separate nodes.
In summary
For HA use mirroring in the cluster.
To balance load across nodes, use a LB to distribute across Queues.
If you'd like to load balance the queue itself take a look at Federated Queues. They allow you to fetch messages on a downstream queue from an upstream queue.
Let me try to answer your question and this is generally most of dev may encounter.
Question 1) In case of clustering in RabbitMQ, if a node fails, load shift to another node (without stopping the other nodes). Similarly, we can also add new fresh nodes to the existing cluster without stopping existing nodes in cluster. Is that correct?
Answer: absolutely correct(if rabbitMQ running on a single host) but rabbitMQ's Queue behaves differently on the cluster. Queues only live on one node in the cluster by default. But Rabbit(2.6.0) gave us a built-in active-active redundancy option for queues: mirrored queues. Declaring a mirrored queue is just like declaring a normal queue; you pass an extra argument called x-ha-policy; tells Rabbit that you want the queue to be mirrored across all nodes in the cluster. This means that if a new node is added to the cluster after the queue is declared, it’ll automatically begin hosting a slave copy of the queue.
Question 2) Assume that we start with a single rabbitMQ node, and create 100 queues on it. Now producers started sending message at faster rate. To handle this load, we add more nodes and make a cluster. But queues exist on first node only. How does load balanced among nodes now? And if we need to add more queues, on which node we should add them? Or can we add them using load balancer.
This question has multiple sub-questions.
How does load-balanced among nodes now?
Set to all, x-ha-policy tells Rabbit that you want the queue to be mirrored across all nodes in the cluster. This means that if a new node is added to the cluster after the queue is declared, it’ll automatically begin hosting a slave copy of the queue.
on which node we should add them?
answer the above.
can we add them using load balancer?
No but yes(you have to call the rabbitMQ API within LB which is not a best practice approach), Load balancer is used for resilient messaging infrastructure. Your cluster nodes are the servers behind the load balancer and your producers and consumers are the customers.