Should I connect to master node in RabbitMQ? - rabbitmq

The docs say:
Assuming all cluster members are available, a client can connect to
any node and perform any operation. Nodes will route operations to the
queue master node transparently to clients.
What does this "route" mean exactly? Does the node I connected to redirect all the traffic to another node? It means that network utilization is doubled, doesn't it?
Should I try to connect to the master node of the queue when I'm only going to make operation on this single queue only?

The question is, can I save some resources by connecting to the home
node?
Yes you can, but it is a premature optimization that is probably not worth the effort. You can only verify this via running benchmarks.
You will probably spend more time developing code to ensure your application connects to the master node, setting up your benchmark environment and running them than you would save by this optimization.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

Related

RabbitMQ,Vernemq and HiveMQ - Distribution and HA

I'm trying to pick one up for a project I'm doing.
It's utterly unclear from the documentation how does the HA work in all of them except for RabbitMQ, which states that incoming requests will always find their way to the node that originally created the queue in question. Meaning If I want to publish to QueueA which was created by Node1 but end up making the request at Node2, I will be internally routed to Node2.
I on the other hand was looking for a TRUE distributed solution where I would be able to address any node without having my request re-routed internally, and get an answer back from it.
VerneMQ provides clustering without the use of any additional components like Zookeeper, full syncronisation of the topic tree and subscriptions, self-healing netsplits. VerneMQ routes every message from any cluster node to any other node where subscribes live. VerneMQ can do queue migration in the background for offline queues.
VerneMQ has an eventually consistent synchronisation model. It broadcasts events in the cluster, but also uses anti-entropy to make sure the synchronisation happens.
Contrary to brokers based on Mnesia (Rabbit), VerneMQ has a defined model regarding CAP (consistency, availability).
Contrary to other brokers, VerneMQ offers these features in its fully Open Source version!
Disclaimer: I'm with the VerneMQ project.
Disclaimer: I work for HiveMQ.
From the HiveMQ User Guide:
An MQTT broker cluster is a distributed system that represents one logical MQTT broker to connected MQTT clients, meaning that for the MQTT client it makes zero difference whether it is connected to a single HiveMQ broker node or a multi node HiveMQ cluster.
Data sets get replicated between different node and when a node leaves or enters the cluster data gets re-synchronised to ensure the configured replica count is always upheld.
HiveMQ provides a masterless, fully elastic cluster with true high availability.
No additional software such as Zookeeper is required.

Why would you run a messaging queue (eg RabbitMQ) cluster?

Overview
A RabbitMQ broker is a logical grouping of one or several Erlang
nodes, each running the RabbitMQ application and sharing users,
virtual hosts, queues, exchanges, bindings, and runtime parameters.
Sometimes we refer to the collection of nodes as a cluster.
Why would you do this? I understand to increase durability of messages (if a node goes down, other queues still get the messages). But what about performance? How does cluster improve performance. Won't all consumers/producers connect to the master node's queue anyway? If so, aren't we still getting traffic on a single node regardless? Do we put a load balancer so traffic is directed at different nodes each time?
How does RabbitMQ cluster increase performance?
Well, right after that paragraph, the documentation states the following:
What is Replicated?
All data/state required for the operation of a RabbitMQ broker is
replicated across all nodes. An exception to this are message queues,
which by default reside on one node, though they are visible and
reachable from all nodes. To replicate queues across nodes in a
cluster, see the documentation on high availability (note that you
will need a working cluster first).
So, you would cluster to provide higher capacity in your RabbitMQ broker than a single node can provide alone. Note that clustering by itself is not a high-availability strategy.
Your assertion that message durability is increased is false, as message queues continue to reside on one broker (unless mirroring is used).
By default, contents of a queue within a RabbitMQ cluster are located on a single node (the node on which the queue was declared) [1]
Without mirroring, when that node goes down, messages on it will be lost. The cluster will put the queue onto a different node. RabbitMQ does not handle network partitions well, so this can be a bit of a problem.
"Aren't we still getting traffic on a single node regardless?" - if you only have one queue, then yes. However, a bigger question is "why would you run a message broker with only one queue?" Similarly, if you only create queues on one node, then you will still have one point of failure in the system.

How to send messages to specialised node in a cluster rabbitmq

I have a cluster of rabbitmq servers. But the machines where the servers are hosted show differences in terms of installed software. So, they have no overlapping capabilities, they are specialized workers, for example, only one node has email software installed.
I know that a queue is bound to the node on which is created. My question is, how I can set up my queues so I can send certain messages to the special endowed node, where my special software is waiting for work, bypassing rabbitmq distributing messages round-robin algorithm.
Maybe that is not a solution, am open to any working solution
You could always connect to the IP address of the specific node in the cluster, rather than connecting to some kind of a load balancer that is in front of the cluster - so specify a different IP in the client's method for opening the connection. This of course defeats the purpose of the cluster, but it seems to me that your setup does the same thing :)
By rabbitmq server do you mean actual rabbitmq server or clients/workers?
If I understand correctly, you can create a single exchange of type "topic". For each worker create an exclusive queue and bind it to the exchange with some unique routing key which in your case will be the feature of the host. When submitting a message to the exchange, use the feature as a routing key. The message will be routed to the appropriate host.

Clone RabbitMQ admin users, etc. on replacement server

We have a couple of crusty AWS hosts running a RabbitMQ implementation in a cluster. We need to upgrade the hardware, and therefore we developed a Chef cookbook to spawn replacement servers.
One thing that we would rather not recreate by hand is the admin users, the queues, etc.
What is the best method to get that stuff from the old hosts to the new ones? I believe it's everything that lives in the /var/lib/rabbitmq/mnesia directory.
Is it wise to copy the files from one host to another?
Is there a programmatic means to do this?
Can it be coded into our Chef cookbook?
You can definitely export and import configuration via command line: https://www.rabbitmq.com/management-cli.html
I'm not sure about admin user, though.
If you create new rabbitmq nodes on your new hardware, you will get all the users in that new node. This is easy to try:
run docker container with image of rabbitmq (with management plugin)
and create a user
run another container and add that node to the
cluster of the first one
kill rabbitmq on the first one, or delete
the docker container and you will see that you still have the newly
created user on the 2nd (but now master) node
I wrote docker since it's faster to create a cluster this way, but if you already have a cluster you could use it for testing if you prefer.
For the queues and exchanges, I don't want to quote almost everything found in the rabbitmq doc page for the high availability, but I will just say that you have to pay attention to the following:
exclusive queues because they are gone once the client connection is gone
queue mirroring (if you have any set up, if not it would be wise to consider it, if not even necessary)
I would do the migration gradually, waiting for the queues to get emptied and then kill of the nodes on the old hardware. It maybe doable in a big-bang fashion, but seems riskier. If you have a running system, than set up queue mirroring and try to find appropriate moment to do manual sync - but careful, this has a huge impact on the broker performance.
Additionally there is this shovel plugin (I have to point out that I did not use it or even explore it) but that may be another way to go since (quoting form the link):
In essence, a shovel is a simple pump. Each shovel:
connects to the source broker and the destination broker, consumes
messages from the queue, re-publishes each message to the destination
broker (using, by default, the original exchange name and
routing_key).

RabbitMq Clustering

I am new to RabbitMq. I am not able to understand the concept here. Please find the scenario.
I have two machines (RMQ1, RMQ2) where I have installed rabbitmq in both the machines which are running. Again I clustered RMQ2 to join RMQ1
cmd:/> rabbitmqctl join_cluster rabbit#RMQ1
If you see the status of the machines here it is as below
In RMQ1
c:/> rabbitmqctl cluster_status
Cluster status of node rabbit#RMQ1...
[{nodes,[{disc,[rabbit#RMQ1,rabbit#RMQ2]}]},
{running_nodes,[rabbit#RMQ1,rabbit#RMQ2]}]
In RMQ2
c:\> rabbitmqctl cluster_status
Cluster status of node rabbit#RMQ2 ...
[{nodes,[{disc,[rabbit#RMQ1,rabbit#RMQ2]}]},
{running_nodes,[rabbit#RMQ1,rabbit#RMQ2]}]
The in order to publish and subscribe message I am connecting to RMQ1. Now I see the whenever I sent or message to RMQ1, I see message mirrored in both RMQ1 and RMQ2. This I understand clearly that as both the nodes are in same cluster they are getting mirrored across nodes.
Let say I bring down the RMQ2, I still see message getting published to RMQ1.
But when I bring down the RMQ1, I cannot publish the message anymore. From this I understand that RMQ1 is master and RMQ2 is slave.
Now I have below questions, without changing the code :
How do I make the RMQ2 take up the job of accepting the message.
What is the meaning of Highly Available Queues.
How should be the strategy for implementing this kind scenario.
Please help
Question #2 is best answered first, since it will clear up a lot of things for you.
What is the meaning of highly available queues?
A good source of information for this is the Rabbit doc on high availability. It's very important to understand that mirroring (which is how you achieve high availability in Rabbit) and clustering are not the same thing. You need to create a cluster in order to mirror, but mirroring doesn't happen automatically just because you create a cluster.
When you cluster Rabbit, the nodes in the cluster share exchanges, bindings, permissions, and other resources. This allows you to manage the cluster as a single logical broker and utilize it for scenarios such as load-balancing. However, even though queues in a cluster are accessible from any machine in the cluster, each queue and its messages are still actually located only on the single node where the queue was declared.
This is why, in your case, bringing down RMQ1 will make the queues and messages unavailable. If that's the node you always connect to, then that's where those queues reside. They simply do not exist on RMQ2.
In addition, even if there are queues and messages on RMQ2, you will not be able to access them unless you specifically connect to RMQ2 after you detect that your connection to RMQ1 has been lost. Rabbit will not automatically connect you to some surviving node in a cluster.
By the way, if you look at a cluster in the RabbitMQ management console, what you see might make you think that the messages and queues are replicated. They are not. You are looking at the cluster in the management console. So regardless of which node you connect to in the console, you will see a cluster-wide view.
So with this background now you know the answer to your other two questions:
What should be the strategy for implementing high availability? / how to make RMQ2 accept messages?
From your description, you are looking for the failover that high availability is intended to provide. You need to enable this on your cluster. This is done through a policy, and there are various ways to do it, but the easiest way is in the management console on the Admin tab in the Policies section:
The previously cited doc has more detail on what it means to configure high availability in Rabbit.
What this will give you is mirroring of queues and messages across your cluster. That way, if RMQ1 fails then RMQ2 will still have your queues and messages since they are mirrored across both nodes.
An important note is that Rabbit will not automatically detect a loss of connection to RMQ1 and connect you to RMQ2. Your client needs to do this. I see you tagged your question with EasyNetQ. EasyNetQ provides this "failover connect" type of feature for you. You just need to supply both node hosts in the connection string. The EasyNetQ doc on clustering has details. Note that EasyNetQ even lets you inject a simple load balancing strategy in this case as well.