Let us consider the scenario below.
There are 3 RabbitMQ brokers(B1,B2,B3) deployed in a clustered model. There is an exchange E with bindings which is replicated to all the 3 brokers. There is a producer P and 3 consumers C1,C2,C3. I have the following questions
Lets say a producer connects to broker B1 and creates a Queue Q which is mirrored to B2. Now when a consumer connects to Broker B3, how does it get the messages in the queue?
From my understanding, the exchange and binding information is maintained in memory in each broker. If the exchange is persistent, in order to recover from broker crashes, is the exchange and binding information also persisted in the disk in all brokers?
If the entire queue is maintained in memory in all the mirrored brokers, it consumes a lot of memory in the broker. In order to support potentially large number of queues each holding millions of messages in each broker, is it not a constraint for scalability?
Each mirrored queue has a master node. The master node for that queue is always used for consuming. So when a consumer connects to a node which is does not have the queue storage (or is a slave node), the consumer will actually end up consuming from the master node.
Yes, assuming the node is a disc node and not a RAM node. I'm not 100% sure about the binding, but my guess is yes. Anyway, it's highly recommended to always declare all queues, exchanges etc that your client needs! (do this each time client starts or something)
Yes, that's the point of mirroring: add redundancy in case something goes wrong. It does not increase performance (rather the opposite!). But in general, queues with millions of messages is not exactly a good situation as queues should, on average, be empty
Related
We have currently using a service bus in Azure and for various reasons, we are switching to RabbitMQ.
Under heavy load, and when specific tasks on backend are having problem, one of our queues can have up to 1 million messages waiting to be processed.
RabbitMQ can have a maximum of 50 000 messages per queue.
The question is how can we design the rabbitMQ infrastructure to continue to work when messages are temporarily accumulating?
Note: we want to host our RabbitMQ server in a docker image inside a kubernetes cluster.
we imagine an exchange that would load balance mesages between queues in nodes behind.
But what is unclear to us is how to dynamically add new queues on demand if we detect that queues are getting full.
RabbitMQ can have a maximum of 50 000 messages per queue.
There is no this kind of limit.
RabbitMQ can handle more messages using quorum or classic queues with lazy.
With stream queues RabbitMQ can handle Millions of messages per second.
we imagine an exchange that would load balance messages between queues in nodes behind.
you can do that using different bindings.
kubernetes cluster.
I would suggest to use the k8s Operator
But what is unclear to us is how to dynamically add new queues on demand if we detect that queues are getting full.
There is no concept of FULL in RabbitMQ. There are limits that you can put using max-length or TTL.
A RabbitMQ queue will never be "full" (no such limitation exists in the software). A queue's maximum length rather depends on:
Queue settings (e.g max-length/max-length-bytes)
Message expiration settings such as x-message-ttl
Underlying hardware & cluster setup (available RAM and disk space).
Unless you are using Streams (new feature in v 3.9) you should always try to keep your queues short (if possible). The entire idea of a Message Queue (in it's classical sense) is that a message should be passed along as soon as possible.
Therefore, if you find yourself with long queues you should rather try to match the load of your producers by adding more consumers.
I am new to RabbitMQ. I wanted to know how memory is used in case of HA.
For example, in Kafka the partition use a specific amount of memory if data is present or not in it and so do the replications .In RabbitMQ how are the queues allocated memory ? and How does HA work ?Do the mirrored queues occupy the same amout of memory each replicated node ?
Queues in RabbitMQ don't need a lot of resources per se, but messages will be kept in memory in most of the cases. When a message is sent to the queue that has mirrored queues, this message will be replicated among other nodes defined by the mirroring policy. The idea of mirrored queues is to provide high availability, so if the broker hosting the master queue crashes, a new master queue will be elected among alive mirrored queues. The switch to the new node should happen quite fast, because all messages are ready to be consumed.
Simple example:
The cluster consists of 3 nodes:
The test queue was created on the node-1.rabbitmq node and the mirroring policy was applied to replicate messages on all nodes:
Approximately 70k messages were sent to the test queue and the screenshot from the RabbitMQ management tool is shown below:
It is clear that all nodes got messages and they are kept in memory.
Memory consumption of RabbitMQ is a tricky topic and there are many factors which can affect it (type of the queue, the amount of messages in other queues, reaching the defined limits, etc.). In the official documentation it is stated:
RabbitMQ can report on its own memory use, to let you see where your system is using memory. Note that all measurements are somewhat approximate, based on values returned by the underlying Erlang virtual machine; however they should still be accurate enough to be useful.
I want to build a RabbitMQ system which is able to scale out for the sake of performance.
I've gone through the official document of RabbitMQ Clustering. However, its clustering doesn't seem to support scalability. That's because only through master queue we can publish/consume, even though the master queue is reachable from any node of a cluster. Other than the node on which a master queue resides, we can't process any publish/consume.
Why do we cluster then?
Why do we cluster then?
To ensure availability.
To enforce data replication.
To spread the load/data accross queues on different nodes. Master queues can be stored on different node and replicated with a factor < number of cluster nodes.
Other than the node on which a master queue resides, we can't process
any publish/consume.
Client can be connected on any node of the cluster. This node will transfer 'the request' to the master queue node and vice versa. As a downside it will generate extra hop.
Answer to the question in the title Is RabbitMQ Clustering including scalability too? - yes it does, this is achieved by simply adding more nodes/removing some nodes to/from the cluster. Of course you have to consider high availability - that is queue and exchange mirroring etc.
And just to make something clear regarding:
However, its clustering doesn't seem to support scalability. That's
because only through master queue we can publish/consume, even though
the master queue is reachable from any node of a cluster.
Publishing is done to exchange, queues have nothing to with publishing. A publishing client publishes only to an exchange and a routing key. It doesn't need any knowledge about the queue.
I have one rabbit mq Publisher who is publishing on a Direct exchange. There are multiple rabbit mq consumers bound to the Direct exchange with different routing keys.
Few of these consumers might take more time to process the message.
My question is does one slow consumer affect the performance of other consumers even though they are bound on different routing keys ?
One slow consumer will have no affect on other consumers. Each consumer is independent and can work as fast or as slow as necessary for your application.
It will affect other consumers in the terrible case that said consumer's queue start backing up badly up to the point where you hit the server memory watermark. If that happens tho, you need to review what's going on in your system for such situation to arise.
Assume you have a small rabbitmq system of 3 nodes that is supposed to handle 100+ decently high volume queues in the same exchange. Given that queues only exist on the node they are created on (we're not using replicated, High Availability queues), what's the best way to create the queues? Is there any benefit to having the queues distributed among the cluster nodes, or is it better to keep them all on one node and have rmq do the routing?
It depends on your application, really.
RabbitMQ is smart about sending messages, so it'll only send a message to a node in the cluster if
a queue that holds that message resides on that node or
if a consumer has connected to that node and has requested the message.
In general, you should aim to declare queues on the nodes on which both the publishers and the consumers for that queue will connect to. In other words, you should aim to connect publishers and consumers to the node that holds the queues they use. This assumes you're trying to conserve bandwidth used overall.
If you're using clustering to improve throughput (and you probably are), and you don't care about internal bandwidth used, you should aim to connect your publishers/consumers to the nodes in a balanced way and not worry about the internal routing mechanisms.
One last thing to think about is memory and disk-space. Queues store messages in main memory, and fallback to disk if that's insufficient. So, if you declare all your queues in one place, that'll result in one node that's "over-worked" and two nodes with memory to spare.
As part of a move towards redundancy and failover in an application I'm working on, I've just finished setting up a RabbitMQ cluster behind a proxy, and have all of my publishers and consumers connect via the proxy, which round robins connections to the individual nodes as they come in from the clients. Prior to upgrading RabbitMQ to 2.7.1, this seemed to pretty evenly distribute queues to the separate nodes, though this would of course depend pretty heavily on how your proxy balances the requests and when your clients try to connect (and declare a queue)...
Having said all that, I just upgraded to RabbitMQ 2.7.1, which was pretty painless, and gave us HA queues, which is a pretty big win for our apps. At any rate, if you're interested in the set up, and think it would be of benefit to your queue problem, I'd be happy to share the setup.