RabbitMQ HighAvailability - rabbitmq

I am new to RabbitMQ. I wanted to know how memory is used in case of HA.
For example, in Kafka the partition use a specific amount of memory if data is present or not in it and so do the replications .In RabbitMQ how are the queues allocated memory ? and How does HA work ?Do the mirrored queues occupy the same amout of memory each replicated node ?

Queues in RabbitMQ don't need a lot of resources per se, but messages will be kept in memory in most of the cases. When a message is sent to the queue that has mirrored queues, this message will be replicated among other nodes defined by the mirroring policy. The idea of mirrored queues is to provide high availability, so if the broker hosting the master queue crashes, a new master queue will be elected among alive mirrored queues. The switch to the new node should happen quite fast, because all messages are ready to be consumed.
Simple example:
The cluster consists of 3 nodes:
The test queue was created on the node-1.rabbitmq node and the mirroring policy was applied to replicate messages on all nodes:
Approximately 70k messages were sent to the test queue and the screenshot from the RabbitMQ management tool is shown below:
It is clear that all nodes got messages and they are kept in memory.
Memory consumption of RabbitMQ is a tricky topic and there are many factors which can affect it (type of the queue, the amount of messages in other queues, reaching the defined limits, etc.). In the official documentation it is stated:
RabbitMQ can report on its own memory use, to let you see where your system is using memory. Note that all measurements are somewhat approximate, based on values returned by the underlying Erlang virtual machine; however they should still be accurate enough to be useful.

Related

How to have more than 50 000 messages in a RabbitMQ Queue

We have currently using a service bus in Azure and for various reasons, we are switching to RabbitMQ.
Under heavy load, and when specific tasks on backend are having problem, one of our queues can have up to 1 million messages waiting to be processed.
RabbitMQ can have a maximum of 50 000 messages per queue.
The question is how can we design the rabbitMQ infrastructure to continue to work when messages are temporarily accumulating?
Note: we want to host our RabbitMQ server in a docker image inside a kubernetes cluster.
we imagine an exchange that would load balance mesages between queues in nodes behind.
But what is unclear to us is how to dynamically add new queues on demand if we detect that queues are getting full.
RabbitMQ can have a maximum of 50 000 messages per queue.
There is no this kind of limit.
RabbitMQ can handle more messages using quorum or classic queues with lazy.
With stream queues RabbitMQ can handle Millions of messages per second.
we imagine an exchange that would load balance messages between queues in nodes behind.
you can do that using different bindings.
kubernetes cluster.
I would suggest to use the k8s Operator
But what is unclear to us is how to dynamically add new queues on demand if we detect that queues are getting full.
There is no concept of FULL in RabbitMQ. There are limits that you can put using max-length or TTL.
A RabbitMQ queue will never be "full" (no such limitation exists in the software). A queue's maximum length rather depends on:
Queue settings (e.g max-length/max-length-bytes)
Message expiration settings such as x-message-ttl
Underlying hardware & cluster setup (available RAM and disk space).
Unless you are using Streams (new feature in v 3.9) you should always try to keep your queues short (if possible). The entire idea of a Message Queue (in it's classical sense) is that a message should be passed along as soon as possible.
Therefore, if you find yourself with long queues you should rather try to match the load of your producers by adding more consumers.

Flow control limitting message rate on single queue

I have a exchange and only one queue bind to it. When the message publishing rate goes over some cap the rabbitmq automatically throttles the incoming message rate.
On further investigation i found this happens due to the "Flow control" trottling mechanism built in rabbitmq. https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
As per this document i have connection, channels in flow control and not the queue. which means there is a cpu-bound / disk-bound limit.
My messages are not persistent so i don't have disk limitation. On Searching, i found documents stating a queue is limited to single cpu. https://groups.google.com/forum/#!msg/rabbitmq-users/wzHMV7F0ugU/zhW_9b8ACQAJ
What does it mean ? do the rabbitmq queue process uses only 1 cpu even multiple cores are available in the machine? what is the limitation of cpu with respect to queue flow control?
A queue is handled by one and one only CPU, which mean that you have to design your message flow through rabbit with multiple queue in order to remain scalable.
If you are on one queue only you will be limited to a maximum number of messages no matter if you have 1 or more cores
https://www.rabbitmq.com/queues.html#runtime-characteristics
If you have a specific need to build an architecture with only one logical queue, which is explicitely not recommended ; or if you have a queue with a really high trafic, you can check sharded queues here : Github Sharded queues Plugin
It's a pluggin (take with caution and test everything before going to production, especialy failure and replication) that split a logical queue name into multiple queues.
If you are running a benchmark on rabbitmq, remember to produce and consume on a number of queues superior to the amount of CPU cores present on the server.
Other tips about benchmark, try to produce only, consume only, and both at the same time, with different persistence settings (persistence, message size, lazy queues, ...) and ack settings.

RabbitMQ Durable queue on disk node

I was reading through the documentation of RabbitMQ on their website and came across two terminologies which seem to be doing the same thing - "Durable Queues" and "Disk Node". As per the documentation if I make a Disk Node, all data except messages, message store indices, queue indices and other node state (not sure what are the other node states).
So, if I make my node a Disk Node, do I still need to mark my queue as durable to survive broker restarts ?
Same question goes for durable exchanges as well.
Disk nodes and durable queues are two different concepts within RabbitMQ.
RabbitMQ maintains certain internal information (such as users, passwords, vhosts, ...) within specific mnesia tables. Disk nodes store these tables on disk. As the related documentation states:
This does not include messages, message store indices, queue indices and other node state.
To ensure durability/persistence of exchanges, queues or messages you need to explicitly state it when you declare/publish them.

RabbitMQ clustering model

Let us consider the scenario below.
There are 3 RabbitMQ brokers(B1,B2,B3) deployed in a clustered model. There is an exchange E with bindings which is replicated to all the 3 brokers. There is a producer P and 3 consumers C1,C2,C3. I have the following questions
Lets say a producer connects to broker B1 and creates a Queue Q which is mirrored to B2. Now when a consumer connects to Broker B3, how does it get the messages in the queue?
From my understanding, the exchange and binding information is maintained in memory in each broker. If the exchange is persistent, in order to recover from broker crashes, is the exchange and binding information also persisted in the disk in all brokers?
If the entire queue is maintained in memory in all the mirrored brokers, it consumes a lot of memory in the broker. In order to support potentially large number of queues each holding millions of messages in each broker, is it not a constraint for scalability?
Each mirrored queue has a master node. The master node for that queue is always used for consuming. So when a consumer connects to a node which is does not have the queue storage (or is a slave node), the consumer will actually end up consuming from the master node.
Yes, assuming the node is a disc node and not a RAM node. I'm not 100% sure about the binding, but my guess is yes. Anyway, it's highly recommended to always declare all queues, exchanges etc that your client needs! (do this each time client starts or something)
Yes, that's the point of mirroring: add redundancy in case something goes wrong. It does not increase performance (rather the opposite!). But in general, queues with millions of messages is not exactly a good situation as queues should, on average, be empty

How distributed should queues be in a RabbitMQ cluster?

Assume you have a small rabbitmq system of 3 nodes that is supposed to handle 100+ decently high volume queues in the same exchange. Given that queues only exist on the node they are created on (we're not using replicated, High Availability queues), what's the best way to create the queues? Is there any benefit to having the queues distributed among the cluster nodes, or is it better to keep them all on one node and have rmq do the routing?
It depends on your application, really.
RabbitMQ is smart about sending messages, so it'll only send a message to a node in the cluster if
a queue that holds that message resides on that node or
if a consumer has connected to that node and has requested the message.
In general, you should aim to declare queues on the nodes on which both the publishers and the consumers for that queue will connect to. In other words, you should aim to connect publishers and consumers to the node that holds the queues they use. This assumes you're trying to conserve bandwidth used overall.
If you're using clustering to improve throughput (and you probably are), and you don't care about internal bandwidth used, you should aim to connect your publishers/consumers to the nodes in a balanced way and not worry about the internal routing mechanisms.
One last thing to think about is memory and disk-space. Queues store messages in main memory, and fallback to disk if that's insufficient. So, if you declare all your queues in one place, that'll result in one node that's "over-worked" and two nodes with memory to spare.
As part of a move towards redundancy and failover in an application I'm working on, I've just finished setting up a RabbitMQ cluster behind a proxy, and have all of my publishers and consumers connect via the proxy, which round robins connections to the individual nodes as they come in from the clients. Prior to upgrading RabbitMQ to 2.7.1, this seemed to pretty evenly distribute queues to the separate nodes, though this would of course depend pretty heavily on how your proxy balances the requests and when your clients try to connect (and declare a queue)...
Having said all that, I just upgraded to RabbitMQ 2.7.1, which was pretty painless, and gave us HA queues, which is a pretty big win for our apps. At any rate, if you're interested in the set up, and think it would be of benefit to your queue problem, I'd be happy to share the setup.