How to have more than 50 000 messages in a RabbitMQ Queue - rabbitmq

We have currently using a service bus in Azure and for various reasons, we are switching to RabbitMQ.
Under heavy load, and when specific tasks on backend are having problem, one of our queues can have up to 1 million messages waiting to be processed.
RabbitMQ can have a maximum of 50 000 messages per queue.
The question is how can we design the rabbitMQ infrastructure to continue to work when messages are temporarily accumulating?
Note: we want to host our RabbitMQ server in a docker image inside a kubernetes cluster.
we imagine an exchange that would load balance mesages between queues in nodes behind.
But what is unclear to us is how to dynamically add new queues on demand if we detect that queues are getting full.

RabbitMQ can have a maximum of 50 000 messages per queue.
There is no this kind of limit.
RabbitMQ can handle more messages using quorum or classic queues with lazy.
With stream queues RabbitMQ can handle Millions of messages per second.
we imagine an exchange that would load balance messages between queues in nodes behind.
you can do that using different bindings.
kubernetes cluster.
I would suggest to use the k8s Operator
But what is unclear to us is how to dynamically add new queues on demand if we detect that queues are getting full.
There is no concept of FULL in RabbitMQ. There are limits that you can put using max-length or TTL.

A RabbitMQ queue will never be "full" (no such limitation exists in the software). A queue's maximum length rather depends on:
Queue settings (e.g max-length/max-length-bytes)
Message expiration settings such as x-message-ttl
Underlying hardware & cluster setup (available RAM and disk space).
Unless you are using Streams (new feature in v 3.9) you should always try to keep your queues short (if possible). The entire idea of a Message Queue (in it's classical sense) is that a message should be passed along as soon as possible.
Therefore, if you find yourself with long queues you should rather try to match the load of your producers by adding more consumers.

Related

Flow control limitting message rate on single queue

I have a exchange and only one queue bind to it. When the message publishing rate goes over some cap the rabbitmq automatically throttles the incoming message rate.
On further investigation i found this happens due to the "Flow control" trottling mechanism built in rabbitmq. https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
As per this document i have connection, channels in flow control and not the queue. which means there is a cpu-bound / disk-bound limit.
My messages are not persistent so i don't have disk limitation. On Searching, i found documents stating a queue is limited to single cpu. https://groups.google.com/forum/#!msg/rabbitmq-users/wzHMV7F0ugU/zhW_9b8ACQAJ
What does it mean ? do the rabbitmq queue process uses only 1 cpu even multiple cores are available in the machine? what is the limitation of cpu with respect to queue flow control?
A queue is handled by one and one only CPU, which mean that you have to design your message flow through rabbit with multiple queue in order to remain scalable.
If you are on one queue only you will be limited to a maximum number of messages no matter if you have 1 or more cores
https://www.rabbitmq.com/queues.html#runtime-characteristics
If you have a specific need to build an architecture with only one logical queue, which is explicitely not recommended ; or if you have a queue with a really high trafic, you can check sharded queues here : Github Sharded queues Plugin
It's a pluggin (take with caution and test everything before going to production, especialy failure and replication) that split a logical queue name into multiple queues.
If you are running a benchmark on rabbitmq, remember to produce and consume on a number of queues superior to the amount of CPU cores present on the server.
Other tips about benchmark, try to produce only, consume only, and both at the same time, with different persistence settings (persistence, message size, lazy queues, ...) and ack settings.

RabbitMQ HighAvailability

I am new to RabbitMQ. I wanted to know how memory is used in case of HA.
For example, in Kafka the partition use a specific amount of memory if data is present or not in it and so do the replications .In RabbitMQ how are the queues allocated memory ? and How does HA work ?Do the mirrored queues occupy the same amout of memory each replicated node ?
Queues in RabbitMQ don't need a lot of resources per se, but messages will be kept in memory in most of the cases. When a message is sent to the queue that has mirrored queues, this message will be replicated among other nodes defined by the mirroring policy. The idea of mirrored queues is to provide high availability, so if the broker hosting the master queue crashes, a new master queue will be elected among alive mirrored queues. The switch to the new node should happen quite fast, because all messages are ready to be consumed.
Simple example:
The cluster consists of 3 nodes:
The test queue was created on the node-1.rabbitmq node and the mirroring policy was applied to replicate messages on all nodes:
Approximately 70k messages were sent to the test queue and the screenshot from the RabbitMQ management tool is shown below:
It is clear that all nodes got messages and they are kept in memory.
Memory consumption of RabbitMQ is a tricky topic and there are many factors which can affect it (type of the queue, the amount of messages in other queues, reaching the defined limits, etc.). In the official documentation it is stated:
RabbitMQ can report on its own memory use, to let you see where your system is using memory. Note that all measurements are somewhat approximate, based on values returned by the underlying Erlang virtual machine; however they should still be accurate enough to be useful.

Handling RabbitMQ node failures in a cluster in order to continue publishing and consuming

I would like to create a cluster for high availability and put a load balancer front of this cluster. In our configuration, we would like to create exchanges and queues manually, so one exchanges and queues are created, no client should make a call to redeclare them. I am using direct exchange with a routing key so its possible to route the messages into different queues on different nodes. However, I have some issues with clustering and queues.
As far as I read in the RabbitMQ documentation a queue is specific to the node it was created on. Moreover, we can only one queue with the same name in a cluster which should be alive in the time of publish/consume operations. If the node dies then the queue on that node will be gone and messages may not be recovered (depends on the configuration of course). So, even if I route the same message to different queues in different nodes, still I have to figure out how to use them in order to continue consuming messages.
I wonder if it is possible to handle this failover scenario without using mirrored queues. Say I would like switch to a new node in case of a failure and continue to consume from the same queue. Because publisher is just using routing key and these messages can go into more than one queue, same situation is not possible for the consumers.
In short, what can I to cope with the failures in an environment explained in the first paragraph. Queue mirroring is the best approach with a performance penalty in the cluster or a more practical solution exists?
Data replication (mirrored queues in RabbitMQ) is a standard approach to achieve high availability. I suggest to use those. If you don't replicate your data, you will lose it.
If you are worried about performance - RabbitMQ does not scale well.
The only way I know to improve performance is just to make your nodes bigger or create second cluster. Adding nodes to cluster does not really improve things. Also if you are planning to use TLS it will decrease throughput significantly as well. If you have high throughput requirement +HA I'd consider Apache Kafka.
If your use case allows not to care about HA, then just re-declare queues/exchanges whenever your consumers/publishers connect to the broker, which is absolutely fine. When you declare queue that's already exists nothing wrong will happen, queue won't be purged etc, same with exchange.
Also, check out RabbitMQ sharding plugin, maybe that will do for your usecase.

celeryev Queue in RabbitMQ Becomes Very Large

I am using celery on rabbitmq. I have been sending thousands of messages to the queue and they are being processed successfully and everything is working just fine. However, the number of messages in several rabbitmq queues are growing quite large (hundreds of thousands of items in the queue). The queues are named celeryev.[...] (see screenshot below). Is this appropriate behavior? What is the purpose of these queues and shouldn't they be regularly purged? Is there a way to purge them more regularly, I think they are taking up quite a bit of disk space.
You can use the CELERY_EVENT_QUEUE_TTL celery option (only working with amqp), that will set the message expiry time, after which it will be deleted from the queue.
For anyone else who is running into problems with a celeryev queue becoming very large and threatening the disk space on your rabbitmq server, beware the accepted answer! Here's my suggestion. Just issue this command on your rabbitmq instance:
rabbitmqctl set_policy limit_celeryev_queues "^celeryev\." '{"max-length":1000000}' --apply-to queues
This will limit any queue beginning with "celeryev" to 1 Million entries. I did some experimenting with a stuck flower instance causing a runaway celeryev queue, and setting CELERY_EVENT_QUEUE_TTL / CELERY_EVENT_QUEUE_EXPIRES did not help control the queue size.
In my testing, I started a flower process, then SIGSTOP'ed it, and watched its celeryev queue start running away. Neither of these two settings helped at all. I confirmed SIGCONT'ing the flower process would bring the queue back to 0 rapidly. I am not certain why these two knobs didn't help, but it may have something to do with how RabbitMQ implements these two settings.
First, the Per-Message TTL corresponding to CELERY_EVENT_QUEUE_TTL only establishes an expiration time on each queue entry -- AIUI it will not automatically delete the message out of the queue to save space upon expiration. Second, the Queue TTL corresponding to CELERY_EVENT_QUEUE_EXPIRES says that it "... guarantees that the queue will be deleted, if unused for at least the expiration period". However, I believe that their definition of "unused" may be too strict to kick in for e.g. an overburdened, stuck, or killed flower process.
EDIT: Unfortunately, one problem with this suggestion is that the set_policy ... apply-to queues will only impact existing queues, and flower can and will create new queues which may overflow.
Celery use celeryev prefixed queues (and exchange) for monitoring, you can configure it as you want or disable at all (celery control disable_events).
You just have to set a config to your Celery.
If you want to avoid Celery from creating celeryev.* queues:
CELERY_SEND_EVENTS = False # Will not create celeryev.* queues
If you need these queues for monitoring purpose (CeleryFlower for instance), you may regularly purge them:
CELERY_EVENT_QUEUE_EXPIRES = 60 # Will delete all celeryev. queues without consumers after 1 minute.
The solution came from here: https://www.cloudamqp.com/docs/celery.html
You can limit the queue size in RabbitMQ with x-max-length queue declaration argument
http://www.rabbitmq.com/maxlength.html

how to recover from message store exhaustion?

when a activemq broker gets flooded with messages or the consumer fails it will stop accepting messages once certain (configurable) limits are reached. In Broker Networks this effect can take down the whole cluster.
I'm currently using the default configuration for memory limits and experience the following behavior:
consumer fails or becomes very slow (known problem)
broker A (the one the consumer connects to) gets filled and stops accepting messages
all other brokers get filled up and stop to accept messages
the cluster is basicly down
if the consumer comes back online now it will try to reconnect to one of the cluster nodes but the nodes will not accept the connection becaus this would create advisory messages that can't be handled because the broker is already full.
How do i have to configure the memory limits so that my productive destinations are limited and blocked but the broker will still be able to accept advisories so my consumer can revover?
You should be able to use producerFlowControl to slow producers to not overwhelm your broker. That being said, this is enabled by default, so you are likely using it already...
I would try something like this (assuming an 8GB box or so)...
use the failover transport everywhere (broker/client connections)
increase JVM heap to 4 GB
increase systemUsage limits substantially (memoryUsage 3gb, storeUsage/tempUsage = 10 gb)
enable producer flow control on both topics and queues
set the memory limit to 2GB divided by the total # of topics+queues
in other words, this should in total be substantially less the the memoryUsage limit
exclude the Advisory topics from the producer flow control (they might be already)
This should limit the producers and leave resources for your system to function/recover/accept consumer connections...