celeryev Queue in RabbitMQ Becomes Very Large - rabbitmq

I am using celery on rabbitmq. I have been sending thousands of messages to the queue and they are being processed successfully and everything is working just fine. However, the number of messages in several rabbitmq queues are growing quite large (hundreds of thousands of items in the queue). The queues are named celeryev.[...] (see screenshot below). Is this appropriate behavior? What is the purpose of these queues and shouldn't they be regularly purged? Is there a way to purge them more regularly, I think they are taking up quite a bit of disk space.

You can use the CELERY_EVENT_QUEUE_TTL celery option (only working with amqp), that will set the message expiry time, after which it will be deleted from the queue.

For anyone else who is running into problems with a celeryev queue becoming very large and threatening the disk space on your rabbitmq server, beware the accepted answer! Here's my suggestion. Just issue this command on your rabbitmq instance:
rabbitmqctl set_policy limit_celeryev_queues "^celeryev\." '{"max-length":1000000}' --apply-to queues
This will limit any queue beginning with "celeryev" to 1 Million entries. I did some experimenting with a stuck flower instance causing a runaway celeryev queue, and setting CELERY_EVENT_QUEUE_TTL / CELERY_EVENT_QUEUE_EXPIRES did not help control the queue size.
In my testing, I started a flower process, then SIGSTOP'ed it, and watched its celeryev queue start running away. Neither of these two settings helped at all. I confirmed SIGCONT'ing the flower process would bring the queue back to 0 rapidly. I am not certain why these two knobs didn't help, but it may have something to do with how RabbitMQ implements these two settings.
First, the Per-Message TTL corresponding to CELERY_EVENT_QUEUE_TTL only establishes an expiration time on each queue entry -- AIUI it will not automatically delete the message out of the queue to save space upon expiration. Second, the Queue TTL corresponding to CELERY_EVENT_QUEUE_EXPIRES says that it "... guarantees that the queue will be deleted, if unused for at least the expiration period". However, I believe that their definition of "unused" may be too strict to kick in for e.g. an overburdened, stuck, or killed flower process.
EDIT: Unfortunately, one problem with this suggestion is that the set_policy ... apply-to queues will only impact existing queues, and flower can and will create new queues which may overflow.

Celery use celeryev prefixed queues (and exchange) for monitoring, you can configure it as you want or disable at all (celery control disable_events).

You just have to set a config to your Celery.
If you want to avoid Celery from creating celeryev.* queues:
CELERY_SEND_EVENTS = False # Will not create celeryev.* queues
If you need these queues for monitoring purpose (CeleryFlower for instance), you may regularly purge them:
CELERY_EVENT_QUEUE_EXPIRES = 60 # Will delete all celeryev. queues without consumers after 1 minute.
The solution came from here: https://www.cloudamqp.com/docs/celery.html

You can limit the queue size in RabbitMQ with x-max-length queue declaration argument
http://www.rabbitmq.com/maxlength.html

Related

How to have more than 50 000 messages in a RabbitMQ Queue

We have currently using a service bus in Azure and for various reasons, we are switching to RabbitMQ.
Under heavy load, and when specific tasks on backend are having problem, one of our queues can have up to 1 million messages waiting to be processed.
RabbitMQ can have a maximum of 50 000 messages per queue.
The question is how can we design the rabbitMQ infrastructure to continue to work when messages are temporarily accumulating?
Note: we want to host our RabbitMQ server in a docker image inside a kubernetes cluster.
we imagine an exchange that would load balance mesages between queues in nodes behind.
But what is unclear to us is how to dynamically add new queues on demand if we detect that queues are getting full.
RabbitMQ can have a maximum of 50 000 messages per queue.
There is no this kind of limit.
RabbitMQ can handle more messages using quorum or classic queues with lazy.
With stream queues RabbitMQ can handle Millions of messages per second.
we imagine an exchange that would load balance messages between queues in nodes behind.
you can do that using different bindings.
kubernetes cluster.
I would suggest to use the k8s Operator
But what is unclear to us is how to dynamically add new queues on demand if we detect that queues are getting full.
There is no concept of FULL in RabbitMQ. There are limits that you can put using max-length or TTL.
A RabbitMQ queue will never be "full" (no such limitation exists in the software). A queue's maximum length rather depends on:
Queue settings (e.g max-length/max-length-bytes)
Message expiration settings such as x-message-ttl
Underlying hardware & cluster setup (available RAM and disk space).
Unless you are using Streams (new feature in v 3.9) you should always try to keep your queues short (if possible). The entire idea of a Message Queue (in it's classical sense) is that a message should be passed along as soon as possible.
Therefore, if you find yourself with long queues you should rather try to match the load of your producers by adding more consumers.

Handling RabbitMQ node failures in a cluster in order to continue publishing and consuming

I would like to create a cluster for high availability and put a load balancer front of this cluster. In our configuration, we would like to create exchanges and queues manually, so one exchanges and queues are created, no client should make a call to redeclare them. I am using direct exchange with a routing key so its possible to route the messages into different queues on different nodes. However, I have some issues with clustering and queues.
As far as I read in the RabbitMQ documentation a queue is specific to the node it was created on. Moreover, we can only one queue with the same name in a cluster which should be alive in the time of publish/consume operations. If the node dies then the queue on that node will be gone and messages may not be recovered (depends on the configuration of course). So, even if I route the same message to different queues in different nodes, still I have to figure out how to use them in order to continue consuming messages.
I wonder if it is possible to handle this failover scenario without using mirrored queues. Say I would like switch to a new node in case of a failure and continue to consume from the same queue. Because publisher is just using routing key and these messages can go into more than one queue, same situation is not possible for the consumers.
In short, what can I to cope with the failures in an environment explained in the first paragraph. Queue mirroring is the best approach with a performance penalty in the cluster or a more practical solution exists?
Data replication (mirrored queues in RabbitMQ) is a standard approach to achieve high availability. I suggest to use those. If you don't replicate your data, you will lose it.
If you are worried about performance - RabbitMQ does not scale well.
The only way I know to improve performance is just to make your nodes bigger or create second cluster. Adding nodes to cluster does not really improve things. Also if you are planning to use TLS it will decrease throughput significantly as well. If you have high throughput requirement +HA I'd consider Apache Kafka.
If your use case allows not to care about HA, then just re-declare queues/exchanges whenever your consumers/publishers connect to the broker, which is absolutely fine. When you declare queue that's already exists nothing wrong will happen, queue won't be purged etc, same with exchange.
Also, check out RabbitMQ sharding plugin, maybe that will do for your usecase.

RabbitMQ queue length limit with flow control

If I declare a queue with x-max-length, all messages will be dropped or dead-lettered once the limit is reached.
I'm wondering if instead of dropped or dead-lettered, RabbitMQ could activate the Flow Control mechanism like the Memory/Disk watermarks. The reason is because I want to preserve the message order (when submitting; FIFO behaviour) and would be much more convenient slowing down the producers.
Try to realize queue length limit on application level. Say, increment/decrement Redis key and check it max value. It might be not so accurate as native RabbitMQ mechanism but it works pretty good on separate queue/exchange without affecting other ones on the same broker.
P.S. Alternatively, in some tasks RabbitMQ is not the best choice and old-school relational databases (MySQL, PostgreSQL or whatever you like) works the best, but RabbitMQ still can be used as an event bus.
There are two open issues related to this topic on the rabbitmq-server github repo. I recommended expressing your interest there:
Block publishers when queue length limit is reached
Nack messages that cannot be deposited to all queues due to max length reached

Does clustering also distribute the message queue index in rabbitmq?

Currently are getting tons of new messages and our workers can't handle them as fast as they are coming in. The message queue index gets bigger and bigger untill the set_vm_memory_high_watermark is reached and it stops accepting connections.
So what we could do is increase the memory, but this may not be scalable untill a certain point. Instead I would like to add more servers and distribute the message queue index over several rabbitmqnodes and if we need more memory we just add more servers.
How would I set this up and is this possible or are there any other ways to solve this problem?
Yes, you can use Distributed RabbitMQ brokers, chose federation Shovel.
You can store messages on disk if it is an option for you or drop the oldest one (with per-message or per-queue ttl) or set the max queue length.

Temporary queue made in Celery

I am using Celery with RabbitMQ. Lately, I have noticed that a large number of temporary queues are getting made.
So, I experimented and found that when a task fails (that is a tasks raises an Exception), then a temporary queue with a random name (like c76861943b0a4f3aaa6a99a6db06952c) is formed and the queue remains.
Some properties of the temporary queue as found in rabbitmqadmin are as follows -
auto_delete : True
consumers : 0
durable : False
messages : 1
messages_ready : 1
And one such temporary queue is made everytime a task fails (that is, raises an Exception). How to avoid this situation? Because in my production environment a large number of such queues get formed.
It sounds like you're using the amqp as the results backend. From the docs here are the pitfalls of using that particular setup:
Every new task creates a new queue on the server, with thousands of
tasks the broker may be overloaded with queues and this will affect
performance in negative ways. If you’re using RabbitMQ then each
queue will be a separate Erlang process, so if you’re planning to
keep many results simultaneously you may have to increase the Erlang
process limit, and the maximum number of file descriptors your OS
allows
Old results will not be cleaned automatically, so you must make
sure to consume the results or else the number of queues will
eventually go out of control. If you’re running RabbitMQ 2.1.1 or
higher you can take advantage of the x-expires argument to queues,
which will expire queues after a certain time limit after they are
unused. The queue expiry can be set (in seconds) by the
CELERY_AMQP_TASK_RESULT_EXPIRES setting (not enabled by default).
From what I've read in the changelog, this is no longer the default backend in versions >=2.3.0 because users were getting bit in the rear end by this behavior. I'd suggest changing the results backend if this not the functionality you need.
Well, Philip is right there. The following is a description of how I solved it. It is a configuration in celeryconfig.py.
I am still using CELERY_BACKEND = "amqp" as Philip had said. But in addition to that, I am now using CELERY_IGNORE_RESULT = True. This configuration will ensure that the extra queues are not formed for every task.
I was already using this configuration but still when a task fails, the extra queue was formed. Then I noticed that I was using another configuration which needed to be removed which was CELERY_STORE_ERRORS_EVEN_IF_IGNORED = True. What this did that it did not store the results for all tasks but did only for errors (tasks which failed) and hence one extra queue for a task which failed.
The CELERY_TASK_RESULT_EXPIRES dictates the time to live of the temp queues. The default is 1 day. You can modify this value.
The reason this is happening is because celery workers remote control is enabled (it is enabled by default).
You can disable it by setting the CELERY_ENABLE_REMOTE_CONTROL setting to False
However, note that you will lose the ability to do things like add_consumer, cancel_consumer etc using the celery command
amqp backend creates a new queue for each task. If you want to avoid it, you can use rpc backend which keeps results in a single queue.
In your config, set
CELERY_RESULT_BACKEND = 'rpc'
CELERY_RESULT_PERSISTENT = True
You can read more about this on celery docs.