how to resove "connection.blocked: true" in capabilities on the RabbitMQ UI - rabbitmq

"rabbitmqctl list_connections" shows as running but on the UI in the connections tab, under client properties, i see "connection.blocked: true".
I can see that messages are in queued in RabbitMq and the connection is in idle state.
I am running Airflow with Celery. My jobs are not executing at all.
Is this the reason why jobs are not executing?
How to resolve the issue so that my jobs start running

I'm experiencing the same kind of issue by just using celery.
It seems that when you have a lot of messages in the queue, and these are fairly chunky, and your node memory goes high, the rabbitMQ memory watermark gets trespassed and this triggers a blocking into consumer connections, so no worker can access that node (and related queues).
At the same time publishers are happily sending stuff via the exchange so you get in a lose-lose situation.
The only solution we had is to avoid hitting that memory watermark and scale up the number of consumers.
Keep messages/tasks lean so that the signature is not MB but KB

Related

Which Exception should I get when RabbitMQ as a broker for Celery is out of memory?

I have an application which uses Celery with RabbitMQ as a broker.
Recently I've noticed that the application became slow. All RabbitMQ Connections had the state "blocking" and the used memory was pretty much at the high watermark.
After making sure that RabbitMQ has enough memory, a couple of connections went to the "running" state and the overall system normalized.
Now I want to be able to recognize this earlier. While I will improve the monitoring/alerting for RabbitMQ itself, I was wondering if it's possible to detect this state on the Celery app side. What does Celery do when all connections of the broker are blocking / when the broker has issues?

RabbitMQ as Message Broker used by Spring Websocket dies under load

I develop an application where we need to handle 160k concurrent users which are connected to the backend via a websocket connection.
We decided to use the spring websocket implementation and RabbitMQ as the message broker.
In our application every user needs to subscribe to its user queue /exchange/amq.direct/update as well as to another queue where also other users can potential subscribe to /topic/someUniqueName.
In our first performance test we did the naive approach where every user subscribes to two new queues.
When running the test RabbitMQ dies silently when around 800 users are connected at the same time, so around 1600 queues are active (See the graph of all RabbitMQ objects here).
I read though that you should be careful opening many connections to RabbitMQ.
Now I wonder if the approach that is anticipated by Spring Websocket with opening one queue per user is a conceptional problem for systems with high load or if there is another error in my system.
Limiting factors for RabbitMQ are usually:
memory (can be checked in dashboard) that needs to grow with number of messages and number of queues (if you don't use lazy queues that go directly to disk).
maximum number of file descriptors (at least 1 per connection) that often defaults to too low values on many distributions (ref: https://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2012-April/019615.html)
CPU for routing the messages
I did find the issue. I actually misconfigured the RabbitMQ service and just gave it a 1024 file descriptor limit. Increasing it solved the issue.

ActiveMQ Consumer takes a long time to receive messages on startup

I've been experiencing an issue on the consumer side when restarting. After dumping the heap and sifting through threads, I've determined that the issue is due to the compression of the kahadb local repository index file. As this file gets larger, the time it takes for the consumer to start getting messages again increases. I've deleted my local repository directory, restarted, and verified that the consumer gets messages almost instantly.
Has anyone experienced this issue when working with ActiveMQ and KahaDB? On occasion, if the directory isn't wiped out, it can take up to 1.5 hours for my consumer to start getting messages from the broker again.
I've also verified that the messages are being published in a timely matter, they're just not being consumed because the index compression thread is blocking the "add" threads.
Any insight would be greatly appreciated!

What is the expected behavior when a RabbitMQ durable queue runs out of RAM?

My understanding of RabbitMQ durable queues (i.e. delivery_mode = 2) is that they run in RAM, but that messages are flushed to disk so that they can be recovered in the event that the process is restarted or the machine is rebooted.
It's unclear to me though what the expected behavior is when the machine runs out of memory. If the queue gets overloaded, dies, and needs to be restored, then simply loading the messages from the disk-backed store would consume all available RAM.
Do durable queues only load a subset of the messages into RAM in this scenario?
RabbitMQ will page the messages to disc as memory fills up. See https://www.rabbitmq.com/memory.html section "Configuring the Paging Threshold".

Accessing AMQP connection from Resque worker when using Thin as Web Server

trying to work past an issue when using a resque job to process inbound AMQP messages.
am using an initializer to set up the message consumer at application startup and then feed the received messages to resque job for processing. that is working quite well.
however, i also want to process a response message out of the worker, i.e. publish it back out to a queue, and am running into the issue of the forking process making the app-wide AMQP connection unaddressable from inside the resque worker. would be very interested to see how other folks have tackled this as i can't believe this pattern is unusual.
due to message volumes, taking the approach of firing up a new thread and amqp connection for every response is not a workable solution.
ideas?
my bust on this, had my eye off the ball and forgot about resque forking when it kicks off a worker. going to go the route suggested by others of daemonizing the process instead....