Which Exception should I get when RabbitMQ as a broker for Celery is out of memory? - rabbitmq

I have an application which uses Celery with RabbitMQ as a broker.
Recently I've noticed that the application became slow. All RabbitMQ Connections had the state "blocking" and the used memory was pretty much at the high watermark.
After making sure that RabbitMQ has enough memory, a couple of connections went to the "running" state and the overall system normalized.
Now I want to be able to recognize this earlier. While I will improve the monitoring/alerting for RabbitMQ itself, I was wondering if it's possible to detect this state on the Celery app side. What does Celery do when all connections of the broker are blocking / when the broker has issues?

Related

Is there a redis pub/sub replacement option, with high availability and redundancy, or, probably p2p messaging?

I have an app with hundreds of horizontally scaled servers which uses redis pub/sub, and it works just fine.
The redis server is a central point of failure. Whenever redis fails (well, it happens sometimes), our application falls into inconsistent state and have to follow recovery process which takes time. During this time the entire app is hardly useful.
Is there any messaging system/framework option, similar to redis pub/sub, but with redundancy and high availability so that if one instance fails, other will continue to deliver the messages exchanged between application hosts?
Or, better, is there any distributed messaging system in which app instances exchange the messages in a peer-to-peer manner, so that there is no single point of failure?

How to track celery and rabbitmq in production server

I have installed both celery and rabbitmq. Now i would like to track how many messages are there in the queue and how it is distributed, want to see the list of celery consumers and tasks they are executing etc. this is bcoz i had issues with celery getting stuck when there is a memory pressure. I tried installing rabbitmq management for a start and when i tried to login at myservr.com:15672 it said can only be used through localhost, is there any workaround? Also is it a good idea to run such monitoring on production servers? Will there be any chance for memory leaks?

how to resove "connection.blocked: true" in capabilities on the RabbitMQ UI

"rabbitmqctl list_connections" shows as running but on the UI in the connections tab, under client properties, i see "connection.blocked: true".
I can see that messages are in queued in RabbitMq and the connection is in idle state.
I am running Airflow with Celery. My jobs are not executing at all.
Is this the reason why jobs are not executing?
How to resolve the issue so that my jobs start running
I'm experiencing the same kind of issue by just using celery.
It seems that when you have a lot of messages in the queue, and these are fairly chunky, and your node memory goes high, the rabbitMQ memory watermark gets trespassed and this triggers a blocking into consumer connections, so no worker can access that node (and related queues).
At the same time publishers are happily sending stuff via the exchange so you get in a lose-lose situation.
The only solution we had is to avoid hitting that memory watermark and scale up the number of consumers.
Keep messages/tasks lean so that the signature is not MB but KB

activemq tuning for 20000 threads

I have running ActiveMQ which connects thru stomp port with 20000+ servers at same time to publish and consume message. The activemq server is running 8CPU and 32G memory. I have assigned JVM max memory as -Xmx16384m . But still when all the servers are connected with this ActiveMQ, server gets over loaded and Virtual Memory usage about 21G and cpu utilization is about 500 some times.
Not sure whether JVM uses that much or anyother process utilizing in this activemq and tried with many tunings and no improvements.
Maybe you should reconsider the architecture. If you really need that many servers you may want to try a non blocking messaging bus, like ActiveMQ Artemis. I don't know for sure how many STOMP client it will support under your setup but it's worth a try. Keeping that many clients a separate threads will have a huge memory footprint and I think Artemis will handle such cases better. Not sure for STOMP though.

Fail-over for Redis replica set, Celery

We're running a flask application and we do all our heavy processing with celery. We use a redis instance from amazon to be our message broker. We just had a fail, causing much pain and bleeding, so we're looking into fail-over strategies.
The first project that appeared to us was Celery sentinel. https://github.com/dealertrack/celery-redis-sentinel
Would this be something that would give us a fail-over capability?
We've been doing some tests, and it seems not to be working as anticipated.
In your case maybe moving the celery backend to RabbitMQ would be better, as RabbitMQ is a lot more persistent with its data