How to track celery and rabbitmq in production server - rabbitmq

I have installed both celery and rabbitmq. Now i would like to track how many messages are there in the queue and how it is distributed, want to see the list of celery consumers and tasks they are executing etc. this is bcoz i had issues with celery getting stuck when there is a memory pressure. I tried installing rabbitmq management for a start and when i tried to login at myservr.com:15672 it said can only be used through localhost, is there any workaround? Also is it a good idea to run such monitoring on production servers? Will there be any chance for memory leaks?

Related

Which Exception should I get when RabbitMQ as a broker for Celery is out of memory?

I have an application which uses Celery with RabbitMQ as a broker.
Recently I've noticed that the application became slow. All RabbitMQ Connections had the state "blocking" and the used memory was pretty much at the high watermark.
After making sure that RabbitMQ has enough memory, a couple of connections went to the "running" state and the overall system normalized.
Now I want to be able to recognize this earlier. While I will improve the monitoring/alerting for RabbitMQ itself, I was wondering if it's possible to detect this state on the Celery app side. What does Celery do when all connections of the broker are blocking / when the broker has issues?

how to resove "connection.blocked: true" in capabilities on the RabbitMQ UI

"rabbitmqctl list_connections" shows as running but on the UI in the connections tab, under client properties, i see "connection.blocked: true".
I can see that messages are in queued in RabbitMq and the connection is in idle state.
I am running Airflow with Celery. My jobs are not executing at all.
Is this the reason why jobs are not executing?
How to resolve the issue so that my jobs start running
I'm experiencing the same kind of issue by just using celery.
It seems that when you have a lot of messages in the queue, and these are fairly chunky, and your node memory goes high, the rabbitMQ memory watermark gets trespassed and this triggers a blocking into consumer connections, so no worker can access that node (and related queues).
At the same time publishers are happily sending stuff via the exchange so you get in a lose-lose situation.
The only solution we had is to avoid hitting that memory watermark and scale up the number of consumers.
Keep messages/tasks lean so that the signature is not MB but KB

Fail-over for Redis replica set, Celery

We're running a flask application and we do all our heavy processing with celery. We use a redis instance from amazon to be our message broker. We just had a fail, causing much pain and bleeding, so we're looking into fail-over strategies.
The first project that appeared to us was Celery sentinel. https://github.com/dealertrack/celery-redis-sentinel
Would this be something that would give us a fail-over capability?
We've been doing some tests, and it seems not to be working as anticipated.
In your case maybe moving the celery backend to RabbitMQ would be better, as RabbitMQ is a lot more persistent with its data

RabbitMQ creates a number of strange processes

I happened to find a number of strange processes created by rabbitmq on my RabbitMQ server. I ran rabbitmq server in a docker container. I recreated the container and hours later those processes appeared again. There're some consumers connecting to it. Any idea about what those processes for? Thanks!

Celery workers missing heartbeats and getting substantial drift over Ec2

I am testing my celery implementation over 3 ec2 machines right now. I am pretty confident in my implementation now, but I am getting problems with the actual worker execution. My test structure is as follows:
1 ec2 machine is designated as the broker, also runs a celery worker
1 ec2 machine is designated as the client (runs the client celery script that enqueues all the tasks using .delay(), also runs a celery worker
1 ec2 machine is purely a worker.
All the machines have 1 celery worker running. Before, I was immediately getting the message:
"Substantial drift from celery#[other ec2 ip] may mean clocks are out of sync."
A drift amount in seconds would then be printed, which would increase over time.
I would also get messages : "missed heartbeat from celery#[other ec2 ip].
The machine would be doing very little work at this point, so my AutoScaling config in ec2 would shut down the instance automatically once it got to cpu utilization levels very low (<5%)
So to try to solve this problem, i attempted to sync all my machine's clocks (although I thought celery handled this) with this command, which was performed upon start up for all machines:
apt-get -qy install ntp
service ntp start
With this, they all performed well for about 10 minutes with no hitches, after which I started getting missed heartbeats and my ec2 instances stalled and shut down. The weird thing is, the drift increased and then decreased sometimes.
Any idea on why this is happening?
I am using the newest version of celery (3.1) and rabbitmq
EDIT: It should be noted that I am utilizing us-west-1a and us-west-1c availability zones on ec2.
EDIT2: I am starting to think memory problems might be an issue. I am using a t2.micro instance, and running 3 celery workers on the same machine (only 1 instance) which is also the broker, still cause heartbeat misses and stalls.