Celery workers stalled on boot - rabbitmq

We boot up a cluster of 250 worker nodes in AWS at night to handle some long-running distributed tasks.
The worker nodes are running celery with the following command:
celery -A celery_worker worker --concurrency=1 -l info -n background_cluster.i-1b1a0dbb --without-heartbeat --without-gossip --without-mingle -- celeryd.prefetch_multiplier=1
We are using rabbitmq as our broker, and there is only 1 rabbitmq node.
About 60% of our nodes claim to be listening, but will not pick up any tasks.
Their logs look like this:
-------------- celery#background_cluster.i-1b1a0dbb v3.1.18 (Cipater)
---- **** -----
--- * *** * -- Linux-3.2.0-25-virtual-x86_64-with-Ubuntu-14.04-trusty
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: celery_worker:0x7f10c2235cd0
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results: disabled
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> background_cluster exchange=root(direct) key=background_cluster
[tasks]
. more.celery_worker.background_cluster
[2015-10-10 00:20:17,110: WARNING/MainProcess] celery#background_cluster.i-1b1a0dbb
[2015-10-10 00:20:17,110: WARNING/MainProcess] consuming from
[2015-10-10 00:20:17,110: WARNING/MainProcess] {'background_cluster': <unbound Queue background_cluster -> <unbound Exchange root(direct)> -> background_cluster>}
[2015-10-10 00:20:17,123: INFO/MainProcess] Connected to amqp://our_server:**#10.0.11.136:5672/our_server
[2015-10-10 00:20:17,144: WARNING/MainProcess] celery#background_cluster.i-1b1a0dbb ready.
However, rabbitmq shows that there are messages waiting in the queue.
If I login to any of the worker nodes and issue this command:
celery -A celery_worker inspect active
...then every (previously stalled) worker node immediately grabs a task and starts cranking.
Any ideas as to why?
Might it be related to these switches?
--without-heartbeat --without-gossip --without-mingle

It turns out that this was a bug in celery where using --without-gossip kept events from draining. Celery's implementation of gossip is pretty new, and it apparently implicitly takes care of draining events, but when you turn it off things get a little wonky.
The details to the issue are outlined in this github issue: https://github.com/celery/celery/issues/1847
Master currently has the fix in this PR: https://github.com/celery/celery/pull/2823
So you can solve this one of three ways:
Use gossip (remove --without-gossip)
Patch your version of celery with https://github.com/celery/celery/pull/2823.patch
Use a cron job to run a celery inspect active regularly

Related

RabbitMQ Ack Timeout

I'm using RPC Pattern for processing my objects with RabbitMQ.
You suspect,I have an object, and I want to have that process finishes and After that send ack to RPC Client.
Ack as default has a timeout about 3 Minutes.
My process Take long time.
How can I change this timeout for ack of each objects or what I must be do for handling these like processess?
Modern versions of RabbitMQ have a delivery acknowledgement timeout:
In modern RabbitMQ versions, a timeout is enforced on consumer delivery acknowledgement. This helps detect buggy (stuck) consumers that never acknowledge deliveries. Such consumers can affect node's on disk data compaction and potentially drive nodes out of disk space.
If a consumer does not ack its delivery for more than the timeout value (30 minutes by default), its channel will be closed with a PRECONDITION_FAILED channel exception. The error will be logged by the node that the consumer was connected to.
Error message will be:
Channel error on connection <####> :
operation none caused a channel exception precondition_failed: consumer ack timed out on channel 1
Timeout by default is 30 minutes (1,800,000ms)note 1 and is configured by the consumer_timeout parameter in rabbitmq.conf.
note 1: Timeout was 15 minutes (900,000ms) before RabbitMQ 3.8.17.
if you run rabbitmq in docker, you can describe volume with file rabbitmq.conf, then create this file inside volume and set consumer_timeout
for example:
docker compose
version: "2.4"
services:
rabbitmq:
image: rabbitmq:3.9.13-management-alpine
network_mode: host
container_name: 'you name'
ports:
- 5672:5672
- 15672:15672 ----- if you use gui for rabbit
volumes:
- /etc/rabbitmq/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
And you need create file
rabbitmq.conf
on you server by this way
/etc/rabbitmq/
documentation with params: https://github.com/rabbitmq/rabbitmq-server/blob/v3.8.x/deps/rabbit/docs/rabbitmq.conf.example

Celery workers not working with redis

I tried to start the celery which consumes a data from REDIS . It is showing that celery has started (but not sure.. ) but this service could not spawn the workers pools.
Below message showing the status as ACTIVE :-
celeryd.service - LSB: celery task worker daemon Loaded: loaded
(/etc/init.d/celeryd; bad; vendor preset: enabled) Active: active
(exited) since Wed 2017-09-20 13:40:12 UTC; 7min ago
Docs: man:systemd-sysv-generator(8)
But still workers not created. I checked with running processes . Previously it was working fine with the current configuration . That means configuration is correct but when I restarted the application , this issue started to occur.
Second question is what is bad in below highlighted line :-
Loaded: loaded (/etc/init.d/celeryd; ***bad***; vendor preset: enabled)
Please reply with your suggestions/answers . Thanks in advance.

IO thread error : 1595 (Relay log write failure: could not queue event from master)

Slave status :
Last_IO_Errno: 1595
Last_IO_Error: Relay log write failure: could not queue event from master
Last_SQL_Errno: 0
from error log :
[ERROR] Slave I/O for channel 'db12': Unexpected master's heartbeat data: heartbeat is not compatible with local info; the event's data: log_file_name toku10-bin.000063<D1> log_pos 97223067, Error_code: 1623
[ERROR] Slave I/O for channel 'db12': Relay log write failure: could not queue event from master, Error_code: 1595
I tried to restarting the slave_io thread for many times, still its same.
we need to keep on start io_thread whenever it stopped manually, hope its bug from percona
I have simply written shell and scheduled the same for every 10mins to check if io_thread is not running , start slave io_thread for channel 'db12';. It's working as of now

storm-YARN : Topology submitted but no supervisor nodes assigned

I'm running WordCount storm TOPOLOGY (https://github.com/nathanmarz/storm-starter/).
I'm launching this topology via storm yarn, but it's not running any worker nodes. In the storm ui I cannot see any supervisor nodes assigned. In my code I've configured to start three worker nodes.
Do we need to do something else to get the supervisors started?
I also looked at nimbus.log, stderr, stdout & ui.log there are no signs of error.
storm jar target/storm-starter-0.9.3-rc2-SNAPSHOT-jar-with-dependencies.jar storm.starter.WordCountTopology TEST-WC-TOPO
1045 [main] INFO backtype.storm.StormSubmitter - Successfully uploaded topology jar to assigned location: storm-local/nimbus/inbox/stormjar-4a08ebf7-022c-4b22-86be-1efc6e8f6be0.jar
1046 [main] INFO backtype.storm.StormSubmitter - Submitting topology WCT in distributed mode with conf {"topology.workers":3,"topology.debug":true}
1438 [main] INFO backtype.storm.StormSubmitter - Finished submitting topology: WCT

how can i get result of periodic task scheduling

hey guys i am new to celery. i am working on periodic task scheduling. I have configured my celeryconfig.py as follow:
from datetime import timedelta
BROKER_URL = 'redis://localhost:6379/0'
CELERY_RESULT_BACKEND = "redis"
CELERY_REDIS_HOST = "localhost"
CELERY_REDIS_PORT = 6379
CELERY_REDIS_DB = 0
CELERY_IMPORTS=("mytasks")
CELERYBEAT_SCHEDULE={'runs-every-60-seconds' :
{
'task': 'mytasks.add',
'schedule': timedelta(seconds=60),
'args':(16,16)
},
}
and mytask.pyas follow:
from celery import Celery
celery = Celery("tasks",
broker='redis://localhost:6379/0',
backend='redis')
#celery.task
def add(x,y):
return x+y
#celery.task
def mul(x,y):
return x*y
when i am running
celery beat -s celerybeat-schedule then i am getting
Configuration ->
. broker -> redis://localhost:6379/0
. loader -> celery.loaders.default.Loader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]#INFO
. maxinterval -> now (0s)
[2012-08-28 12:27:17,825: INFO/MainProcess] Celerybeat: Starting...
[2012-08-28 12:28:00,041: INFO/MainProcess] Scheduler: Sending due task mytasks.add
[2012-08-28 12:29:00,057: INFO/MainProcess] Scheduler: Sending due task mytasks.add
[2012-08-28 12:30:00,064: INFO/MainProcess] Scheduler: Sending due task mytasks.add
[2012-08-28 12:31:00,097: INFO/MainProcess] Scheduler: Sending due task mytasks.add
now i am not getting that i have passed arguments (16,16) then how i can get the answer of this function add(x,y)
I'm not sure I quite understand what you have asked, but from what I can tell, your issue may be one of the following:
1) Are you running celeryd (the worker daemon)? If not, did you start a celery worker in a terminal? Celery beat is a task scheduler. It is not a worker. Celerybeat only schedules the tasks (i.e. places them in a queue for a worker to eventually consume).
2) How did you plan on viewing the results? Are they being saved somewhere? Since you have set your results backend to redis, the results are at least temporarily stored in the redis results backend