i am new to celery and redis.
I started up my redis server by using redis-server.
Celery was run using this parameter
celery -A proj worker
There are no other configurations. However, i realised that when i have a long running job in celery, it does not process another task that is in the queue until the long running task is completed. My understanding is that since i have 8 cores on my CPU, i should be able to process 8 tasks concurrently since the default parameter for -c is the number of cores?
Am i missing something here ?
Your problem is classical, everybode met this who had long-running tasks.
The root cause is that celery tries to optimize your execution flow reserving some tasks for each worker. But if one of these tasks is long-running the others get locked. It is known as 'prefetch count'. This is because by default celery set up for short tasks.
Another related setting is a 'late ack'. By default worker takes a task from the queue and immediately sends an 'acknowledge' signal, then broker removes this task from the queue. But this means that more messages will be prefetched for this worker. 'late ack' enabled tells worker to send acknowledge only after the task is completed.
This is just in two words. You may read more about prefetch and late ack.
As for the solution - just use these settings (celery 4.x):
task_acks_late = True
worker_prefetch_multiplier = 1
or for previous versions (2.x - 3.x):
CELERY_ACKS_LATE = True
CELERYD_PREFETCH_MULTIPLIER = 1
Also, starting the worker with parameter -Ofair does the same.
Related
In Celery using RabbitMQ, I have distributed workers running very long tasks on individual ec2 instances.
What is happening is that my concurrency is set to 2, -Ofair is enabled, and task_acks_late = True worker_prefetch_multiplier = 1 are set, but the Celery worker runs the 2 tasks in parallel, but then grabs a third task and doesn't run it. This leaves other workers with no tasks to run.
What i would like to happen is for the workers to only grab jobs when they can perform work on them. Allowing other workers that are free to grab the tasks and perform them.
Does anyone know how to achieve the result that I'm looking for? Attached below is an example of my concurrency being 2, and there being three jobs on the worker, where one is not yet acknowledged. I would like for there to be only two tasks there, and the other remain on the server until another worker can start them.
With celery, I have created listeners to Redis for getting all write events to Redis. Based on the events, I will trigger celery tasks to migrate data from Redis to DB.
I'm using the eventlet pool along with concurrency of 1000. Also, I'm having 5 celery queues for processing my data.
celery -A proj worker -l info -P eventlet -c 1000 -Q event_queue,vap_queue,client_queue,group_queue,ap_queue
Here, I'm facing the problem like, the listener is able to receive all the write events from Redis and workers are able to receive tasks from the listener. But, celery workers are delaying while processing huge number of data. (For example, I will be receiving 800 tasks per 10 seconds for each queue)
I have tried by increasing concurrency to higher values, changing the pool from eventlet to gevent and prefetch multiplier to 1. Still, My workers are delaying to complete a task.
Can anyone help to solve this? I'm new to celery actually :)
Some times concurrency is not the main factor in speeding up the task consumption.
When these tasks are processed.
Infact too much concurrency can lead to many context switches and slow down things, monitor your server CPU and memory to check if they are not getting overwhelmed by the tasks and find an optimum number.
For CPU bound task I will say prefer more worker than concurrent threads and for I/O bound tasks you can have concurrent threads
I have this worker whose runtime greatly varies from 10 seconds to up to an hour. I want to run this worker every five minutes. This is fine as long as the job finishes within five minutes. However, If the job takes longer Iron.io keeps enqueuing the same task over and over and a bunch of tasks of the same type accumulate while the worker is running.
Furthermore, it is crucial that the task may not run concurrently, so max concurrency for this worker is set to one.
So my question is: Is there a way to prevent Iron.io from enqueuing tasks of workers that are still running?
Answering my own question.
According to Iron.io support it is not possible to prevent IronWorker from enqueuing tasks of workers that are still running. For cases like mine it is better to have master workers that do the scheduling, i.e. creating/enqueuing tasks from script via one of the client libraries.
The best option would be to enqueue new task from the worker's code. For example, your task is running for 10 sec - 1 hour and enqueues itself at the end (last line of code). This will prevent the tasks from accumulating while the worker is running.
I might be misunderstanding how this works (which is why I'm asking), but I think when a celery worker consumes a task from RabbitMQ it puts a lock on it -- so to speak -- and then must acknowledge it completed that task onces it's done. So say I have 4 workers which all have the prefetch setting at 1 and queue of 6 tasks which take a long time. Once I start those workers and I run:
rabbitmqctl -q list_queues name messages messages_ready messages_unacknowledged
I'd expect to see something like:
celery 6 2 4
indicating that 4 tasks are running (but not yet acknowledged) and 2 are ready to be consumed.
I think my understanding is wrong because what I actually see is:
celery 2 0 2
So it's as if the acknowledging happens when a message is received by a worker, but before that worker finishes processing that task.
So to sum up, my question is, when does a celery worker acknowledge it has a task? It seems like it's once it receives that task and starts working on it, not when it completes working on it. Can someone confirm?
This is mentioned in the FAQ, but I can't blame you for not finding it:
http://docs.celeryproject.org/en/latest/faq.html#should-i-use-retry-or-acks-late
The default behavior of early ack is there because we don't want to enforce users
to write idempotent tasks.
I'm trying to implement a simple queue that performs one task at a time. Offloading tasks off the main thread using Celery and setting concurrency=1 in the Celery config works fine, but I might want to use more concurrent workers for other tasks.
Is there a way to tell Celery or RabbitMQ to not use multiple concurrent workers for a task (except by forcing concurrency=1)? I can't find anything in the documentation but maybe these tools are not designed for a linear queue?
Thanks!
I think what you need is a separate queue for each type of task. Create separate workers that consume from each queue, with concurrency set to 1.