Processing of tasks in celery workers are getting delayed - redis

With celery, I have created listeners to Redis for getting all write events to Redis. Based on the events, I will trigger celery tasks to migrate data from Redis to DB.
I'm using the eventlet pool along with concurrency of 1000. Also, I'm having 5 celery queues for processing my data.
celery -A proj worker -l info -P eventlet -c 1000 -Q event_queue,vap_queue,client_queue,group_queue,ap_queue
Here, I'm facing the problem like, the listener is able to receive all the write events from Redis and workers are able to receive tasks from the listener. But, celery workers are delaying while processing huge number of data. (For example, I will be receiving 800 tasks per 10 seconds for each queue)
I have tried by increasing concurrency to higher values, changing the pool from eventlet to gevent and prefetch multiplier to 1. Still, My workers are delaying to complete a task.
Can anyone help to solve this? I'm new to celery actually :)

Some times concurrency is not the main factor in speeding up the task consumption.
When these tasks are processed.
Infact too much concurrency can lead to many context switches and slow down things, monitor your server CPU and memory to check if they are not getting overwhelmed by the tasks and find an optimum number.
For CPU bound task I will say prefer more worker than concurrent threads and for I/O bound tasks you can have concurrent threads

Related

How does concurrency per queue work in sidekiq?

If I have two queues, default and critical, and a concurrency set to 10, does that mean that each of the individual queues is going to run 10 concurrent threads to execute the jobs stored that are queued up in the Redis database?
Or maybe threads is the wrong terminology to use here?
Each Sidekiq process will create 10 threads. Concurrency is how many jobs each process can work on at the same time.
A queue is an array of jobs in Redis.

celery multiple workers but one queue

i am new to celery and redis.
I started up my redis server by using redis-server.
Celery was run using this parameter
celery -A proj worker
There are no other configurations. However, i realised that when i have a long running job in celery, it does not process another task that is in the queue until the long running task is completed. My understanding is that since i have 8 cores on my CPU, i should be able to process 8 tasks concurrently since the default parameter for -c is the number of cores?
Am i missing something here ?
Your problem is classical, everybode met this who had long-running tasks.
The root cause is that celery tries to optimize your execution flow reserving some tasks for each worker. But if one of these tasks is long-running the others get locked. It is known as 'prefetch count'. This is because by default celery set up for short tasks.
Another related setting is a 'late ack'. By default worker takes a task from the queue and immediately sends an 'acknowledge' signal, then broker removes this task from the queue. But this means that more messages will be prefetched for this worker. 'late ack' enabled tells worker to send acknowledge only after the task is completed.
This is just in two words. You may read more about prefetch and late ack.
As for the solution - just use these settings (celery 4.x):
task_acks_late = True
worker_prefetch_multiplier = 1
or for previous versions (2.x - 3.x):
CELERY_ACKS_LATE = True
CELERYD_PREFETCH_MULTIPLIER = 1
Also, starting the worker with parameter -Ofair does the same.

How to prevent ironworker from enqueuing tasks of workers that are still running?

I have this worker whose runtime greatly varies from 10 seconds to up to an hour. I want to run this worker every five minutes. This is fine as long as the job finishes within five minutes. However, If the job takes longer Iron.io keeps enqueuing the same task over and over and a bunch of tasks of the same type accumulate while the worker is running.
Furthermore, it is crucial that the task may not run concurrently, so max concurrency for this worker is set to one.
So my question is: Is there a way to prevent Iron.io from enqueuing tasks of workers that are still running?
Answering my own question.
According to Iron.io support it is not possible to prevent IronWorker from enqueuing tasks of workers that are still running. For cases like mine it is better to have master workers that do the scheduling, i.e. creating/enqueuing tasks from script via one of the client libraries.
The best option would be to enqueue new task from the worker's code. For example, your task is running for 10 sec - 1 hour and enqueues itself at the end (last line of code). This will prevent the tasks from accumulating while the worker is running.

Fetching tasks from many queues

I have 2 types of tasks, one that are generated by user and the other one that are created in huge batches. Tasks are going to separate queues ("short" and "long")
When there are tasks in one queue (ie. that huge batch), Celery is fetching only those tasks completely ignoring another queue until the whole batch is done.
Example:
send 100 slow tasks to "long" queue
send 100 small tasks to "short" queue
send 100 slow tasks to "long" queue
send 100 small tasks to "short" queue
Celery behaviour:
process 100 tasks form "long" queue
process 100 tasks form "short" queue
process 100 tasks form "long" queue
process 100 tasks form "short" queue
That is happening even when I set rate_limit for slow tasks, that goes to "long" queue. All I get is slow tasks are blocking system for longer :/
Is there a way to ensure that Celery is fetching tasks from all queues? (I'm using Celery 2.5.1 with RabbitMQ)
You can launch separate celery worker for each queue:
$ celeryd -Q short
$ celeryd -Q long
In that case each type of task will be processed independently by separate worker.

Simple queue with Celery and RabbitMQ

I'm trying to implement a simple queue that performs one task at a time. Offloading tasks off the main thread using Celery and setting concurrency=1 in the Celery config works fine, but I might want to use more concurrent workers for other tasks.
Is there a way to tell Celery or RabbitMQ to not use multiple concurrent workers for a task (except by forcing concurrency=1)? I can't find anything in the documentation but maybe these tools are not designed for a linear queue?
Thanks!
I think what you need is a separate queue for each type of task. Create separate workers that consume from each queue, with concurrency set to 1.