How does concurrency per queue work in sidekiq? - redis

If I have two queues, default and critical, and a concurrency set to 10, does that mean that each of the individual queues is going to run 10 concurrent threads to execute the jobs stored that are queued up in the Redis database?
Or maybe threads is the wrong terminology to use here?

Each Sidekiq process will create 10 threads. Concurrency is how many jobs each process can work on at the same time.
A queue is an array of jobs in Redis.

Related

RQ Worker Processing Jobs in a Batch

Say you have a RQ Queue with lots of jobs, which gets filled from various sources.
Those Jobs would be more efficiently processed in batches, eg pulling and processing 100 Jobs at a time.
How would you achieve this in RQ?
Would you need to write a custom Worker class to pull multiple jobs at once, or a custom Queue Class to batch jobs when they are given out, or some other approach?
Thanks
I think that Tasktiger, which is based on redis like RQ, can better fulfill your needs. From the README:
Batch queues
Batch queues can be used to combine multiple queued tasks into one. That way, your task function can process multiple sets of arguments at the same time, which can improve performance. The batch size is configurable.

Processing of tasks in celery workers are getting delayed

With celery, I have created listeners to Redis for getting all write events to Redis. Based on the events, I will trigger celery tasks to migrate data from Redis to DB.
I'm using the eventlet pool along with concurrency of 1000. Also, I'm having 5 celery queues for processing my data.
celery -A proj worker -l info -P eventlet -c 1000 -Q event_queue,vap_queue,client_queue,group_queue,ap_queue
Here, I'm facing the problem like, the listener is able to receive all the write events from Redis and workers are able to receive tasks from the listener. But, celery workers are delaying while processing huge number of data. (For example, I will be receiving 800 tasks per 10 seconds for each queue)
I have tried by increasing concurrency to higher values, changing the pool from eventlet to gevent and prefetch multiplier to 1. Still, My workers are delaying to complete a task.
Can anyone help to solve this? I'm new to celery actually :)
Some times concurrency is not the main factor in speeding up the task consumption.
When these tasks are processed.
Infact too much concurrency can lead to many context switches and slow down things, monitor your server CPU and memory to check if they are not getting overwhelmed by the tasks and find an optimum number.
For CPU bound task I will say prefer more worker than concurrent threads and for I/O bound tasks you can have concurrent threads

How to prevent ironworker from enqueuing tasks of workers that are still running?

I have this worker whose runtime greatly varies from 10 seconds to up to an hour. I want to run this worker every five minutes. This is fine as long as the job finishes within five minutes. However, If the job takes longer Iron.io keeps enqueuing the same task over and over and a bunch of tasks of the same type accumulate while the worker is running.
Furthermore, it is crucial that the task may not run concurrently, so max concurrency for this worker is set to one.
So my question is: Is there a way to prevent Iron.io from enqueuing tasks of workers that are still running?
Answering my own question.
According to Iron.io support it is not possible to prevent IronWorker from enqueuing tasks of workers that are still running. For cases like mine it is better to have master workers that do the scheduling, i.e. creating/enqueuing tasks from script via one of the client libraries.
The best option would be to enqueue new task from the worker's code. For example, your task is running for 10 sec - 1 hour and enqueues itself at the end (last line of code). This will prevent the tasks from accumulating while the worker is running.

celeryev Queue in RabbitMQ Becomes Very Large

I am using celery on rabbitmq. I have been sending thousands of messages to the queue and they are being processed successfully and everything is working just fine. However, the number of messages in several rabbitmq queues are growing quite large (hundreds of thousands of items in the queue). The queues are named celeryev.[...] (see screenshot below). Is this appropriate behavior? What is the purpose of these queues and shouldn't they be regularly purged? Is there a way to purge them more regularly, I think they are taking up quite a bit of disk space.
You can use the CELERY_EVENT_QUEUE_TTL celery option (only working with amqp), that will set the message expiry time, after which it will be deleted from the queue.
For anyone else who is running into problems with a celeryev queue becoming very large and threatening the disk space on your rabbitmq server, beware the accepted answer! Here's my suggestion. Just issue this command on your rabbitmq instance:
rabbitmqctl set_policy limit_celeryev_queues "^celeryev\." '{"max-length":1000000}' --apply-to queues
This will limit any queue beginning with "celeryev" to 1 Million entries. I did some experimenting with a stuck flower instance causing a runaway celeryev queue, and setting CELERY_EVENT_QUEUE_TTL / CELERY_EVENT_QUEUE_EXPIRES did not help control the queue size.
In my testing, I started a flower process, then SIGSTOP'ed it, and watched its celeryev queue start running away. Neither of these two settings helped at all. I confirmed SIGCONT'ing the flower process would bring the queue back to 0 rapidly. I am not certain why these two knobs didn't help, but it may have something to do with how RabbitMQ implements these two settings.
First, the Per-Message TTL corresponding to CELERY_EVENT_QUEUE_TTL only establishes an expiration time on each queue entry -- AIUI it will not automatically delete the message out of the queue to save space upon expiration. Second, the Queue TTL corresponding to CELERY_EVENT_QUEUE_EXPIRES says that it "... guarantees that the queue will be deleted, if unused for at least the expiration period". However, I believe that their definition of "unused" may be too strict to kick in for e.g. an overburdened, stuck, or killed flower process.
EDIT: Unfortunately, one problem with this suggestion is that the set_policy ... apply-to queues will only impact existing queues, and flower can and will create new queues which may overflow.
Celery use celeryev prefixed queues (and exchange) for monitoring, you can configure it as you want or disable at all (celery control disable_events).
You just have to set a config to your Celery.
If you want to avoid Celery from creating celeryev.* queues:
CELERY_SEND_EVENTS = False # Will not create celeryev.* queues
If you need these queues for monitoring purpose (CeleryFlower for instance), you may regularly purge them:
CELERY_EVENT_QUEUE_EXPIRES = 60 # Will delete all celeryev. queues without consumers after 1 minute.
The solution came from here: https://www.cloudamqp.com/docs/celery.html
You can limit the queue size in RabbitMQ with x-max-length queue declaration argument
http://www.rabbitmq.com/maxlength.html

Simple queue with Celery and RabbitMQ

I'm trying to implement a simple queue that performs one task at a time. Offloading tasks off the main thread using Celery and setting concurrency=1 in the Celery config works fine, but I might want to use more concurrent workers for other tasks.
Is there a way to tell Celery or RabbitMQ to not use multiple concurrent workers for a task (except by forcing concurrency=1)? I can't find anything in the documentation but maybe these tools are not designed for a linear queue?
Thanks!
I think what you need is a separate queue for each type of task. Create separate workers that consume from each queue, with concurrency set to 1.