Is there a reason for uneven task distribution with celery + redis? - redis

We used to have redis as the broker and rpc as the backend with our celery task queuing system. If we sent 50 tasks to 5 workers, each worker would reliably end up with 10 tasks.
Now that we switched to redis as the backend and broker, some workers get 20 tasks, some get 1, it seems to be completely arbitrary. No code has changed besides the backend + broker changing. Is there a reason for this?
The code I am running is very simple:
from time import sleep
def multiply(self, num_1, num_2):
sleep(50)
return num_1 * num_2
I am currently running it 1000 times across 30 nodes, and some workers are running 0 tasks and some are running 11. I really do not understand why some workers would have 0 tasks when there are still tasks in the queue waiting to be run.

Related

set slurm to distribute jobs across nodes in nextflow

I am running a nextflow pipeline on a 3-node cluster.
When I run the pipeline through slurm, it creates a high number of jobs, that I limit by using the executor.queueSize = X directive.
However, what slurm does is to saturate node 1, then saturate node 2, then starts sending jobs to node 3.
I'd like it to distribute the job list more evenly.
I've tried a number of slurm commands, including
--spread-job
--ntasks-per-core=5
--distribution=cyclic
-m cyclic=1
--distribution=plane=5
But none does what I want, which is just to assign 1 job to N1, then 1 to N2, then 1 to N3, then 1 to N1 again etc.
Any ideas please?
Thanks in advance for your help.
As a user, you do not decide how your independent jobs are allocated with respect to one another. The --spread-job and --distribution=cyclic options decide how the allocation for a single job is built, and how tasks are mapped onto that allocation.
To obtain the behaviour you want, the cluster must be configured with SelectTypeParameters=CR_LLN
This option leads to fragmented resource and makes it more difficult to schedule large jobs, so it often is not the default choice for clusters.

Work queue providing retries with increasing delays and the maximum number of attempts. Is a pure RabbitMQ solution possible?

I have repetitive tasks that I want to process with a number of workers (i.e., competing consumers pattern). The probability of failure during the task is fairly low so in case of such rare events, I would like to try again after a short period of time, say 1 second.
A sequence of consecutive failures is even less probable but still possible, so for a few initial retries, I would like to stick to a 1-second delay.
However, if the sequence of failures reaches some point, then the most likely there is some external reason that may cause these failures. So from that point, I would like to start extending the delay.
Let's say that the desired distribution of delays looks like this:
first appearance in the queue - no delay
retry 1 - 1 second
retry 2 - 1 second
retry 3 - 1 second
retry 4 - 5 second
retry 5 - 10 seconds
retry 6 - 20 seconds
retry 7 - 40 seconds
retry 8 - 80 seconds
retry 9 - 160 seconds
retry 10 - 320 seconds
another retry - drop the message
I have found a lot of information about DLXes (Dead Letter Exchanges) that can partially solve the problem. It appears to be easy to achieve an infinite number of retries with the same delay. At the same time, I haven't found a way to increase the delay or to stop after certain number of retries.
I'm looking for the purest RabbitMQ solution possible. However, I'm interested in anything that works.
There is a plugin available for this. I think you can use it to achieve what you need.
I've used it for something in a similar fashion for handling custom retries with dynamic delays.
RabbitMQ Delayed Message Plugin
Using a combination of DLXes and expire/TTL times, you can accomplish this except for the case when you want to change the redelivery time, for instance, implementing an exponential backoff.
The only way I could make it work using a pure RabbitMQ approach is to set the expire time to the smallest time needed and then use the x-death array to figure out how many times the message has been killed and then reject (ie. DLX it again) or ack the message accordingly.
Let's say you set expire time to 1 minute and you need to backoff 1 minute first time, then 5 minutes and then 30 minutes. This translates to x-death.count = 1, followed by 5 and then 30. Any other time you just reject the message.
Note that this can create lots of churn if you have many retry-messages. But if retries are rare, go for it.

Apache Nifi PutElasticsearch can wait forever to fill up batch size?

I am trying to write streaming data into elasticsearch with apache-nifi.putElasticSearch processor,
PutElasticSearch has property named "Batch Size", when I set this value to 1 all events are written to elasticsearch ASAP.
But such a low "batch size" obviously not working when the load is high. So in order to have a reasonable throughput I need to set it to 1000.
My question is, does PutElasticSearch waits till the batch size of events available. If yes it can wait hours when there are 999 events waiting on processor.
I am searching to understand how logstash doing same job on elasticsearch output plugin. There may be some flushing logic implemented based on time ( if events are waiting ~2 sec flush events to elasticsearch )..
You have any idea?
Edit: I just found logstash implemented this https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-idle_flush_time :)
How can I do same functionality on nifi
According to the code the batch size parameter is a maximum number of FlowFiles from the incoming queue.
For example in case of value batch size = 1000:
1/ if incoming queue contains 1001 flow files - only 1000 will be taken in one transaction.
2/ if incoming queue contains 999 flow files - 999 will be taken in one transaction.
And everything will be processed as soon as there is something in the incoming queue and there are available threads in nifi.
references:
PutElasticsearch.java
ProcessSession.java

How can I implement this single concurrency distributed queue in any MQ platform?

I am currently struggle to find a solution for implement a specific kind of queue, which require the following traits:
All queue must respect the order that job were added.
The whole queue will have a concurrency of 1, which means that there will only be one job execute at a time per queue, not worker.
There will be more than a few thousand queue like this.
It need to be distributed and be able to scale (example if I add a worker)
Basically it is a single process FIFO queue, and this is exactly what I want when tryout different message queue software like ActiveMQ or RabbitMQ, but as soon as I scale it to 2 worker, it just does not work since in this case I want it to scale and maintain exact same feature of single process queue. Below I attach the description of how it should work in a distributed environment with multiple worker.
Example of how the topology looks like: (Note that it's a many to many relationship between the Queue and Workers)
Example of how it would run:
+------+-----------------+-----------------+-----------------+
| Step | Worker 1 | Worker 2 | Worker 3 |
+------+-----------------+-----------------+-----------------+
| 1 | Fetch Q/1/Job/1 | Fetch Q/2/Job/1 | Waiting |
+------+-----------------+-----------------+-----------------+
| 2 | Running | Running | Waiting |
+------+-----------------+-----------------+-----------------+
| 3 | Running | Done Q/2/Job/1 | Fetch Q/2/Job/2 |
+------+-----------------+-----------------+-----------------+
| 4 | Done Q/1/Job/1 | Fetch Q/1/Job/2 | Running |
+------+-----------------+-----------------+-----------------+
| 5 | Waiting | Running | Running |
+------+-----------------+-----------------+-----------------+
Probably this is not the best representation but it show that, even in the Queue 1 and Queue 2, there are more jobs, but Worker 3 does not start fetching the next job until the previous one finish.
This is what I struggle to find a good solution.
I have tried a lot of other solution like rabbitMQ, activeMQ, apollo... These allow me to create thousand of queues, but all of them as i try out, will use worker 3 to run the next job in queue. And the concurrency is per worker
Are there any solution out there that can make this possible in any MQ platform, example ActiveMQ, RabbitMQ, ZeroMQ etc..?
Thank you :)
You can achieve this using Redis lists with an additional "dispatch" queue that all workers BRPOP on for their jobs. Each job in the dispatch queue is tagged with the original queue ID, and when the worker has completed the job it goes to this original queue and performs RPOPLPUSH onto the dispatch queue to make the next job available for any other worker. The dispatch queue will therefore have a maximum of num_queues elements.
One thing you'll have to handle is the initial population of the dispatch queue when the source queue is empty. This could just be a check done by the publisher against an "empty" flag for each queue that is set initially, and also set by the worker when there is nothing left in the original queue to dispatch. If this flag is set, the publisher can just LPUSH the first job directly onto the dispatch queue.

Cassandra: MigrationStage cannot keep up

We have 4 nodes and running tpstats shows big backlog for MigrationStage at all nodes and it's not able to reduce the queue over time. For example:
Pool Name Active Pending Completed Blocked All time blocked
MigrationStage 1 3946 17766 0 0
I don't see this going down ever and the other 3 servers have about 300 pending requests.
Is there a way to speed this up? Or is it possible to stop schema migration since most likely it's trying to migrate old keyspaces?
PS I Tried to drop keyspaces to reduce this (there are about 200 keyspace). However I always query timeout for that statement (select works). I assume this backlog is also blocking some schema DLL statements.