Using RabbitMQ as Distributed broker - How to Serialize jobs per queue - rabbitmq

Each Job in my system belongs to a specific userid and can be put in rabbitmq from multiple sources. My requirements:
No more than 1 job should be running per user at any given time.
Jobs for other users should not experience any delay because of job piling up for a specific user.
Each Job should be executed at least once. Each Job will have a max retries count and is re-inserted in queue (or probably delayed) with a delay if fails.
Maintaining Sequence of Jobs (per user) is desirable but not compulsory.
Jobs should probably be persisted, as I need them executing atleast once. There is no expiry time of jobs.
Any of the workers should be able to run jobs for any of the user.
With these requirements, I think maintaining a single queue for each individual user makes sense. I would also need all the workers watching all user queues and execute job for user, whose job is currently not running anywhere (ie, no more than 1 job per user)
Would this solution work using RabbitMQ in a cluster setup? Since the number of queues would be large, I am not sure each worker watching every user queue would cause significant overhead or not. Any help is appreciated.

As #dectarin has mentioned, having multiple workers listen to multiple job queues will make it hard to ensure that only one job per user is being executed.
I think it'd work better if the jobs go through a couple steps.
User submits job
Job gets queued per user until no other jobs are running
Coordinator puts job on the active job queue that is consumed by the workers
Worker picks up the job and executes it
Worker posts the results in a result queue
The results get sent to the user
I don't know how the jobs get submitted to the system, so it's hard to tell if actual per-user MessageQueues would be the best way to queue the waiting. If the jobs already sit in a mailbox, that might work as well, for instance. Or store the queued jobs in a database, as a bonus that'd allow you to write a little front end for users to inspect and manage their queued jobs.
Depending on what you choose, you can then find an elegant way to coordinate the single job per user constraint.
For instance, if the jobs sit in a database, the database keeps things synchronised and multiple coordinator workers could go through the following loop:
while( true ) {
if incoming job present for any user {
pick up first job from queue
put job in database, marking it active if no other active job is present
if job was marked active {
put job on active job queue
}
}
if result is present for any user {
pick up first result from result queue
send results to user
mark job as done in database
if this user has job waiting in database, mark it as active
if job was marked active {
put job on active job queue
}
}
}
Or if the waiting jobs sit in per-user message queues, transactions will be easier and a single Coordinator going through the loop won't need to worry about multi-threading.
Making things fully transactional across database and queues may be hard but need not be necessary. Introducing a pending state you should allow you to err on the side of caution making sure no jobs get lost if a step fails.

Related

Prevent Celery From Grabbing One More Task Than Concur

In Celery using RabbitMQ, I have distributed workers running very long tasks on individual ec2 instances.
What is happening is that my concurrency is set to 2, -Ofair is enabled, and task_acks_late = True worker_prefetch_multiplier = 1 are set, but the Celery worker runs the 2 tasks in parallel, but then grabs a third task and doesn't run it. This leaves other workers with no tasks to run.
What i would like to happen is for the workers to only grab jobs when they can perform work on them. Allowing other workers that are free to grab the tasks and perform them.
Does anyone know how to achieve the result that I'm looking for? Attached below is an example of my concurrency being 2, and there being three jobs on the worker, where one is not yet acknowledged. I would like for there to be only two tasks there, and the other remain on the server until another worker can start them.

How to achieve dynamic fair processing between batch tasks?

Our use case is that our system supports scheduling multiple multi-channel send jobs at any time. Multi-channel send meaning send emails, send notifications, send sms etc.
How it currently works is we have a SQS queue per channel. Whenever a job is scheduled, it pushes all its send records to appropriate channel SQS. Any job scheduled later then pushes its own send records to appropriate channel SQS and so on. This leads to starvation of later scheduled jobs if the first scheduled job is high volume, as its records will be processed first from queue before reaching 2nd job records.
On consumer side, we have much lower processing rate than incoming as we can only do a fixed amount of sends per hour. So a high volume job could go on for a long time after being scheduled.
To solve the starvation problem, our first idea was to create 3 queues per channel, low, medium high volume and jobs would be submitted to queue as per their volume. Problem is if 2 or more same volume jobs come, then we still face this problem.
The only guaranteed way to ensure no starvation and fair processing seems like having a queue per job created dynamically. Consumers process from each queue at equal rate and processing bandwidth gets divided between jobs. High volume job might take long time to complete, but it wont choke processing for other jobs.
We could create the sqs queues dynamically for every job scheduled, but that will mean monitoring maybe 50+ queues at some point. Better choice seemed having a kinesis stream with multiple shards, where we would need to ensure every shard only contains single partition key that would identify a single job, I am not sure if that's possible though.
Are there any better ways to achieve this, so we can do fair processing and not starve any job?
If this is not the right community for such questions, please let me know.

Rabbitmq - How to limit only 2 consumer that is processed per end user?

My question is same this question: How to limit concurrent message consuming based on a criteria
However, I want to have X consumers to process messages at the time.
The scenario (I've simplified things):
Many end users can start jobs (heavy jobs, like rendering a big PDF for example), from a front end web application (producer).
The jobs are sent to a single durable RabbitMQ queue.
Many worker applications (consumers) processes those jobs and write the results back in a datastore.
This fairly standard pattern is working fine.
The problem: if a user starts 10 jobs in the same minute, and only 10 worker applications are up at that time of day, this end user is effectively taking over all the compute time for himself.
The question: How can I make sure only two job per end user is processed at any time ?
Also, I do not want the front end application to block end users from starting concurrent jobs. I just want the end users to wait for their concurrent jobs to finish one at a time.
The solution?: Should I dynamically create one auto-delete exclusive queue per end users ? If yes, how can I tell the worker applications to start consuming this queue ? How to ensure two worker will consume from this queue ?

RabbitMQ - allow only one process per user

To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!
The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.
If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.

How to limit concurrent message consuming based on a criteria

The scenario (I've simplified things):
Many end users can start jobs (heavy jobs, like rendering a big PDF for example), from a front end web application (producer).
The jobs are sent to a single durable RabbitMQ queue.
Many worker applications (consumers) processes those jobs and write the results back in a datastore.
This fairly standard pattern is working fine.
The problem: if a user starts 10 jobs in the same minute, and only 10 worker applications are up at that time of day, this end user is effectively taking over all the compute time for himself.
The question: How can I make sure only one job per end user is processed at any time ? (Bonus: some end users (admins for example) must not be throttled)
Also, I do not want the front end application to block end users from starting concurrent jobs. I just want the end users to wait for their concurrent jobs to finish one at a time.
The solution?: Should I dynamically create one auto-delete exclusive queue per end users ? If yes, how can I tell the worker applications to start consuming this queue ? How to ensure one (and only one) worker will consume from this queue ?
You would need to build something yourself to implement this as Dimos says. Here is an alternative implementation which requires an extra queue and some persistent storage.
As well as the existing queue for jobs, create a "processable job queue". Only jobs that satisfy your business rules are added to this queue.
Create a consumer (named "Limiter") for the job queue. The Limiter also needs persistent storage (e.g. Redis or a relational database) to record which jobs are currently processing. The limiter reads from the job queue and writes to the processable job queue.
When a worker application finishes processing a job, it adds a "job finished" event to the job queue.
------------ ------------ -----------
| Producer | -> () job queue ) -> | Limiter |
------------ ------------ -----------
^ |
| V
| ------------------------
| () processable job queue )
job finished | ------------------------
| |
| V
| ------------------------
\-----| Job Processors (x10) |
------------------------
The logic for the limiter is as follows:
When a job message is received, check the persistent storage to see if a job is already running for the current user:
If not, record the job in the storage as running and add the job message to the processable job queue.
If an existing job is running, record the job in the storage as a pending job.
If the job is for an admin user, always add it to the processable job queue.
When a "job finished" message is received, remove that job from the "running jobs" list in the persistent storage. Then check the storage for a pending job for that user:
If a job is found, change the status of that job from pending to running and add it to the processable job queue.
Otherwise, do nothing.
Only one instance of the limiter process can run at a time. This could be achieved either by only starting a single instance of the limiter process, or by using locking mechanisms in the persistent storage.
It's fairly heavyweight, but you can always inspect the persistent storage if you need to see what's going on.
Such a feature is not provided natively by rabbitMQ.
However, you could implement it in the following way. You will have to use polling though, which is not so efficient (compared to subscribing/publishing). You will also have to leverage Zookeeper for the coordination between the different workers.
You will create 2 queues: 1 high-priority queue (for the admin jobs) and 1 low-priority queue (for the normal user jobs). The 10 workers will be retrieving messages from both queues. Each worker will execute an infinite loop (with intervals of sleep ideally, when queues are empty), where it will attempt to retrieve a message from each queue interchangeably :
For the high-priority queue, the worker just retrieves a message, processes it and acknowledges to the queue.
For the low-priority queue, the worker attempts to hold a lock in Zookeeper (by writing to a specific file-znode), and if successful, then reads a message, processes it and acknowledges. If the zookeeper write was unsuccessful, someone else holds the lock, so this worker skips this step and repeats the loop.