How to limit concurrent message consuming based on a criteria - rabbitmq

The scenario (I've simplified things):
Many end users can start jobs (heavy jobs, like rendering a big PDF for example), from a front end web application (producer).
The jobs are sent to a single durable RabbitMQ queue.
Many worker applications (consumers) processes those jobs and write the results back in a datastore.
This fairly standard pattern is working fine.
The problem: if a user starts 10 jobs in the same minute, and only 10 worker applications are up at that time of day, this end user is effectively taking over all the compute time for himself.
The question: How can I make sure only one job per end user is processed at any time ? (Bonus: some end users (admins for example) must not be throttled)
Also, I do not want the front end application to block end users from starting concurrent jobs. I just want the end users to wait for their concurrent jobs to finish one at a time.
The solution?: Should I dynamically create one auto-delete exclusive queue per end users ? If yes, how can I tell the worker applications to start consuming this queue ? How to ensure one (and only one) worker will consume from this queue ?

You would need to build something yourself to implement this as Dimos says. Here is an alternative implementation which requires an extra queue and some persistent storage.
As well as the existing queue for jobs, create a "processable job queue". Only jobs that satisfy your business rules are added to this queue.
Create a consumer (named "Limiter") for the job queue. The Limiter also needs persistent storage (e.g. Redis or a relational database) to record which jobs are currently processing. The limiter reads from the job queue and writes to the processable job queue.
When a worker application finishes processing a job, it adds a "job finished" event to the job queue.
------------ ------------ -----------
| Producer | -> () job queue ) -> | Limiter |
------------ ------------ -----------
^ |
| V
| ------------------------
| () processable job queue )
job finished | ------------------------
| |
| V
| ------------------------
\-----| Job Processors (x10) |
------------------------
The logic for the limiter is as follows:
When a job message is received, check the persistent storage to see if a job is already running for the current user:
If not, record the job in the storage as running and add the job message to the processable job queue.
If an existing job is running, record the job in the storage as a pending job.
If the job is for an admin user, always add it to the processable job queue.
When a "job finished" message is received, remove that job from the "running jobs" list in the persistent storage. Then check the storage for a pending job for that user:
If a job is found, change the status of that job from pending to running and add it to the processable job queue.
Otherwise, do nothing.
Only one instance of the limiter process can run at a time. This could be achieved either by only starting a single instance of the limiter process, or by using locking mechanisms in the persistent storage.
It's fairly heavyweight, but you can always inspect the persistent storage if you need to see what's going on.

Such a feature is not provided natively by rabbitMQ.
However, you could implement it in the following way. You will have to use polling though, which is not so efficient (compared to subscribing/publishing). You will also have to leverage Zookeeper for the coordination between the different workers.
You will create 2 queues: 1 high-priority queue (for the admin jobs) and 1 low-priority queue (for the normal user jobs). The 10 workers will be retrieving messages from both queues. Each worker will execute an infinite loop (with intervals of sleep ideally, when queues are empty), where it will attempt to retrieve a message from each queue interchangeably :
For the high-priority queue, the worker just retrieves a message, processes it and acknowledges to the queue.
For the low-priority queue, the worker attempts to hold a lock in Zookeeper (by writing to a specific file-znode), and if successful, then reads a message, processes it and acknowledges. If the zookeeper write was unsuccessful, someone else holds the lock, so this worker skips this step and repeats the loop.

Related

How to achieve dynamic fair processing between batch tasks?

Our use case is that our system supports scheduling multiple multi-channel send jobs at any time. Multi-channel send meaning send emails, send notifications, send sms etc.
How it currently works is we have a SQS queue per channel. Whenever a job is scheduled, it pushes all its send records to appropriate channel SQS. Any job scheduled later then pushes its own send records to appropriate channel SQS and so on. This leads to starvation of later scheduled jobs if the first scheduled job is high volume, as its records will be processed first from queue before reaching 2nd job records.
On consumer side, we have much lower processing rate than incoming as we can only do a fixed amount of sends per hour. So a high volume job could go on for a long time after being scheduled.
To solve the starvation problem, our first idea was to create 3 queues per channel, low, medium high volume and jobs would be submitted to queue as per their volume. Problem is if 2 or more same volume jobs come, then we still face this problem.
The only guaranteed way to ensure no starvation and fair processing seems like having a queue per job created dynamically. Consumers process from each queue at equal rate and processing bandwidth gets divided between jobs. High volume job might take long time to complete, but it wont choke processing for other jobs.
We could create the sqs queues dynamically for every job scheduled, but that will mean monitoring maybe 50+ queues at some point. Better choice seemed having a kinesis stream with multiple shards, where we would need to ensure every shard only contains single partition key that would identify a single job, I am not sure if that's possible though.
Are there any better ways to achieve this, so we can do fair processing and not starve any job?
If this is not the right community for such questions, please let me know.

Rabbitmq - How to limit only 2 consumer that is processed per end user?

My question is same this question: How to limit concurrent message consuming based on a criteria
However, I want to have X consumers to process messages at the time.
The scenario (I've simplified things):
Many end users can start jobs (heavy jobs, like rendering a big PDF for example), from a front end web application (producer).
The jobs are sent to a single durable RabbitMQ queue.
Many worker applications (consumers) processes those jobs and write the results back in a datastore.
This fairly standard pattern is working fine.
The problem: if a user starts 10 jobs in the same minute, and only 10 worker applications are up at that time of day, this end user is effectively taking over all the compute time for himself.
The question: How can I make sure only two job per end user is processed at any time ?
Also, I do not want the front end application to block end users from starting concurrent jobs. I just want the end users to wait for their concurrent jobs to finish one at a time.
The solution?: Should I dynamically create one auto-delete exclusive queue per end users ? If yes, how can I tell the worker applications to start consuming this queue ? How to ensure two worker will consume from this queue ?

Why does celery need a message broker?

As celery is a job queue/task queue, name illustrates that it can maintain its tasks and process them. Then why does it need a message broker like rabbitmq or redis?
Celery is a Distributed Task Queue that means that the system can reside across multiple computers (containers) across multiple locations with a single centralise bus
the basic architecture is as follows:
workers - processes that can take jobs (data) from the bus (task queue) and process it
*it can put the result back into the bus for farther processing by a different worker (create a processing flow)
bus - task queue, this is basically a db that store the jobs as messages, so the workers can retrieve them,
it's important to implement a concurrent and non blocking db, so when one process takes or puts job from/on the bus, it doesn't block other workers from getting/putting theirs jobs.
RabbitMQ, Redis, ActiveMQ Kafka and such are best candidates for this sort of behaviour
the bus has an api which let to submit jobs for workers and retrieve them (among more complex features)
most buses implement an ack/fail feature so workers can ack their job being done or if not ack (or report failure) this message can be served again to another worker, and might get processed successfully this time, thus no data is lost...(this depends highly on the fail over logic and the context of data as an input to a task)
Celery include a scheduler (beat) that periodically put specific jobs on the bus and thus create a periodically tasks
lets work with a scrapping example, you want to scrap the world, but china can only allow traffic from it's region and so is Europe and the USA
so you can build a workers and place them all over the world
you can use only one bus, lets say it's located in the usa, all other workers know this bus and can connect to it, so by placing a specific job (scrap china) on the bus located in the US, a process in china can work on it, hence distributed
of course, workers will increase the throughput of the system, only due to parallelism, unrelated to their geo location and this is the common case of using an event-driven architecture (i.e central bus, consumers and producers)
I suggest read the formal docs, it's pretty straight forward

Using RabbitMQ as Distributed broker - How to Serialize jobs per queue

Each Job in my system belongs to a specific userid and can be put in rabbitmq from multiple sources. My requirements:
No more than 1 job should be running per user at any given time.
Jobs for other users should not experience any delay because of job piling up for a specific user.
Each Job should be executed at least once. Each Job will have a max retries count and is re-inserted in queue (or probably delayed) with a delay if fails.
Maintaining Sequence of Jobs (per user) is desirable but not compulsory.
Jobs should probably be persisted, as I need them executing atleast once. There is no expiry time of jobs.
Any of the workers should be able to run jobs for any of the user.
With these requirements, I think maintaining a single queue for each individual user makes sense. I would also need all the workers watching all user queues and execute job for user, whose job is currently not running anywhere (ie, no more than 1 job per user)
Would this solution work using RabbitMQ in a cluster setup? Since the number of queues would be large, I am not sure each worker watching every user queue would cause significant overhead or not. Any help is appreciated.
As #dectarin has mentioned, having multiple workers listen to multiple job queues will make it hard to ensure that only one job per user is being executed.
I think it'd work better if the jobs go through a couple steps.
User submits job
Job gets queued per user until no other jobs are running
Coordinator puts job on the active job queue that is consumed by the workers
Worker picks up the job and executes it
Worker posts the results in a result queue
The results get sent to the user
I don't know how the jobs get submitted to the system, so it's hard to tell if actual per-user MessageQueues would be the best way to queue the waiting. If the jobs already sit in a mailbox, that might work as well, for instance. Or store the queued jobs in a database, as a bonus that'd allow you to write a little front end for users to inspect and manage their queued jobs.
Depending on what you choose, you can then find an elegant way to coordinate the single job per user constraint.
For instance, if the jobs sit in a database, the database keeps things synchronised and multiple coordinator workers could go through the following loop:
while( true ) {
if incoming job present for any user {
pick up first job from queue
put job in database, marking it active if no other active job is present
if job was marked active {
put job on active job queue
}
}
if result is present for any user {
pick up first result from result queue
send results to user
mark job as done in database
if this user has job waiting in database, mark it as active
if job was marked active {
put job on active job queue
}
}
}
Or if the waiting jobs sit in per-user message queues, transactions will be easier and a single Coordinator going through the loop won't need to worry about multi-threading.
Making things fully transactional across database and queues may be hard but need not be necessary. Introducing a pending state you should allow you to err on the side of caution making sure no jobs get lost if a step fails.

When does a celery worker acknowledge to RabbitMQ that it has a task?

I might be misunderstanding how this works (which is why I'm asking), but I think when a celery worker consumes a task from RabbitMQ it puts a lock on it -- so to speak -- and then must acknowledge it completed that task onces it's done. So say I have 4 workers which all have the prefetch setting at 1 and queue of 6 tasks which take a long time. Once I start those workers and I run:
rabbitmqctl -q list_queues name messages messages_ready messages_unacknowledged
I'd expect to see something like:
celery 6 2 4
indicating that 4 tasks are running (but not yet acknowledged) and 2 are ready to be consumed.
I think my understanding is wrong because what I actually see is:
celery 2 0 2
So it's as if the acknowledging happens when a message is received by a worker, but before that worker finishes processing that task.
So to sum up, my question is, when does a celery worker acknowledge it has a task? It seems like it's once it receives that task and starts working on it, not when it completes working on it. Can someone confirm?
This is mentioned in the FAQ, but I can't blame you for not finding it:
http://docs.celeryproject.org/en/latest/faq.html#should-i-use-retry-or-acks-late
The default behavior of early ack is there because we don't want to enforce users
to write idempotent tasks.