Simple queue with Celery and RabbitMQ - rabbitmq

I'm trying to implement a simple queue that performs one task at a time. Offloading tasks off the main thread using Celery and setting concurrency=1 in the Celery config works fine, but I might want to use more concurrent workers for other tasks.
Is there a way to tell Celery or RabbitMQ to not use multiple concurrent workers for a task (except by forcing concurrency=1)? I can't find anything in the documentation but maybe these tools are not designed for a linear queue?
Thanks!

I think what you need is a separate queue for each type of task. Create separate workers that consume from each queue, with concurrency set to 1.

Related

Coordinate scheduled jobs between multiple producers

I have a distributed system of producers and consumers across several servers, with redundant nodes—both for failover and load-balancing. The nodes communicate via RabbitMQ messages.
Each producer runs its own scheduler to invoke jobs, which one of the consumers should run. This works by publishing the appropriate RabbitMQ message, that one of the consumers will process.
Now, the tricky part is, each job should be run only once. In short, my requirements are:
Only one invoke message per scheduled job should be processed (by any of the consumer instances)
If any of the procuders goes down, the job should still be invoked by the other instances
I can't figure out how to implement this without relying on anything else but RabbitMQ. I could make it work if there was such a thing as an "exclusive exchange", which only one producer can connect to at a time. I thought about making the consumers ignore any duplicate invokes for the same job, but this will not work, because due to the load-balancing, subsequent messages may be received by any of the other instances. Another idea was implementing a mechanism to declare one of the producers the "principal" node, so only this one is allowed to send invokes, but this basically presented the same problem of coordinating between instances.
Any ideas? Thanks in advance.

When to use - Delayed Job vs RabbitMQ

Can someone give me the clarity of the advantages of using RabbitMQ(message queue) instead of Delayed Job(background processing) ?
Basically I want to know when to use background processing and messaging queue ?
My web application has 3 components one main server which will handle all user requests and 2 app servers where all the background jobs(like es reindex, es record update, sending emails, crons) should be run.
I saw articles which say Database as a queue(delayed job) is very bad as the consumers will be polling the database for new jobs and updating the statuses of jobs which will lock the tables. Then how does rabbit MQ or other messaging queues store to avoid this problem.
There are other alternatives for delayed job like sidekiq which will run over redis instead of mysql. It is better to use sidekiq instead of rabbitmq?
And are there any advantages of using sidekiq over delayed job?
You have 2 workers and 1 web server: I guess your web app dispatches some delayed jobs to your workers. So you need a way to store the data related to those background jobs.
For that, you can use both a database (like Redis, this is what sidekick is doing) or a message queue (like RabbitMQ). A message queue is a specialized system that is very efficient for this use case (allowing a much higher throughput). A database would let you have a better introspection (as you can request the jobs table to see what your current situation is), while the queuing system would be more efficient but also is more a black box and will require new skills.
If you do not have performance issues, the simpler the better, even a simple mysql database should be enough. If you want a more powerful system or need a lot of monitoring you can also consider using a specialized hosted service such as zenaton (I'm founder) that will do all the heavy lifting for you, including scheduling or more sophisticated orchestration of your background jobs.
Both perform the same task, i.e executing jobs in the background, but go about it differently.
With delayed job one uses some sort of a database for storage, queries for the jobs thereafter then processes them. It's simple to set up but the performance and scalability aren't great.
RabbitMQ or its alternatives Redis e.t.c are harder to set up but their performance, flexibility and scalability is great, we are talking in the upwards of 5000 jobs per second besides you have tend to use less code.
Another option is to use task orchestration system like Cadence Workflow. It supports both delayed execution and queueing, but provides higher level programming model and tons of features that neither queues or delayed execution frameworks.
Cadence offers a lot of advantages over using queues for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
Built in distributed CRON
See the presentation that goes over Cadence programming model.

Is it possible to define priorities for Celery workers consuming from the same queue?

I have two machines on my network running Celery workers that process tasks from a common queue (the messaging back-end is RabbitMQ).
One machine is much more powerful and processes the tasks faster (which is important). If there is only one task in the queue, I always want it to run on this machine. If the queue is full, I want the less powerful machine to start accepting tasks as well.
Is there a recommended, elegant way to do this? Or do I have to set up two queues ("fast" and "slow") and implement some kind of router that sends tasks to the "slow" queue only when the "fast" queue is full?

Celery, zmq, message passing approach for a distributed system

I need to implement a system which does the following:
Transfer data to a remote place.
Once the data gets transferred fully, start a computation on the remote server.
Once the computation is done, fetch the resulting computed data back to the source.
A web interface to track/edit the progress of each task.
I am thinking of using:
1. Ruby on Rails for 4)
2. Celery as the distributed solution.
3. Zmq to pass messages across to RoR app and in between the different "categories" of workers within celery described below.
To decouple these components from each other, I'm considering having 3 sets of celery workers, each belonging to a separate category :-
A. 'Sync' workers,
B. 'Render' workers, and
C. 'Fetch' workers.
I wanna use zmq pub sub or broadcast model to pass messages around between these sets of workers and the web app so that they can be synchronised properly. For example B) should only kick in once A) is done. And C) should follow B).
Does this approach sound reasonable or can it be done better using perhaps just zmq or celery alone? Should instead of these I be using the celery back end like redis or amp?
Reasons I wanna use celery is of course data persistence as well as a web interface to monitor the workers.
I'm obviously relatively new to celery, zmq and distributed computation in general so any advice would be welcome.
Thanks all.
I have done something similar for work but it has all been done using rabbitmq and celery. The way I would approach this is have a celery worker running on the remote server and on the local host. Have each worker have it's own unique queue and fire off a chain something like
chain(sync.s(file), compute.s(), sync_back.s()).delay have the 2 sync tasks go to the localhost queue and the compute task go into the remote host queue

Do RabbitMQ, Beanstalk or Resque support scheduling a task at a certain date?

I've looked at RabbitMQ, Beanstalk and Resque, which all seem geared towards asynchronous, non-delayed tasks (i.e., run all of these as quickly as possible).
Do any of them support scheduling a task on a certain timestamp?
Beanstalk has provisions for a "delay" parameter whereby you can delay the message on a delay queue for a specific period of time.
Resque has one or more scheduling add-ons to it that will provide for scheduling tasks.
With queues, the delay is often an integer specifying the number of seconds to delay (in which case you'll need to convert to the delta you need). More robust scheduling -- as part of a task queue for example -- will often take datetime values via a client library.
Note that you can also use IronMQ push queues (with a delay like beanstalk) or IronWorker (scheduling a task instead of queuing it). (Note that I work for Iron.io.)
Deplayed_job does this:
Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now)
http://railscasts.com/episodes/171-delayed-job?view=asciicast