I'm writing a project which uses RabbitMQ for message passing. It has a producer that generate tasks at scheduled time and put them into RabbitMQ queue. Also I have a pool of workers that get tasks from there, process them and put them into another queue (exchange). I need to store results into database. So the question is should I use the same app (scheduler) that generate tasks or write separate one for this task? This is slightly simplified version of what I do but can you tell me some cons and pros for this?
I will use separate app. Since it will be two simple apps that are totally decoupled.
Related
If as admin I wanted to know from a particular queue A, how many calls initiated by which person and how many get dequeued, and how many are still in queue # any time.
I just want to develop one UI in my application to show those user-specific records from ActiveMQ.
There is no built in functionality in the broker that does this sort of thing. You could develop your own broker plugin that tracks these things but you'd need to build some sort of DB or other storage as you would lose any in-memory stats when a broker is restarted. You should use caution when trying to push all requirements into the message broker for system level management as that is not its purpose and will likely result in other issues when you do.
Can someone give me the clarity of the advantages of using RabbitMQ(message queue) instead of Delayed Job(background processing) ?
Basically I want to know when to use background processing and messaging queue ?
My web application has 3 components one main server which will handle all user requests and 2 app servers where all the background jobs(like es reindex, es record update, sending emails, crons) should be run.
I saw articles which say Database as a queue(delayed job) is very bad as the consumers will be polling the database for new jobs and updating the statuses of jobs which will lock the tables. Then how does rabbit MQ or other messaging queues store to avoid this problem.
There are other alternatives for delayed job like sidekiq which will run over redis instead of mysql. It is better to use sidekiq instead of rabbitmq?
And are there any advantages of using sidekiq over delayed job?
You have 2 workers and 1 web server: I guess your web app dispatches some delayed jobs to your workers. So you need a way to store the data related to those background jobs.
For that, you can use both a database (like Redis, this is what sidekick is doing) or a message queue (like RabbitMQ). A message queue is a specialized system that is very efficient for this use case (allowing a much higher throughput). A database would let you have a better introspection (as you can request the jobs table to see what your current situation is), while the queuing system would be more efficient but also is more a black box and will require new skills.
If you do not have performance issues, the simpler the better, even a simple mysql database should be enough. If you want a more powerful system or need a lot of monitoring you can also consider using a specialized hosted service such as zenaton (I'm founder) that will do all the heavy lifting for you, including scheduling or more sophisticated orchestration of your background jobs.
Both perform the same task, i.e executing jobs in the background, but go about it differently.
With delayed job one uses some sort of a database for storage, queries for the jobs thereafter then processes them. It's simple to set up but the performance and scalability aren't great.
RabbitMQ or its alternatives Redis e.t.c are harder to set up but their performance, flexibility and scalability is great, we are talking in the upwards of 5000 jobs per second besides you have tend to use less code.
Another option is to use task orchestration system like Cadence Workflow. It supports both delayed execution and queueing, but provides higher level programming model and tons of features that neither queues or delayed execution frameworks.
Cadence offers a lot of advantages over using queues for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
Built in distributed CRON
See the presentation that goes over Cadence programming model.
Let's say that I have a pipeline of commands that need to be executed sequentially and that some of the commands contain multiple operations that should be executed in parallel (same correlation id). And let's assume that I need to know when all parallel operations are executed in order to proceed with execution further in the pipeline.
Is it possible to achieve this kind of orchestration with RabbitMQ alone by using exchanges and queues without usage of external data sources like database ?
I am interested in the following use case:
I have just published 3 messages of the same type on the same queue. Those messages are being processed in parallel. I would like to publish a new message only when all messages of the same correlation ID are finished successfully.
Is there a way to achieve this with RabbitMQ ?
It sounds like you could use the scatter-gather pattern. This explains that pattern quite well with a diagram:
http://www.enterpriseintegrationpatterns.com/patterns/messaging/BroadcastAggregate.html
And here is a tutorial on how to implement with RabbitMQ:
http://geekswithblogs.net/michaelstephenson/archive/2012/08/06/150373.aspx
I'm using RabbitMQ as a message queue in a service-oriented architecture, where many separate web services publish messages bound for RabbitMQ queues. Those queues are in turn subscribed to by various consumers, which perform background work; a pretty vanilla use-case for RabbitMQ.
Now I'd like to change some of the queue parameters (specifically, I'd like to bind queues to a new dead-letter exchange with a certain routing key). My problem is that making this change in place on a production system is problematic for a couple reasons.
Whats the best way for me to transition to these new queues without losing messages in a production system?
I've considered everything from versioning queue names to making a new vhost with the new settings to doing all the changes in place.
Here are some of the problems I'm facing:
Because RabbitMQ queues are idempotent, the disparate web services have been declaring the queues before publishing to them (in case they don't already exist). Once you change the queue parameters (but maintain the same routing key), the queue declare fails and RabbitMQ closes the channel.
I'd like to not lose messages when changing a queue (here I'm planning on subscribing an exclusive consumer that saves the messages and then republishes to the new queue).
General coordination between disparate publishers and the consumer base (or, even better, a way to avoid needing to coordinate them).
Queues bindings can be added and removed at runtime without any impact on clients, unless clients manually modify bindings. So if your question only about bindings just change them via CLI or web management panel and skip what written below.
It's a common problem to make back-incompatible changes, especially in heterogeneous environment, especially when multiple applications attempts to declare same entity in their own way (with their specific settings). There are no easy way to change queue declaration at the same time in multiple applications and it highly depends on how whole working process organized, how critical your apps are, what is your infrastructure and etc.
Fast and dirty way:
While the publishers doesn't deals with queues declaration and bindings (at least they should not do that), you can focus on consumers. Wrapping queues declaration in try-except block may be the fast and dirty choice. Also most projects, even numerous can survive small downtime, so you can block rabbitmq user in one shell, alter queue as you wish (create new one and make your consumers use it instead of old one) and then unblock user and let consumers works as before (your workers are under supervisor or monit, right?). Then migrate manually messages from old queue to new one.
Fast and safe solution:
Is is a bit tricky and based on a hack how to migrate messages from one queue to another inside single vhost. The whole solution works inside single vhost but requires extra queue for every queue you want to modify. Set up Dead Letter Exchanges on source queue and point it to route expired messages to your new target queue. Then apply Per-Queue Message TTL to source queue, set x-message-ttl=0 (to it's minimal value, see No Queueing at all note about immediate delivery). Both actions can be done via CLI or management panel and can be done on already declared queue. In this way your publishers can publish messages as usual and even old consumers can work as expected for the first time, but in parallel new consumers can consume from new queue which can be pre-declared with new args manually or in other way.
Note, that on queues with large messages number and huge messages flow there are some risks to met flow control limits, especially if your server utilize almost all of it resources.
Much more complicated but safer approach (for cases when whole messages workflow logic changed):
Make all necessary changes to applications and run new codebase in parallel to existing one, but on the different RabbitMQ vhost (or even use separate server, it depends on your applications load and hardware). Actually, it may be possible to run on the same vhost but change exchanges and queues name, but it even doesn't sound good and smells even in written form. After you set up new apps, switch them with old one and run messages migration from old queues to new one (or just let old system empty the queues). It guaranties seamless migration with minimal downtime. If you have your deployment automatized, whole process will not takes too much efforts.
P.S.: in any case above, if you can, let old consumers to empty queues so you don't need to migrate messages manually.
Update:
You may find very useful Shovel plugin, especially Dynamic Shovels to move messages between exchanges and queues, even between different vhosts and servers. It's the fastest and safest way to migrate messages between queues/exchanges.
My website is hosted on AWS Elastic Beanstalk (PHP). I use Yii Framework as an MVC.
A while ago I wanted to run a SQL query everyday. I looked up how to run crons on Beanstalk and it seemed complicated to merge the concepts of Cloud and Cron. I ran into Iron Worker (http://www.iron.io/worker), and managed to create a worker that is currently doing its job fine.
Today I want to run a more complex cron (Look for notifications in my database, decide whether to send an email, build an email template and send the email (via AWS SES).
From what I understand, worker files are supposed to be self-contained items, with everything they need to work.
However, I have invested a lot of time and effort in building my MVC. I have complex models, verifications, an email templating engine, etc...
It seems very difficult to use the work I've done to create an Iron Worker. Even if I managed to port all of my code to a worker (which seems like a great deal of work), it means anytime I make changes to my main code I need to make sure the worker also has those changes. It means I would have a "branch" of my code. Even more so if I want to create more workers in the future.
What is the correct approach?
Short-term, you could likely just use the scheduling capabilities in IronWorker and have the worker hit an endpoint in your application. The endpoint will then trigger the operations to run within your app environment.
Longer-term, we do suggest you look at more of a service-oriented approach whereby you break your application up to be more loose-coupled and distributed. Here's a post on the subject. The advantages are many especially around scalability and development agility.
https://blog.heroku.com/archives/2013/12/3/end_monolithic_app
You can also take a look at this YII addition.
http://www.yiiframework.com/extension/yiiron/
Certainly don't want you rewrite your app unnecessarily but there are likely areas where you can look to decouple. Suggest creating a worker directory and making efforts to write the workers to be self-contained. In that way, you could run them in a different environment and just pass payloads to the worker. (Push queues can also be used to push to these workers.) Once you get used to distributed async processing, it's a pretty easy process to manage.
(Note: I work at Iron.io)