RabbitMQ architecture for jobs with results back - rabbitmq

I have trying to create a system that people can submit files to sign. Each submission can have multiple files and the results will be sent back to the user if once all the file in each submission is done.
Currently I have 6 signing boxes that signs one single file at a time. So for each submission I need to separate all the files and load balancing them among the 6 signing boxes, and after that I need to fetch the results and keep track of whether all the files in this submission have been completed.
If possible, I want to also limit the maximum number of signing boxes that each submission can use so that a big submission does not starve the other jobs.
My current set up is that I have a submission queue and a few submission queue consumers will consume message from it one by one, break the submission into individual files and send them to a file queue. And the 6 signing boxes will be the 6 consumers for the file queue. I am having trouble writing results back the the submission consumer which keep track of whether all the files are done for this submission, neither can I control the maximum number of signing boxes used for each job.
I looked at the RPC setup for rabbitmq which allows the results to be written back. The problem is rpc call is blocking for each file, and I want multiple files for the same submission to be able to sign in parallel. So I need some suggestions on the rabbitmq architecture and message flow for this type of scenario.

1) Assuming that you know the # of files, assign each submission a job id
2) A job can be composed of multiple tasks.
3) Calculate a priority which naively can be an inverse of the no of documents. So a document with more # of documents by default has a lower priority. Your application should have the bells and whistles to tweak the priority, just in case a bigger job needs to be processed with high priority.
4) Instead of coupling the job submitter and signer, why not use an event based mechanism that emits events on sign success and failure or currently processing for each task.
The submitter would persist the job and task details and the signer would update them as they get processed. The submitter would take into account the priority and no of tasks currently executing before submitting any further tasks. The logic to make that call is left to you.

Related

To be sure about concurrency, same group of works in multiple queues (FIFO)

I have a question about multi consumer concurrency.
I want to send works to rabbitmq that comes from web request to distributed queues.
I just want to be sure about order of works in multiple queues (FIFO).
Because this request comes from different users eech user requests/works must be ordered.
I have found this feature with different names on Azure ServiceBus and ActiveMQ message grouping.
Is there any way to do this in pretty RabbitMQ ?
I want to quaranty that customer's requests must be ordered each other.
Each customer may have multiple requests but those requests for that customer must be processed in order.
I desire to process quickly incoming requests with using multiple consumer on different nodes.
For example different customers 1 to 1000 send requests over 1 millions.
If I put this huge request in only one queue it takes a lot of time to consume. So I want to share this process load between n (5) node. For customer X 's requests must be in same sequence for processing
When working with event-based systems, and especially when using multiple producers and/or consumers, it is important to come to terms with the fact that there usually is no such thing as a guaranteed order of events. And to get a robust system, it is also wise to design the system so the message handlers are idempotent; they should tolerate to get the same message twice (or more).
There are way to many things that may (and actually should be allowed to) interfere with the order;
The producers may deliver the messages in a slightly different pace
One producer might miss an ack (due to a missed package) and will resend the message
One consumer may get and process a message, but the ack is lost on the way back, so the message is delivered twice (to another consumer).
Some other service that your handlers depend on might be down, so that you have to reject the message.
That being said, there is one pattern that servicebus-systems like NServicebus use to enforce the order messages are consumed. There are some requirements:
You will need a centralized storage (like a sql-server or document store) that allows for conditional updates; for instance you want to be able to store the sequence number of the last processed message (or how far you have come in the process), but only if the already stored sequence/progress is the right/expected one. Storing the user-id and the progress even for millions of customers should be a very easy operation for most databases.
You make sure the queue is configured with a dead-letter-queue/exchange for retries, and then set your original queue as a dead-letter-queue for that one again.
You set a TTL (for instance 30 seconds) on the retry/dead-letter-queue. This way the messages that appear on the dead-letter-queue will automatically be pushed back to your original queue after some timeout.
When processing your messages you check your storage/database if you are in the right state to handle the message (i.e. the needed previous steps are already done).
If you are ok to handle it you do and update the storage (conditionally!).
If not - you nack the message, so that it is thrown on the dead-letter queue. Basically you are saying "nah - I can't handle this message, there are probably some other message in the queue that should be handled first".
This way the happy-path is to process a great number of messages in the right order.
But if something happens and a you get a message out of band, you will throw it on the retry-queue (the dead-letter-queue) and Rabbit will make sure it will get back in the queue to be retried at a later stage. But only after a delay.
The beauty of this is that you are able to handle most of the situations that may interfere with processing the message (out of order messages, dependent services being down, your handler being shut down in the middle of handling the message) in exact the same way; by rejecting the message and letting your infrastructure (Rabbit) take care of it being retried after a while.
(Assuming the OP is asking about things like ActiveMQs "message grouping:)
This isn't currently built in to RabbitMQ AFAIK (it wasn't as of 2013 as per this answer) and I'm not aware of it now (though I haven't kept up lately).
However, RabbitMQ's model of exchanges and queues is very flexible - exchanges and queues can be easily created dynamically (this can be done in other messaging systems but, for example, if you read ActiveMQ documentation or Red Hat AMQ documentation you'll find all of the examples in the user guides are using pre-declared queues in configuration files loaded at system startup - except for RPC-like request/response communication).
Also it is very easy in RabbitMQ for a consumer (i.e., message consuming thread) to consume from multiple queues.
So you could build, on top of RabbitMQ, a system where you got your desired grouping semantics.
One way would be to create dynamic queues: The first time a customer order was seen or a new group of customer orders a queue would be created with a unique name for all messages for that group - that queue name would be communicated (via another queue) to a consumer who's sole purpose was to load-balance among other consumers that were responsible for handling customer order groups. I.e., the load-balancer would pull off of its queue a message saying "new group with queue name XYZ" and it would find in a pool of order group consumer a consumer which could take this load and pass it a message saying "start listening to XYZ".
Another way to do it is with pub/sub and topic routing - each customer order group would get a unique topic - and proceed as above.
RabbitMQ Consistent Hash Exchange Type
We are using RabbitMQ and we have found a plugin. It use Consistent Hashing algorithm to distribute messages in order to consistent keys.
For more information about Consistent Hashing ;
https://en.wikipedia.org/wiki/Consistent_hashing
https://www.youtube.com/watch?v=viaNG1zyx1g
You can find this plugin from rabbitmq web page
plugin : rabbitmq_consistent_hash_exchange
https://www.rabbitmq.com/plugins.html

Select consumers before publishing a message rabbitmq

I am trying to build a system where I need to select next available and suitable consumer to send a message from a queue (or may be any other solution not using the queue)
Requirements
We have multiple publishers/clients who would send objects (images) to process on one side and multiple Analysts who would process them, once processed the publisher should get the corresponding response.
The publishers do not care which Analyst is going to process the data.
Users have a web app where they can map each client/publisher to one or more or all agents, say for instance if Publisher P1 is mapped to Agents A & B, all objects coming from P1 can be processed by Agent A or Agent B. Note: an object can only be processed by one agent only.
Depending on the mapping I should have a middleware which consumes the messages from all publishers and distributes to the agents
Solution 1
My initial thoughts were to have a queue where all publishers post their messages. Another queue where Agents publish message saying they are waiting to process an object.
A middleware picks the message, gets the possible list of agents it can send the message to (from cached database) and go through the agents queue to find the next suitable and available agent and publish the message to that agent.
The issue with this solution is if I have agents queue like a,b,c,d and the message I receive can only be processed by agent b I will be rejecting agents d & c and they would end up at the tail of the queue and I have around 180 agents so they might never be picked or if the next message can only be processed by agent d (for example) we have to reject all the agents to get there
Solution 2
First bit from publishers to middleware is still the same
Have a scaled fast nosql database where agents add a record to notify there availability. Basically a key value pair
The middleware gets config from cache and gets the next available + suitable agent from the nosql database sends message to the agent's queue (through direct exchange) and updates the nosql to set isavailable false ad gets the next message.
Issue with this solution is the db and middleware can become a bottleneck, also if I scale the middleware I will end up in database concurrency issues for example f I have two copies of middleware running and each recieves a message which can be proceesed by Agents A & B and both agents are available.
The two middleware copies would query the db and might get A as availble and end up sneding both messages to A while B is still waiting for a message to process.
I will have around 100 publishers and 180 agents to start with.
Any ideas how to improve these solutions or any other feasible solution would be highly appreciated?
Depending on this I also need to figure out how the Agent would send response back to the publisher.
Thank you
I'll answer this from the perspective the perspective of my open-source service bus: Shuttle.Esb
Typically one would ignore any content-based routing and simply have a distributor pattern. All message go to the primary endpoint and it will distribute the messages. However, if you decide to stick to these logical groupings you could have primary endpoints for each logical grouping (per agent group). You would still have the primary endpoint but instead of having worker endpoints mapped to agents you would have agent groupings map to the logical primary endpoint with workers backing that.
Then in the primary endpoint you would, based on your content (being the agent identifier), forward the message to the relevant logical primary endpoint. All the while you keep track of the original sender. In the worker you would then send a message back to the queue of the original sender.
I'm sure you could do pretty much the same using any service bus.
I see several requirements in here, that can be boiled down to a few things, I think:
publisher does not care which agent processes the image
publisher needs to know when the image processing is done
agent can only process 1 image at a time
agent can only process certain images
are these assumptions correct? did I miss anything important?
if not, then your solution is pretty much built into RabbitMQ with routing and queues. there should be no need to build custom middle-tier service to manage this.
With RabbitMQ, you can have a consumer set to only process 1 message at a time. The consumer sets it's "prefetch" limit to 1, and retrieves a message from the queue with "no ack" set to false - meaning, it must acknowledge the message when it is done processing it.
To consume only messages that a particular agent can handle, use RabbitMQ's routing capabilities with multiple queues. The queues would be created based on the type of image or some other criteria by which the consumers can select images.
For example, if there are two types of images: TypeA and TypeB, you would have 2 queues - one for TypeA and one for TypeB.
Then, if Agent1 can only handle TypeA images, it would only consume from the TypeA queue. If Agent2 can handle both types of images, it would consume from both queues.
To put the right images in the right queue, the publisher would need to use the right routing key. If you know if the image type (or whatever the selection criteria is), you would change the routing key on the publisher side to match that selection criteria. The routing in RabbitMQ would be set up to move messages for TypeA into the TypeA queue, etc.
The last part is getting a response on when the image is done processing. That can be accomplished through RabbitMQ's "reply to" field and related code. The gist of it is that the publisher has it's own exclusive queue. When it publishes a message, it includes the name of it's exclusive queue in the "reply to" header of the message. When the agent finishes processing the image, it sends a status update message back through the queue found in the "reply to" header. That status update message tells the producer the status of the request.
From a RabbitMQ perspective, these pieces can be put together using the examples and documentation found here:
http://www.rabbitmq.com/getstarted.html
Look at these specifically:
Work Queues: http://www.rabbitmq.com/tutorials/tutorial-two-python.html
Topics: http://www.rabbitmq.com/tutorials/tutorial-five-python.html
RPC (aka Request/Response): http://www.rabbitmq.com/tutorials/tutorial-six-python.html
You'll find examples in many languages, in these docs.
I also cover most of these scenarios (and others) in my RabbitMQ Patterns eBook
Since the total number of senders and receivers are only hundreds, how about to create one queue for each of your senders. Based on your sender receiver mapping, receivers subscribes to the sender queues (update the subscribing on mapping changes). You could configure your receiver to only receive the next message from all the queues it subscribes (in a random way) when it finishes processing one message.

Queue Fairness and Messaging Servers

I'm looking to solve a problem that I have with the FIFO nature of messaging severs and queues. In some cases, I'd like to distribute the messages in a queue to the pool of consumers on a criteria other than the message order it was delivered in. Ideally, this would prevent users from hogging shared resources in the system. Take this overly simplified scenario:
There is a feature within an application where a user can empty their trash can.
This event dispatches a DELETE message for each item in trash can
The consumers for this queue invoke a web service that has a rate limited API.
Given that each user can have very large volumes of messages in their trash can, what options do we have to allow concurrent processing of each trash can without regard to the enqueue time? It seems to me that there are a few obvious solutions:
Create a separate queue and pool of consumers for each user
Randomize the message delivery from a single queue to a single pool of consumers
In our case, creating a separate queue and managing the consumers for each user really isn't practical. It can be done but I think I really prefer the second option if it's reasonable. We're using RabbitMQ but not necessarily tied to it if there is a technology more suited to this task.
I'm entertaining the idea of using Rabbit's message priorities to help randomize delivery. By randomly assigning a message a priority between 1 and 10, this should help distribute the messages. The problem with this method is that the messages with the lowest priority may be stuck in the queue forever if the queue is never completely emptied. I thought I could use a TTL on the message and then re-queue the message with an escalated priority but I noticed this in the docs:
Messages which should expire will still only expire from the head of
the queue. This means that unlike with normal queues, even per-queue
TTL can lead to expired lower-priority messages getting stuck behind
non-expired higher priority ones. These messages will never be
delivered, but they will appear in queue statistics.
I fear that I may heading down the rabbit hole with this approach. I wonder how others are solving this problem. Any feedback on creative routing, messaging patterns, or any alternative solutions would be appreaciated.
So I ended up taking a page out of the network router handbook. This a problem they routers need to solve to allow fair traffic patterns. This video has a good breakdown of the problem and the solution.
The translation of the problem into my domain:
And the solution:
The load balancer is a wrapper around a channel and a known number of queues that uses a weighted algorithm to balance between messages received on each queue. We found a really interesting article/implementation that seems to be working well so far.
With this solution, I can also prioritize workspaces after messages have been published to increase their throughput. That's a really nice feature.
The biggest challenge ahead of me is management of the queues. There will be too many queues to leave bound to the exchange for an extended period of time. I'm working on some tools to manage their lifecycle.
One solution could be to interpose a Resequencer. The principle is outlined in the diag in that link. In your case, something like:
The app dispatches its DELETE messages into the delete queue as originally.
The Resequencer (a new component you write) is interposed between the original publishers and original consumers. It:
pulls messages off the DELETE queue into memory
places them into (in-memory) queues-by-user
republishes them to a new queue (eg FairPriorityDeleteQueue), round-robinning to interleave fairly any messages from different original users
limits its republish rate into FairPriorityDeleteQueue, either such that the length of FairPriorityDeleteQueue (obtainable via polling the rabbitmq management api periodically) never exceeds some integer you choose N, or limited to some rate related to the rate-limited delete API the consumers use.
doesn't ack any message it pulled off the original DELETE queue, until it's republished it to FairPriorityDeleteQueue (so you never lose a message)
The original consumers subscribe instead to FairPriorityDeleteQueue.
You set the preFetchCount on these consumers fairly low (<10), to prevent them in turn bulk-buffering the contents of FairPriorityDeleteQueue in memory.
--
Some points to watch:
Rate- or length-limiting publishing into and/or drawing messages out of FairPriorityDeleteQueue is essential. If you don't limit, Resequencer may just hand messages on as fast as it receives them, limiting the potential for resequencing.
Resequencer of course acts as a kind of in-memory buffer while resequencing. If the original publishers can publish very large numbers of messages in to the queue suddenly, you may need to memory-limit the Resequencer process so that it doesn't ingest more than it can hold.
Your particular scenario is greatly helped by the fact that you have an external factor (the final delete API) limiting throughput. Without such an extrinsic limiting factor, it is much harder to choose the optimum parameters for such a resequencer, to balance throughput-versus-resequencing in a particular environment.
I don't think a resequencer is needed in this case. Maybe it is, if you need to ensure the items are deleted in a specific order. But that only comes into play when you send multiple messages at roughly the same time and need to guarantee order on the consumer end.
You should also avoid the timeout scenario, for the reasons you've mentioned. timeout is meant to tell RabbitMQ that a message doesn't need to be processed - or that it needs to be routed to a dead letter queue so that i can be processed by some other code. while you might be able to make timeout work, i don't think it's a good choice.
Priorities may solve part of the problem, but could introduce a scenario where files never get processed. if you have a priority 1 message sitting back in the queue somewhere, and you keep putting priority 2, 3, 5, 10, etc. into the queue, the 1 might not be processed. the timeout doesn't solve this, as you've noted.
For my money, I would suggest a different approach: sending delete requests serially, for a single file.
that is, send 1 message to delete 1 file. wait for a response to say it's done. then send the next message to delete the next file.
here's why i think that will work, and how to manage it:
Long-Running Workflow, Single File Delete Requests
In this scenario, I would suggest taking a multi-step approach to the problem using the idea of a "saga" (aka a long-running workflow object).
when a user requests to delete their trashcan, you send a single message through rabbitmq to the service that can handle the delete process. that service creates an instance of the saga for that user's trashcan.
the saga gathers a list of all files in the trashcan that need to be deleted. then it starts to send the requests to delete the individual files, one at a time.
with each request to delete a single file, the saga waits for the response to say the file was deleted.
when the saga receives the message to say the previous file has been deleted, it sends out the next request to delete the next file.
once all the files are deleted, the saga updates itself and any other part of the system to say the trash can is empty.
Handling Multiple Users
When you have a single user requesting a delete, things will happen fairly quickly for them. they will get their trash emptied soon.
u1 = User 1 Trashcan Delete Request
|u1|u1|u1|u1|u1|u1|u1|u1|u1|u1done|
when you have multiple users requesting a delete, the process of sending one file delete request at a time means each user will have an equal chance of getting the next file delete.
u1 = User 1 Trashcan Delete Request
u2 = User 2 Trashcan Delete Request
|u1|u2|u1|u1|u2|u2|u1|u2|u1|u2|u2|u1|u1|u1|u2|u2|u1|u2|u1|u1done|u2|u2done|
This way, there will be shared use of the resources to delete the files. Over-all, it will take a little longer for each person's trashcan to be emptied, but they will see progress sooner and that's an important aspect of people thinking the system is fast / responsive to their request.
Optimizing Small File Set vs Large File Set
In a scenario where you have a small number of users with a small number of files, the above solution may prove to be slower than if you deleted all the files at once. after all, there will be more messages sent across rabbitmq - at least 2 for every file that needs to be deleted (one delete request, one delete confirmation response)
To optimize this further, you could do a couple of things:
have a minimum trashcan size before you split up the work like this. below that minimum, you just delete it all at once
chunk the work into groups of files, instead of one at a time. maybe 10 or 100 files would be a better group size, than 1 file at a time
Either (or both) of these solutions would help to improve the over-all performance of the process by reducing the number of messages being sent, and batching the work a bit.
You would need to do some testing in your real scenario to see which of these (or maybe both) would help and at what settings.
Many Users Problem
There's one additional problem you may face - many users. If you have 2 or 3 users requesting deletes, it won't be a big deal.
But if you have 100 or 1000 users requesting deletes, it could take a very long time for an individual to get their trashcan emptied.
You may need to have a higher level controlling process for this situation, where all requests to empty trashcans would be managed by yet another Saga. This saga would rate-limit the number of active trashcan-deletion sagas.
For example, if you have 10 active requests for deleting trashcans, the rate-limiting saga would only start 3 of them and it would wait for one to finish before starting the next one.
Again, you would need to test your actual scenario to see if this is needed and see what the limits should be, for performance reasons.
There may be additional scenarios that have to be considered in your actual scenario, but I hope this gets you down the path! :)

How to handle a single publisher clogging up my RabbitMQ's queue

In my last project, I am using MassTransit (2.10.1) with RabbitMQ.
On some scenarios, a producer is allowed to send a bulk of messages to the queue.
For example - the user set to a bulk notification to his list of contacts - the list could be as large as 100000 contacts on some cases. This will send a message per each contact to the queue (I need to keep track of each message). Now since - as I understand - messages are being processed in the order of entrance, that user is clogging up the queue for a large amount of time while another user, which may have done a simple thing such as send a test message to himself, waits for the processing to end.
I have considered separating queues for regular VS bulk operations but this still doesn't solve the problem for small bulks (user with dozens of contacts waiting for users with hundred thousands) and causes extra maintenance.
The ideal solution for me - I think - would involve manipulating the routing in such a way that the consumer will be handling x messages from the same user, move the X messages from the next user, than again, and than moving back to the beginning of the queue, until all messages are processed.
Is that possible? Is there a better solution?
Thanks in advance.
You will to have to write code to manage this yourself. RabbitMQ doesn't really have any built-in mechanism to handle a scenario like this, without your code getting involved.
If you want to process a few at a time from bulk, then back to normal, then back to bulk, you'll need 2 queues and code to manage which one is being pulled from, when.
Just my opinion, seeing as how there is no built in way to my knowledge...Have you considered using whatever storage you are using to store the notifications, then just publish one message, with a List of Notifications, store it in you DB, and then have a retrieve notifications for user consumer. the response would be one message, it may have a massive payload, but even if that gets bogged down, add a skip and take property to the message and force it to be between 0 and 50 (or whatever). In what scenario would you want to show a user 100,000 notifications at once?

NServicebus - Stopping a long running process?

Here is my application I'm attempting to put together using NServiceBus:
I have a 1000 files that need to be processed by a service. So far I'm thinking I'd have one endpoint, the client, find all of those files and send them out on the bus to be processed
My other endpoint, the server that does the processing, would listen for these client messages, when one comes in process the file, and return the results.
Client takes the results, marks the file as processed, and waits for the next 999 files to be processed. Client doesn't care the order of the messages that come back, just as long as they all get processed at some point. (In reality the client is going to do something more with the data after it is processed that can't be done by the server, so I can't just fire and forget the request for processing.)
Since processing a single message can take over an hour I would scale out the application to have multiple servers all attempting to eat through the 1000 files that need to be processed.
Conceptually, its like building a personal SETI at home service to run on all of my servers.
The issues I'm having is, how do I stop midway through processing the 1000 files?
I want to keep all of my servers working as much as they can on my data, so when the client starts does it publish a 1000 commands for the 1000 files to process and then sit back and wait? And if it does this, and decides to stop, how can it clear the bus of all of those commands to process files?
If my client only pushes one or two messages on the bus at a time I could easily stop sending messages if I decide to stop on the client, but then I have two other problems
The servers could be underutilized and I'd end up with idle servers.
How do I stop the servers that are loaded up and processing data? Send them a second command of a different message format?
Thoughts, ideas? Am I approaching this problem using the right tool/right methodology?
One of things you might want to think about is how you are going to correlate the message processing. I would use a saga for this and have the client generate some kind of batch id which is attached to all the files to be processed. This allows your client to be able to send a CancelProcessing message to the saga, the handler for which could then stop the processing / sending of messages to the file processing endpoints and perform any clean-up operations such as completing the saga and removing data from the database.
So you would have client endpoint, saga endpoint and one or more file processing endpoints (which would sit behind a distributor). Your client would be responsible for initiating / sending the files to the saga. The saga manages the file correlation and processing activities, while your processing endpoints focus doing the work.
Remember that the processing endpoints don't necessarily have to be physical endpoints. You can have many of these on one server if you wanted to and use monitoring tools to determine whether or not you need to add or remove nodes.