RabbitMQ - allow only one process per user - rabbitmq

To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!

The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.

If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.

Related

Coordinate scheduled jobs between multiple producers

I have a distributed system of producers and consumers across several servers, with redundant nodes—both for failover and load-balancing. The nodes communicate via RabbitMQ messages.
Each producer runs its own scheduler to invoke jobs, which one of the consumers should run. This works by publishing the appropriate RabbitMQ message, that one of the consumers will process.
Now, the tricky part is, each job should be run only once. In short, my requirements are:
Only one invoke message per scheduled job should be processed (by any of the consumer instances)
If any of the procuders goes down, the job should still be invoked by the other instances
I can't figure out how to implement this without relying on anything else but RabbitMQ. I could make it work if there was such a thing as an "exclusive exchange", which only one producer can connect to at a time. I thought about making the consumers ignore any duplicate invokes for the same job, but this will not work, because due to the load-balancing, subsequent messages may be received by any of the other instances. Another idea was implementing a mechanism to declare one of the producers the "principal" node, so only this one is allowed to send invokes, but this basically presented the same problem of coordinating between instances.
Any ideas? Thanks in advance.

Question about moving events from redis to kafka

I have a question related to a tricky situation in an event-driven system that I want to ask for advise. Here is the situation:
In our system, I use redis as a memcached database, and kafkaa as message queues. To increase the performance of redis, I use lua scripting to process data, and at the same time, push events into a blocking list of redis. Then there will be a process to pick redis events in that blocking list and move them to kafka. So in this process, there are 3 steps:
1) Read events from redis list
2) Produce in batch into kafka
3) Delete corresponding events in redis
Unfortunately, if the process dies between 2 and 3, meaning that after producing all events into kafka, it doesn't delete corresponding events in redis, then after that process is restarted, it will produce duplicated events into kafka, which is unacceptable. So does any one has any solution for this problem. Thanks in advance, I really appreciate it.
Kafka is prone to reprocess events, even if written exactly once. Reprocessing will almost certainly be caused by rebalancing clients. Rebalancing might be triggered by:
Modification of partitions on a topic.
Redeployment of servers and subsequent temporary unavailabilty of clients.
Slow message consumption and subsequent recreation of client by the broker.
In other words, if you need to be sure that messages are processed exactly once, you need to insure that at the client. You could do so, by setting a partition key that ensures related messages are consumed in a sequential fashion by the same client. This client could then maintain a databased record of what he has already processed.

How can I get SQL Service Broker to actually use all available Queue Readers?

I've built a data collection framework around service broker. There are several procs that fill the queue with various jobs. Then a listener (activated procedure) that takes the jobs, decides what needs to be done with that item, and hands it off to the correct collection proc.
The activation queue has a MAX_QUEUE_READERS of 10, but almost never reaches that limit. Instead it will take far longer to process with just 1 or 2 activated tasks as seen from dm_broker_activated_tasks.
How can I incentivize or even force the higher number of workers?
EDIT: THIS MS doc says it only checks for activation every 5 sec.
Does that mean if my tasks take less that 5 seconds I have no way to parallelize them through service broker?
Service Broker has a specific concept for parallelism, namely the conversation group. Only messages from different groups can be processed in parallel. How this manifests is that a RECEIVE will lock the conversation group for the dequeued message and no other RECEIVE can dequeue messages from the same conversation group.
So even if you do have more messages in your queue, if they belong to the same conversation group then SQL Server cannot activate more parallel readers.
Even if you don't manage conversation groups explicitly (almost nobody does), they are managed implicitly by the fact that a conversation handle is also a group. Basically, every time you issue a single BEGIN DIALOG followed by several SEND on the same handle, they will not be processable in parallel. If you issue separate BEGIN DIALOG for each SEND they are processable in parallel, but you loose the order guarantee.

What's the recommended way to queue "delayed execution" messages via ServiceStack/Redis MQ?

I would like to queue up messages to be processed, only after a given duration of time elapses (i.e., a minimum date/time for execution is met), and/or at processing time of a message, defer its execution to a later point in time (say some prerequisite checks are not met).
For example, an event happens which defines a process that needs to run no sooner than 1 hour from the time of the initial event.
Is there any built in/suggested model to orchestrate this using https://github.com/ServiceStack/ServiceStack/wiki/Messaging-and-Redis?
I would probably build this in a two step approach.
Queue the Task into your Queueing system, which will process it into a persistence store: SQL Server, MongoDB, RavenDB.
Have a service polling your "Queued" tasks for when they should be reinserted back into the Queue.
Probably the safest way, since you don't want to lose these jobs presumably.
If you use RabbitMQ instead of Redis you could use Dead Letter Queues to get the same behavior. Dead letter queues essentially are catchers for expired messages.
So you push your messages into a queue with no intention of processing them, and they have a specific expiration in minutes. When they expire they pop over into the queue that you will process out of. Pretty slick way to queue things for later.
You could always use https://github.com/dominionenterprises/mongo-queue-csharp or https://github.com/dominionenterprises/mongo-queue-php or https://github.com/gaillard/mongo-queue-java which provides delayed messages and other uncommon features.

Service Broker Design

I’m looking to introduce SS Service Broker,
I have a remote orders database and a local processing database, all activity on the processing database has to happen in sequence, this seems a perfect job for Service Broker!
I’ve set up the infrastructure, I can send and receive messages and now I’m looking at the design of the processing. As I said all processes for one order need to be completed in sequence so I’ll put them in one conversation.
One of these processes is a request for external flat file data, we then wait (could be several days) and then import and process this file when it returns. How can I process half the tasks, then wait for the flat file to return before processing the other half.
I’ve had some ideas but I’m sure I’m missing a trick somewhere
1) Write all queue items to a status table and use status values – seems to remove some of the flexibility of SSSB and add another layer of tasks
2) Keep the transaction open until we get the data back – not ideal
3) Have the flat file import task continually polling for the file to appear – this seems inefficient
What is the most efficient way of managing this workflow?
thanks in advance
In my opinion it is like chain of responsibility. As far as i can understand we have the following workflow.
1.) Process for message.
2.) Wait for external file, now this can be a busy wait or if external data provides you a notification then we can actually do it in non-polling manner.
3.) Once data is received then process the data.
So my suggestion would be to use 3 different Queues one for each part, when one is done it will forward or put a new message in chained queue.
I am assuming, one order processing will not disrupt another order processing.
I am thinking MSMQ with Windows Sequential Work flow, might also be a candidate for this task.