Redis reliable queues for multi threaded processing - redis

For my ongoing project, I am using Redis for message distribution across several processes. Now, I am supposed to make them reliable.
I consider using the Reliable queue pattern through BRPOPLPUSH command. This pattern suggests that the processing thread remove the extra copy of message from "processing list" via lrem command, after the job has been successfully completed.
As I am using multiple threads to pop, the extra copies of popped item go into a processing list from several threads. That is to say, the processing queue contains elements popped by several threads. As a consequence, if a thread completes its job, it cannot know which item to remove from the "processing queue".
To overcome this problem, I am thinking that I should maintain multiple processing queues (one for each thread) based on threadId. So, my BRPOPLPUSH will be:
BRPOPLPUSH <primary-queue> <thread-specific-processing-queue>
Then for cleaning up timedout objects, my monitoring thread will have to monitor all these thread specific processing queues.
Are there any better approaches to this problem, than the one conceived above?

#user779159
To support reliable queue mechanism, we take the following approach:
- two data structures
-- Redis List (the original queue from which items are popped regularly)
-- a Redis z-set, which temporarily stores the popped item.
Algorithm:
-- When an element is popped, we store in z-set
-- If the task that picked the item completed its job, it will delete the entry from z-set.
-- If the task couldn't complete it, the item will be hanging around in z-set. So we know, whether a task was done within expected time or not.
-- Now, another background process periodically scans this z-set, picks up items which are timedout, and then puts them back to queue
How it is done:
we use zset to store the item that we poped (typically using a lua
script).
We store a timeout value as the rank/score of this item.
Another scanner process, will periodically (say every minute) run
z-set command zrangebyscore, to select items between (now and last 1
minute).
If there are items found by the above command, this means
the process that popped the item (via brpop) has not completed its
task in time.
So, this 2nd process will put the item back to the
queue (redis list) where it originally belonged.

Related

RabbitMQ - allow only one process per user

To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!
The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.
If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.

Keep lock for multiple updates to a single row in SQL

I want to lock a certain single row in a MariaDb table for multiple updates, and would like to release the lock when I'm finished updating it. The scenarios is multiple machines could be requesting the lock. When one machine gets the lock, it has a batch of work that needs to update the row data multiple times. The other machine may not get a chance at the lock while the first machine is still busy with it's batch of work. So if the first machine does two updates, the other machine may not get the lock between the two updates.
I've looked at transactions, but they do a rollback if my application crashes, whereas I want the updates to remain.
Does anyone have an idea on how to solve this kind of issue? Googling it does not produce any good hits, or my search terms might be wrong.
Edit:
Trying to clarify the use case a bit more:
This functionality is for event processing in a distributed system, where there are multiple concurrent consumers.
The events are parts of streams, and the events within a stream must be processed in the correct order, otherwise the system gets corrupted.
For all kinds of reasons the events of a single stream can end up on different consumers, in the wrong orders, with a large delay, etcetera, these are exceptional cases, not normal operation conditions.
The locking of rows helps making sure that different consumers are not working on the same event stream concurrently
BEGIN;
SELECT counter, someAdditionalFields FROM streamCounters WHERE streamId=x FOR UPDATE;
# this locks the stream to this consumer
# This thread is processing event with eventId counter+1, this can be in the range of 0ms to a few seconds, certainly not minutes.
# The result of this work ends up in another table
UPDATE streamCounters SET counter=(counter+1) WHERE streamId=x;
# signifies that event was processed, the stream advances by 1
# This thread is processing event with eventId counter+2
UPDATE streamCounters SET counter=(counter+2) WHERE streamId=x;
# signifies that event was processed, the stream advances by 1
# ... until no more ready events available in stream
COMMIT; # release the lock on this stream
The reason I don't want this transaction to roll back in case of a crash, is because the processing of the events means significant change in other tables.
The changes in the other tables are done by a different part of the application, they provide me a function that I call, what they do is not really under my control.
Lets say I want to process event 1 in a stream, I lock the stream.
I call provided function that processes event 1.
When it returns with success, i update the streamcounter from 0 to 1
Now I want to process event 2, but I crash.
Now this transaction gets rolled back, streamcounter goes back to 0, but event was processed.
My streamcounter does now not represent how much work was done!

GCD custom queue for synchronization

In Mike Ash's GCD article, he mentions: "Custom queues can be used as a synchronization mechanism in place of locks."
Questions:
1) How does dispatch_barrier_async work differently from dispatch_async? Doesn't dispatch_async achieve the same function as dispatch_barrier_async synchronization wise?
2) Is custom queue the only option? Can't we use main queue for synchronization purpose?
First, whether a call to submit a task to a queue is _sync or _async does not in any way affect whether the task is synchronized with other threads or tasks. It only affects whether the caller is blocked until the task completes executing or if it can continue on. The _sync stands for "synchronous" and _async stands for "asynchronous", which sound similar to but are different from "synchronized" and "unsynchronized". The former have nothing to do with thread safety, while the latter are crucial.
You can use a serial queue for synchronizing access to shared data structures. A serial queue only executes one task at a time. So, if all tasks which touch a given data structure are submitted to the same serial queue, then they will never be executing simultaneously and their accesses to the data structure will be safe.
The main queue is a serial queue, so it has this same property. However, any long-running task submitted to the main queue will block user interaction. If the tasks don't have to interact with the GUI or have a similar requirement that they run on the main thread, it's better to use a custom serial queue.
It's also possible to achieve synchronization using a custom concurrent queue if you use the barrier routines. dispatch_barrier_async() is different from dispatch_async() in that the queue temporarily become a serial queue, more or less. When the barrier task reaches the head of the queue, it is not started until all previous tasks in that queue have completed. Once they do, the barrier task is executed. Until the barrier task completes, the queue will not start any subsequent tasks that it holds.
Non-barrier tasks submitted to a concurrent queue may run simultaneously with one another, which means they are not synchronized and, if they access shared data structures, they can corrupt that data structure or get incorrect results, etc.
The barrier routines are useful for read-write synchronization. It is usually safe for multiple threads to be reading from a data structure simultaneously, so long as no thread is trying to modify (write to) the data structure at the same time. A task that modifies or writes to the data structure must not run simultaneously with either readers or other writers. This can be achieved by submitting read tasks as non-barrier tasks to a given queue and submitting write tasks as barrier tasks to that same queue.

How to create a distributed 'debounce' task to drain a Redis List?

I have the following usecase: multiple clients push to a shared Redis List. A separate worker process should drain this list (process and delete). Wait/multi-exec is in place to make sure, this goes smoothly.
For performance reasons I don't want to call the 'drain'-process right away, but after x milliseconds, starting from the moment the first client pushes to the (then empty) list.
This is akin to a distributed underscore/lodash debounce function, for which the timer starts to run the moment the first item comes in (i.e.: 'leading' instead of 'trailing')
I'm looking for the best way to do this reliably in a fault tolerant way.
Currently I'm leaning to the following method:
Use Redis Set with the NX and px method. This allows:
to only set a value (a mutex) to a dedicated keyspace, if it doesn't yet exist. This is what the nx argument is used for
expires the key after x milliseconds. This is what the px argument is used for
This command returns 1 if the value could be set, meaning no value did previously exist. It returns 0 otherwise. A 1 means the current client is the first client to run the process since the Redis List was drained. Therefore,
this client puts a job on a distributed queue which is scheduled to run in x milliseconds.
After x milliseconds, the worker to receive the job starts the process of draining the list.
This works on paper, but feels a bit complicated. Any other ways to make this work in a distributed fault-tolerant way?
Btw: Redis and a distributed queue are already in place so I don't consider it an extra burden to use it for this issue.
Sorry for that, but normal response would require a bunch of text/theory. Because your good question you've already written a good answer :)
First of all we should define the terms. The 'debounce' in terms of underscore/lodash should be learned under the David Corbacho’s article explanation:
Debounce: Think of it as "grouping multiple events in one". Imagine that you go home, enter in the elevator, doors are closing... and suddenly your neighbor appears in the hall and tries to jump on the elevator. Be polite! and open the doors for him: you are debouncing the elevator departure. Consider that the same situation can happen again with a third person, and so on... probably delaying the departure several minutes.
Throttle: Think of it as a valve, it regulates the flow of the executions. We can determine the maximum number of times a function can be called in certain time. So in the elevator analogy you are polite enough to let people in for 10 secs, but once that delay passes, you must go!
Your are asking about debounce sinse first element would be pushed to list:
So that, by analogy with the elevator. Elevator should go up after 10 minutes after the lift came first person. It does not matter how many people crammed into the elevator more.
In case of distributed fault-tolerant system this should be viewed as a set of requirements:
Processing of the new list must begin within X time, after inserting the first element (ie the creation of the list).
The worker crash should not break anything.
Dead lock free.
The first requirement must be fulfilled regardless of the number of workers - be it 1 or N.
I.e. you should know (in distributed way) - group of workers have to wait, or you can start the list processing. As soon as we utter the phrase "distributed" and "fault-tolerant". These concepts always lead with they friends:
Atomicity (eg by blocking)
Reservation
In practice
In practice, i am afraid that your system needs to be a little bit more complicated (maybe you just do not have written, and you already have it).
Your method:
Pessimistic locking with mutex via SET NX PX. NX is a guarantee that only one process at a time doing the work (atomicity). The PX ensures that if something happens with this process the lock is released by the Redis (one part of fault-tolerant about dead locking).
All workers try to catch one mutex (per list key), so just one be happy and would process list after X time. This process can update TTL of mutex (if need more time as originally wanted). If process would crash - the mutex would be unlocked after TTL and be grabbed with other worker.
My suggestion
The fault-tolerant reliable queue processing in Redis built around RPOPLPUSH:
RPOPLPUSH item from processing to special list (per worker per list).
Process item
Remove item from special list
Requirements
So, if worker would crashed we always can return broken message from special list to main list. And Redis guarantees atomicity of RPOPLPUSH/RPOP. That is, there is only a problem group of workers to wait a while.
And then two options. First - if have much of clients and lesser workers use locking on side of worker. So try to lock mutex in worker and if success - start processing.
And vice versa. Use SET NX PX each time you execute LPUSH/RPUSH (to have "wait N time before pop from me" solution if you have many workers and some push clients). So push is:
SET myListLock 1 PX 10000 NX
LPUSH myList value
And each worker just check if myListLock exists we should wait not at least key TTL before set processing mutex and start to drain.

What's the recommended way to queue "delayed execution" messages via ServiceStack/Redis MQ?

I would like to queue up messages to be processed, only after a given duration of time elapses (i.e., a minimum date/time for execution is met), and/or at processing time of a message, defer its execution to a later point in time (say some prerequisite checks are not met).
For example, an event happens which defines a process that needs to run no sooner than 1 hour from the time of the initial event.
Is there any built in/suggested model to orchestrate this using https://github.com/ServiceStack/ServiceStack/wiki/Messaging-and-Redis?
I would probably build this in a two step approach.
Queue the Task into your Queueing system, which will process it into a persistence store: SQL Server, MongoDB, RavenDB.
Have a service polling your "Queued" tasks for when they should be reinserted back into the Queue.
Probably the safest way, since you don't want to lose these jobs presumably.
If you use RabbitMQ instead of Redis you could use Dead Letter Queues to get the same behavior. Dead letter queues essentially are catchers for expired messages.
So you push your messages into a queue with no intention of processing them, and they have a specific expiration in minutes. When they expire they pop over into the queue that you will process out of. Pretty slick way to queue things for later.
You could always use https://github.com/dominionenterprises/mongo-queue-csharp or https://github.com/dominionenterprises/mongo-queue-php or https://github.com/gaillard/mongo-queue-java which provides delayed messages and other uncommon features.