How to create a distributed 'debounce' task to drain a Redis List? - redis

I have the following usecase: multiple clients push to a shared Redis List. A separate worker process should drain this list (process and delete). Wait/multi-exec is in place to make sure, this goes smoothly.
For performance reasons I don't want to call the 'drain'-process right away, but after x milliseconds, starting from the moment the first client pushes to the (then empty) list.
This is akin to a distributed underscore/lodash debounce function, for which the timer starts to run the moment the first item comes in (i.e.: 'leading' instead of 'trailing')
I'm looking for the best way to do this reliably in a fault tolerant way.
Currently I'm leaning to the following method:
Use Redis Set with the NX and px method. This allows:
to only set a value (a mutex) to a dedicated keyspace, if it doesn't yet exist. This is what the nx argument is used for
expires the key after x milliseconds. This is what the px argument is used for
This command returns 1 if the value could be set, meaning no value did previously exist. It returns 0 otherwise. A 1 means the current client is the first client to run the process since the Redis List was drained. Therefore,
this client puts a job on a distributed queue which is scheduled to run in x milliseconds.
After x milliseconds, the worker to receive the job starts the process of draining the list.
This works on paper, but feels a bit complicated. Any other ways to make this work in a distributed fault-tolerant way?
Btw: Redis and a distributed queue are already in place so I don't consider it an extra burden to use it for this issue.

Sorry for that, but normal response would require a bunch of text/theory. Because your good question you've already written a good answer :)
First of all we should define the terms. The 'debounce' in terms of underscore/lodash should be learned under the David Corbacho’s article explanation:
Debounce: Think of it as "grouping multiple events in one". Imagine that you go home, enter in the elevator, doors are closing... and suddenly your neighbor appears in the hall and tries to jump on the elevator. Be polite! and open the doors for him: you are debouncing the elevator departure. Consider that the same situation can happen again with a third person, and so on... probably delaying the departure several minutes.
Throttle: Think of it as a valve, it regulates the flow of the executions. We can determine the maximum number of times a function can be called in certain time. So in the elevator analogy you are polite enough to let people in for 10 secs, but once that delay passes, you must go!
Your are asking about debounce sinse first element would be pushed to list:
So that, by analogy with the elevator. Elevator should go up after 10 minutes after the lift came first person. It does not matter how many people crammed into the elevator more.
In case of distributed fault-tolerant system this should be viewed as a set of requirements:
Processing of the new list must begin within X time, after inserting the first element (ie the creation of the list).
The worker crash should not break anything.
Dead lock free.
The first requirement must be fulfilled regardless of the number of workers - be it 1 or N.
I.e. you should know (in distributed way) - group of workers have to wait, or you can start the list processing. As soon as we utter the phrase "distributed" and "fault-tolerant". These concepts always lead with they friends:
Atomicity (eg by blocking)
Reservation
In practice
In practice, i am afraid that your system needs to be a little bit more complicated (maybe you just do not have written, and you already have it).
Your method:
Pessimistic locking with mutex via SET NX PX. NX is a guarantee that only one process at a time doing the work (atomicity). The PX ensures that if something happens with this process the lock is released by the Redis (one part of fault-tolerant about dead locking).
All workers try to catch one mutex (per list key), so just one be happy and would process list after X time. This process can update TTL of mutex (if need more time as originally wanted). If process would crash - the mutex would be unlocked after TTL and be grabbed with other worker.
My suggestion
The fault-tolerant reliable queue processing in Redis built around RPOPLPUSH:
RPOPLPUSH item from processing to special list (per worker per list).
Process item
Remove item from special list
Requirements
So, if worker would crashed we always can return broken message from special list to main list. And Redis guarantees atomicity of RPOPLPUSH/RPOP. That is, there is only a problem group of workers to wait a while.
And then two options. First - if have much of clients and lesser workers use locking on side of worker. So try to lock mutex in worker and if success - start processing.
And vice versa. Use SET NX PX each time you execute LPUSH/RPUSH (to have "wait N time before pop from me" solution if you have many workers and some push clients). So push is:
SET myListLock 1 PX 10000 NX
LPUSH myList value
And each worker just check if myListLock exists we should wait not at least key TTL before set processing mutex and start to drain.

Related

RabbitMQ - allow only one process per user

To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!
The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.
If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.

Distributed workers that ensure a single instance of a task is running

I need to design a distributed system a scheduler sends tasks to workers in multiple nodes. Each task is assigned an id, and it could be executed more than once, scheduled by the scheduler (usually once per hour).
My only requirement is that a task with a specific id should not be executed twice at the same time by the cluster. I can think of a design where the scheduler holds a lock for each task id and sends the task to an appropriate worker. Once the worker has finished the lock should be released and the scheduler might schedule it again.
What should my design include to ensure this. I'm concerned about cases where a task is sent to a worker which starts the task but then fails to inform the scheduler about it.
What would be the best practice in this scenario to ensure that only a single instance of a job is always executed at a time?
You could use a solution that implements a consensus protocol. Say - for example - that all your nodes in the cluster can communicate using the Raft protocol. As such, whenever a node X would want to start working on a task Y it would attempt to commit a message X starts working on Y. Once such messages are committed to the log, all the nodes will see all the messages in the log in the same order.
When node X finishes or aborts the task it would attempt to commit X no longer works on Y so that another node can start/continue working on it.
It could happen that two nodes (X and Z) may try to commit their start messages concurrently, and the log would then look something like this:
...
N-1: ...
N+0: "X starts working on Y"
...
N+k: "Z starts working on Y"
...
But since there is no X no longer works on Y message between the N+0 and N+k entry, every node (including Z) would know that Z must not start the work on Y.
The only remaining problem would be if node X got partitioned from the cluster before it can attempt to commit its X no longer works on Y for which I believe there is no perfect solution.
A work-around could be that X would try to periodically commit a message X still works on Y at time T and if no such message was committed to the log for some threshold duration, the cluster would assume that no one is working on that task anymore.
With this work-around however, you'd be allowing the possibility that two or more nodes will work on the same task (the partitioned node X and some new node that picks up the task after the timeout).
After some thorough search, I came to the conclusion that this problem can be solved through a method called fencing.
In essence, when you suspect that a node (worker) failed, the only way to ensure that it will not corrupt the rest of the system is to provide a fence that will stop the node from accessing the shared resource you need to protect. That must be a radical method like resetting the machine that runs the failed process or setup a firewall rule that will prevent the process from accessing the shared resource. Once the fence is in place, then you can safely break the lock that was being held by the failed process and start a new process.
Another possibility is to use a relational database to store task metadata + proper isolation level (can't go wrong with serializable if performance is not your #1 priority).
SERIALIZABLE
This isolation level specifies that all transactions occur in a completely isolated fashion; i.e., as if all transactions in the system had executed serially, one after the other. The DBMS may execute two or more transactions at the same time only if the illusion of serial execution can be maintained.
Use either optimistic or pessimistic locking should work too. https://learning-notes.mistermicheels.com/data/sql/optimistic-pessimistic-locking-sql/
In case you need a rerun of the task, simply update the metadata. (or I would recommend to create a new task with different metadata to keep track of its execution history)

Best way to handle timouts on rabbitmq message processing

I am trying to get my head around an issue I have recently encountered and I hope someone will be able to point me in the most reasonable direction of solving it.
I am using Riak KV store and working on CRDT data, where I have some sort of counter inside each CRDT item stored in database.
I have a rabbitmq queue, where each message is a request to increase or decrease a certain amount of aforementioned counters.
Finally, I have a group of service-workers, that listens on the queue, and for each request try to change the amount of counters accordingly.
The issue I have is as follows: While a single worker is processing a request, it may get stuck for a while on a write operation to database – let’s say on a second change of counters out of three. It’s connection with rabbitmq gets lost (timeout) so the message-request gets back on to the queue (I cannot afford to miss one). Then it is picked up by second worker, which begins all processing anew. However, the first worker finishes its work, and as a results I have processed a single message twice.
I can split those increments into single actions, but this still leaves me with dilemma – can still change value of counter twice, if some worker gets stuck on a write operation for a long period.
I do not have possibility of making Riak KV CRDT writes work faster, nor can I accept missing out a message-request. I need to implement some means of checking whether a request was already processed before.
My initial thoughts were to use some alternative, quick KV store for storing rabbitMQ message ID if they are being processed. That way other workers could tell whether they are not starting to process a message that is already parsed elsewhere.
I could use any help and pointers to materials I can read.
You can't have "exactly one delivery" semantic. You can reduce double-sent messages or missed deliveries, so it's up to you to decide which misbehavior is the least inconvenient.
First of all are you sure it's the CRDTs that are too slow ? Are you using simple counters or counters inside maps ? In my experience they are quite fast, although slower than kv. You could try:
- having simple CRDTs (no maps), and more CRDTs objects, to lower their stress( can you split the counters in two ?)
- not using CRDTs but using good old sibling resolution on client side on simple key/values.
- accumulate the count updates orders and apply them in batch, but then you're accepting an increase in latency so it's equivalent to increasing the timeout.
Can you provide some metrics? Like how long the updates take, what numbers you'd expect, if it's as slow when you have few updates or many updates, etc

How do I wait for all work to complete in Akka.Net?

I have successfully sent work to a pool of actors to perform my work, but now I want to do some aggregation on the results returned by all the workers. How do I know that everyone is done?
The best I have come up with is to maintain a set of requests ids and wait for that set to go to zero, but this seems inelegant.
Generally, you want to use what we call the "Commander" pattern for this. Essentially, you have one stateful actor (the Commander) that is responsible for starting and monitoring the task. You then farm out the actual work across the actor pool, and have them report back to the Commander as they finish. The commander can then track the progress of the job by calculating # completions / size of worker pool.
This way, the workers can be monitored and restarted independently as they do the work, but all the precious task-level state and information lives in the Commander (this is called the "Error Kernel pattern")
You can see an example of this in the Akka.NET scalable webcrawler demo.

Redis reliable queues for multi threaded processing

For my ongoing project, I am using Redis for message distribution across several processes. Now, I am supposed to make them reliable.
I consider using the Reliable queue pattern through BRPOPLPUSH command. This pattern suggests that the processing thread remove the extra copy of message from "processing list" via lrem command, after the job has been successfully completed.
As I am using multiple threads to pop, the extra copies of popped item go into a processing list from several threads. That is to say, the processing queue contains elements popped by several threads. As a consequence, if a thread completes its job, it cannot know which item to remove from the "processing queue".
To overcome this problem, I am thinking that I should maintain multiple processing queues (one for each thread) based on threadId. So, my BRPOPLPUSH will be:
BRPOPLPUSH <primary-queue> <thread-specific-processing-queue>
Then for cleaning up timedout objects, my monitoring thread will have to monitor all these thread specific processing queues.
Are there any better approaches to this problem, than the one conceived above?
#user779159
To support reliable queue mechanism, we take the following approach:
- two data structures
-- Redis List (the original queue from which items are popped regularly)
-- a Redis z-set, which temporarily stores the popped item.
Algorithm:
-- When an element is popped, we store in z-set
-- If the task that picked the item completed its job, it will delete the entry from z-set.
-- If the task couldn't complete it, the item will be hanging around in z-set. So we know, whether a task was done within expected time or not.
-- Now, another background process periodically scans this z-set, picks up items which are timedout, and then puts them back to queue
How it is done:
we use zset to store the item that we poped (typically using a lua
script).
We store a timeout value as the rank/score of this item.
Another scanner process, will periodically (say every minute) run
z-set command zrangebyscore, to select items between (now and last 1
minute).
If there are items found by the above command, this means
the process that popped the item (via brpop) has not completed its
task in time.
So, this 2nd process will put the item back to the
queue (redis list) where it originally belonged.