I want to lock a certain single row in a MariaDb table for multiple updates, and would like to release the lock when I'm finished updating it. The scenarios is multiple machines could be requesting the lock. When one machine gets the lock, it has a batch of work that needs to update the row data multiple times. The other machine may not get a chance at the lock while the first machine is still busy with it's batch of work. So if the first machine does two updates, the other machine may not get the lock between the two updates.
I've looked at transactions, but they do a rollback if my application crashes, whereas I want the updates to remain.
Does anyone have an idea on how to solve this kind of issue? Googling it does not produce any good hits, or my search terms might be wrong.
Edit:
Trying to clarify the use case a bit more:
This functionality is for event processing in a distributed system, where there are multiple concurrent consumers.
The events are parts of streams, and the events within a stream must be processed in the correct order, otherwise the system gets corrupted.
For all kinds of reasons the events of a single stream can end up on different consumers, in the wrong orders, with a large delay, etcetera, these are exceptional cases, not normal operation conditions.
The locking of rows helps making sure that different consumers are not working on the same event stream concurrently
BEGIN;
SELECT counter, someAdditionalFields FROM streamCounters WHERE streamId=x FOR UPDATE;
# this locks the stream to this consumer
# This thread is processing event with eventId counter+1, this can be in the range of 0ms to a few seconds, certainly not minutes.
# The result of this work ends up in another table
UPDATE streamCounters SET counter=(counter+1) WHERE streamId=x;
# signifies that event was processed, the stream advances by 1
# This thread is processing event with eventId counter+2
UPDATE streamCounters SET counter=(counter+2) WHERE streamId=x;
# signifies that event was processed, the stream advances by 1
# ... until no more ready events available in stream
COMMIT; # release the lock on this stream
The reason I don't want this transaction to roll back in case of a crash, is because the processing of the events means significant change in other tables.
The changes in the other tables are done by a different part of the application, they provide me a function that I call, what they do is not really under my control.
Lets say I want to process event 1 in a stream, I lock the stream.
I call provided function that processes event 1.
When it returns with success, i update the streamcounter from 0 to 1
Now I want to process event 2, but I crash.
Now this transaction gets rolled back, streamcounter goes back to 0, but event was processed.
My streamcounter does now not represent how much work was done!
Related
To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!
The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.
If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.
I want to implement a mutual exclusion system in PostgreSQL where multiple worker processes will temporarily lock resources (rows) from a table (queue) while they work on them. If the worker processes crash, I want the lock to be cleanly released and not have to rely on another process to clean up the leaked locks.
What I have come up with so far is to use a SELECT ... FOR UPDATE SKIP LOCKED query within a transaction, which locks the row it finds and skips any other locked row.
It works well but one of the issues is that the worker might take a while to do its task and I need to keep the transaction open for the entire duration of its task.
Another problem is that the workers work incrementally and persist their state to the database so that if they're stopped or crash, they can resume quickly where they were. The row being locked makes it impossible to persist their state in the same table (though I think I can get away from that by using another table to persist the state).
I've searched on the Web on how to implement a semaphore or a resource borrowing system in SQL/PostgreSQL but I haven't found something that fits my needs. Is there a simple way of achieving this with PostgreSQL?
I have the following usecase: multiple clients push to a shared Redis List. A separate worker process should drain this list (process and delete). Wait/multi-exec is in place to make sure, this goes smoothly.
For performance reasons I don't want to call the 'drain'-process right away, but after x milliseconds, starting from the moment the first client pushes to the (then empty) list.
This is akin to a distributed underscore/lodash debounce function, for which the timer starts to run the moment the first item comes in (i.e.: 'leading' instead of 'trailing')
I'm looking for the best way to do this reliably in a fault tolerant way.
Currently I'm leaning to the following method:
Use Redis Set with the NX and px method. This allows:
to only set a value (a mutex) to a dedicated keyspace, if it doesn't yet exist. This is what the nx argument is used for
expires the key after x milliseconds. This is what the px argument is used for
This command returns 1 if the value could be set, meaning no value did previously exist. It returns 0 otherwise. A 1 means the current client is the first client to run the process since the Redis List was drained. Therefore,
this client puts a job on a distributed queue which is scheduled to run in x milliseconds.
After x milliseconds, the worker to receive the job starts the process of draining the list.
This works on paper, but feels a bit complicated. Any other ways to make this work in a distributed fault-tolerant way?
Btw: Redis and a distributed queue are already in place so I don't consider it an extra burden to use it for this issue.
Sorry for that, but normal response would require a bunch of text/theory. Because your good question you've already written a good answer :)
First of all we should define the terms. The 'debounce' in terms of underscore/lodash should be learned under the David Corbacho’s article explanation:
Debounce: Think of it as "grouping multiple events in one". Imagine that you go home, enter in the elevator, doors are closing... and suddenly your neighbor appears in the hall and tries to jump on the elevator. Be polite! and open the doors for him: you are debouncing the elevator departure. Consider that the same situation can happen again with a third person, and so on... probably delaying the departure several minutes.
Throttle: Think of it as a valve, it regulates the flow of the executions. We can determine the maximum number of times a function can be called in certain time. So in the elevator analogy you are polite enough to let people in for 10 secs, but once that delay passes, you must go!
Your are asking about debounce sinse first element would be pushed to list:
So that, by analogy with the elevator. Elevator should go up after 10 minutes after the lift came first person. It does not matter how many people crammed into the elevator more.
In case of distributed fault-tolerant system this should be viewed as a set of requirements:
Processing of the new list must begin within X time, after inserting the first element (ie the creation of the list).
The worker crash should not break anything.
Dead lock free.
The first requirement must be fulfilled regardless of the number of workers - be it 1 or N.
I.e. you should know (in distributed way) - group of workers have to wait, or you can start the list processing. As soon as we utter the phrase "distributed" and "fault-tolerant". These concepts always lead with they friends:
Atomicity (eg by blocking)
Reservation
In practice
In practice, i am afraid that your system needs to be a little bit more complicated (maybe you just do not have written, and you already have it).
Your method:
Pessimistic locking with mutex via SET NX PX. NX is a guarantee that only one process at a time doing the work (atomicity). The PX ensures that if something happens with this process the lock is released by the Redis (one part of fault-tolerant about dead locking).
All workers try to catch one mutex (per list key), so just one be happy and would process list after X time. This process can update TTL of mutex (if need more time as originally wanted). If process would crash - the mutex would be unlocked after TTL and be grabbed with other worker.
My suggestion
The fault-tolerant reliable queue processing in Redis built around RPOPLPUSH:
RPOPLPUSH item from processing to special list (per worker per list).
Process item
Remove item from special list
Requirements
So, if worker would crashed we always can return broken message from special list to main list. And Redis guarantees atomicity of RPOPLPUSH/RPOP. That is, there is only a problem group of workers to wait a while.
And then two options. First - if have much of clients and lesser workers use locking on side of worker. So try to lock mutex in worker and if success - start processing.
And vice versa. Use SET NX PX each time you execute LPUSH/RPUSH (to have "wait N time before pop from me" solution if you have many workers and some push clients). So push is:
SET myListLock 1 PX 10000 NX
LPUSH myList value
And each worker just check if myListLock exists we should wait not at least key TTL before set processing mutex and start to drain.
Scenario:
We have a wcf workflow with a client that does NOT use transactionflow.
The workflow contains several sequential TransactedReceiveScopes (using content-based correlation).
The TransactedReceiveScopes contain custom db operations.
Observations:
When we run SQL profiler against the first call, we see all the custom db calls, and the SaveInstance call in the profile trace.
We've noticed that, even though the SendReply is at the very end of TransactedReceiveScope, sometimes the sendreply occurs a good 10 seconds before the transaction gets committed.
We tried changing the TimeToPersist and TimeToUnload to zero, but that had no effect. (The trace shows the SaveInstance happening immediately anyway, but rather the commit seems to be delayed).
Questions:
Are our observations correct?
At what point is the transaction committed? Is this like garbage collection - i.e. it commits some time later when it's not busy?
Is there any way to control the commit delay, or is the only way to do this to use transactionflow from the client (anc then it should all commit when the client commits, including the persist).
The TransactedReceiveScope commits the transaction when the body is completed but as all execution is done through the scheduler that could be some time later. It is not related to garbage collection and there is no real way to influence it other that to avoid a busy machine and a lot of other parallel activities that could also be in the execution queue.
I'm taking an intro class on database management systems, and had a question that wasn't answered by my book. This question is not from my homework, I was just curious.
The textbook continually stresses that a transaction is one logical unit of work. However, when coming across the shared/exclusive locking modes, I got a little confused.
There was a diagram in the book that looked like this:
Time | Transaction Status
1 Request Lock
2 Receive Lock
3 Process transaction
4 Release Lock
5 Lock is released
Does the transaction get processed all at the same time, or does it get processed as individual locks are obtained?
If there are commands in two transactions that result in a shared lock as well as an exclusive lock, do those transactions run concurrently, or are they scheduled one after the other?
The answer is, as usual, "it depends" :-)
Generally speaking, you don't need to take out all your locks before you begin; however, you need to take out all your locks before you release any locks.
So you can do the following:
lock resource A
update A
lock resource B
update B
unlock A
unlock B
This allows you to be a bit friendlier to other transactions that may want to read B, and don't care about A, for example. It does introduce more risk -- you may be unable to acquire a lock on B, and decide to roll back your transaction. Them's the breaks.
You also want to always acquire all locks in the same order, so that you don't wind up in a deadlock (transaction 1 has A and wants B; trans 2 has B and wants A; standoff at high noon, no one wins. If you enforce consistent order, trans 2 will try to get A before B and either wait, letting trans 2 proceed, or fail, if trans 1 already started -- either way, no deadlock).
Things get more interesting when you have intent-to-exclude locks -- locks that are taken as shared with an "option" to make them exclusive. This might be covered somewhere in the back of your book :-)
In practice each operation aquires the needed lock before it proceeds. A SELECT will first aquire a shared lock on a row, then read the row. An UPDATE will first acquire an exclusive lock on that row, then update the row. In theory you can say that 'locks are aquired, then the transaction processes', but in real life is it each individual operation in the transaction that knows what locks are required.
If it needs an exclusive lock, it will either block the other transaction or it will wait for the other transaction to finish before obtaining the lock.
Things that need exclusive locks (UPDATE/DELETE/etc) can't happen while anything else is accessing the data.
in general locks are determined at run time. When the BEGIN TRANSACTION command is processed, nothing has run in the transaction yet, so there are no locks. As commands execute in the transaction locks are acquired.
"If there are commands in two transactions that result in a shared lock as well as an exclusive lock, do those transactions run concurrently, or are they scheduled one after the other?"
A lock does not consist solely of the notion "shared/exclusive". The most important thing about a lock is the resource that it applies to.
Two transactions that each hold an exclusive lock on distinct resources (say, two separate tables, or two separate partitions, or two separate pages, or two separate rows, or two separate printers, or two separate IP ports, ...) can continue to run concurrently without any problem.
Transaction serialization only becomes necessary when a transaction requests a lock on some resource, where the sharing mode of that lock is incompatible with a lock held on the same resource by some other transaction.
If your textbook really gives the sequence of events as you state, then throw it away. Lock requests emerge as the transaction is being processed, and there is no definitive and final way for the transaction processor to know at the start of the transaction which locks it will be needing (otherwise deadlocking would be a nonexistant problem).