Lock specific table rows to insert a new row - sql

I have an Operations table with the columns sourceId, destinationId, amount, status. Whenever a user makes a transfer the API inserts a new row into that table, after checking the user's balance by calculating the sum of credit operations minus the sum of debit operations. Only when the balance is greater or equal than the transfer amount the operation is inserted with a successful status.
The issue is concurrency since a user performing multiple transfers at the same time might end up with a negative balance.
There are multiple ways of handling this concurrency issue with PostgreSQL:
Serializable transaction isolation level
Table locking
Row versioning
Row locking
etc.
Our expected behavior is, instead of failing with a unique violation on (sourceId, version), the database should wait for the previous transaction to finish, get the latest inserted version without setting the transaction isolation level to SERIALIZABLE.
However, I am not completely sure about the best approach. Here's what I tried:
1. Serializable transaction isolation level
This is the easiest approach, but the problem is lock escalation because if the database engine is under heavy load, 1 transaction can lock the whole table up, which is the documented behavior.
Pseudo-code:
newId = INSERT INTO "Operations" ("SourceId", "DestinationId", "Amount", "Status", "OccuredAt") values (null, 2, 3, 100, 'PENDING', null);
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM "Operations" WHERE ("SourceId" = 2 or "DestinationId"=2) and "Status" = 'SUCCESSFUL';
'''API: check if balance > transfer amount'''
UPDATE "Operations" SET "Status" = 'SUCCESSFUL' where id = newId
COMMIT;
2. Table locking
This is what we want to avoid by NOT using serializable transaction level
3. Row versioning
This approach seems best so far performance-wise. We added a column version int and a unique index on (sourceId, version), and when the transaction is inserted it is inserted with the next version. If two transactions are concurrent the database throws an error:
duplicate key value violates unique constraint "IX_Transactions_SourceWalletId_Version"
Pseudo-code:
newId = INSERT INTO "Operations" ("SourceId", "DestinationId", "Amount", "Status", "OccuredAt") values (null, 2, 3, 100, 'PENDING', null);
BEGIN;
lastVersion = SELECT o."Version"
FROM "Operations"
WHERE ("SourceId" = 2) AND ("Version" IS NOT NULL)
ORDER BY o."Version" DESC
LIMIT 1
SELECT * FROM "Operations" WHERE ("SourceId" = 2 or "DestinationId"=2)
and "Status" = 'SUCCESSFUL';
'''API: check if balance > transfer amount'''
UPDATE "Operations" SET "Status" = 'SUCCESSFUL', "Version" = lastVersion + 1 where id = newId;
COMMIT;
4. Row locking
Before calculating the user balance, lock all transaction rows with sourceWalletId = x (where x is the user making the transfer). But I can't find a way of doing this in PostgreSQL, using for update does the trick, but after a concurrent transaction waits on the first one, the result does not return the newly inserted row, which is the documented behavior for PostgreSQL.

using for update does the trick, but after a concurrent transaction waits on the first one, the result does not return the newly inserted row, which is the documented behavior for PostgreSQL.
Kind of true, but also not a show-stopper.
Yes, in default READ COMMITTED transaction isolation each statement only sees rows that were committed before the query began. The query, mind you, not the transaction. See:
Can concurrent value modification impact single select in PostgreSQL 9.1?
Just start the next query in the same transaction after acquiring the lock.
Assuming a table holding exactly one row per (relevant) user (like you should have). I'll call it "your_wallet_table", based on the cited "sourceWalletId":
BEGIN;
SELECT FROM "your_wallet_table" WHERE "sourceWalletId" = x FOR UPDATE;
-- x is the user making the transfer
-- check the user's balance (separate query!)
INSERT INTO "Operations" ... -- 'SUCCESSFUL' or 'PENDING'
COMMIT;
The lock is only acquired once no other transaction is working on the same user, and only released at the end of the transaction.
The next transaction will see all committed rows in its next statement.
If all transactions stick to this modus operandi, all is fine.
Of course, transactions cannot be allowed to change rows affecting the balance of other users.
Related:
How to use RETURNING with ON CONFLICT in PostgreSQL?

Number one in your example, the serializable variant, is the only one that will guarantee correct behaviour without the code having to retry transactions if the update count is zero or the transaction was rolled back. It is also the simplest to reason about. By the way, REPEATABLE READ would also be good enough for already existing rows.
Number 3 in your example, which looks lightly like optimistic locking, might look more performant, but that depends on the type of load. Updating an index, in your case the unique index, can also be a performance hit. And generally, you have less control over locks on indexes, which makes the situation less deterministic or more difficult to reason about. Also, it is still possible to suffer from different read values in any transaction isolation level below REPEATABLE READ.
Another thing is that your implementation behaves incorrectly in the following scenario:
Process 1 starts and reads version 1
Process 2 starts and reads version 1
Process 2 succeeds and writes version 2
Process 3 starts and reads version 2
Process 3 succeeds and writes version 3
Process 1 succeeds and writes version 2 // NO VIOLATION!
What does work is optimistic locking, which looks somewhat like your number 3. Pseudo code / SQL:
BEGIN
SELECT "version", "amount" FROM "operations" WHERE "id" = identifier
// set variable oldVersion to version
// set variable newVersion to version + 1
// set variable newAmount to amount + amountOfOperation
IF newAmount < 0
ROLLBACK
ELSE
UPDATE "operations" SET "version" = newVersion, "amount" = newAmount WHERE "id" = identifier AND "version" = oldVersion
COMMIT
ENDIF
This does not require a unique index containing version. And in general, the query in the WHERE condition of the UPDATE and the update itself are correct even with READ COMMITTED transaction isolation level. I am not certain about PostGreSQL: do verify this in documentation!
In general, I would start with the serializable number 1 example, until measurements in real use-cases show that it is a problem. Then try optimistic locking and see if it improves, also with actual use-cases.
Do remember that the code must always be able to replay the transaction from BEGIN to END if the UPDATE reports zero updated rows, or if the transaction fails.
Good luck and have fun!

You will have to take a performance hit. SERIALIZABLE isolation would be the easiest way. You can increase max_predicate_locks_per_relation, max_predicate_locks_per_xact or max_predicate_locks_per_page (depending on which limit you hit) to escalate locks later.
Alternatively, you could have a table that stores the balance per user, which is updated by a deferred trigger. Then you can have a check constraint on that table. This will serialize operations only per user.

Related

PostgreSQL select for update lock, new rows

I have the following concurrency use-case: An endpoint can be called at any time and an operation is supposed to happen. The operation goes like this in pseudocode (current isolation level is READ COMMITTED):
SELECT * FROM TABLE_A WHERE IS_LATEST=true FOR UPDATE
// DO SOME APP LOGIC TO TEST VALIDITY
// ALL GOES WELL => INSERT OR UPDATE NEW ROW WITH IS_LATEST=TRUE => COMMIT
// OTHERWISE => ROLLBACK (all good not interesting)
Now this approach with SELECT FOR UPDATE is fine if two of these operations start at the same time in the respects of update. Because both transactions see the same number of rows, one will update the rows and the second transaction will wait its turn before being able to SELECT FOR UPDATE and the state is valid.
The issue I have is when I have an insert in the first transaction. What happens is that for example when the first transaction makes that lock SELECT FOR UPDATE there are two rows, then the transaction continues, in the middle of the transaction, the second transaction comes in wanting to SELECT FOR UPDATE (latest) and waits for first transaction to finish.. The first transaction finished and there is a new third item realistically in the db, but the second transaction picks up only two rows while it was waiting for the row locks to be released. (This is because at the time of calling the SELECT FOR UPDATE the snapshot was different had only two rows that matched IS_LATEST=true).
Is there a way to make this transaction such that the SELECT lock picks up the latest snapshot after waiting?
The issue is that each command only sees rows that have been committed before the query started. There are various possible solutions ...
Stricter isolation level
You can solve this with a stricter isolation level, but that's relatively expensive.
Laurenz already provided a solution for this.
Just start a new command
Keep the (cheap) default isolation level READ COMMITTED, and just start a new command.
Only few rows to lock
While only locking a hand full of rows, the dead simple solution is to repeat the same SELECT ... FOR UPDATE. The second iteration sees newly committed rows and locks them additionally.
There is a theoretical race condition with additional transactions that might lock new rows before the waiting transaction does. That would result in a deadlock. Highly unlikely, but to be absolutely sure, lock rows in consistent order:
BEGIN; -- default READ COMMITTED
SELECT FROM table_a WHERE is_latest ORDER BY id FOR UPDATE; -- consistent order
SELECT * FROM table_a WHERE is_latest ORDER BY id FOR UPDATE; -- just repeat !!
-- DO SOME APP LOGIC TO TEST VALIDITY
-- pseudo-code
IF all_good
UPDATE table_a SET is_latest = true WHERE ...;
INSERT table_a (IS_LATEST, ...) VALUES (true, ...);
COMMIT;
ELSE
ROLLBACK;
END;
A partial index on (id) WHERE is_latest would be ideal.
More rows to lock
For more than a hand full of rows, I would instead create a dedicated one-row token table. A bullet-proof implementation could look like this, run as admin or superuser:
CREATE TABLE public.single_task_x (just_me bool CHECK (just_me) PRIMARY KEY DEFAULT true);
INSERT INTO public.single_task_x VALUES (true);
REVOKE ALL ON public.single_task_x FROM public;
GRANT SELECT, UPDATE ON public.single_task_x TO public; -- or just to those who need it
See:
How to allow only one row for a table?
Then:
BEGIN; -- default READ COMMITTED
SELECT FROM public.single_task_x FOR UPDATE;
SELECT * FROM table_a WHERE is_latest; -- FOR UPDATE? ①
-- DO SOME APP LOGIC TO TEST VALIDITY
-- pseudo-code
IF all_good
ROLLBACK;
ELSE
UPDATE table_a SET is_latest = true WHERE ...;
INSERT table_a (IS_LATEST, ...) VALUES (true, ...);
COMMIT;
END;
A single lock is cheaper.
① You may or may not want to lock additionally, to defend against other writes, possibly with a weaker lock ....
Either way, all locks are released at the end of the transaction automatically.
Advisory lock
Or use an advisory lock. pg_advisory_xact_lock() persists for the duration of the transaction:
BEGIN; -- default READ COMMITTED
SELECT pg_advisory_xact_lock(123);
SELECT * FROM table_a WHERE is_latest;
-- do stuff
COMMIT; -- or ROLLBACK;
Make sure to use a unique token for your particular task. 123 in my example. Consider a look-up table if you have many different tasks.
To release the lock at a different point in time (not when the transaction ends), consider a session-level lock with pg_advisory_lock(). Then you can (and must) unlock manually with pg_advisory_unlock() - or close the session.
Both of these wait for the locked resource. There are alternative functions returning false instead of waiting ...
With your method, the query in the second transaction will return an empty result after the lock is gone, because it sees is_latest = FALSE on the row in question, and the new row is not yet visible. So you would have to retry the transaction in that case.
I suggest that you use REPEATABLE READ isolation level and optimistic locking instead:
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM table_a WHERE is_latest; -- no locks!
/* perform your application ruminations */
UPDATE table_a SET is_latest = FALSE WHERE id = <id you found above>;
INSERT INTO table_a (is_latest, ...) VALUES (TRUE, ...);
COMMIT;
Then three things may happen:
Your query finds a row, and the transaction succeeds.
Your query finds no row, then you could insert the first row.
The query finds a row, but the update of that row causes a serialization error.
In that case you know that a concurrent transaction interfered, and you repeat the complete transaction in response.

In sybase, how would I lock a stored procedure that is executing and alter the table that the stored procedure returns?

I have a table as follows:
id status
-- ------
1 pass
1 fail
1 pass
1 na
1 na
Also, I have a stored procedure that returns a table with top 100 records having status as 'na'. The stored procedure can be called by multiple nodes in an environment and I don't want them to fetch duplicate data. So, I want to lock the stored procedure while it is executing and set the status of the records obtained from the stored procedure to 'In Progress' and return that table and then release the lock, so that different nodes don't fetch the same data. How would I accomplish this?
There is already a solution provided for similar question in ms sql but it shows errors when using in sybase.
Assuming Sybase ASE ...
The bigger issue you'll likely want to consider is whether you want a single process to lock the entire table while you're grabbing your top 100 rows, or if you want other processes to still access the table?
Another question is whether you'd like multiple processes to concurrently pull 100 rows from the table without blocking each other?
I'm going to assume that you a) don't want to lock the entire table and b) you may want to allow multiple processes to concurrently pull rows from the table.
1 - if possible, make sure the table is using datarows locking (default is usually allpages); this will reduce the granularity of locks to the row level (as opposed to page level for allpages); the table will need to be datarows if you want to allow multiple processes to concurrently find/update rows in the table
2 - make sure the lock escalation setting on the table is high enough to ensure a single process's 100 row update doesn't lock the table (sp_setpglockpromote for allpages, sp_setrowlockpromote for datarows); the key here is to make sure your update doesn't escalate to a table-level lock!
3 - when it comes time to grab your set of 100 rows you'll want to ... inside a transaction ... update the 100 rows with a status value that's unique to your session, select the associated id's, then update the status again to 'In Progress'
The gist of the operation looks like the following:
declare #mysession varchar(10)
select #mysession = convert(varchar(10),##spid) -- replace ##spid with anything that
-- uniquely identifies your session
set rowcount 100 -- limit the update to 100 rows
begin tran get_my_rows
-- start with an update so that get exclusive access to the desired rows;
-- update the first 100 rows you find with your ##spid
update mytable
set status = #mysession -- need to distinguish your locked rows from
-- other processes; if we used 'In Progress'
-- we wouldn't be able to distinguish between
-- rows update earlier in the day or updated
-- by other/concurrent processes
from mytable readpast -- 'readpast' allows your query to skip over
-- locks held by other processes but it only
-- works for datarows tables
where status = 'na'
-- select your reserved id's and send back to the client/calling process
select id
from mytable
where status = #mysession
-- update your rows with a status of 'In Progress'
update mytable
set status = 'In Progress'
where status = #mysession
commit -- close out txn and release our locks
set rowcount 0 -- set back to default of 'unlimited' rows
Potential issues:
if your table is large and you don't have an index on status then your queries could take longer than necessary to run; by making sure lock escalation is high enough and you're using datarows locking (so the readpast works) you should see minimal blocking of other processes regardless of how long it takes to find the desired rows
with an index on the status column, consider that all of these updates are going to force a lot of index updates which is probably going to lead to some expensive deferred updates
if using datarows and your lock escalation is too low then an update could look the entire table, which would cause another (concurrent) process to readpast the table lock and find no rows to process
if using allpages you won't be able to use readpast so concurrent processes will block on your locks (ie, they won't be able to read around your lock)
if you've got an index on status, and several concurrent processes locking different rows in the table, there could be a chance for deadlocks to occur (likely in the index tree of the index on the status column) which in turn would require your client/application to be coded to expect and address deadlocks
To think about:
if the table is relatively small such that table scanning isn't a big cost, you could drop any index on the status column and this should reduce the performance overhead of deferred updates (related to updating the indexes)
if you can work with a session specific status value (eg, 'In Progress - #mysession') then you could eliminate the 2nd update statement (could come in handy if you're incurring deferred updates on an indexed status column)
if you have another column(s) in the table that you could use to uniquely identifier your session's rows (eg, last_updated_by_spid = ##spid, last_updated_date = #mydate - where #mydate is initially set to getdate()) then your first update could set the status = 'In Progress', the select would use ##spid and #mydate for the where clause, and the second update would not be needed [NOTE: This is, effectively, the same thing Gordon is trying to address with his session column.]
assuming you can work with a session specific status value, consider using something that will allow you to track, and fix, orphaned rows (eg, row status remains 'In Progress - #mysession' because the calling process died and never came back to (re)set the status)
if you can pass the id list back to the calling program as a single string of concatenated id values you could use the method I outline in this answer to append the id's into a #variable during the first update, allowing you to set status = 'In Progress' in the first update and also allowing you to eliminate the select and the second update
how would you tell which rows have been orphaned? you may want the ability to update a (small)datetime column with the getdate() of when you issued your update; then, if you would normally expect the status to be updated within, say, 5 minutes, you could have a monitoring process that looks for orphaned rows where status = 'In Progress' and its been more than, say, 10 minutes since the last update
If the datarows, readpast, lock escalation settings and/or deadlock potential is too much, and you can live with brief table-level locks on the table, you could have the process obtain an exclusive table level lock before performing the update and select statements; the exclusive lock would need to be obtained within a user-defined transaction in order to 'hold' the lock for the duration of your work; a quick example:
begin tran get_my_rows
-- request an exclusive table lock; wait until it's granted
lock table mytable in exclusive mode
update ...
select ...
update ...
commit
I'm not 100% sure how to do this in Sybase. But, the idea is the following.
First, add a new column to the table that represents the session or connection used to change the data. You will use this column to provide isolation.
Then, update the rows:
update top (100) t
set status = 'in progress',
session = #session
where status = 'na'
order by ?; -- however you define the "top" records
Then, you can return or process the 100 ids that are "in progress" for the given connection.
Create another table, proc_lock, that has one row
When control enters the stored procedure, start a transaction and do a select for update on the row in proc_lock (see this link). If that doesn't work for Sybase, then you could try the technique from this answer to lock the row.
Before the procedure exits, make sure to commit the transaction.
This will ensure that only one user can execute the proc at a time. When the second user tries to execute the proc, it will block until the first user's lock on the proc_lock row is released (e.g. when transaction is committed)

40001 error in PostgreSQL

I have simple table "Counters":
"CounterId" SERIAL (PK)
"CounterName" VARCHAR(50) NOT NULL (UNIQUE INDEX)
"Value" BIGINT NOT NULL
When two serializable transactions (actually there are many transactions like this in same time) executing queries:
SELECT NULL
FROM "Counters"
WHERE "CounterName" = #Value FOR UPDATE
SELECT "CounterId", "CounterName", "Value"
FROM "Counters"
WHERE "CounterName" = #Value
LIMIT 2
(this query executes by Entity Framework in same connection and transaction)
UPDATE "Counters" SET "Value" = #Value WHERE "CounterId" = #CounterId
One of transactions is rolled back with error 40001
could not serialize access due to read/write dependencies among transactions
I'm retrying error transactions (5 times), but still have this error occurs.
Maybe this caused by different predicates in first and third queries?
If two transactions like the one you describe above run concurrently, the following will happen:
Transaction 1 locks a row with SELECT ... FOR UPDATE.
Transaction 2 tries to lock the same row and is blocked.
Transaction 1 modifies the value and commits.
Transaction 2 is unblocked, and before it locks the row, it has to recheck if the row version it is about to lock is still the most current version. Because of the modification by Transaction 1, that is no longer the case, and a serialization error is thrown.
There is no way to avoid that problem if several transactions with isolation level REPEATABLE READ or higher are trying to modify the same rows. Be ready to retry often!
It seems like the transaction actually locks more rows than it modifies. That exacerbates the problem. Only lock those rows that you need to change!

Transaction isolation - INSERTS dependant on previous records values

This question is related/came from discussion about another thing:
What is the correct isolation level for Order header - Order lines transactions?
Imagine scenario where we have usual Orders_Headers and Orders_LineItems tables. Lets say also that we have a special business rules that say:
Each order has Discount field which is calculated based on time passed from last order entered
Each next order Discount field is calculated specially if there has been more than X order in last Y hours.
Each next order Discount field is calculated specially if average frequency of last 10 orders was higher than x per minute.
Each next order Discount field is calculated specially
Point here is to show that every Order is dependant on previous ones and isolation level is crucial.
We have a transaction (just logic of the code shown):
BEGIN TRANSACTION
INSERT INTO Order_Headers...
SET #Id = SCOPE_IDENTITY()
INSERT INTO Order_LineItems...(using #Id)
DECLARE #SomeVar INT
--just example to show selecting previous x orders
--needed to calculate Discount value for new Order
SELECT #SomeVar = COUNT(*) Order_Headers
WHERE ArbitraryCriteria
UPDATE Order_Headers
SET Discount= UDF(#SomeVar)
WHERE Id = #Id
COMMIT
END TRANSACTION
We also have another transaction to read orders:
SELECT TOP 10 * FROM Order_Headers
ORDER BY Id DESC
QUESTIONS
Is SNAPSHOT isolation level for first transaction and READ COMMITED for second appropriate levels?
Is there a better way of approaching CREATE/UPDATE transaction or is this the way to do it?
The problem with snapshot is not about inserting/reading (which i assume you decided to use). Its about updates, that you should be a concerned.
Snapshot isolation levels are using row versioning. Which means any time you insert/update/delete row, those rows get duplicated in tempdb(version store, location for those kinds of rows), and increase its size by 14 bytes with a versioning tag so that your newly started transaction can read a row from the last committed transaction. Keep in mind that these resized rows will stay as they are until you rebuild the index.
This should be an indicator ,that if your table is really busy, your indexes will be defragmented much faster and it will add certain amount of % overhead on your temp.So keep that in mind.
What is even bigger concern here are updates, as i mentioned.
Any time you insert/delete/update row, you will get exclusive locks on those rows (object later),and since you snapshot is using row versioning, inserts from another transaction are adding exclusive locks on a NEW row, and that is not a problem.However if you try to update an existing row and session 2 tries to acquire X lock on that row, it will fail because session 1 already has X lock on it, and this is where you will get this message:
Read Committed and Serializable have covered these issues well, so you might wanna take that approach and test all solutions before you actually implement it. Remember all transactions will cause blocking on updates, and snapshot/read comitted snapshot will simply fail.
Me personally would`ve used read committed snapshot and altered procedure , to rerun in catch block N amount of times, but hey that has flaws as well !
The serializable option:
Using a pessimistic locking strategy by way of the updlock and serializable table hints to acquire a key range lock specified by the where criteria (backed by a supporting index to lock only the range necessary for the query):
declare #Id int, #SomeVar int;
begin tran;
select #SomeVar = count(OrderDate)
from Order_Headers with (updlock,serializable)
where OrderDate >= '20170101';
insert into Order_Headers (OrderDate, SomeVar)
select sysdatetime(), #SomeVar;
set #Id = scope_identity();
insert into Order_LineItems (id,cols)
select #Id, cols
from #TableValuedParameter;
commit tran;
A good guide to the why and how of using the updlock and serializable table hints to lock a key range with a select, and why you need both, is covered in Sam Saffron''s upsert (update/insert) patterns.
Reference:
Documentation on serializable and other Table Hints - MSDN
Key-Range Locking - MSDN
SQL Server Isolation Levels: A Series - Paul White
Questions About T-SQL Transaction Isolation Levels You Were Too Shy to Ask - Robert Sheldon
Isolation Level references curated by Brent Ozar

NHibernate: Select MAX value concurrently

Suppose I need to select max value as order number. Thus I'll select MAX(number), assign number to order, and save changes to database. However, how do I prevent others from messing with the number? Will transactions do? Something like:
ordersRepository.StartTransaction();
order.Number = ordersRepository.GetMaxNumber() + 1;
ordersRepository.Commit();
Will the code above "lock" changes so that order numbers are read/write only by one DB client? Given that transactions are plain NHibernate ones, and GetMaxNumber just does SELECT MAX(Number) FROM Orders.
Using an ITransaction with IsolationLevel.Serializable should do the job. Be careful of table contention, though. If you've got high frequency updates on the table, things could slow down big time. You might want to profile the hit on the db when using GetMaxNumber().
I had to do something similar to generate custom IDs for high concurrency usage. My solution moved the ID generation into the database, and used a separate Counter table to hold the max values.
Using a separate Counter table has a couple of plus points:
It removes the contention on the Order table
It's usually faster
If it's small enough, It can be pinned into memory
I also used a stored proc to return the next available ID:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
UPDATE [COUNTER] SET Value = Value + 1 WHERE COUNTER_ID = #counterId
COMMIT TRAN
RETURN [NEW_VALUE]
Hope that helps.