Avoid SQL deadlocks on CurrentAccountBalance update - sql

I'm evaluating PostgreSQL for some personal project.
I was inspirited by it's Multi-Version Concurrency Control (MVCC)
I simulated a basic need - insert transaction and perform a vendor balance update with many threads at the same time, running the SQL commands like:
INSERT INTO
VendorAccountTransactions (VendorId, BalanceBefore, BalanceAfter)
VALUES (
1,
(SELECT CurrentBalance FROM VendorAccounts WHERE VendorId = 1),
(SELECT CurrentBalance FROM VendorAccounts WHERE VendorId = 1) + 19.99
);
UPDATE VendorAccounts SET CurrentBalance = CurrentBalance + 19.99 WHERE VendorId = 1;
Any Idea how to avoid deadlocks in such common case?
What is is needed - simply insert the transaction description with "balance before" / "balance after" and update the balance.
It will be used in high load application.
How to achieve the right result for this simple business need?
Thank you.
Update:
Maybe there is any other solution to re-design the database to avoid deadlocks or use some other solution to keep the business need solved?

Put the update first and include both statements in a transaction. The update will updlock the vendor row and prevent concurrent transactions from entering the transaction (they will wait until the first tran completed as the updlock is not available).
This will effectively serialize access to a given vendor which will ensure consistency.

Related

Lock specific table rows to insert a new row

I have an Operations table with the columns sourceId, destinationId, amount, status. Whenever a user makes a transfer the API inserts a new row into that table, after checking the user's balance by calculating the sum of credit operations minus the sum of debit operations. Only when the balance is greater or equal than the transfer amount the operation is inserted with a successful status.
The issue is concurrency since a user performing multiple transfers at the same time might end up with a negative balance.
There are multiple ways of handling this concurrency issue with PostgreSQL:
Serializable transaction isolation level
Table locking
Row versioning
Row locking
etc.
Our expected behavior is, instead of failing with a unique violation on (sourceId, version), the database should wait for the previous transaction to finish, get the latest inserted version without setting the transaction isolation level to SERIALIZABLE.
However, I am not completely sure about the best approach. Here's what I tried:
1. Serializable transaction isolation level
This is the easiest approach, but the problem is lock escalation because if the database engine is under heavy load, 1 transaction can lock the whole table up, which is the documented behavior.
Pseudo-code:
newId = INSERT INTO "Operations" ("SourceId", "DestinationId", "Amount", "Status", "OccuredAt") values (null, 2, 3, 100, 'PENDING', null);
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM "Operations" WHERE ("SourceId" = 2 or "DestinationId"=2) and "Status" = 'SUCCESSFUL';
'''API: check if balance > transfer amount'''
UPDATE "Operations" SET "Status" = 'SUCCESSFUL' where id = newId
COMMIT;
2. Table locking
This is what we want to avoid by NOT using serializable transaction level
3. Row versioning
This approach seems best so far performance-wise. We added a column version int and a unique index on (sourceId, version), and when the transaction is inserted it is inserted with the next version. If two transactions are concurrent the database throws an error:
duplicate key value violates unique constraint "IX_Transactions_SourceWalletId_Version"
Pseudo-code:
newId = INSERT INTO "Operations" ("SourceId", "DestinationId", "Amount", "Status", "OccuredAt") values (null, 2, 3, 100, 'PENDING', null);
BEGIN;
lastVersion = SELECT o."Version"
FROM "Operations"
WHERE ("SourceId" = 2) AND ("Version" IS NOT NULL)
ORDER BY o."Version" DESC
LIMIT 1
SELECT * FROM "Operations" WHERE ("SourceId" = 2 or "DestinationId"=2)
and "Status" = 'SUCCESSFUL';
'''API: check if balance > transfer amount'''
UPDATE "Operations" SET "Status" = 'SUCCESSFUL', "Version" = lastVersion + 1 where id = newId;
COMMIT;
4. Row locking
Before calculating the user balance, lock all transaction rows with sourceWalletId = x (where x is the user making the transfer). But I can't find a way of doing this in PostgreSQL, using for update does the trick, but after a concurrent transaction waits on the first one, the result does not return the newly inserted row, which is the documented behavior for PostgreSQL.
using for update does the trick, but after a concurrent transaction waits on the first one, the result does not return the newly inserted row, which is the documented behavior for PostgreSQL.
Kind of true, but also not a show-stopper.
Yes, in default READ COMMITTED transaction isolation each statement only sees rows that were committed before the query began. The query, mind you, not the transaction. See:
Can concurrent value modification impact single select in PostgreSQL 9.1?
Just start the next query in the same transaction after acquiring the lock.
Assuming a table holding exactly one row per (relevant) user (like you should have). I'll call it "your_wallet_table", based on the cited "sourceWalletId":
BEGIN;
SELECT FROM "your_wallet_table" WHERE "sourceWalletId" = x FOR UPDATE;
-- x is the user making the transfer
-- check the user's balance (separate query!)
INSERT INTO "Operations" ... -- 'SUCCESSFUL' or 'PENDING'
COMMIT;
The lock is only acquired once no other transaction is working on the same user, and only released at the end of the transaction.
The next transaction will see all committed rows in its next statement.
If all transactions stick to this modus operandi, all is fine.
Of course, transactions cannot be allowed to change rows affecting the balance of other users.
Related:
How to use RETURNING with ON CONFLICT in PostgreSQL?
Number one in your example, the serializable variant, is the only one that will guarantee correct behaviour without the code having to retry transactions if the update count is zero or the transaction was rolled back. It is also the simplest to reason about. By the way, REPEATABLE READ would also be good enough for already existing rows.
Number 3 in your example, which looks lightly like optimistic locking, might look more performant, but that depends on the type of load. Updating an index, in your case the unique index, can also be a performance hit. And generally, you have less control over locks on indexes, which makes the situation less deterministic or more difficult to reason about. Also, it is still possible to suffer from different read values in any transaction isolation level below REPEATABLE READ.
Another thing is that your implementation behaves incorrectly in the following scenario:
Process 1 starts and reads version 1
Process 2 starts and reads version 1
Process 2 succeeds and writes version 2
Process 3 starts and reads version 2
Process 3 succeeds and writes version 3
Process 1 succeeds and writes version 2 // NO VIOLATION!
What does work is optimistic locking, which looks somewhat like your number 3. Pseudo code / SQL:
BEGIN
SELECT "version", "amount" FROM "operations" WHERE "id" = identifier
// set variable oldVersion to version
// set variable newVersion to version + 1
// set variable newAmount to amount + amountOfOperation
IF newAmount < 0
ROLLBACK
ELSE
UPDATE "operations" SET "version" = newVersion, "amount" = newAmount WHERE "id" = identifier AND "version" = oldVersion
COMMIT
ENDIF
This does not require a unique index containing version. And in general, the query in the WHERE condition of the UPDATE and the update itself are correct even with READ COMMITTED transaction isolation level. I am not certain about PostGreSQL: do verify this in documentation!
In general, I would start with the serializable number 1 example, until measurements in real use-cases show that it is a problem. Then try optimistic locking and see if it improves, also with actual use-cases.
Do remember that the code must always be able to replay the transaction from BEGIN to END if the UPDATE reports zero updated rows, or if the transaction fails.
Good luck and have fun!
You will have to take a performance hit. SERIALIZABLE isolation would be the easiest way. You can increase max_predicate_locks_per_relation, max_predicate_locks_per_xact or max_predicate_locks_per_page (depending on which limit you hit) to escalate locks later.
Alternatively, you could have a table that stores the balance per user, which is updated by a deferred trigger. Then you can have a check constraint on that table. This will serialize operations only per user.

Transaction isolation - INSERTS dependant on previous records values

This question is related/came from discussion about another thing:
What is the correct isolation level for Order header - Order lines transactions?
Imagine scenario where we have usual Orders_Headers and Orders_LineItems tables. Lets say also that we have a special business rules that say:
Each order has Discount field which is calculated based on time passed from last order entered
Each next order Discount field is calculated specially if there has been more than X order in last Y hours.
Each next order Discount field is calculated specially if average frequency of last 10 orders was higher than x per minute.
Each next order Discount field is calculated specially
Point here is to show that every Order is dependant on previous ones and isolation level is crucial.
We have a transaction (just logic of the code shown):
BEGIN TRANSACTION
INSERT INTO Order_Headers...
SET #Id = SCOPE_IDENTITY()
INSERT INTO Order_LineItems...(using #Id)
DECLARE #SomeVar INT
--just example to show selecting previous x orders
--needed to calculate Discount value for new Order
SELECT #SomeVar = COUNT(*) Order_Headers
WHERE ArbitraryCriteria
UPDATE Order_Headers
SET Discount= UDF(#SomeVar)
WHERE Id = #Id
COMMIT
END TRANSACTION
We also have another transaction to read orders:
SELECT TOP 10 * FROM Order_Headers
ORDER BY Id DESC
QUESTIONS
Is SNAPSHOT isolation level for first transaction and READ COMMITED for second appropriate levels?
Is there a better way of approaching CREATE/UPDATE transaction or is this the way to do it?
The problem with snapshot is not about inserting/reading (which i assume you decided to use). Its about updates, that you should be a concerned.
Snapshot isolation levels are using row versioning. Which means any time you insert/update/delete row, those rows get duplicated in tempdb(version store, location for those kinds of rows), and increase its size by 14 bytes with a versioning tag so that your newly started transaction can read a row from the last committed transaction. Keep in mind that these resized rows will stay as they are until you rebuild the index.
This should be an indicator ,that if your table is really busy, your indexes will be defragmented much faster and it will add certain amount of % overhead on your temp.So keep that in mind.
What is even bigger concern here are updates, as i mentioned.
Any time you insert/delete/update row, you will get exclusive locks on those rows (object later),and since you snapshot is using row versioning, inserts from another transaction are adding exclusive locks on a NEW row, and that is not a problem.However if you try to update an existing row and session 2 tries to acquire X lock on that row, it will fail because session 1 already has X lock on it, and this is where you will get this message:
Read Committed and Serializable have covered these issues well, so you might wanna take that approach and test all solutions before you actually implement it. Remember all transactions will cause blocking on updates, and snapshot/read comitted snapshot will simply fail.
Me personally would`ve used read committed snapshot and altered procedure , to rerun in catch block N amount of times, but hey that has flaws as well !
The serializable option:
Using a pessimistic locking strategy by way of the updlock and serializable table hints to acquire a key range lock specified by the where criteria (backed by a supporting index to lock only the range necessary for the query):
declare #Id int, #SomeVar int;
begin tran;
select #SomeVar = count(OrderDate)
from Order_Headers with (updlock,serializable)
where OrderDate >= '20170101';
insert into Order_Headers (OrderDate, SomeVar)
select sysdatetime(), #SomeVar;
set #Id = scope_identity();
insert into Order_LineItems (id,cols)
select #Id, cols
from #TableValuedParameter;
commit tran;
A good guide to the why and how of using the updlock and serializable table hints to lock a key range with a select, and why you need both, is covered in Sam Saffron''s upsert (update/insert) patterns.
Reference:
Documentation on serializable and other Table Hints - MSDN
Key-Range Locking - MSDN
SQL Server Isolation Levels: A Series - Paul White
Questions About T-SQL Transaction Isolation Levels You Were Too Shy to Ask - Robert Sheldon
Isolation Level references curated by Brent Ozar

select combined with an update, race condition

I would like to do something like this
begin trans
declare #maxpriceprice int;
Select #maxpriceprice MAX(price) from products where typeId = 5
and later on
Insert into products (price) values(#maxpriceprice + RAND() * 10)
commit tran
It is rather strange but it is strange application.
between the select and insert i cannot afford to havve people inserting stuff into thedatabase.
Would it be ok to do a select max price with (XLOCK) to prevent other people getting max price until i complete my transaction.
You never have a race condition in the database if you let the DBMS handle the locking for you. To do that, you have to give it enough information:
insert into products (price)
select MAX(price) + RAND() * 10
from products where typeId = 5
Would it be ok to do a select max price with (XLOCK)
Basically, no. You never want to leave a transaction open and hand control back to an application or a user. You want to get your ducks in a row, and execute your transaction. If you want to "lock" something (while, say, the user updates the information) you usually resort to an optimistic concurrency strategy whereby at time of update you verify that the row has not changed since it was read. You might want to read up on the timestamp datatype, which supports that style.
SQL also defines cursors, row-level locking, and serializable isolation. Those are all available too in SQL server, at some cost to concurrency.

SQL Server - Simultaneous Inserts to the table from multiple clients - Check Limit and Block

We are recently facing one issue with simultaneous inserts into one of our sal server tables from multiple clients. I hope you guys can help us through.
We are using stored procedure to do the transactions. In that stored procedure, for each transaction, we calculate total sales so far. If the total sales is less than the set limit,
then the transaction will be allowed. Otherwise, the transaction will be denied.
it works fine most of times. But, sometimes when multiple clients trying to do the transaction exactly at the same time, the limit check is failing as both the transactions get done.
Can you guys suggest how we can effectively enforce the limit all the time? Is there any better way to do that?
Thanks!
I don't think it is possible to do this declaratively.
If all inserts are guaranteed to go through the stored procedure and the SaleValue is not updated once inserted then the following should work (I made up table and column names as these were not supplied in the initial question)
DECLARE #SumSaleValue MONEY
BEGIN TRAN
SELECT #SumSaleValue = SUM(SaleValue)
FROM dbo.Orders WITH (UPDLOCK, HOLDLOCK)
WHERE TransactionId = #TransactionId
IF #SumSaleValue > 1000
BEGIN
RAISERROR('Cannot do insert as total would exceed order limit',16,1);
ROLLBACK;
RETURN;
END
/*Code for INSERT goes here*/
COMMIT
The HOLDLOCK gives serializable semantics and locks the entire range matching the TransactionId and the UPDLOCK prevents two concurrent transactions locking the same range thus reducing the risk of deadlocks.
An index on TransactionId,SaleValue would be best to support this query.

NHibernate: Select MAX value concurrently

Suppose I need to select max value as order number. Thus I'll select MAX(number), assign number to order, and save changes to database. However, how do I prevent others from messing with the number? Will transactions do? Something like:
ordersRepository.StartTransaction();
order.Number = ordersRepository.GetMaxNumber() + 1;
ordersRepository.Commit();
Will the code above "lock" changes so that order numbers are read/write only by one DB client? Given that transactions are plain NHibernate ones, and GetMaxNumber just does SELECT MAX(Number) FROM Orders.
Using an ITransaction with IsolationLevel.Serializable should do the job. Be careful of table contention, though. If you've got high frequency updates on the table, things could slow down big time. You might want to profile the hit on the db when using GetMaxNumber().
I had to do something similar to generate custom IDs for high concurrency usage. My solution moved the ID generation into the database, and used a separate Counter table to hold the max values.
Using a separate Counter table has a couple of plus points:
It removes the contention on the Order table
It's usually faster
If it's small enough, It can be pinned into memory
I also used a stored proc to return the next available ID:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
UPDATE [COUNTER] SET Value = Value + 1 WHERE COUNTER_ID = #counterId
COMMIT TRAN
RETURN [NEW_VALUE]
Hope that helps.