PostgreSQL select for update lock, new rows - sql

I have the following concurrency use-case: An endpoint can be called at any time and an operation is supposed to happen. The operation goes like this in pseudocode (current isolation level is READ COMMITTED):
SELECT * FROM TABLE_A WHERE IS_LATEST=true FOR UPDATE
// DO SOME APP LOGIC TO TEST VALIDITY
// ALL GOES WELL => INSERT OR UPDATE NEW ROW WITH IS_LATEST=TRUE => COMMIT
// OTHERWISE => ROLLBACK (all good not interesting)
Now this approach with SELECT FOR UPDATE is fine if two of these operations start at the same time in the respects of update. Because both transactions see the same number of rows, one will update the rows and the second transaction will wait its turn before being able to SELECT FOR UPDATE and the state is valid.
The issue I have is when I have an insert in the first transaction. What happens is that for example when the first transaction makes that lock SELECT FOR UPDATE there are two rows, then the transaction continues, in the middle of the transaction, the second transaction comes in wanting to SELECT FOR UPDATE (latest) and waits for first transaction to finish.. The first transaction finished and there is a new third item realistically in the db, but the second transaction picks up only two rows while it was waiting for the row locks to be released. (This is because at the time of calling the SELECT FOR UPDATE the snapshot was different had only two rows that matched IS_LATEST=true).
Is there a way to make this transaction such that the SELECT lock picks up the latest snapshot after waiting?

The issue is that each command only sees rows that have been committed before the query started. There are various possible solutions ...
Stricter isolation level
You can solve this with a stricter isolation level, but that's relatively expensive.
Laurenz already provided a solution for this.
Just start a new command
Keep the (cheap) default isolation level READ COMMITTED, and just start a new command.
Only few rows to lock
While only locking a hand full of rows, the dead simple solution is to repeat the same SELECT ... FOR UPDATE. The second iteration sees newly committed rows and locks them additionally.
There is a theoretical race condition with additional transactions that might lock new rows before the waiting transaction does. That would result in a deadlock. Highly unlikely, but to be absolutely sure, lock rows in consistent order:
BEGIN; -- default READ COMMITTED
SELECT FROM table_a WHERE is_latest ORDER BY id FOR UPDATE; -- consistent order
SELECT * FROM table_a WHERE is_latest ORDER BY id FOR UPDATE; -- just repeat !!
-- DO SOME APP LOGIC TO TEST VALIDITY
-- pseudo-code
IF all_good
UPDATE table_a SET is_latest = true WHERE ...;
INSERT table_a (IS_LATEST, ...) VALUES (true, ...);
COMMIT;
ELSE
ROLLBACK;
END;
A partial index on (id) WHERE is_latest would be ideal.
More rows to lock
For more than a hand full of rows, I would instead create a dedicated one-row token table. A bullet-proof implementation could look like this, run as admin or superuser:
CREATE TABLE public.single_task_x (just_me bool CHECK (just_me) PRIMARY KEY DEFAULT true);
INSERT INTO public.single_task_x VALUES (true);
REVOKE ALL ON public.single_task_x FROM public;
GRANT SELECT, UPDATE ON public.single_task_x TO public; -- or just to those who need it
See:
How to allow only one row for a table?
Then:
BEGIN; -- default READ COMMITTED
SELECT FROM public.single_task_x FOR UPDATE;
SELECT * FROM table_a WHERE is_latest; -- FOR UPDATE? ①
-- DO SOME APP LOGIC TO TEST VALIDITY
-- pseudo-code
IF all_good
ROLLBACK;
ELSE
UPDATE table_a SET is_latest = true WHERE ...;
INSERT table_a (IS_LATEST, ...) VALUES (true, ...);
COMMIT;
END;
A single lock is cheaper.
① You may or may not want to lock additionally, to defend against other writes, possibly with a weaker lock ....
Either way, all locks are released at the end of the transaction automatically.
Advisory lock
Or use an advisory lock. pg_advisory_xact_lock() persists for the duration of the transaction:
BEGIN; -- default READ COMMITTED
SELECT pg_advisory_xact_lock(123);
SELECT * FROM table_a WHERE is_latest;
-- do stuff
COMMIT; -- or ROLLBACK;
Make sure to use a unique token for your particular task. 123 in my example. Consider a look-up table if you have many different tasks.
To release the lock at a different point in time (not when the transaction ends), consider a session-level lock with pg_advisory_lock(). Then you can (and must) unlock manually with pg_advisory_unlock() - or close the session.
Both of these wait for the locked resource. There are alternative functions returning false instead of waiting ...

With your method, the query in the second transaction will return an empty result after the lock is gone, because it sees is_latest = FALSE on the row in question, and the new row is not yet visible. So you would have to retry the transaction in that case.
I suggest that you use REPEATABLE READ isolation level and optimistic locking instead:
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM table_a WHERE is_latest; -- no locks!
/* perform your application ruminations */
UPDATE table_a SET is_latest = FALSE WHERE id = <id you found above>;
INSERT INTO table_a (is_latest, ...) VALUES (TRUE, ...);
COMMIT;
Then three things may happen:
Your query finds a row, and the transaction succeeds.
Your query finds no row, then you could insert the first row.
The query finds a row, but the update of that row causes a serialization error.
In that case you know that a concurrent transaction interfered, and you repeat the complete transaction in response.

Related

Lock specific table rows to insert a new row

I have an Operations table with the columns sourceId, destinationId, amount, status. Whenever a user makes a transfer the API inserts a new row into that table, after checking the user's balance by calculating the sum of credit operations minus the sum of debit operations. Only when the balance is greater or equal than the transfer amount the operation is inserted with a successful status.
The issue is concurrency since a user performing multiple transfers at the same time might end up with a negative balance.
There are multiple ways of handling this concurrency issue with PostgreSQL:
Serializable transaction isolation level
Table locking
Row versioning
Row locking
etc.
Our expected behavior is, instead of failing with a unique violation on (sourceId, version), the database should wait for the previous transaction to finish, get the latest inserted version without setting the transaction isolation level to SERIALIZABLE.
However, I am not completely sure about the best approach. Here's what I tried:
1. Serializable transaction isolation level
This is the easiest approach, but the problem is lock escalation because if the database engine is under heavy load, 1 transaction can lock the whole table up, which is the documented behavior.
Pseudo-code:
newId = INSERT INTO "Operations" ("SourceId", "DestinationId", "Amount", "Status", "OccuredAt") values (null, 2, 3, 100, 'PENDING', null);
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM "Operations" WHERE ("SourceId" = 2 or "DestinationId"=2) and "Status" = 'SUCCESSFUL';
'''API: check if balance > transfer amount'''
UPDATE "Operations" SET "Status" = 'SUCCESSFUL' where id = newId
COMMIT;
2. Table locking
This is what we want to avoid by NOT using serializable transaction level
3. Row versioning
This approach seems best so far performance-wise. We added a column version int and a unique index on (sourceId, version), and when the transaction is inserted it is inserted with the next version. If two transactions are concurrent the database throws an error:
duplicate key value violates unique constraint "IX_Transactions_SourceWalletId_Version"
Pseudo-code:
newId = INSERT INTO "Operations" ("SourceId", "DestinationId", "Amount", "Status", "OccuredAt") values (null, 2, 3, 100, 'PENDING', null);
BEGIN;
lastVersion = SELECT o."Version"
FROM "Operations"
WHERE ("SourceId" = 2) AND ("Version" IS NOT NULL)
ORDER BY o."Version" DESC
LIMIT 1
SELECT * FROM "Operations" WHERE ("SourceId" = 2 or "DestinationId"=2)
and "Status" = 'SUCCESSFUL';
'''API: check if balance > transfer amount'''
UPDATE "Operations" SET "Status" = 'SUCCESSFUL', "Version" = lastVersion + 1 where id = newId;
COMMIT;
4. Row locking
Before calculating the user balance, lock all transaction rows with sourceWalletId = x (where x is the user making the transfer). But I can't find a way of doing this in PostgreSQL, using for update does the trick, but after a concurrent transaction waits on the first one, the result does not return the newly inserted row, which is the documented behavior for PostgreSQL.
using for update does the trick, but after a concurrent transaction waits on the first one, the result does not return the newly inserted row, which is the documented behavior for PostgreSQL.
Kind of true, but also not a show-stopper.
Yes, in default READ COMMITTED transaction isolation each statement only sees rows that were committed before the query began. The query, mind you, not the transaction. See:
Can concurrent value modification impact single select in PostgreSQL 9.1?
Just start the next query in the same transaction after acquiring the lock.
Assuming a table holding exactly one row per (relevant) user (like you should have). I'll call it "your_wallet_table", based on the cited "sourceWalletId":
BEGIN;
SELECT FROM "your_wallet_table" WHERE "sourceWalletId" = x FOR UPDATE;
-- x is the user making the transfer
-- check the user's balance (separate query!)
INSERT INTO "Operations" ... -- 'SUCCESSFUL' or 'PENDING'
COMMIT;
The lock is only acquired once no other transaction is working on the same user, and only released at the end of the transaction.
The next transaction will see all committed rows in its next statement.
If all transactions stick to this modus operandi, all is fine.
Of course, transactions cannot be allowed to change rows affecting the balance of other users.
Related:
How to use RETURNING with ON CONFLICT in PostgreSQL?
Number one in your example, the serializable variant, is the only one that will guarantee correct behaviour without the code having to retry transactions if the update count is zero or the transaction was rolled back. It is also the simplest to reason about. By the way, REPEATABLE READ would also be good enough for already existing rows.
Number 3 in your example, which looks lightly like optimistic locking, might look more performant, but that depends on the type of load. Updating an index, in your case the unique index, can also be a performance hit. And generally, you have less control over locks on indexes, which makes the situation less deterministic or more difficult to reason about. Also, it is still possible to suffer from different read values in any transaction isolation level below REPEATABLE READ.
Another thing is that your implementation behaves incorrectly in the following scenario:
Process 1 starts and reads version 1
Process 2 starts and reads version 1
Process 2 succeeds and writes version 2
Process 3 starts and reads version 2
Process 3 succeeds and writes version 3
Process 1 succeeds and writes version 2 // NO VIOLATION!
What does work is optimistic locking, which looks somewhat like your number 3. Pseudo code / SQL:
BEGIN
SELECT "version", "amount" FROM "operations" WHERE "id" = identifier
// set variable oldVersion to version
// set variable newVersion to version + 1
// set variable newAmount to amount + amountOfOperation
IF newAmount < 0
ROLLBACK
ELSE
UPDATE "operations" SET "version" = newVersion, "amount" = newAmount WHERE "id" = identifier AND "version" = oldVersion
COMMIT
ENDIF
This does not require a unique index containing version. And in general, the query in the WHERE condition of the UPDATE and the update itself are correct even with READ COMMITTED transaction isolation level. I am not certain about PostGreSQL: do verify this in documentation!
In general, I would start with the serializable number 1 example, until measurements in real use-cases show that it is a problem. Then try optimistic locking and see if it improves, also with actual use-cases.
Do remember that the code must always be able to replay the transaction from BEGIN to END if the UPDATE reports zero updated rows, or if the transaction fails.
Good luck and have fun!
You will have to take a performance hit. SERIALIZABLE isolation would be the easiest way. You can increase max_predicate_locks_per_relation, max_predicate_locks_per_xact or max_predicate_locks_per_page (depending on which limit you hit) to escalate locks later.
Alternatively, you could have a table that stores the balance per user, which is updated by a deferred trigger. Then you can have a check constraint on that table. This will serialize operations only per user.

Retrieve UNLOCKED records

In SSMS, in one session, I acquired a exclusive lock on a table1 for a specific record as below.
Session1
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRAN
SELECT * FROM TABLE1 WITH (XLOCK,ROWLOCK)
WHERE (FIELD1+FIELD2) = ('0101R001');
In another Session2
How to get unlocked records from table1.
When used with readpast as below, the results are inconsistent (displays all records). Is there a alternative ways to identify the unlocked records alone from table1 ?
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM TABLE1 WITH (READPAST)
Before modification, we acquire a xclusive lock and we do some processing on those fetched values and then initiate an update statement followed by commit. During this time, no other process should change the values of that record. Using XLOCK, was able to get an exclusive lock on that record. This is working. In the select statement, if we change to WITH (UPDLOCK), it allows other process to read that record values which we want to restrict.
With Xclusive locks in place, if we use (READPAST) along with where clause involving primary keys to check for specific record, it skips that locked record which is an expected behaviour. If no where clause, it displays all records (incl locked records).
SELECT * FROM TABLE1 WITH (READPAST)

SELECT COUNT(SomeId) with INSERT later to same SomeId: Appropriate locking strategy?

I am using SQL Server 2012. I have a repeatable read transaction where I perform this query:
select count(SomeId)
from dbo.MyTable
where SomeId = #SomeId
SomeId is a column whose value may repeat in the table (think foreign key). However, SomeId is not a member of any index nor is it a foreign key.
Later in the transaction, I insert a record into dbo.MyTable with the same #SomeId, thus changing what the select count(*) would return were I to run it again:
insert into dbo.MyTable (SomeId, ...)
values (#SomeId, ...)
Several threads in my application can execute this transaction at the same time. Because of this, I'm getting deadlocks on the insert statement. At first, I thought an updlock would be appropriate on the select statement, but I quickly realized that it wouldn't work because I'm not actually updating the rows selected by the select count(SomeId).
My question is this: is there any way to avoid a potentially expensive table lock? Is there any way to lock just rows that involve SomeId, even if they haven't been inserted yet (strange, I know)? I want to force other threads to wait while the original transaction completes its work but I don't want to lock rows unnecessarily.
EDIT
Here's what I'm trying to accomplish:
I only want to insert up to eight rows for a particular SomeId. There are several unrelated processes that can start one of these transactions, potentially at the same time. The select count detects whether there are already eight rows and causes the operation to fail for that process. If the count is less than eight, that same transaction performs additional work, then inserts a record at the end, thus effectively incrementing the count were the select count to be run again. I am hitting the deadlock on the insert statement.
If you have several processes that try to do the same thing and you don't want to have more records than some number, you will need to actually prevent those processes from running at the the same time.
One way would be to read the counts with exclusive lock:
select count(SomeId)
from dbo.MyTable with (xlock)
where SomeId = #SomeId
This way those records will be blocked until the transaction completes.
You should create an index for the SomeId column though as it will be most likely that the locks will be on held on the index level this way.

What type of Transaction IsolationLevel should be used to ignore inserts but lock the selected row?

I have a process that starts a transaction, inserts a record into Table1, and then calls a long running web service (up to 30 seconds). If the web service call fails then the insert is rolled back (which is what we want). Here is an example of the insert (it is actually multiple inserts into multiple tables but I am simplifying for this question):
INSERT INTO Table1 (UserId, StatusTypeId) VALUES (#UserId, 1)
I have a second process that queries Table1 from the first step like this:
SELECT TOP 1 * FROM Table1 WHERE StatusTypeId=2
and then updates that row for a user. When process 1 is running, Table1 is locked so process 2 will not complete until process 1 finishes which is a problem because a long delay is introduced while process 1 finishes its web service call.
Process 1 will only ever insert a StatusTypeId of 1 and it is also the only operation that inserts into Table1. Process 2 will only query on StatusTypeId = 2. I want to tell Process 2 to ignore any inserts into Table1 but lock the row that it selects. The default isolation level for Process 2 is waiting on too much but I have a fear that IsolationLevel.ReadUncommitted allows reading of too much dirty data. I do not want two users running Process 2 and then accidentally getting the same row.
Is there a different IsolationLevel to use other than ReadUncommitted that says ignore inserted rows but make sure the select locks the row that is selected?
Regarding the SELECT being blocked by the insert this should be avoidable by providing appropriate indexes.
Test Table.
CREATE TABLE Table1
(
UserId INT PRIMARY KEY,
StatusTypeId INT,
AnotherColumn varchar(50)
)
insert into Table1
SELECT number, (LEN(type)%2)+1, newid()
FROM master.dbo.spt_values
where type='p'
Query window one
BEGIN TRAN
INSERT INTO Table1 (UserId, StatusTypeId) VALUES (5000, 1)
WAITFOR DELAY '00:01';
ROLLBACK
Query window two (Blocks)
SELECT TOP 1 *
FROM Table1
WHERE StatusTypeId=2
ORDER BY AnotherColumn
But if you retry the test after adding an index it won't block CREATE NONCLUSTERED INDEX ix ON Table1 (StatusTypeId,AnotherColumn)
Regarding your locking of rows for Process 2 you can use the following (the READPAST hint will allow 2 concurrent Process 2 transactions to begin processing different rows rather than one blocking the other). You might find this article by Remus Rusanu relevant
BEGIN TRAN
SELECT TOP 1 *
FROM Table1 WITH (UPDLOCK, READPAST)
WHERE StatusTypeId=2
ORDER BY AnotherColumn
/*
Rest of Process Two's code here
*/
COMMIT
Edit: Having re-read the question, the lock on any insert should not effect any select under READ COMMITTED this could be an issue with your indexes.
However, from your comments and rest of the question it seems you want only one transaction to be able to read a row at a time, which is not what an isolation level prevents.
They prevent
Dirty Read - reading uncommitted data in a transaction which could be rolled back - occurs in READ UNCOMMITTED, prevented in READ COMMITTED, REPEATABLE READ, SERIALIZABLE
Non Repeatable Reads - a row is updated whilst being read in an uncommitted transaction, meaning the same read of a particular row can occur twice in a transaction and produce a different results - occurs in READ UNCOMMITTED, READ COMMITTED. prevented in REPEATABLE READ, SERIALIZABLE
phantom rows - a row is inserted or deleted whilst being read in an uncommited transaction, meaning that the same read of multiple rows can occur twice in a transaction and produce different results, with either added or missing rows - occurs in READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, prevented in SERIALIZABLE

Does "SELECT FOR UPDATE" prevent other connections inserting when the row is not present?

I'm interested in whether a SELECT FOR UPDATE query will lock a non-existent row.
Example
Table FooBar with two columns, foo and bar, foo has a unique index.
Issue query SELECT bar FROM FooBar WHERE foo = ? FOR UPDATE
If the first query returns zero rows, issue a query
INSERT INTO FooBar (foo, bar) values (?, ?)
Now is it possible that the INSERT would cause an index violation or does the SELECT FOR UPDATE prevent that?
Interested in behavior on SQLServer (2005/8), Oracle and MySQL.
MySQL
SELECT ... FOR UPDATE with UPDATE
Using transactions with InnoDB (auto-commit turned off), a SELECT ... FOR UPDATE allows one session to temporarily lock down a particular record (or records) so that no other session can update it. Then, within the same transaction, the session can actually perform an UPDATE on the same record and commit or roll back the transaction. This would allow you to lock down the record so no other session could update it while perhaps you do some other business logic.
This is accomplished with locking. InnoDB utilizes indexes for locking records, so locking an existing record seems easy--simply lock the index for that record.
SELECT ... FOR UPDATE with INSERT
However, to use SELECT ... FOR UPDATE with INSERT, how do you lock an index for a record that doesn't exist yet? If you are using the default isolation level of REPEATABLE READ, InnoDB will also utilize gap locks. As long as you know the id (or even range of ids) to lock, then InnoDB can lock the gap so no other record can be inserted in that gap until we're done with it.
If your id column were an auto-increment column, then SELECT ... FOR UPDATE with INSERT INTO would be problematic because you wouldn't know what the new id was until you inserted it. However, since you know the id that you wish to insert, SELECT ... FOR UPDATE with INSERT will work.
CAVEAT
On the default isolation level, SELECT ... FOR UPDATE on a non-existent record does not block other transactions. So, if two transactions both do a SELECT ... FOR UPDATE on the same non-existent index record, they'll both get the lock, and neither transaction will be able to update the record. In fact, if they try, a deadlock will be detected.
Therefore, if you don't want to deal with a deadlock, you might just do the following:
INSERT INTO ...
Start a transaction, and perform the INSERT. Do your business logic, and either commit or rollback the transaction. As soon as you do the INSERT on the non-existent record index on the first transaction, all other transactions will block if they attempt to INSERT a record with the same unique index. If the second transaction attempts to insert a record with the same index after the first transaction commits the insert, then it will get a "duplicate key" error. Handle accordingly.
SELECT ... LOCK IN SHARE MODE
If you select with LOCK IN SHARE MODE before the INSERT, if a previous transaction has inserted that record but hasn't committed yet, the SELECT ... LOCK IN SHARE MODE will block until the previous transaction has completed.
So to reduce the chance of duplicate key errors, especially if you hold the locks for awhile while performing business logic before committing them or rolling them back:
SELECT bar FROM FooBar WHERE foo = ? LOCK FOR UPDATE
If no records returned, then
INSERT INTO FooBar (foo, bar) VALUES (?, ?)
In Oracle, the SELECT ... FOR UPDATE has no effect on a non-existent row (the statement simply raises a No Data Found exception). The INSERT statement will prevent a duplicates of unique/primary key values. Any other transactions attempting to insert the same key values will block until the first transaction commits (at which time the blocked transaction will get a duplicate key error) or rolls back (at which time the blocked transaction continues).
On Oracle:
Session 1
create table t (id number);
alter table t add constraint pk primary key(id);
SELECT *
FROM t
WHERE id = 1
FOR UPDATE;
-- 0 rows returned
-- this creates row level lock on table, preventing others from locking table in exclusive mode
Session 2
SELECT *
FROM t
FOR UPDATE;
-- 0 rows returned
-- there are no problems with locking here
rollback; -- releases lock
INSERT INTO t
VALUES (1);
-- 1 row inserted without problems
I wrote a detailed analysis of this thing on SQL Server: Developing Modifications that Survive Concurrency
Anyway, you need to use SERIALIZABLE isolation level, and you really need to stress test.
SQL Server only has the FOR UPDATE as part of a cursor. And, it only applies to UPDATE statements that are associated with the current row in the cursor.
So, the FOR UPDATE has no relationship with INSERT. Therefore, I think your answer is that it's not applicable in SQL Server.
Now, it may be possible to simulate the FOR UPDATE behavior with transactions and locking strategies. But, that may be more than what you're looking for.