Atomic update in SQL with two queries - sql

Assume that I want to do atomic update for a column with two queries:
SELECT num FROM table WHERE id = ?
Put num inside variable x
UPDATE table SET num = x + 1 WHERE id = ?
And I make these two queries inside a transaction with isolation level: repeatable read.
Is it possible to have a deadlock situation?
Let's say there are two transactions running.
T1 reads the value num and acquires the share lock on this row
T2 reads the value num and acquires the share lock on this row
T1 now wants to get the exclusive lock to write but cannot because T2 holds a share lock
T2 now wants to get the exclusive lock to write but cannot because T1 holds a share lock
We have deadlock here.
But I tried in the code with Spring data JPA, there is no deadlock. Instead I have race condition and the final incremented value is less than what is expected. If I move to isolation level serializable, then I will have deadlock, and there is no race condition. The final increment value is good.
I thought serializable only deals with the range lock. Why does the deadlock happen in serializable isolation level and not repeatable read? with the two queries above.
PS: Don't tell me to user just one single UPDATE statement, I am trying to learn how to manual do atomic update with two statement and learn more about transaction here.

Related

Is it possible to lock on a value of a column in SQL Server?

I have a table that looks like that:
Id GroupId
1 G1
2 G1
3 G2
4 G2
5 G2
It should at any time be possible to read all of the rows (committed only). When there will be an update I want to have a transaction that will lock on group id, i.e. there should at any given time be only one transaction that attempts to update per GroupId.
It should ideally be still possible to read all committed rows (i.e. other transaction/ordinary reads that will not try to acquire the "update per group lock" should be still able to read).
The reason I want to do this is that an update can not rely on "outdated" data. I.e. I do make some calculations in a transaction and another transaction cannot edit row with id 1 or add a new row with the same GroupId after these rows were read by the first transaction (even though the first transaction would never modify the row itself it will be dependent on it's value).
Another "nice to have" requirement is that sometimes I would need the same requirement "cross group", i.e. the update transaction would have to lock 2 groups at the same time. (This is not a dynamic number of groups, but rather just 2)
Here are some ideas. I don't think any of them are perfect - I think you will need to give yourself a set of use-cases and try them. Some of the situations I tried after applying locks
SELECTs with the WHERE filter as another group
SELECTs with the WHERE filter as the locked group
UPDATES on the table with the WHERE clause as another group
UPDATEs on the table where ID (not GrpID!) was not locked
UPDATEs on the table where the row was locked (e.g., IDs 1 and 2)
INSERTs into the table with that GrpId
I have the funny feeling that none of these will be 100%, but the most likely answer is the second one (setting the transaction isolation level). It will probably lock more than desired, but will give you the isolation you need.
Also one thing to remember: if you lock many rows (e.g., there are thousands of rows with the GrpId you want) then SQL Server can escalate the lock to be a full-table lock. (I believe the tipping point is 5000 locks, but not sure).
Old-school hackjob
At the start of your transaction, update all the relevant rows somehow e.g.,
BEGIN TRAN
UPDATE YourTable
SET GrpId = GrpId
WHERE GrpId = N'G1';
-- Do other stuff
COMMIT TRAN;
Nothing else can use them because (bravo!) they are a write within a transaction.
Convenient - set isolation level
See https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-transaction-locking-and-row-versioning-guide?view=sql-server-ver15#isolation-levels-in-the-
Before your transaction, set the isolation level high e.g., SERIALIZABLE.
You may want to read all the relevant rows at the start of your transaction (e.g., SELECT Grp FROM YourTable WHERE Grp = N'Grp1') to lock them from being updated.
Flexible but requires a lot of coding
Use resource locking with sp_getapplock and sp_releaseapplock.
These are used to lock resources, not tables or rows.
What is a resource? Well, anything you want it to be. In this case, I'd suggest 'Grp1', 'Grp2' etc. It doesn't actually lock rows. Instead, you ask (via sp_getapplock, or APPLOCK_TEST) whether you can get the resource lock. If so, continue. If not, then stop.
Anything code referring to these tables needs to be reviewed and potentially modified to ask if it's allowed to run or not. If something doesn't ask for permission and just does it, there's no actual real locks stopping it (except via any transactions you've explicity specified).
You also need to ensure that errors are handled appropriately (e.g., still releasing the app_lock) and that processes that are blocked are re-tried.

Generating a sequence number in a PostgreSQL database - concurrency & isolation levels

Similar question; Update SQL with consecutive numbering
I want to be able to generate a sequence number by incrementing a column num in a table called SeqNum. The SeqNum table layout;
|num|
|===|
| 0 |
The query being run;
BEGIN TRANSACTION
UPDATE SeqNum
SET num = num + 1
SELECT num from SeqNum
COMMIT TRANSACTION
My question is if I have multiple processes running this query with a READ COMMITTED isolation level at the same time, will the select clause always return a unique updated value. I'm assuming this will be consistent and no two processes would ever return the same num... Obviously this is all running in the one transaction. If it wasn't in a transaction I would expect it to potentially return duplicate values.
I'm not sure how the behavior changes (if at all) depending on the isolation level.
In PostgreSQL, you can request any of the four standard transaction isolation levels. But internally, there are only three distinct isolation levels, which correspond to the levels Read Committed, Repeatable Read, and Serializable. When you select the level Read Uncommitted you really get Read Committed ...1
In read committed isolation level dirty reads are not possible per standard, which means these transactions cannot read data written by a concurrent uncommitted transaction. That can only happen in the read uncommitted isolation level per standard (but won't happen in PostgreSQL: the four isolation levels only define which phenomena must not happen, they do not define which phenomena must happen).
In short your select clause won't return a unique value always. Neither will, if you rewrite it to UPDATE ... RETUNRING ..., but the time window will be really small, so chances will be much more lower to multiple transactions return the same value.
But lucky for you, the only thing in PostgreSQL, which doesn't affected by transactions, is the sequence:
To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used, even if the transaction that did the nextval later aborts. This means that aborted transactions might leave unused "holes" in the sequence of assigned values. 2
Because sequences are non-transactional, changes made by setval are not undone if the transaction rolls back. 2

Can an UPDATE of a Transaction T1 run at the same time with T2 that does a SELECT get to those rows before the Select does?

Let's say we have a table with many rows and a primary key (index).
T1 will do a SELECT that would search for some rows using the WHERE clause, locking them with Shared locks. At the same time, T2 will do an update on a row that falls into the range of T1's requested rows.
The question is, can the Update get to those rows before he Select does?
How is the engine locking rows when Selecting, one-by-one ,like : read this row,lock it, now move to the next, etc. In this case the Update might get to the rows before the Select gets them? and what if no index was used but a table scan instead?
The Update statement has a Select component too. How does the Update actually lock a row?
One by one, first reads it, then locks it with X, next one etc. In this scenario the Select could get to the rows before the Update does?
And is the Select part of the Update affected by the isolation level?
The question is targeted on traditional ANSI isolation systems and not Oracle/MVCC
There's quite a few questions here but I'll try to address some of them.
Both SELECT and UPDATE will lock as they go through the index or records in the table. Records already locked by SELECT will not be available for UPDATE and the other way round. This may even cause a deadlock, depending on the order of those operations (which is beyond your control).
If you need to update before select, you need to control it from your code level. If you start both at once, SQL Server will just start executing them and locking.
SELECT is affected by the isolation level, e.g. when your isolation level will be read uncommitted, select will read the data and not put any locks.

When does sql exclusively lock a row in an update statement?

Can a race condition occur in sql under these conditions?
If I have this SQL update running in one thread call it statement 1:
Update Items
Set Flag = B
where Flag = A;
And this SQL update running in another call it statement 2:
Update Items
Set Flag = C
where Flag = A;
Is it possible for each thread to read the same record where Flag is equal to A and write the record with their own values? Such that statement 1 can write it first and then statement 2 writes it or visa versa?
The answer to this question depends on when the database exclusively locks the update. Does it happen before it finds the records or after it finds the records and evaluates the where clause?
First, there are three lock contexts:
Database level lock
Table level lock
Row level lock
Then you have four lock modes:
IX
IS
X
S
IX and IS locks are "intention" locks. These locks are held before acquiring other types of locks. X locks are exclusive (write) locks and S locks are shared (read) locks.
The locks (IX,IS,X or S) locks can be taken at any context level. An X lock at the database level will block all other operations in the database for example. This is the type of lock that SQLlite takes. An S lock is taken for the entire database during reads, and an X lock is taken for the entire database during writes. Writes will wait for any S locks to complete and will block new S and X locks until the write lock is released. This provides a serializable isolation transaction level.
For MySQL, the locking depends on the storage engine. MyISAM will take X and S locks on entire (sets of) tables. X locks will wait on existing S or X locks and block new locks. New X locks will be given higher priority in the queue, moved ahead of new S locks. This behavior can be changed by setting LOW_PRIORITY_UPDATES, which could result in write starvation because writes will be de-prioritized in favor of reads.
It is possible in MySQL to obtain an X lock over the entire database using 'FLUSH TABLES WITH READ LOCK'.
InnoDB locks rows as they are encountered via an index read. InnoDB locks index records and locks the records when the index records are traversed. InnoDB uses special locks called 'gap' locks to ensure REPEATABLE-READ transaction isolation level. Locks are held on index entries, so if a table is not well indexed for an UPDATE query, then many rows will be locked. Note that InnoDB does not create S locks for normal SELECT queries. It uses row versioning, not row level locking for consistent snapshots.
When acquiring X locks, the database needs to detect deadlocks. Consider the following:
>connection 1
start transaction;
update T set c = c + 1 order by id asc;
>connection 2
start transaction;
update T set c = c - 1 order by id desc;
In a row locking model, these two statements can not both complete successfully. The first would wait forever to acquire locks the second holds, and vice-versa. The database will pick one of the connections to roll back. InnoDB will pick the connection which has made the fewest number of changes. MyISAM will lock the whole table for whichever connection acquires the lock first, and then the second will run after the first completes.
The simple example given by you will be resolved by X locks at any context (database, table or row). If two connections begin at exactly the same type, both running two updates which try to update the same row, both will attempt to acquire an X lock. Only one connection can acquire the X lock. It is not possible to determine exactly which one will acquire the lock. The other connection will have to wait until the lock is released until it can acquire the X lock. Keep in mind, that if the row was locked by a DELETE or UPDATE, then the waiter might end up not acquiring a lock after waiting, because there is nothing left in the database to lock.
In your example, the first UPDATE to acquire the X lock, and the second UPDATE will then wait on the X lock and will eventually execute but not match any rows.
Exclusive lock, used for data-modification operations, such as INSERT, UPDATE, or DELETE will be used in this scenario.
An exclusive lock ensures that multiple updates cannot be made to the same resource at the same time.
You will not get a race condition in this scenario.
If you have a more complex scenario involving multiple tables then you may get race conditions, or deadlocks. There are many ways to avoid this, simplifying and separating queries, etc.
You can also apply hints to queries that tell SQL what type of lock to use.
http://msdn.microsoft.com/en-us/library/aa213026(v=sql.80).aspx
Sounds like you should read about locking. SQL server has a complex set of logic and will perform either table or row level locks based on the number of rows it estimates will require updates. Unless you specifically tell it which you want it to perform it can even vary from query to query. Usually if you are modifying a small subset of the table it will choose a row level lock.
SQL Server is designed with ACID in mind, thus it writes changes to its logs before performing any actual updates to the data. This allows any failed updates to be rolled back and allows consistency between queries (like your asking about). You can perform dirty reads to get around locking issues, however you cannot prevent SQL Server from locking inserted, updated and/or deleted records.
SQL Server Locking
EDIT: Here is an article about ACID.
ACID - Wikipedia
All SQL databases pretty much guarantee that such a collision will not occur. "When" locking occurs depends on whether locking is at the table, partition, page, or row level. Or, whether you have turned off such locking in your database.
What can happen, if you have concurrent update statements and multiple rows being updated, is that sone row are updated with the first, some with the second.
In general, I think of the where clause as being evaluated to select the row set, lock the rows one at a time, do the update and unlock. However, this depends on the type of locking. In this case, the scenario above would continue with the values flipping.
If you are concerned about this situation, use table level locking to force serialization when concurrent update requests are being processed.

Difference between "read commited" and "repeatable read" in SQL Server

I think the above isolation levels are so alike. Could someone please describe with some nice examples what the main difference is?
Read committed is an isolation level that guarantees that any data read was committed at the moment is read. It simply restricts the reader from seeing any intermediate, uncommitted, 'dirty' read. It makes no promise whatsoever that if the transaction re-issues the read, will find the Same data, data is free to change after it was read.
Repeatable read is a higher isolation level, that in addition to the guarantees of the read committed level, it also guarantees that any data read cannot change, if the transaction reads the same data again, it will find the previously read data in place, unchanged, and available to read.
The next isolation level, serializable, makes an even stronger guarantee: in addition to everything repeatable read guarantees, it also guarantees that no new data can be seen by a subsequent read.
Say you have a table T with a column C with one row in it, say it has the value '1'. And consider you have a simple task like the following:
BEGIN TRANSACTION;
SELECT * FROM T;
WAITFOR DELAY '00:01:00'
SELECT * FROM T;
COMMIT;
That is a simple task that issue two reads from table T, with a delay of 1 minute between them.
under READ COMMITTED, the second SELECT may return any data. A concurrent transaction may update the record, delete it, insert new records. The second select will always see the new data.
under REPEATABLE READ the second SELECT is guaranteed to display at least the rows that were returned from the first SELECT unchanged. New rows may be added by a concurrent transaction in that one minute, but the existing rows cannot be deleted nor changed.
under SERIALIZABLE reads the second select is guaranteed to see exactly the same rows as the first. No row can change, nor deleted, nor new rows could be inserted by a concurrent transaction.
If you follow the logic above you can quickly realize that SERIALIZABLE transactions, while they may make life easy for you, are always completely blocking every possible concurrent operation, since they require that nobody can modify, delete nor insert any row. The default transaction isolation level of the .Net System.Transactions scope is serializable, and this usually explains the abysmal performance that results.
And finally, there is also the SNAPSHOT isolation level. SNAPSHOT isolation level makes the same guarantees as serializable, but not by requiring that no concurrent transaction can modify the data. Instead, it forces every reader to see its own version of the world (its own 'snapshot'). This makes it very easy to program against as well as very scalable as it does not block concurrent updates. However, that benefit comes with a price: extra server resource consumption.
Supplemental reads:
Isolation Levels in the Database Engine
Concurrency Effects
Choosing Row Versioning-based Isolation Levels
Repeatable Read
The state of the database is maintained from the start of the transaction. If you retrieve a value in session1, then update that value in session2, retrieving it again in session1 will return the same results. Reads are repeatable.
session1> BEGIN;
session1> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> BEGIN;
session2> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> UPDATE names SET firstname = 'Bob' WHERE id = 7;
session2> SELECT firstname FROM names WHERE id = 7;
Bob
session2> COMMIT;
session1> SELECT firstname FROM names WHERE id = 7;
Aaron
Read Committed
Within the context of a transaction, you will always retrieve the most recently committed value. If you retrieve a value in session1, update it in session2, then retrieve it in session1again, you will get the value as modified in session2. It reads the last committed row.
session1> BEGIN;
session1> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> BEGIN;
session2> SELECT firstname FROM names WHERE id = 7;
Aaron
session2> UPDATE names SET firstname = 'Bob' WHERE id = 7;
session2> SELECT firstname FROM names WHERE id = 7;
Bob
session2> COMMIT;
session1> SELECT firstname FROM names WHERE id = 7;
Bob
Makes sense?
Simply the answer according to my reading and understanding to this thread and #remus-rusanu answer is based on this simple scenario:
There are two transactions A and B.
Transaction B is reading Table X
Transaction A is writing in table X
Transaction B is reading again in Table X.
ReadUncommitted: Transaction B can read uncommitted data from Transaction A and it could see different rows based on B writing. No lock at all
ReadCommitted: Transaction B can read ONLY committed data from Transaction A and it could see different rows based on COMMITTED only B writing. could we call it Simple Lock?
RepeatableRead: Transaction B will read the same data (rows) whatever Transaction A is doing. But Transaction A can change other rows. Rows level Block
Serialisable: Transaction B will read the same rows as before and Transaction A cannot read or write in the table. Table-level Block
Snapshot: every Transaction has its own copy and they are working on it. Each one has its own view
Old question which has an accepted answer already, but I like to think of these two isolation levels in terms of how they change the locking behavior in SQL Server. This might be helpful for those who are debugging deadlocks like I was.
READ COMMITTED (default)
Shared locks are taken in the SELECT and then released when the SELECT statement completes. This is how the system can guarantee that there are no dirty reads of uncommitted data. Other transactions can still change the underlying rows after your SELECT completes and before your transaction completes.
REPEATABLE READ
Shared locks are taken in the SELECT and then released only after the transaction completes. This is how the system can guarantee that the values you read will not change during the transaction (because they remain locked until the transaction finishes).
Trying to explain this doubt with simple diagrams.
Read Committed: Here in this isolation level, Transaction T1 will be reading the updated value of the X committed by Transaction T2.
Repeatable Read: In this isolation level, Transaction T1 will not consider the changes committed by the Transaction T2.
I think this picture can also be useful, it helps me as a reference when I want to quickly remember the differences between isolation levels (thanks to kudvenkat on youtube)
Please note that, the repeatable in repeatable read regards to a tuple, but not to the entire table. In ANSC isolation levels, phantom read anomaly can occur, which means read a table with the same where clause twice may return different return different result sets. Literally, it's not repeatable.
My observation on initial accepted solution.
Under RR (default mysql) - If a tx is open and a SELECT has been fired, another tx can NOT delete any row belonging to previous READ result set until previous tx is committed (in fact delete statement in the new tx will just hang), however the next tx can delete all rows from the table without any trouble. Btw, a next READ in previous tx will still see the old data until it is committed.