SQL Server Race Condition Question - sql

(Note: this is for MS SQL Server)
Say you have a table ABC with a primary key identity column, and a CODE column. We want every row in here to have a unique, sequentially-generated code (based on some typical check-digit formula).
Say you have another table DEF with only one row, which stores the next available CODE (imagine a simple autonumber).
I know logic like below would present a race condition, in which two users could end up with the same CODE:
1) Run a select query to grab next available code from DEF
2) Insert said code into table ABC
3) Increment the value in DEF so it's not re-used.
I know that, two users could get stuck at Step 1), and could end up with same CODE in the ABC table.
What is the best way to deal with this situation? I thought I could just wrap a "begin tran" / "commit tran" around this logic, but I don't think that worked. I had a stored procedure like this to test, but I didn't avoid the race condition when I ran from two different windows in MS:
begin tran
declare #x int
select #x= nextcode FROM def
waitfor delay '00:00:15'
update def set nextcode = nextcode + 1
select #x
commit tran
Can someone shed some light on this? I thought the transaction would prevent another user from being able to access my NextCodeTable until the first transaction completed, but I guess my understanding of transactions is flawed.
EDIT: I tried moving the wait to after the "update" statement, and I got two different codes... but I suspected that. I have the waitfor statement there to simulate a delay so the race condition can be easily seen. I think the key problem is my incorrect perception of how transactions work.

Set the Transaction Isolation Level to Serializable.
At lower isolation levels, other transactions can read the data in a row that is read, (but not yet modified) in this transaction. So two transactions can indeed read the same value. At very low isolation (Read Uncommitted) other transactions can even read data after it's been modified (but before committed)...
Review details about SQL Server Isolation Levels here
So bottom line is that the Isolation level is crtitical piece here to control what level of access other transactions get into this one.
NOTE. From the link, about Serializable
Statements cannot read data that has been modified but not yet committed by other transactions.
This is because the locks are placed when the row is modified, not when the Begin Trans occurs, So what you have done may still allow another transaction to read the old value until the point where you modify it. So I would change the logic to modify it in the same statement as you read it, thereby putting the lock on it at the same time.
begin tran
declare #x int
update def set #x= nextcode, nextcode += 1
waitfor delay '00:00:15'
select #x
commit tran

As other responders have mentioned, you can set the transaction isolation level to ensure that anything you 'read' using a SELECT statement cannot change within a transaction.
Alternatively, you could take out a lock specifically on the DEF table by adding the syntax WITH HOLDLOCK after the table name, e.g.,
SELECT nextcode FROM DEF WITH HOLDLOCK
It doesn't make much difference here, as your transaction is small, but it can be useful to take out locks for some SELECTs and not others within a transaction. It's a question of 'repeatability versus concurrency'.
A couple of relavant MS-SQL docs.
Isolation levels
Table hints

Late answer. You want to avoid a race condition...
"SQL Server Process Queue Race Condition"

Recap:
You began a transaction. This doesn't actually "do" anything in and of itself, it modifies subsequent behavior
You read data from a table. The default isolation level is Read Committed, so this select statement is not made part of the transaction.
You then wait 15 seconds
You then issue an update. With the declared transaction, this will generate a lock until the transaction is committed.
You then commit the transaction, releasing the lock.
So, guessing you ran this simultaneously in two windows (A and B):
A read the "next" value from table def, then went into wait mode
B read the same "next" value from the table, then went into wait mode. (Since A only did a read, the transaction did not lock anything.)
A then updated the table, and probably commited the change before B exited the wait state.
B then updated the table, after A's write was committed.
Try putting the wait statement after the update, before the commit, and see what happens.

It's not a real race condition. It's more a common problem with concurrent transactions. One solution is to set a read lock on the table and therefor have a serialization in place.

This is actually a common problem in SQL databases and that is why most (all?) of them have some built in features to take care of this issue of obtaining a unique identifier. Here are some things to look into if you are using Mysql or Postgres. If you are using a different database I bet the provide something very similar.
A good example of this is postgres sequences which you can check out here:
Postgres Sequences
Mysql uses something called auto increments.
Mysql auto increment

You can set the column to a computed value that is persisted. This will take care of the race condition.
Persisted Computed Columns
NOTE
Using this method means you do not need to store the next code in a table. The code column becomes the reference point.
Implementation
Give the column the following properties under computed column specification.
Formula = dbo.GetNextCode()
Is Persisted = Yes
Create Function dbo.GetNextCode()
Returns VarChar(10)
As
Begin
Declare #Return VarChar(10);
Declare #MaxId Int
Select #MaxId = Max(Id)
From Table
Select #Return = Code
From Table
Where Id = #MaxId;
/* Generate New Code ... */
Return #Return;
End

Related

How to implement Serializable Isolation Level in SQL Server

I need to implement a serializable isolation level in SQL Server but I've tried many ways and I don't get it.
I need to lock 1 row in one transaction (It doesn´t matter if lock the complete table). So, another transaction can´t even select the row (don´t read).
The last thing I tried:
For transaction 1:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT code FROM table1 WHERE code = 1
-- Here I select in another instance the same row
COMMIT TRAN
For transaction 2:
BEGIN TRAN
SELECT code FROM table1 WHERE code = 1
COMMIT TRAN
I would expect that transaction 2 wait until transaction 1 commit the operation, but the transaction 2 gives me the row.
Anyone can explain me if I miss something?
SQL Server conforms to the strict definition of a Serializable query. That is, there must be a result that can logically be generated IF both queries ran in serial order - Transaction 1 finishing before Transaction 2 can start, or vice versa.
This results in some effects that can be different than you would expect. There is a great explanation of the Serializable isolation level over at SQLPerformance.com that makes clear some of what this logical serializability ends up meaning. (Very helpful site, that one.)
For your above queries, there is no logical requirement to prevent the second query from reading the same row as the first query. No matter in what order the queries are run, they will both return the same data without modifying it. Since the Query Analyzer can identify this, there is no reason to place a read lock on the data. However, if one of the queries performed an update on the data, then (warning - logic assumption here, since I don't actually know the internals of how SQL Server handles this) the QA would set a stronger lock on the selected rows.
TL;DR - SQL Server wants to minimize blocking, so it uses logical analysis to see what types of locks are needed for a serializable isolation level, and it (tries to) use the minimum number and strength of locks needed to achieve its goal.
Now that we've dealt with that - there are only two ways that I can think of to lock a row so that no one else can read it: using XLOCK + TABLOCK (locking the whole table - not a recommended practice) or having some form of a field on each row that is updated when you start your process - something like an SPID field, or a bit flag for Locked. When you update it within your transaction, only SELECTs with NOLOCK hints will be able to read it.
Clearly, neither of these are optimal. I recommend the "This row is busy - go away" flag, as that's probably the approach I would take for an (almost) absolute lock on a row.
According to the documentation:
SERIALIZABLE Specifies the following:
Statements cannot read data that has been modified but not yet committed by other transactions.
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the
current transaction until the current transaction completes.
If you're not making any changes to data with an INSERT, UPDATE, or DELETE inside transaction 1, SQL will release the Shared Lock after the read completes.
What you might want to try is adding a table hit to prevent the row lock from being released until the end of transaction 1.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT code
FROM table1 WITH(ROWLOCK, HOLDLOCK)
WHERE code = 1
COMMIT TRAN
Maybe you can solve this with some hack like this?
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
UPDATE someTableForThisHack set val = CASE WHEN val = 1 THEN 0 else 1 End
SELECT code from table1.....
COMMIT TRANSACTION
So you create a table someTableForThisHack and insert one row to it.

Use T-SQL Transaction for batch of delete statements?

I have a stored procedure that deletes records from multiple tables.
I wish for either all of the delete statements to complete successfully, or none. The actual purpose here is to wipe all data related to a particular user.
Note that none of this data is related in any way to any other data. E.g. a user's data is not referenced in any way by another users data. However it is possible to have concurrent client sources accessing one user's data simultaneously. I don't know if this is relevant
So I've wrapped it in BEGIN TRANSACTION ... COMMIT TRANSACTION
like so:
CREATE PROCEDURE [dbo].[spDeleteData]
#MyID AS INT
AS
BEGIN TRANSACTION
DELETE FROM [Table1] WHERE myId = #MyID;
DELETE FROM [Table2] WHERE myId = #MyID;
....
COMMIT TRANSACTION
RETURN 0
My question here is what are the implications of wrapping multiple DELETE calls in a transaction? Will it create possible deadlock scenarios, or hurt performance in some way?
From what I am reading, using TRANSACTION ISOLATION LEVEL only applies to read operations, is this true?
What you are guaranteeing is that either all the rows that match the conditions in both tables are successfully deleted or none of the rows are deleted (i.e. if there is a problem the deletes are rolled back.) There are more locks and they are kept for a longer period but if it fails you don't have to manually recreate the rows the deletes are undone for you automatically. You probably want to add the statement:
set xact_abort on
at the beginning of the transaction and to wrap the whole thing in a begin try/begin catch statement.
Please see sommarskog.se/error-handling-I.html#XACT_ABORT for an execellent discussion on this statement and on error handling for TSQL.

What is wrong with the below tsql code

BEGIN TRAN
SELECT * FROM AnySchema.AnyTable
WHERE AnyColumn = SomeCondition
COMMIT
I know the transaction is not required here because it is just a select but just want to know how bad a programming it is and whether it is going to be an overhead on the DB engine.
You may use transaction on SELECT statements to ensure nobody else could update / delete records of the table of while the bunch of your select queries are executing.
Using WITH(NOLOCK):
Anyways, you may also use WITH(NOLOCK) for t_sql
SELECT * FROM AnySchema.AnyTable WITH(NOLOCK)
WHERE AnyColumn = SomeCondition
WITH (NOLOCK) is the equivalent of using READ UNCOMMITED as a transaction isolation level. Here stand the risk of reading an uncommitted row that is subsequently rolled back, i.e. data that never made it into the database.
So, while it can prevent reads being deadlocked by other operations, it comes with a risk.
TRANSACTION Block :
Using TRANSACTION block will not cause much of extra DB overload but if you keep the same type practice on, and suppose , at any SQL block you forget (you / your developers may forget, right ?) to close the transaction, then other processes can't work on the same table.
Anyways, it depends on what type of application you are using. If very frequent update and select things are there , it is advised not to use such transaction blocks. If medium level of updates and select are there, occurs next to each other, you may use transaction blocks for select (but ensure to close the transaction).
That's actually a good question. To understand what's going on, you need to know about SET IMPLICIT_TRANSACTIONS.
When ON, the system is in implicit transaction mode. This means that if ##TRANCOUNT = 0, any of the following Transact-SQL statements begins a new transaction. It is equivalent to an unseen BEGIN TRANSACTION being executed first:
When OFF, each of the preceding T-SQL statements is bounded by an unseen BEGIN TRANSACTION and an unseen COMMIT TRANSACTION statement. When OFF, we say the transaction mode is autocommit. [this is the default]
If your T-SQL code visibly issues a BEGIN TRANSACTION, we say the transaction mode is explicit.
https://msdn.microsoft.com/en-us/library/ms187807.aspx
Since the SQL Server would have created a transaction for you, manually doing doesn't actually change anything. The exact same thing would have happened either way.
Summary: What you are doing isn't 'wrong' because it has no effect, but unnecessary and confusing to the reader.
I think you would be taking a holdlock and most likely a tablock
table hints
That is not always a good thing as you would block any update or deletes (maybe even inserts)
It would be better to let SQL decide what level of of locks to take. Most likely pagelocks. I would stay away from nolock as bad stuff can happen.
On a select on a single table just let the optimizer do it's thing.

Does a SQL UPDATE operation read data to "local memory"?

This answer quotes this Technet article which explains the two interpretations of lost updates:
A lost update can be interpreted in one of two ways. In the first scenario, a lost update is considered to have taken place when data that has been updated by one transaction is overwritten by another transaction, before the first transaction is either committed or rolled back. This type of lost update cannot occur in SQL Server 2005 because it is not allowed under any transaction isolation level.
The other interpretation of a lost update is when one transaction (Transaction #1) reads data into its local memory, and then another transaction (Transaction #2) changes this data and commits its change. After this, Transaction #1 updates the same data based on what it read into memory before Transaction #2 was executed. In this case, the update performed by Transaction #2 can be considered a lost update.
So it looks like the difference is that in the first scenario the whole update happens out of "local memory" while in the second one there's "local memory" used and this makes a difference.
Suppose I have the following code:
UPDATE MagicTable SET MagicColumn = MagicColumn + 10 WHERE SomeCondition
Does this involve "local memory"? Is it prone to the first or to the second interpretation of lost updates?
I suppose it would come under the second interpretation.
However the way this type of UPDATE is implemented in SQL Server a lost update is still not possible. Rows read for the update are protected with a U lock (converted to X lock when the row is actually updated).
U locks are not compatible with other U locks (or X locks)
So at all isolation levels if two concurrent transactions were to run this statement then one of them would end up blocked behind the other transaction's U lock or X lock and would not be able to proceed until that transaction completes.
Therefore it is not possible for lost updates to occur with this pattern in SQL Server at any isolation level.
To achieve a lost update you would need to do something like
BEGIN TRAN
DECLARE #MagicColumn INT;
/*Two concurrent transactions can both read the same pre-update value*/
SELECT #MagicColumn = MagicColumn FROM MagicTable WHERE SomeCondition
UPDATE MagicTable SET MagicColumn = #MagicColumn + 10 WHERE SomeCondition
COMMIT

READ COMMITTED database isolation level in oracle

I'm working on a web app connected to oracle. We have a table in oracle with a column "activated". Only one row can have this column set to 1 at any one time. To enforce this, we have been using SERIALIZED isolation level in Java, however we are running into the "cannot serialize transaction" error, and cannot work out why.
We were wondering if an isolation level of READ COMMITTED would do the job. So my question is this:
If we have a transaction which involves the following SQL:
SELECT *
FROM MODEL;
UPDATE MODEL
SET ACTIVATED = 0;
UPDATE MODEL
SET ACTIVATED = 1
WHERE RISK_MODEL_ID = ?;
COMMIT;
Given that it is possible for more than one of these transactions to be executing at the same time, would it be possible for more than one MODEL row to have the activated flag set to 1 ?
Any help would be appreciated.
your solution should work: your first update will lock the whole table. If another transaction is not finished, the update will wait. Your second update will guarantee that only one row will have the value 1 because you are locking the table (it doesn't prevent INSERT statements however).
You should also make sure that the row with the RISK_MODEL_ID exists (or you will have zero row with the value '1' at the end of your transaction).
To prevent concurrent INSERT statements, you would LOCK the table (in EXCLUSIVE MODE).
You could consider using a unique, function based index to let Oracle handle the constraint of only having a one row with activated flag set to 1.
CREATE UNIQUE INDEX MODEL_IX ON MODEL ( DECODE(ACTIVATED, 1, 1, NULL));
This would stop more than one row having the flag set to 1, but does not mean that there is always one row with the flag set to 1.
If what you want is to ensure that only one transaction can run at a time then you can use the FOR UPDATE syntax. As you have a single row which needs locking this is a very efficient approach.
declare
cursor c is
select activated
from model
where activated = 1
for update of activated;
r c%rowtype;
begin
open c;
-- this statement will fail if another transaction is running
fetch c in r;
....
update model
set activated = 0
where current of c;
update model
set activated = 1
where risk_model_id = ?;
close c;
commit;
end;
/
The commit frees the lock.
The default behaviour is to wait until the row is freed. Otherwise we can specify NOWAIT, in which case any other session attempting to update the current active row will fail immediately, or we can add a WAIT option with a polling time. NOWAIT is the option to choose to absolutely avoid the risk of hanging, and it also gives us the chance to inform the user that someone else is updating the table, which they might want to know.
This approach is much more scalable than updating all the rows in the table. Use a function-based index as WW showed to enforce the rule that only one row can have ACTIVATED=1.