I have two tables:
TableA with columns id(UNIQUEIDENTIFIER) and name(NVARCHAR)( uses NEWSEQUENTIALID() to auto-generate values for 'id' column)
TableB with columns id( IDENTITY), parentId(UNIQUEIDENTIFIER).
parentId in TableB has a foreign key constraint on TableA id.
I'm trying the execute the following queries:
In session 1:
BEGIN TRAN test1
INSERT INTO dbo.TableA( name )
OUTPUT INSERTED.id
VALUES ('foo')
Note that I do not want to commit the transaction here yet.
In session 2:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
BEGIN TRANSACTION test2
INSERT INTO dbo.TableB(parentId)
VALUES('<use_id_from_session1_here>')
The second insert fails to execute and hangs up in SQL Server Management Studio and in my application code as well.
Shouldn't setting an isolation level of 'ReadUncommitted' allow the insert in the second transaction to read what was uncommitted in the first?
Am I missing something here or the way I'm using/setting the transaction isolation level incorrect?
Any suggestions would be appreciated.
What's happening is when you're doing the second insert, SQL Server is trying to check the Foreign Key to make sure you're not inserting something you can't. But since the lock on TableA is still being held by your first transaction, your second transaction is waiting. Read uncommitted doesn't matter there. Constraints have to be checked before an insert can take place, so it will wait for the first transaction to finish. You'd still be violating the Foreign key constraint if the lock weren't in place because the first transaction hasn't been committed yet.
you might commit every single insert in session 1, or every some batch of a size that don't limit too about wait time of session 2
Related
I have the following concurrency use-case: An endpoint can be called at any time and an operation is supposed to happen. The operation goes like this in pseudocode (current isolation level is READ COMMITTED):
SELECT * FROM TABLE_A WHERE IS_LATEST=true FOR UPDATE
// DO SOME APP LOGIC TO TEST VALIDITY
// ALL GOES WELL => INSERT OR UPDATE NEW ROW WITH IS_LATEST=TRUE => COMMIT
// OTHERWISE => ROLLBACK (all good not interesting)
Now this approach with SELECT FOR UPDATE is fine if two of these operations start at the same time in the respects of update. Because both transactions see the same number of rows, one will update the rows and the second transaction will wait its turn before being able to SELECT FOR UPDATE and the state is valid.
The issue I have is when I have an insert in the first transaction. What happens is that for example when the first transaction makes that lock SELECT FOR UPDATE there are two rows, then the transaction continues, in the middle of the transaction, the second transaction comes in wanting to SELECT FOR UPDATE (latest) and waits for first transaction to finish.. The first transaction finished and there is a new third item realistically in the db, but the second transaction picks up only two rows while it was waiting for the row locks to be released. (This is because at the time of calling the SELECT FOR UPDATE the snapshot was different had only two rows that matched IS_LATEST=true).
Is there a way to make this transaction such that the SELECT lock picks up the latest snapshot after waiting?
The issue is that each command only sees rows that have been committed before the query started. There are various possible solutions ...
Stricter isolation level
You can solve this with a stricter isolation level, but that's relatively expensive.
Laurenz already provided a solution for this.
Just start a new command
Keep the (cheap) default isolation level READ COMMITTED, and just start a new command.
Only few rows to lock
While only locking a hand full of rows, the dead simple solution is to repeat the same SELECT ... FOR UPDATE. The second iteration sees newly committed rows and locks them additionally.
There is a theoretical race condition with additional transactions that might lock new rows before the waiting transaction does. That would result in a deadlock. Highly unlikely, but to be absolutely sure, lock rows in consistent order:
BEGIN; -- default READ COMMITTED
SELECT FROM table_a WHERE is_latest ORDER BY id FOR UPDATE; -- consistent order
SELECT * FROM table_a WHERE is_latest ORDER BY id FOR UPDATE; -- just repeat !!
-- DO SOME APP LOGIC TO TEST VALIDITY
-- pseudo-code
IF all_good
UPDATE table_a SET is_latest = true WHERE ...;
INSERT table_a (IS_LATEST, ...) VALUES (true, ...);
COMMIT;
ELSE
ROLLBACK;
END;
A partial index on (id) WHERE is_latest would be ideal.
More rows to lock
For more than a hand full of rows, I would instead create a dedicated one-row token table. A bullet-proof implementation could look like this, run as admin or superuser:
CREATE TABLE public.single_task_x (just_me bool CHECK (just_me) PRIMARY KEY DEFAULT true);
INSERT INTO public.single_task_x VALUES (true);
REVOKE ALL ON public.single_task_x FROM public;
GRANT SELECT, UPDATE ON public.single_task_x TO public; -- or just to those who need it
See:
How to allow only one row for a table?
Then:
BEGIN; -- default READ COMMITTED
SELECT FROM public.single_task_x FOR UPDATE;
SELECT * FROM table_a WHERE is_latest; -- FOR UPDATE? ①
-- DO SOME APP LOGIC TO TEST VALIDITY
-- pseudo-code
IF all_good
ROLLBACK;
ELSE
UPDATE table_a SET is_latest = true WHERE ...;
INSERT table_a (IS_LATEST, ...) VALUES (true, ...);
COMMIT;
END;
A single lock is cheaper.
① You may or may not want to lock additionally, to defend against other writes, possibly with a weaker lock ....
Either way, all locks are released at the end of the transaction automatically.
Advisory lock
Or use an advisory lock. pg_advisory_xact_lock() persists for the duration of the transaction:
BEGIN; -- default READ COMMITTED
SELECT pg_advisory_xact_lock(123);
SELECT * FROM table_a WHERE is_latest;
-- do stuff
COMMIT; -- or ROLLBACK;
Make sure to use a unique token for your particular task. 123 in my example. Consider a look-up table if you have many different tasks.
To release the lock at a different point in time (not when the transaction ends), consider a session-level lock with pg_advisory_lock(). Then you can (and must) unlock manually with pg_advisory_unlock() - or close the session.
Both of these wait for the locked resource. There are alternative functions returning false instead of waiting ...
With your method, the query in the second transaction will return an empty result after the lock is gone, because it sees is_latest = FALSE on the row in question, and the new row is not yet visible. So you would have to retry the transaction in that case.
I suggest that you use REPEATABLE READ isolation level and optimistic locking instead:
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM table_a WHERE is_latest; -- no locks!
/* perform your application ruminations */
UPDATE table_a SET is_latest = FALSE WHERE id = <id you found above>;
INSERT INTO table_a (is_latest, ...) VALUES (TRUE, ...);
COMMIT;
Then three things may happen:
Your query finds a row, and the transaction succeeds.
Your query finds no row, then you could insert the first row.
The query finds a row, but the update of that row causes a serialization error.
In that case you know that a concurrent transaction interfered, and you repeat the complete transaction in response.
I wondered if there is anyway to avoid corruption of data in READ COMMITTED isolation level.
Here is a sample of my issue: two sessions working with the same tables.
SSN1> ALTER TABLE APPLICANT ADD( AVGSLEVEL2 NUMBER(5,2) )
Meanwhile in another session....
SSN2> INSERT INTO SPossessed VALUES ( 000001, 'TRUCK DRIVING', 9 );
Back in the first session ...
SSN1> UPDATE APPLICANT
2 SET AVGSLEVEL = ( ( SELECT SUM(SKILLLEVEL)
3 FROM SPOSSESSED
4 WHERE A# = APPLICANT.A# ) /
5 ( SELECT COUNT(*)
6 FROM SPOSSESSED
7 WHERE A# = APPLICANT.A#) );
Then second session does ...
SSQN2> select AVGSLEVEL from APPLICANT;
But when first session issues a commit ...
SSN1> COMMIT;
... then what does second session get?
SSN2> select AVGSLEVEL from APPLICANT;
SSN2> COMMIT;
how to improve first session SQL script included such that it can be safely
processed at READ COMMITTED isolation level?
In a multi-user environment we must be aware that individual users can confuse things with their uncoordinated actions.
The problem isn;t READ COMMITTED. It's perfectly sensible that the UPDATE in SSN1 doesn't take account of the TRUCK DRIVING skill because the second session hasn't committed yet. The user could roll back, or the transaction could fail for some other reason. So, at the time the user in the first session executes their update truck driving does not exist as a possessable skill.
That is completely safe. How could it be safer?
Now, you could check for uncommitted transactions, by serializing things. Before user #1 executes the update they could issue a LOCK TABLE SPOSSESSED statement. That would fail, because SSN2 has a uncommitted transaction. Or, if issued early enough, would prevent SSN2 from doing the insert until SSN1 had committed the update.
But, SSN2 cab subsequently insert the Truck Driving record, so APPLICANT.AVGSLEVEL would still be out of whack. There's nothing you can do about that ...
... except build a transactional API which enforces business rules and prevents random or spontaneous amendment of data in business tables.
Assuming:
I am using REPEATABLE_READ or SERIALIZABLE transaction isolation (locks get retained every time I access a row)
We are talking about multiple threads accessing multiple tables simultaneously.
I have the following questions:
Is it possible for an INSERT operation to cause a deadlock? If so, please provide a detailed scenario demonstrating how a deadlock may occur (e.g. Thread 1 does this, Thread 2 does that, ..., deadlock).
For bonus points: answer the same question for all other operations (e.g. SELECT, UPDATE, DELETE).
UPDATE:
3. For super bonus points: how can I avoid a deadlock in the following scenario?
Given tables:
permissions[id BIGINT PRIMARY KEY]
companies[id BIGINT PRIMARY KEY, name VARCHAR(30), permission_id BIGINT NOT NULL, FOREIGN KEY (permission_id) REFERENCES permissions(id))
I create a new Company as follows:
INSERT INTO permissions; -- Inserts permissions.id = 100
INSERT INTO companies (name, permission_id) VALUES ('Nintendo', 100); -- Inserts companies.id = 200
I delete a Company as follows:
SELECT permission_id FROM companies WHERE id = 200; -- returns permission_id = 100
DELETE FROM companies WHERE id = 200;
DELETE FROM permissions WHERE id = 100;
In the above example, the INSERT locking order is [permissions, companies] whereas the DELETE locking order is [companies, permissions]. Is there a way to fix this example for REPEATABLE_READ or SERIALIZABLE isolation?
Generally all modifications can cause a deadlock and selects will not (get to that later). So
No you cannot ignore these.
You can somewhat ignore select depending on your database and settings but the others will give you deadlocks.
You don't even need multiple tables.
The best way to create a deadlock is to do the same thing in a different order.
SQL Server examples:
create table A
(
PK int primary key
)
Session 1:
begin transaction
insert into A values(1)
Session 2:
begin transaction
insert into A values(7)
Session 1:
delete from A where PK=7
Session 2:
delete from A where PK=1
You will get a deadlock. So that proved inserts & deletes can deadlock.
Updates are similar:
Session 1:
begin transaction
insert into A values(1)
insert into A values(2)
commit
begin transaction
update A set PK=7 where PK=1
Session 2:
begin transaction
update A set pk=9 where pk=2
update A set pk=8 where pk=1
Session 1:
update A set pk=9 where pk=2
Deadlock!
SELECT should never deadlock but on some databases it will because the locks it uses interfere with consistent reads. That's just crappy database engine design though.
SQL Server will not lock on a SELECT if you use SNAPSHOT ISOLATION. Oracle & I think Postgres will never lock on SELECT (unless you have FOR UPDATE which is clearly reserving for an update anyway).
So basically I think you have a few incorrect assumptions. I think I've proved:
Updates can cause deadlocks
Deletes can cause deadlocks
Inserts can cause deadlocks
You do not need more than one table
You do need more than one session
You'll just have to take my word on SELECT ;) but it will depend on your DB and settings.
In addition to LoztInSpace's answer, inserts may cause deadlocks even without deletes or updates presence. All you need is a unique index and a reversed operations order.
Example in Oracle :
create table t1 (id number);
create unique index t1_pk on t1 (id);
--thread 1 :
insert into t1 values(1);
--thread 2
insert into t1 values(2);
--thread 1 :
insert into t1 values(2);
--thread 2
insert into t1 values(1); -- deadlock !
Let us assume you have two relations A and B and two users X and Y. Table A is WRITE Locked by user X and Table B is WRITE Locked by Y. Then the following query will give you a dead lock if used by both the users X and Y.
Select * from A,B
So clearly a Select operation can cause a deadlock if join operations involving more than one table is a part of it. Usually Insert and Delete operations involve single relations. So they may not cause deadlock.
I have a process that starts a transaction, inserts a record into Table1, and then calls a long running web service (up to 30 seconds). If the web service call fails then the insert is rolled back (which is what we want). Here is an example of the insert (it is actually multiple inserts into multiple tables but I am simplifying for this question):
INSERT INTO Table1 (UserId, StatusTypeId) VALUES (#UserId, 1)
I have a second process that queries Table1 from the first step like this:
SELECT TOP 1 * FROM Table1 WHERE StatusTypeId=2
and then updates that row for a user. When process 1 is running, Table1 is locked so process 2 will not complete until process 1 finishes which is a problem because a long delay is introduced while process 1 finishes its web service call.
Process 1 will only ever insert a StatusTypeId of 1 and it is also the only operation that inserts into Table1. Process 2 will only query on StatusTypeId = 2. I want to tell Process 2 to ignore any inserts into Table1 but lock the row that it selects. The default isolation level for Process 2 is waiting on too much but I have a fear that IsolationLevel.ReadUncommitted allows reading of too much dirty data. I do not want two users running Process 2 and then accidentally getting the same row.
Is there a different IsolationLevel to use other than ReadUncommitted that says ignore inserted rows but make sure the select locks the row that is selected?
Regarding the SELECT being blocked by the insert this should be avoidable by providing appropriate indexes.
Test Table.
CREATE TABLE Table1
(
UserId INT PRIMARY KEY,
StatusTypeId INT,
AnotherColumn varchar(50)
)
insert into Table1
SELECT number, (LEN(type)%2)+1, newid()
FROM master.dbo.spt_values
where type='p'
Query window one
BEGIN TRAN
INSERT INTO Table1 (UserId, StatusTypeId) VALUES (5000, 1)
WAITFOR DELAY '00:01';
ROLLBACK
Query window two (Blocks)
SELECT TOP 1 *
FROM Table1
WHERE StatusTypeId=2
ORDER BY AnotherColumn
But if you retry the test after adding an index it won't block CREATE NONCLUSTERED INDEX ix ON Table1 (StatusTypeId,AnotherColumn)
Regarding your locking of rows for Process 2 you can use the following (the READPAST hint will allow 2 concurrent Process 2 transactions to begin processing different rows rather than one blocking the other). You might find this article by Remus Rusanu relevant
BEGIN TRAN
SELECT TOP 1 *
FROM Table1 WITH (UPDLOCK, READPAST)
WHERE StatusTypeId=2
ORDER BY AnotherColumn
/*
Rest of Process Two's code here
*/
COMMIT
Edit: Having re-read the question, the lock on any insert should not effect any select under READ COMMITTED this could be an issue with your indexes.
However, from your comments and rest of the question it seems you want only one transaction to be able to read a row at a time, which is not what an isolation level prevents.
They prevent
Dirty Read - reading uncommitted data in a transaction which could be rolled back - occurs in READ UNCOMMITTED, prevented in READ COMMITTED, REPEATABLE READ, SERIALIZABLE
Non Repeatable Reads - a row is updated whilst being read in an uncommitted transaction, meaning the same read of a particular row can occur twice in a transaction and produce a different results - occurs in READ UNCOMMITTED, READ COMMITTED. prevented in REPEATABLE READ, SERIALIZABLE
phantom rows - a row is inserted or deleted whilst being read in an uncommited transaction, meaning that the same read of multiple rows can occur twice in a transaction and produce different results, with either added or missing rows - occurs in READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, prevented in SERIALIZABLE
I'm interested in whether a SELECT FOR UPDATE query will lock a non-existent row.
Example
Table FooBar with two columns, foo and bar, foo has a unique index.
Issue query SELECT bar FROM FooBar WHERE foo = ? FOR UPDATE
If the first query returns zero rows, issue a query
INSERT INTO FooBar (foo, bar) values (?, ?)
Now is it possible that the INSERT would cause an index violation or does the SELECT FOR UPDATE prevent that?
Interested in behavior on SQLServer (2005/8), Oracle and MySQL.
MySQL
SELECT ... FOR UPDATE with UPDATE
Using transactions with InnoDB (auto-commit turned off), a SELECT ... FOR UPDATE allows one session to temporarily lock down a particular record (or records) so that no other session can update it. Then, within the same transaction, the session can actually perform an UPDATE on the same record and commit or roll back the transaction. This would allow you to lock down the record so no other session could update it while perhaps you do some other business logic.
This is accomplished with locking. InnoDB utilizes indexes for locking records, so locking an existing record seems easy--simply lock the index for that record.
SELECT ... FOR UPDATE with INSERT
However, to use SELECT ... FOR UPDATE with INSERT, how do you lock an index for a record that doesn't exist yet? If you are using the default isolation level of REPEATABLE READ, InnoDB will also utilize gap locks. As long as you know the id (or even range of ids) to lock, then InnoDB can lock the gap so no other record can be inserted in that gap until we're done with it.
If your id column were an auto-increment column, then SELECT ... FOR UPDATE with INSERT INTO would be problematic because you wouldn't know what the new id was until you inserted it. However, since you know the id that you wish to insert, SELECT ... FOR UPDATE with INSERT will work.
CAVEAT
On the default isolation level, SELECT ... FOR UPDATE on a non-existent record does not block other transactions. So, if two transactions both do a SELECT ... FOR UPDATE on the same non-existent index record, they'll both get the lock, and neither transaction will be able to update the record. In fact, if they try, a deadlock will be detected.
Therefore, if you don't want to deal with a deadlock, you might just do the following:
INSERT INTO ...
Start a transaction, and perform the INSERT. Do your business logic, and either commit or rollback the transaction. As soon as you do the INSERT on the non-existent record index on the first transaction, all other transactions will block if they attempt to INSERT a record with the same unique index. If the second transaction attempts to insert a record with the same index after the first transaction commits the insert, then it will get a "duplicate key" error. Handle accordingly.
SELECT ... LOCK IN SHARE MODE
If you select with LOCK IN SHARE MODE before the INSERT, if a previous transaction has inserted that record but hasn't committed yet, the SELECT ... LOCK IN SHARE MODE will block until the previous transaction has completed.
So to reduce the chance of duplicate key errors, especially if you hold the locks for awhile while performing business logic before committing them or rolling them back:
SELECT bar FROM FooBar WHERE foo = ? LOCK FOR UPDATE
If no records returned, then
INSERT INTO FooBar (foo, bar) VALUES (?, ?)
In Oracle, the SELECT ... FOR UPDATE has no effect on a non-existent row (the statement simply raises a No Data Found exception). The INSERT statement will prevent a duplicates of unique/primary key values. Any other transactions attempting to insert the same key values will block until the first transaction commits (at which time the blocked transaction will get a duplicate key error) or rolls back (at which time the blocked transaction continues).
On Oracle:
Session 1
create table t (id number);
alter table t add constraint pk primary key(id);
SELECT *
FROM t
WHERE id = 1
FOR UPDATE;
-- 0 rows returned
-- this creates row level lock on table, preventing others from locking table in exclusive mode
Session 2
SELECT *
FROM t
FOR UPDATE;
-- 0 rows returned
-- there are no problems with locking here
rollback; -- releases lock
INSERT INTO t
VALUES (1);
-- 1 row inserted without problems
I wrote a detailed analysis of this thing on SQL Server: Developing Modifications that Survive Concurrency
Anyway, you need to use SERIALIZABLE isolation level, and you really need to stress test.
SQL Server only has the FOR UPDATE as part of a cursor. And, it only applies to UPDATE statements that are associated with the current row in the cursor.
So, the FOR UPDATE has no relationship with INSERT. Therefore, I think your answer is that it's not applicable in SQL Server.
Now, it may be possible to simulate the FOR UPDATE behavior with transactions and locking strategies. But, that may be more than what you're looking for.