Assuming:
I am using REPEATABLE_READ or SERIALIZABLE transaction isolation (locks get retained every time I access a row)
We are talking about multiple threads accessing multiple tables simultaneously.
I have the following questions:
Is it possible for an INSERT operation to cause a deadlock? If so, please provide a detailed scenario demonstrating how a deadlock may occur (e.g. Thread 1 does this, Thread 2 does that, ..., deadlock).
For bonus points: answer the same question for all other operations (e.g. SELECT, UPDATE, DELETE).
UPDATE:
3. For super bonus points: how can I avoid a deadlock in the following scenario?
Given tables:
permissions[id BIGINT PRIMARY KEY]
companies[id BIGINT PRIMARY KEY, name VARCHAR(30), permission_id BIGINT NOT NULL, FOREIGN KEY (permission_id) REFERENCES permissions(id))
I create a new Company as follows:
INSERT INTO permissions; -- Inserts permissions.id = 100
INSERT INTO companies (name, permission_id) VALUES ('Nintendo', 100); -- Inserts companies.id = 200
I delete a Company as follows:
SELECT permission_id FROM companies WHERE id = 200; -- returns permission_id = 100
DELETE FROM companies WHERE id = 200;
DELETE FROM permissions WHERE id = 100;
In the above example, the INSERT locking order is [permissions, companies] whereas the DELETE locking order is [companies, permissions]. Is there a way to fix this example for REPEATABLE_READ or SERIALIZABLE isolation?
Generally all modifications can cause a deadlock and selects will not (get to that later). So
No you cannot ignore these.
You can somewhat ignore select depending on your database and settings but the others will give you deadlocks.
You don't even need multiple tables.
The best way to create a deadlock is to do the same thing in a different order.
SQL Server examples:
create table A
(
PK int primary key
)
Session 1:
begin transaction
insert into A values(1)
Session 2:
begin transaction
insert into A values(7)
Session 1:
delete from A where PK=7
Session 2:
delete from A where PK=1
You will get a deadlock. So that proved inserts & deletes can deadlock.
Updates are similar:
Session 1:
begin transaction
insert into A values(1)
insert into A values(2)
commit
begin transaction
update A set PK=7 where PK=1
Session 2:
begin transaction
update A set pk=9 where pk=2
update A set pk=8 where pk=1
Session 1:
update A set pk=9 where pk=2
Deadlock!
SELECT should never deadlock but on some databases it will because the locks it uses interfere with consistent reads. That's just crappy database engine design though.
SQL Server will not lock on a SELECT if you use SNAPSHOT ISOLATION. Oracle & I think Postgres will never lock on SELECT (unless you have FOR UPDATE which is clearly reserving for an update anyway).
So basically I think you have a few incorrect assumptions. I think I've proved:
Updates can cause deadlocks
Deletes can cause deadlocks
Inserts can cause deadlocks
You do not need more than one table
You do need more than one session
You'll just have to take my word on SELECT ;) but it will depend on your DB and settings.
In addition to LoztInSpace's answer, inserts may cause deadlocks even without deletes or updates presence. All you need is a unique index and a reversed operations order.
Example in Oracle :
create table t1 (id number);
create unique index t1_pk on t1 (id);
--thread 1 :
insert into t1 values(1);
--thread 2
insert into t1 values(2);
--thread 1 :
insert into t1 values(2);
--thread 2
insert into t1 values(1); -- deadlock !
Let us assume you have two relations A and B and two users X and Y. Table A is WRITE Locked by user X and Table B is WRITE Locked by Y. Then the following query will give you a dead lock if used by both the users X and Y.
Select * from A,B
So clearly a Select operation can cause a deadlock if join operations involving more than one table is a part of it. Usually Insert and Delete operations involve single relations. So they may not cause deadlock.
Related
We have 2 tables defined as follows
CREATE TABLE foo (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE bar (
foo_id BIGINT UNIQUE,
foo_name TEXT NOT NULL UNIQUE REFERENCES foo (name)
);
I've noticed that when executing the following two queries concurrently
INSERT INTO foo (name) VALUES ('BAZ')
INSERT INTO bar (foo_name, foo_id) VALUES ('BAZ', (SELECT id FROM foo WHERE name = 'BAZ'))
it is possible under certain circumstances to end up inserting a row into bar where foo_id is NULL. The two queries are executed in different transactions, by two completely different processes.
How is this possible? I'd expect the second statement to either fail due to a foreign key violation (if the record in foo is not there), or succeed with a non-null value of foo_id (if it is).
What is causing this race condition? Is it due to the subselect, or is it due to the timing of when the foreign key constraint is checked?
We are using isolation level "read committed" and postgres version 10.3.
EDIT
I think the question was not particularly clear on what is confusing me. The question is about how and why 2 different states of the database were being observed during the execution of a single statement. The subselect is observing that the record in foo as being absent, whereas the fk check sees it as present. If it's just that there's no rule preventing this race condition, then this is an interesting question in itself - why would it not be possible to use transaction ids to ensure that the same state of the database is observed for both?
The subselect in the INSERT INTO bar cannot see the new row concurrently inserted in foo because the latter is not committed yet.
But by the time that the query that checks the foreign key constraint is executed, the INSERT INTO foo has committed, so the foreign key constraint doesn't report an error.
A simple way to work around that is to use the REPEATABLE READ isolation level for the INSERT INT bar. Then the foreign key check uses the same snapshot as the INSERT, it won't see the newly committed row, and a constraint violation error will be thrown.
Logic suggests that ordering of the commands (including the sub-query), combined with when Postgres checks of constraints (which is not necessarily immediate) could cause the issue. Therefore you could
Have the second command start first
Have the SELECT component run and return NULL
First command starts and inserts row
Second command inserts the row (with the 'name' field and a NULL)
FK reference check is successful as 'name' exists
Re deferrable constraints see https://www.postgresql.org/docs/13/sql-set-constraints.html and https://begriffs.com/posts/2017-08-27-deferrable-sql-constraints.html
Suggested answers
Have a not null check on BAR for Foo_Id, or included as part of foreign key checks
Rewrite the two commands to run consecutively rather than simultaneously (if possible)
You do indeed have a race condition. Without some sort of locking or use of a transaction to sequence the events, there is no rule precluding the sequence
The sub select of the bar INSERT is performed, yielding NULL
The INSERT into foo
The INSERT into bar, which now does not have any FK violation, but does have a NULL.
Since of course this is the toy version of your real program, I can't recommend how best to fix it. If it makes sense to require these events in a particular sequence, then they can be in a transaction on a single thread. In some other situation, you might prohibit inserting directly into foo and bar (REVOKE permissions as necessary) and allow modifications only through a function/procedure, or through a view that has triggers (possibly rules).
An anonymous plpgsql block will help you avoid the race conditions (by making sure that the inserts run sequentially within the same transaction) without going deep into Postgres internals:
do language plpgsql
$$
declare
v_foo_id bigint;
begin
INSERT into foo (name) values ('BAZ') RETURNING id into v_foo_id;
INSERT into bar (foo_name, foo_id) values ('BAZ', v_foo_id);
end;
$$;
or using plain SQL with a CTE in order to avoid switching context to/from plpgsql:
with t(id) as
(
INSERT into foo (name) values ('BAZ') RETURNING id
)
INSERT into bar (foo_name, foo_id) values ('BAZ', (select id from t));
And, btw, are you sure that the two inserts in your example are executed in the same transaction in the right order? If not then the short answer to your question is "MVCC" since the second statement is not atomic.
This seems more likely a scenario where both queries executed one after another but transaction is not committed.
Process 1
INSERT INTO foo (name) VALUES ('BAZ')
Transaction not committed but Process 2 execute next query
INSERT INTO bar (foo_name, foo_id) VALUES ('BAZ', (SELECT id FROM foo WHERE name = 'BAZ'))
In this case process 2 query will wait until process 1 transaction isn't committed.
From PostgreSQL doc :
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress).
I have two tables:
TableA with columns id(UNIQUEIDENTIFIER) and name(NVARCHAR)( uses NEWSEQUENTIALID() to auto-generate values for 'id' column)
TableB with columns id( IDENTITY), parentId(UNIQUEIDENTIFIER).
parentId in TableB has a foreign key constraint on TableA id.
I'm trying the execute the following queries:
In session 1:
BEGIN TRAN test1
INSERT INTO dbo.TableA( name )
OUTPUT INSERTED.id
VALUES ('foo')
Note that I do not want to commit the transaction here yet.
In session 2:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
BEGIN TRANSACTION test2
INSERT INTO dbo.TableB(parentId)
VALUES('<use_id_from_session1_here>')
The second insert fails to execute and hangs up in SQL Server Management Studio and in my application code as well.
Shouldn't setting an isolation level of 'ReadUncommitted' allow the insert in the second transaction to read what was uncommitted in the first?
Am I missing something here or the way I'm using/setting the transaction isolation level incorrect?
Any suggestions would be appreciated.
What's happening is when you're doing the second insert, SQL Server is trying to check the Foreign Key to make sure you're not inserting something you can't. But since the lock on TableA is still being held by your first transaction, your second transaction is waiting. Read uncommitted doesn't matter there. Constraints have to be checked before an insert can take place, so it will wait for the first transaction to finish. You'd still be violating the Foreign key constraint if the lock weren't in place because the first transaction hasn't been committed yet.
you might commit every single insert in session 1, or every some batch of a size that don't limit too about wait time of session 2
I want to know the core reason(the mechanics of segments, blocks, locks that the engine does) why bulk insert(with direct-path) locks the entire table so if I insert into a partition, I can't truncate another partition which is not affected(apparently) by insert.
A conventional insert(without append hint) permits to truncate some nonaffected partitions.(Notice that i speak about non-commited transaction.)
Below an example to ilustrate it.
Let be a table:
CREATE TABLE FG_TEST
(COL NUMBER )
PARTITION BY RANGE (COL)
(PARTITION "P1" VALUES LESS THAN (1000),
PARTITION "P2" VALUES LESS THAN (2000));
Insert into table fg_test values (1);
insert into table fg_test values (1000);
commit;
Session 1:
insert into table fg_test select * from fg_test where col >=1000;
--1 rows inserted;
Session 2:
alter table fg_test truncate partition p1;
--table truncated
Session 1:
rollback;
insert /*+append */ into table fg_test select * from fg_test where col >=1000;
--1 rows inserted;
Session 2:
alter table fg_test truncate partition p1;
--this throws ORA-00054: resource busy and acquire with NOWAIT specified
--or timeout expired
The Doc on Diret-Path Insert is pretty abrupt on this subject and just says:
During direct-path INSERT, the database obtains exclusive locks on the
table (or on all partitions of a partitioned table). As a result,
users cannot perform any concurrent insert, update, or delete
operations on the table, and concurrent index creation and build
operations are not permitted.
The How Direct-Path INSERT Works does not explain why the lock is needed for all partitions.
And why conventional insert does not lock nonaffected partitions? (My intuition is that the lock is done at block level)
Your premise is slightly wrong. A direct-path insert does not lock the entire table if you use the partition extension clause.
Session 1:
insert /*+append */ into fg_test partition (p2)
select * from fg_test where col >=1000;
Session 2:
alter table fg_test truncate partition p1;
--table truncated
The new question is: When the partition extension clause is NOT used, why do conventional and direct-path inserts have different locking mechanisms? This clarification makes the question easier, but without inside knowledge the answer below is still only a guess.
It was easier to code a feature that locks the entire table. And it runs faster, since there is no need to track which partitions are updated.
There's usually no need for a more fine-grained lock. Most systems or processes that use direct-path writes only update one large table at a time. If a more fine-grained lock is really needed, the partition extension clause can be used. It's not quite as convenient, since only one partition can be referenced at a time. But it's good enough 99.9% of the time.
I found the follwing answer on asktom.oracle.com:
Ask Tom: Inserts with APPEND Hint
Tom explains many of the inner workings, but the reason why Oracle locks the whole table and not only affected partitions is still not clear.
Maybe it's just a design decision (e.g. not wanting the big bulky direct load to be potentially blocked by one smallish uncommited transaction and therefore locking all partitions ...)
I am working with an Oracle 11.2g instance.
I'd like to know what I am exposing to by inserting rows into tables by generating the primary key values myself.
I would SELECT max(pk) FROM sometables;
and then use the next hundred values for example for my next 100 inserts.
Is is playing with fire?
The context is: I have a big number of inserts to do, that are splitted over several tables linked by foreign keys. I am trying to get good performance, and not use PL/SQL.
[EDIT] here a code sample that looks like what I'm dealing with:
QString query1 = "INSERT INTO table 1 (pk1_id, val) VALUES (pk1_seq.nextval, ?)"
sqlQuery->prepare(query);
sqlQuery->addBindValue(vec_of_values);
sqlQuery->execBatch();
QString query2 = "INSERT INTO table 2 (pk2_id, another_val, pk1_pk1_id) VALUES (pk2_seq.nextval, ?, ?)"
sqlQuery->prepare(query);
sqlQuery->addBindValue(vec_of_values);
// How do I get the primary keys (hundreds of them)
// from the first insert??
sqlQuery->addBindValue(vec_of_pk1);
sqlQuery->execBatch();
You are exposing yourself to slower performance, errors in your logic, and extra code to maintain. Oracle sequences are optimized for your specific purpose. For high DML operations you may also cache sequences:
ALTER SEQUENCE customers_seq CACHE 100;
Create a sequence for the master table(s)
Insert into the master table using your_sequence.nextval
Inserts into child (dependent) tables are done using your_sequence.currval
create table parent (id integer primary key not null);
create table child (id integer primary key not null, pid integer not null references parent(id));
create sequence parent_seq;
create sequence child_seq;
insert into parent (id) values (parent_seq.nextval);
insert into child (id, pid) values (child_seq.nextval, parent_seq.currval);
commit;
To explain why max(id) will not work reliably, consider the following scenario:
Transaction 1 retrieves max(id) + 1 (yields, say 42)
Transaction 1 insert a new row with id = 42
Transaction 2 retrieves max(id) + 1 (also yields 42, because transaction 1 is not yet committed)
Transaction 1 commits
Transcation 2 inserts a new row with id = 42
Transaction 2 tries to commit and gets a unique key violation
Now think about what happens when you have a lot of transactions doing this. You'll get a lot of errors. Additionally your inserts will be slower and slower, because the cost of calculating max(id) will increase with the size of the table.
Sequences are the only sane (i.e. correct, fast and scalable) way out of this problem.
Edit
If you are struck with yet another ORM which can't cope with these kind of strategy (which is supported by nearly all DBMS nowadays - even SQL Server has sequences now), then you should be able to do the following in your client code:
Retrieve the next PK value using select parent_seq.nextval from dual into a variable in your programming language (this is a fast, scalable and correct way to retrieve the PK value).
If you can run a select max(id) you can also run a select parent_seq.nextval from dual. In both cases just use the value obtained from that select statement.
I'm interested in whether a SELECT FOR UPDATE query will lock a non-existent row.
Example
Table FooBar with two columns, foo and bar, foo has a unique index.
Issue query SELECT bar FROM FooBar WHERE foo = ? FOR UPDATE
If the first query returns zero rows, issue a query
INSERT INTO FooBar (foo, bar) values (?, ?)
Now is it possible that the INSERT would cause an index violation or does the SELECT FOR UPDATE prevent that?
Interested in behavior on SQLServer (2005/8), Oracle and MySQL.
MySQL
SELECT ... FOR UPDATE with UPDATE
Using transactions with InnoDB (auto-commit turned off), a SELECT ... FOR UPDATE allows one session to temporarily lock down a particular record (or records) so that no other session can update it. Then, within the same transaction, the session can actually perform an UPDATE on the same record and commit or roll back the transaction. This would allow you to lock down the record so no other session could update it while perhaps you do some other business logic.
This is accomplished with locking. InnoDB utilizes indexes for locking records, so locking an existing record seems easy--simply lock the index for that record.
SELECT ... FOR UPDATE with INSERT
However, to use SELECT ... FOR UPDATE with INSERT, how do you lock an index for a record that doesn't exist yet? If you are using the default isolation level of REPEATABLE READ, InnoDB will also utilize gap locks. As long as you know the id (or even range of ids) to lock, then InnoDB can lock the gap so no other record can be inserted in that gap until we're done with it.
If your id column were an auto-increment column, then SELECT ... FOR UPDATE with INSERT INTO would be problematic because you wouldn't know what the new id was until you inserted it. However, since you know the id that you wish to insert, SELECT ... FOR UPDATE with INSERT will work.
CAVEAT
On the default isolation level, SELECT ... FOR UPDATE on a non-existent record does not block other transactions. So, if two transactions both do a SELECT ... FOR UPDATE on the same non-existent index record, they'll both get the lock, and neither transaction will be able to update the record. In fact, if they try, a deadlock will be detected.
Therefore, if you don't want to deal with a deadlock, you might just do the following:
INSERT INTO ...
Start a transaction, and perform the INSERT. Do your business logic, and either commit or rollback the transaction. As soon as you do the INSERT on the non-existent record index on the first transaction, all other transactions will block if they attempt to INSERT a record with the same unique index. If the second transaction attempts to insert a record with the same index after the first transaction commits the insert, then it will get a "duplicate key" error. Handle accordingly.
SELECT ... LOCK IN SHARE MODE
If you select with LOCK IN SHARE MODE before the INSERT, if a previous transaction has inserted that record but hasn't committed yet, the SELECT ... LOCK IN SHARE MODE will block until the previous transaction has completed.
So to reduce the chance of duplicate key errors, especially if you hold the locks for awhile while performing business logic before committing them or rolling them back:
SELECT bar FROM FooBar WHERE foo = ? LOCK FOR UPDATE
If no records returned, then
INSERT INTO FooBar (foo, bar) VALUES (?, ?)
In Oracle, the SELECT ... FOR UPDATE has no effect on a non-existent row (the statement simply raises a No Data Found exception). The INSERT statement will prevent a duplicates of unique/primary key values. Any other transactions attempting to insert the same key values will block until the first transaction commits (at which time the blocked transaction will get a duplicate key error) or rolls back (at which time the blocked transaction continues).
On Oracle:
Session 1
create table t (id number);
alter table t add constraint pk primary key(id);
SELECT *
FROM t
WHERE id = 1
FOR UPDATE;
-- 0 rows returned
-- this creates row level lock on table, preventing others from locking table in exclusive mode
Session 2
SELECT *
FROM t
FOR UPDATE;
-- 0 rows returned
-- there are no problems with locking here
rollback; -- releases lock
INSERT INTO t
VALUES (1);
-- 1 row inserted without problems
I wrote a detailed analysis of this thing on SQL Server: Developing Modifications that Survive Concurrency
Anyway, you need to use SERIALIZABLE isolation level, and you really need to stress test.
SQL Server only has the FOR UPDATE as part of a cursor. And, it only applies to UPDATE statements that are associated with the current row in the cursor.
So, the FOR UPDATE has no relationship with INSERT. Therefore, I think your answer is that it's not applicable in SQL Server.
Now, it may be possible to simulate the FOR UPDATE behavior with transactions and locking strategies. But, that may be more than what you're looking for.