Strange deadlock PostgreSQL deadlock issue with SELECT FOR UPDATE - sql

I am building a locking system based on PostgreSQL, I have two methods, acquire and release.
For acquire, it works like this
BEGIN
while True:
SELECT id FROM my_locks WHERE locked = false AND id = '<NAME>' FOR UPDATE
if no rows return:
continue
UPDATE my_locks SET locked = true WHERE id = '<NAME>'
COMMIT
break
And for release
BEGIN
UPDATE my_locks SET locked = false WHERE id = '<NAME>'
COMMIT
This looks pretty straightforward, but it doesn't work. The strange part of it is, I thought
SELECT id FROM my_locks WHERE locked = false AND id = '<NAME>' FOR UPDATE
should only acquire the lock on target row only if the target row's locked is false. But in reality, it's not like that. Somehow, even no locked = false row exists, it acquire lock anyway. As a result, I have a deadlock issue. It looks like this
Release is waiting for SELECT FOR UPDATE, and SELECT FOR UPDATE is doing infinite loop while it's holding a lock for no reason.
To reproduce the issue, I wrote a simple test here
https://gist.github.com/victorlin/d9119dd9dfdd5ac3836b
You can run it with psycopg2 and pytest, remember to change the database setting, and run
pip install pytest psycopg2
py.test -sv test_lock.py

The test case plays out like this:
Thread-1 runs the SELECT and acquires the record lock.
Thread-2 runs the SELECT and enters the lock's wait queue.
Thread-1 runs the UPDATE / COMMIT and releases the lock.
Thread-2 acquires the lock. Detecting that the record has changed since its SELECT, it rechecks the data against its WHERE condition. The check fails, and the row is filtered out of the result set, but the lock is still held.
This behaviour is mentioned in the FOR UPDATE documentation:
...rows that satisfied the query conditions as of the query snapshot will be locked, although they will not be returned if they were updated after the snapshot and no longer satisfy the query conditions.
This can have some unpleasant consequences, so a superfluous lock isn't that bad, all things considered.
Probably the simplest workaround is to limit the lock duration by committing after every iteration of acquire. There are various other ways to prevent it from holding this lock (e.g. SELECT ... NOWAIT, running in a REPEATABLE READ or SERIALIZABLE isolation level, SELECT ... SKIP LOCKED in Postgres 9.5).
I think the cleanest implementation using this retry-loop approach would be to skip the SELECT altogether, and just run an UPDATE ... WHERE locked = false, committing each time. You can tell if you acquired the lock by checking cur.rowcount after calling cur.execute(). If there is additional information you need to pull from the lock record, you can use an UPDATE ... RETURNING statement.
But I would have to agree with #Kevin, and say that you'd probably be better off leveraging Postgres' built-in locking support than trying to reinvent it. It would solve a lot of problems for you, e.g.:
Deadlocks are automatically detected
Waiting processes are put to sleep, rather than having to poll the server
Lock requests are queued, preventing starvation
Locks would (generally) not outlive a failed process
The easiest way might be to implement acquire as SELECT FROM my_locks FOR UPDATE, release simply as COMMIT, and let the processes contend for the row lock. If you need more flexibility (e.g. blocking/non-blocking calls, transaction/session/custom scope), advisory locks should prove useful.

PostgreSQL normally aborts transactions which deadlock:
The use of explicit locking can increase the likelihood of deadlocks, wherein two (or more) transactions each hold locks that the other wants. For example, if transaction 1 acquires an exclusive lock on table A and then tries to acquire an exclusive lock on table B, while transaction 2 has already exclusive-locked table B and now wants an exclusive lock on table A, then neither one can proceed. PostgreSQL automatically detects deadlock situations and resolves them by aborting one of the transactions involved, allowing the other(s) to complete. (Exactly which transaction will be aborted is difficult to predict and should not be relied upon.)
Looking at your Python code, and at the screenshot you showed, it appears to me that:
Thread 3 is holding the locked=true lock, and is waiting to acquire a row lock.
Thread 1 is also waiting for a row lock, and also the locked=true lock.
The only logical conclusion is that Thread 2 is somehow holding the row lock, and waiting for the locked=true lock (note the short time on that query; it is looping, not blocking).
Since Postgres is not aware of the locked=true lock, it is unable to abort transactions to prevent deadlock in this case.
It's not immediately clear to me how T2 acquired the row lock, since all the information I've looked at says it can't do that:
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.
I was not able to find any evidence of PostgreSQL "magically" upgrading row locks to table locks or anything similar.
But what you're doing is not obviously safe, either. You're acquiring lock A (the row lock), then acquiring lock B (the explicit locked=true lock), then releasing and re-acquiring A, before finally releasing B and A in that order. This does not properly observe a lock hierarchy since we try both to acquire A while holding B and vice-versa. But OTOH, acquiring B while holding A should not fail (I think), so I'm still not sure this is outright wrong.
Quite frankly, it's my opinion that you'd be better off just using the LOCK TABLE statement on an empty table. Postgres is aware of these locks and will detect deadlocks for you. It also saves you the trouble of the SELECT FOR UPDATE finagling.

Also, you should add locked = true in the release code:
BEGIN
UPDATE my_locks SET locked = false WHERE id = '<NAME>' AND locked = true
COMMIT
If not, you are updating the record whatever locked state it is (in your case, even when locked = false), and adding the odds of causing a deadlock.

Related

Understanding locks and query status in Snowflake (multiple updates to a single table)

While using the python connector for snowflake with queries of the form
UPDATE X.TABLEY SET STATUS = %(status)s, STATUS_DETAILS = %(status_details)s WHERE ID = %(entry_id)s
, sometimes I get the following message:
(snowflake.connector.errors.ProgrammingError) 000625 (57014): Statement 'X' has locked table 'XX' in transaction 1588294931722 and this lock has not yet been released.
and soon after that
Your statement X' was aborted because the number of waiters for this lock exceeds the 20 statements limit
This usually happens when multiple queries are trying to update a single table. What I don't understand is that when I see the query history in Snowflake, it says the query finished successfully (Succeded Status) but in reality, the Update never happened, because the table did not alter.
So according to https://community.snowflake.com/s/article/how-to-resolve-blocked-queries I used
SELECT SYSTEM$ABORT_TRANSACTION(<transaction_id>);
to release the lock, but still, nothing happened and even with the succeed status the query seems to not have executed at all. So my question is, how does this really work and how can a lock be released without losing the execution of the query (also, what happens to the other 20+ queries that are queued because of the lock, sometimes it seems that when the lock is released the next one takes the lock and have to be aborted as well).
I would appreciate it if you could help me. Thanks!
Not sure if Sergio got an answer to this. The problem in this case is not with the table. Based on my experience with snowflake below is my understanding.
In snowflake, every table operations also involves a change in the meta table which keeps track of micro partitions, min and max. This meta table supports only 20 concurrent DML statements by default. So if a table is continuously getting updated and getting hit at the same partition, there is a chance that this limit will exceed. In this case, we should look at redesigning the table updation/insertion logic. In one of our use cases, we increased the limit to 50 after speaking to snowflake support team
UPDATE, DELETE, and MERGE cannot run concurrently on a single table; they will be serialized as only one can take a lock on a table at at a time. Others will queue up in the "blocked" state until it is their turn to take the lock. There is a limit on the number of queries that can be waiting on a single lock.
If you see an update finish successfully but don't see the updated data in the table, then you are most likely not COMMITting your transactions. Make sure you run COMMIT after an update so that the new data is committed to the table and the lock is released.
Alternatively, you can make sure AUTOCOMMIT is enabled so that DML will commit automatically after completion. You can enable it with ALTER SESSION SET AUTOCOMMIT=TRUE; in any sessions that are going to run an UPDATE.

SQL queries causing deadlock errors

There is an update query which causes deadlock errors I don't know why. There is already (rowlock, updlock) hint used in the update query still it gives deadlock error.
Sample query:
update table a with (rowlock, updlock)
set a.column1 = value
This same query is used in several stored procedures which may be called simultaneously. But since a lock is specified should it still causes a deadlock
Deadlock occurs when two or more tasks permanently block each other by each task having a lock on a resource which the other tasks are trying to lock.
Since your are explicitly specifying with (rowlock, updlock) which add a lock upon transaction. Eventually, when one transaction is being executed, it places a lock on the table. If another transaction wants to access that same record, it will have to wait until the previous transaction finishes and remove the lock for it to proceed.
Using with NOLOCK enables you to bypass the locks but in case of update, that would be too risky. More so, as you mentioned, several update can be executed simultaneously.
In your case, it seems that the Locks is the culprit. But however, Locks are not the only source of deadlocks. It might be memory issues or thread execution also. This LINK might help you identify the real cause of your deadlock.

How to remove deadlocks in SQL Server 2005?

First of all I would like to know what is the actual root cause of deadlocks in SQL Server 2005. Is it because when two processes access the same row in a table?
Anyways, consider two tables _Table_Now_ and _Table_History_ where both having the same structure.
Suppose there is one column called NAME.
So when one process tries to UPDATE a record with NAME='BLUE' in _Table_Now_, first, it need to put the present row with NAME='BLUE' into _Table_History_ then update
_Table_Now_, and also delete previously present row from _Table_History_.
Deadlock occurs while deleting. I do not understand why?
Please guide me!
deadlock basically mean when process A is dependent on process B and process B is dependent on process A, so A will just start\continue when B finishes and B will only start\continue when A finishes
what you may be experiencing are table (or row) lock, so SQL locks the row before updating the table to make sure no other process tries to access that row while it is doing the update.
Can you be more specific on how are you doing the insert\update\delete. You shouldnt have deadlocks in this scenario.
FYI, don't use with (NOLOCK). It will yes prevent from locking but it does so by telling SQL Server to read uncommitted data, and it can end up in data inconsistencies.
Deadlock occurs when Process A is waiting for Process B to release resources and Process B is waiting for Process A to release resources.
If I understand the order of Updates correctly, it is this:
1. Read a row in Table_Now
2. Update a row in Table_History
3. Update a row in Table_Now
4. Delete a row in Table_History.
This could be a risky order if you are using transactions or locks incorrectly.
To avoid deadlocks, for each process you should execute:
1. Begin Transaction (Preferably table lock)
2. Perform all the DB operations
3. Commit the transaction (or Rollback in case any problem occurs while DB update)
This will ensure each process to lock both the tables, perform all the operations and then exit.
If you are already using transactions, what scope and level are you using? If not, introduce transactions. It should solve the problem.

Why deadlock occurs?

I use a small transaction which consists of two simple queries: select and update:
SELECT * FROM XYZ WHERE ABC = DEF
and
UPDATE XYZ SET ABC = 123
WHERE ABC = DEF
It is quite often situation when the transaction is started by two threads, and depending on Isolation Level deadlock occurs (RepeatableRead, Serialization). Both transactions try to read and update exactly the same row.
I'm wondering why it is happening. What is the order of queries which leads to deadlock? I've read a bit about lock (shared, exclusive) and how long locks last for each isolation level, but I still don't fully understand...
I've even prepared a simple test which always result in deadlock. I've looked at results of the test in SSMS and SQL Server Profiler. I started first query and then immediately the second.
First query:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
SELECT ...
WAITFOR DELAY '00:00:04'
UPDATE ...
COMMIT
Second query:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
SELECT ...
UPDATE ...
COMMIT
Now I'm not able to show you detailed logs, but it looks less or more like this (I've very likely missed Lock:deadlock etc. somewhere):
(1) SQL:BatchStarting: First query
(2) SQL:BatchStarting: Second query
(3) Lock:timeout for second query
(4) Lock:timeout for first query
(5) Deadlock graph
If I understand locks well, in (1) first query takes a shared lock (to execute SELECT), then goes to sleep and keeps the shared lock until the end of transaction. In (2) second query also takes shared lock (SELECT) but cannot take exclusive lock (UPDATE) while there are shared locks on the same row, which results in Lock:timeout. But I can't explain why timeout for second query occurs. Probably I don't understand the whole process well. Can anybody give a good explanation?
I haven't noticed deadlocks using ReadCommitted but I'm afraid they may occur.
What solution do you recommend?
A deadlock occurs when two or more tasks permanently block each other by each task having a lock on a resource which the other tasks are trying to lock
http://msdn.microsoft.com/en-us/library/ms177433.aspx
"But I can't explain why timeout for second query occurs."
Because the first query holds shared lock. Then the update in the first query also tries to get the exclusive lock, which makes him sleep. So the first and second query are both sleeping waiting for the other to wake up - and this is a deadlock which results in timeout :-)
In mysql it works better - the deadlock is detected immediatelly and one of the transactions is rolled back (you need not to wait for timeout :-)).
Also, in mysql, you can do the following to prevent deadlock:
select ... for update
which will put a write-lock (i.e. exclusive lock) just from the beginning of the transaction, and this way you avoid the deadlock situation! Perhaps you can do something similar in your database engine.
For MSSQL there is a mechanism to prevent deadlocks. What you need here is called the WITH NOLOCK hint.
In 99.99% of the cases of SELECT statements it's usable and there is no need to bundle the SELECT with the UPDATE. There is also no need to put a SELECT into a transaction. The only exception is when dirty reads are not allowed.
Changing your queries to this form would solve all your issues:
SELECT ...
FROM yourtable WITH (NOLOCK)
WHERE ...
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
UPDATE ...
COMMIT
It has been a long time since I last dealt with this, but I believe that the select statement creates a read-lock, which only prevents the data to be changed -- hence multiple queries can hold and share a read-lock on the same data. The shared-read-lock is for read consistency, that is if you multiple times in your transaction reads the same row, then read-consistency should mean that you should always get the same result.
The update statement requires an exclusive lock, and hence the update statement have to wait for the read-lock to be released.
None of the two transactions will release the locks, so the transactions fails.
Different databases implementations have different strategies for how to deal with this, with Sybase and MS-SQL-servers using lock escalation with timeout (escalate from read-to-write-lock) -- Oracle I believe (at some point) implemented read consistency though use of the roll-back-log, where MySQL have yet a different strategy.

Update Zero Rows and then Committing?

Which procedure is more performant for an update which affects zero rows?
UPDATE table SET column = value WHERE id = number;
IF SQL%Rowcount > 0 THEN
COMMIT;
END IF;
or
UPDATE table SET column = value WHERE id = number;
COMMIT;
In other words if an Update affect ZERO rows and a commit is issued am I incurring any added expense at all?
I have a system which is being hampered by log file sync waits... and I'm wondering if issuing a commit; against a transaction which affects zero rows will write that statement to the log or not and thus cause more contention on LGWR.
COMMIT does force the log file sync so the system will have to wait indeed.
However, ROLLBACK does too and at some time either of them will have to happen.
So if you issue neither COMMIT nor ROLLBACK, you are just staying with an open transaction which sooner or later will cause a log sync wait.
Probably, you want to batch you UPDATE operations rather than waiting for a first successful update and committing it.
There are risks in this. Technically while the UPDATE may affect zero rows, it can fire before or after update triggers on the table (not at row level). Those triggers could potentially "do something" that requires a commit/rollback.
Safer to check to see if LOCAL_TRANSACTION_ID is set.
There are any number of reasons which can underlie waits for log file sync. It seems unlikely that the main culprit is committing SQL statements which have updated zero rows. It is true that issuing too many commits can be the cause of this problem. For instance, if the application is set up to commit after every statement (e.g. by using AUTOCOMMIT=TRUE) instead of designing proper transactions. If this is the cause then there is not much you can do, short of a major rewrite of the application.
If you want to delve deeper into the root causes of your problem I recommend you read this exhaustive (and exhausting) article by Pythian's Riyaj Shamsudeen on Tuning ‘log file sync’ Event Waits.