Preventing deadlocks in SQL Server

Preventing deadlocks in SQL Server - sql

I have an application connected to a SQL Server 2014 database that combines several rows into one. There are no other connections to this database while the application is running.
First, select a chunk of rows within a specific time span. This query uses a non-clustered seek (TIME column) merged with a clustered lookup.
select ...
from FOO
where TIME >= #from and TIME < #to and ...
Then, we process these rows in c# and write changes as a single update and multiple deletes, this happens many times per chunk. These also use non-clustered index seeks.
begin tran
update FOO set ...
where NON_CLUSTERED_ID = #id
delete FOO where NON_CLUSTERED_ID in (#id1, #id2, #id3, ...)
commit
I am getting deadlocks when running this with multiple parallel chunks. I tried using ROWLOCK for the update and delete but that caused even more deadlocks than before for some reason, even though there are no overlaps between chunks.
Then I tried TABLOCKX, HOLDLOCK on the update, but that means I can't perform my select in parallel so I'm losing the advantages of parallelism.
Any idea how I can avoid deadlocks but still process multiple parallel chunks?
Would it be safe to use NOLOCK on my select in this case, given there is no row overlap between chunks? Then TABLOCKX, HOLDLOCK would only block the update and delete, correct?
Or should I just accept that deadlocks will happen and retry the query in my application?
UPDATE (additional information): All deadlocks so far have happened in the update and delete phase, none in the select. I'll try to get some deadlock logs up if I can't get this solved today (the correct trace flags weren't enabled before).
UPDATE: These are the two arrangements of deadlocks that occur with ROWLOCK, they both refer only to the delete statement and the non-clustered index it uses. I'm not sure if these are the same as the deadlocks that occur without any table hints as I wasn't able to reproduce any of those.
Ask if there's anything else needed from the .xdl, I'm a bit weary of attaching the whole thing.

The general advice regarding deadlocks: make sure you do everything in the same order, i.e. acquire locks in the same order, for different processes.
You can find the same advice in this technical article on microsoft.com regarding Minimizing Deadlocks. There's a good reason it is listed first.
Access objects in the same order.
Avoid user interaction in transactions.
Keep transactions short and in one batch.
Use a lower isolation level.
Use a row versioning-based isolation level.
Set READ_COMMITTED_SNAPSHOT database option ON to enable read-committed transactions to use row versioning.
Use snapshot isolation.
Use bound connections.
Update after question from Cato:
How would acquiring locks in the same order apply here? Have you got any advice on how he would change his SQL to do that?
Deadlocks are always the same, no matter what environment: two processes (say A & B) acquire multiple locks (say X & Y) in a different order so that A is waiting for Y and B is waiting for X while A is holding X and B is holding Y.
It applies here because DELETE and UPDATE statements implicitely acquire locks on the rows or index range or table (depending on what the engine deems appropriate).
You should analyze your process and see if there are scenarios where locks could be acquired in a different order. If that doesn't reveal anything, you can analyze deadlocks using the SQL Server Profiler:
To trace deadlock events, add the Deadlock graph event class to a trace. This event class populates the TextData data column in the trace with XML data about the process and objects that are involved in the deadlock. SQL Server Profiler can extract the XML document to a deadlock XML (.xdl) file which you can view later in SQL Server Management Studio. You can configure SQL Server Profiler to extract Deadlock graph events to a single file that contains all Deadlock graph events, or to separate files.

I'd use sp_getapplock in the updating transaction to prevent multiple instances of this code running in parallel. This will not block the selecting statement as table locking hints do.
You still should program the retrying logic, because it may take a while to acquire the lock, longer than the timeout parameter.
This is how the updating transaction can be wrapped into sp_getapplock.
BEGIN TRANSACTION;
BEGIN TRY
DECLARE #VarLockResult int;
EXEC #VarLockResult = sp_getapplock
#Resource = 'some_unique_name_app_lock',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 60000,
#DbPrincipal = 'public';
IF #VarLockResult >= 0
BEGIN
-- Acquired the lock
update FOO set ...
where NON_CLUSTERED_ID = #id
delete FOO where NON_CLUSTERED_ID in (#id1, #id2, #id3, ...)
END ELSE BEGIN
-- return some error code, so that the caller could retry
END;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
-- handle the error
END CATCH;
The selecting statement doesn't need any changes.
I would recommend against NOLOCK, even though you say that IDs in chunks do not overlap. With this hint the SELECT query can skip some pages that are being changed, it can read some pages twice. It is unlikely that such behavior can be tolerated.

Kindly use get applock in such format in code.  The stored procedure sp_getapplock puts the lock on the application resource .
EXEC Sp_getapplock
#Resource = 'storeprocedurename',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 25000
It is very helpful. Kindly increase LockTimeout to reduce deadlock

Related

Should I use sp_getapplock to prevent multiple instances of a stored procedure that conditionally inserts?

Hear me out! I know this use case sounds suspect, but...
I have a stored procedure which checks a table (effectively a cache) for data for a given requested ID. If it doesn't find any data for that ID, or deems it out of date, it executes a second stored procedure which will pull data from a separate database (using dynamic SQL, source DB name is based on the requested ID) and insert it into the local table. It then selects from this table.
If the data is in the table, everything returns quickly (ms), but if it needs to be brought in from the other database, it takes about 10 seconds. We're seeing race conditions where two concurrent instances check the local cache, see something is missing, and queue up sequential ingestions of the remote data into the cache. To avoid double-insertion, the cache-populating procedure will clear whatever is already there for this id, but this just means the first instance of the procedure can selecting no rows because the second instance deleted the just-inserted records before re-inserting them itself.
I think I want to put a lock around the entire procedure (checking the cache, potentially populating the cache, selecting from the cache) - although I'm open to other solutions. I think the overall caching approach has to remain on-demand though, the remote databases come and go by the hundreds, and we only want to cache the ones actually requested by reporting as-needed.
BEGIN TRANSACTION;
BEGIN TRY
-- Take out a lock intended to prevent anyone else modifying the cache while we're reading and potentially modifying it
EXEC sp_getapplock #Resource = '[private].[cache_entries]', #LockOwner='Transaction', #LockMode = 'Exclusive', #LockTimeout = 120000;
-- Invoke a stored procedure that ingests any required data that is not already cached
EXEC [private].populate_cache #required_dbs
-- CALCULATIONS
-- ... SELECT FROM [private].cache_entries
COMMIT TRANSACTION; -- Free the lock
END TRY
BEGIN CATCH --Ensure we release our lock on failure
ROLLBACK TRANSACTION;
THROW
END CATCH;

The alternative to sp_getapplock is to use locking hints with your transaction. Both are reasonable approaches. Locking hints can be complex, but they protect the target object itself rather than a single code path. So sometimes necessary. sp_getapplock is simple (with Transaction as owner), and reliable.

You can do this without sp_getapplock, which tends to inhibit concurrency a lot.
The way to do this is to continue do your checks within a transaction, but to apply a HOLDLOCK hint, as well as a UPDLOCK hint.
HOLDLOCK aka the SERIALIZABLE isolation level, will place a lock not only on the ID you specify, but even on the absence of such data, in other words it will prevent anyone else inserting into that ID.
You must use both these hints, as well as have an index that matches that SELECT, otherwise you could run into major blocking and deadlocking problems due to full table scans.
Also, you don't need a CATCH and ROLLBACK. Just use SET XACT_ABORT ON; which ensures a rollback in any event of an error.
SET XACT_ABORT ON; -- always have this set
BEGIN TRANSACTION;
DECLARE #SomeData nvarchar(100) = (
SELECT ce.SomeColumn
FROM [private].cache_entries ce WITH (HOLDLOCK, UPDLOCK)
WHERE ce.SomeCondition = 1
);
IF #SomeData IS NULL
BEGIN
-- Invoke a stored procedure that ingests any required data that is not already cached
EXEC [private].populate_cache #required_dbs
END
-- CALCULATIONS
-- ... SELECT FROM [private].cache_entries
COMMIT TRANSACTION; -- Free the lock

Use T-SQL Transaction for batch of delete statements?

I have a stored procedure that deletes records from multiple tables.
I wish for either all of the delete statements to complete successfully, or none. The actual purpose here is to wipe all data related to a particular user.
Note that none of this data is related in any way to any other data. E.g. a user's data is not referenced in any way by another users data. However it is possible to have concurrent client sources accessing one user's data simultaneously. I don't know if this is relevant
So I've wrapped it in BEGIN TRANSACTION ... COMMIT TRANSACTION
like so:
CREATE PROCEDURE [dbo].[spDeleteData]
#MyID AS INT
AS
BEGIN TRANSACTION
DELETE FROM [Table1] WHERE myId = #MyID;
DELETE FROM [Table2] WHERE myId = #MyID;
....
COMMIT TRANSACTION
RETURN 0
My question here is what are the implications of wrapping multiple DELETE calls in a transaction? Will it create possible deadlock scenarios, or hurt performance in some way?
From what I am reading, using TRANSACTION ISOLATION LEVEL only applies to read operations, is this true?

What you are guaranteeing is that either all the rows that match the conditions in both tables are successfully deleted or none of the rows are deleted (i.e. if there is a problem the deletes are rolled back.) There are more locks and they are kept for a longer period but if it fails you don't have to manually recreate the rows the deletes are undone for you automatically. You probably want to add the statement:
set xact_abort on
at the beginning of the transaction and to wrap the whole thing in a begin try/begin catch statement.
Please see sommarskog.se/error-handling-I.html#XACT_ABORT for an execellent discussion on this statement and on error handling for TSQL.

Why deadlock occurs?

I use a small transaction which consists of two simple queries: select and update:
SELECT * FROM XYZ WHERE ABC = DEF
and
UPDATE XYZ SET ABC = 123
WHERE ABC = DEF
It is quite often situation when the transaction is started by two threads, and depending on Isolation Level deadlock occurs (RepeatableRead, Serialization). Both transactions try to read and update exactly the same row.
I'm wondering why it is happening. What is the order of queries which leads to deadlock? I've read a bit about lock (shared, exclusive) and how long locks last for each isolation level, but I still don't fully understand...
I've even prepared a simple test which always result in deadlock. I've looked at results of the test in SSMS and SQL Server Profiler. I started first query and then immediately the second.
First query:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
SELECT ...
WAITFOR DELAY '00:00:04'
UPDATE ...
COMMIT
Second query:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
SELECT ...
UPDATE ...
COMMIT
Now I'm not able to show you detailed logs, but it looks less or more like this (I've very likely missed Lock:deadlock etc. somewhere):
(1) SQL:BatchStarting: First query
(2) SQL:BatchStarting: Second query
(3) Lock:timeout for second query
(4) Lock:timeout for first query
(5) Deadlock graph
If I understand locks well, in (1) first query takes a shared lock (to execute SELECT), then goes to sleep and keeps the shared lock until the end of transaction. In (2) second query also takes shared lock (SELECT) but cannot take exclusive lock (UPDATE) while there are shared locks on the same row, which results in Lock:timeout. But I can't explain why timeout for second query occurs. Probably I don't understand the whole process well. Can anybody give a good explanation?
I haven't noticed deadlocks using ReadCommitted but I'm afraid they may occur.
What solution do you recommend?

A deadlock occurs when two or more tasks permanently block each other by each task having a lock on a resource which the other tasks are trying to lock
http://msdn.microsoft.com/en-us/library/ms177433.aspx

"But I can't explain why timeout for second query occurs."
Because the first query holds shared lock. Then the update in the first query also tries to get the exclusive lock, which makes him sleep. So the first and second query are both sleeping waiting for the other to wake up - and this is a deadlock which results in timeout :-)
In mysql it works better - the deadlock is detected immediatelly and one of the transactions is rolled back (you need not to wait for timeout :-)).
Also, in mysql, you can do the following to prevent deadlock:
select ... for update
which will put a write-lock (i.e. exclusive lock) just from the beginning of the transaction, and this way you avoid the deadlock situation! Perhaps you can do something similar in your database engine.

For MSSQL there is a mechanism to prevent deadlocks. What you need here is called the WITH NOLOCK hint.
In 99.99% of the cases of SELECT statements it's usable and there is no need to bundle the SELECT with the UPDATE. There is also no need to put a SELECT into a transaction. The only exception is when dirty reads are not allowed.
Changing your queries to this form would solve all your issues:
SELECT ...
FROM yourtable WITH (NOLOCK)
WHERE ...
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
UPDATE ...
COMMIT

It has been a long time since I last dealt with this, but I believe that the select statement creates a read-lock, which only prevents the data to be changed -- hence multiple queries can hold and share a read-lock on the same data. The shared-read-lock is for read consistency, that is if you multiple times in your transaction reads the same row, then read-consistency should mean that you should always get the same result.
The update statement requires an exclusive lock, and hence the update statement have to wait for the read-lock to be released.
None of the two transactions will release the locks, so the transactions fails.
Different databases implementations have different strategies for how to deal with this, with Sybase and MS-SQL-servers using lock escalation with timeout (escalate from read-to-write-lock) -- Oracle I believe (at some point) implemented read consistency though use of the roll-back-log, where MySQL have yet a different strategy.

Transactions within loop within stored procedure

I'm working on a procedure that will update a large number of items on a remote server, using records from a local database. Here's the pseudocode.
CREATE PROCEDURE UpdateRemoteServer
pre-processing
get cursor with ID's of records to be updated
while on cursor
process the item
No matter how much we optimize it, the routine is going to take a while, so we don't want the whole thing to be processed as a single transaction. The items are flagged after being processed, so it should be possible to pick up where we left off if the process is interrupted.
Wrapping the contents of the loop ("process the item") in a begin/commit tran does not do the trick... it seems that the whole statement
EXEC UpdateRemoteServer
is treated as a single transaction. How can I make each item process as a complete, separate transaction?
Note that I would love to run these as "non-transacted updates", but that option is only available (so far as I know) in 2008.

EXEC procedure does not create a transaction. A very simple test will show this:
create procedure usp_foo
as
begin
select ##trancount;
end
go
exec usp_foo;
The ##trancount inside usp_foo is 0, so the EXEC statement does not start an implicit transaction. If you have a transaction started when entering UpdateRemoteServer it means somebody started that transaction, I can't say who.
That being said, using remote servers and DTC to update items is going to perform quite bad. Is the other server also SQL Server 2005 at least? Maybe you can queue the requests to update and use messaging between the local and remote server and have the remote server perform the updates based on the info from the message. It would perform significantly better because both servers only have to deal with local transactions, and you get much better availability due to the loose coupling of queued messaging.
Updated
Cursors actually don't start transactions. The typical cursor based batch processing is usually based on cursors and batches updates into transactions of a certain size. This is fairly common for overnight jobs, as it allows for better performance (log flush throughput due to larger transaction size) and jobs can be interrupted and resumed w/o losing everithing. A simplified version of a batch processing loop is typically like this:
create procedure usp_UpdateRemoteServer
as
begin
declare #id int, #batch int;
set nocount on;
set #batch = 0;
declare crsFoo cursor
forward_only static read_only
for
select object_id
from sys.objects;
open crsFoo;
begin transaction
fetch next from crsFoo into #id ;
while ##fetch_status = 0
begin
-- process here
declare #transactionId int;
SELECT #transactionId = transaction_id
FROM sys.dm_tran_current_transaction;
print #transactionId;
set #batch = #batch + 1
if #batch > 10
begin
commit;
print ##trancount;
set #batch = 0;
begin transaction;
end
fetch next from crsFoo into #id ;
end
commit;
close crsFoo;
deallocate crsFoo;
end
go
exec usp_UpdateRemoteServer;
I ommitted the error handling part (begin try/begin catch) and the fancy ##fetch_status checks (static cursors actually don't need them anyway). This demo code shows that during the run there are several different transactions started (different transaction IDs). Many times batches also deploy transaction savepoints at each item processed so they can skip safely an item that causes an exception, using a pattern similar to the one in my link, but this does not apply to distributed transactions since savepoints and DTC don't mix.

EDIT: as pointed out by Remus below, cursors do NOT open a transaction by default; thus, this is not the answer to the question posed by the OP. I still think there are better options than a cursor, but that doesn't answer the question.
Stu
ORIGINAL ANSWER:
The specific symptom you describe is due to the fact that a cursor opens a transaction by default, therefore no matter how you work it, you're gonna have a long-running transaction as long as you are using a cursor (unless you avoid locks altogether, which is another bad idea).
As others are pointing out, cursors SUCK. You don't need them for 99.9999% of the time.
You really have two options if you want to do this at the database level with SQL Server:
Use SSIS to perform your operation; very fast, but may not be available to you in your particular flavor of SQL Server.
Because you're dealing with remote servers, and you're worried about connectivity, you may have to use a looping mechanism, so use WHILE instead and commit batches at a time. Although WHILE has many of the same issues as a cursor (looping still sucks in SQL), you avoid creating the outer transaction.
Stu

Are yo running this only from within sql server, or from an app? if so, get the list to be processed, then loop in the app to only process for the subsets as required.
Then the transaction should be handled by your app, and should only lock the items being updated/pages the items are in.

NEVER process one item at a time in a loop when you are doing transactional work. You can loop through records processing groups of them but never ever do one record at a time. Do set-based inserts instead and your performance will change from hours to minutes or even seconds. If you are using a cursor to insert update or delete and it isn't handling at least 1000 rowa in each statement (not one at atime) you are doing the wrong thing. Cursors are an extremely poor practice for such thing.

Just an idea ..
Only process a few items when the procedure is called (e.g. only get the TOP 10 items to process)
Process those
Hopefully, this will be the end of the transaction.
Then write a wrapper that calls the procedure as long as there is more work to do (either use a simple count(..) to see if there are items or have the procedure return true indicating that there is more work to do.
Don't know if this works, but maybe the idea is helpful.

Sybase ASE: "Your server command encountered a deadlock situation"

When running a stored procedure (from a .NET application) that does an INSERT and an UPDATE, I sometimes (but not that often, really) and randomly get this error:
ERROR [40001] [DataDirect][ODBC Sybase Wire Protocol driver][SQL Server]Your server command (family id #0, process id #46) encountered a deadlock situation. Please re-run your command.
How can I fix this?
Thanks.

Your best bet for solving you deadlocking issue is to set "print deadlock information" to on using
sp_configure "print deadlock information", 1
Everytime there is a deadlock this will print information about what processes were involved and what sql they were running at the time of the dead lock.
If your tables are using allpages locking. It can reduce deadlocks to switch to datarows or datapages locking. If you do this make sure to gather new stats on the tables and recreate indexes, views, stored procedures and triggers that access the tables that are changed. If you don't you will either get errors or not see the full benefits of the change depending on which ones are not recreated.

I have a set of long term apps which occasionally over lap table access and sybase will throw this error. If you check the sybase server log it will give you the complete info on why it happened. Like: The sql that was involved the two processes trying to get a lock. Usually one trying to read and the other doing something like a delete. In my case the apps are running in separate JVMs, so can't sychronize just have to clean up periodically.

Assuming that your tables are properly indexed (and that you are actually using those indexes - always worth checking via the query plan) you could try breaking the component parts of the SP down and wrapping them in separate transactions so that each unit of work is completed before the next one starts.
begin transaction
update mytable1
set mycolumn = "test"
where ID=1
commit transaction
go
begin transaction
insert into mytable2 (mycolumn) select mycolumn from mytable1 where ID = 1
commit transaction
go

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas