Avoid deadlock for concurrent delete - sql

I have a table called Products with many columns. It is a temporary table used for reporting purposes. The data will be processed to this table concurrently by multiple user requests. There are separate stored procedures to make DML operations to this table.
Table Structure:
CREATE TABLE Products (
instance uniqueidentifier,
inserted datetime,
col1,
col2,
...
)
The inserted column will be populated with GETDATE() to contain the time when each row was inserted and the instance column will contain the value from newid(). One user request will have one unique id but may have million rows. The below are the queries which will be executed concurrently, which causing the deadlock. Please advise me
Query 1:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DELETE P
FROM Products (NOLOCK)
WHERE instance = 'XXXX-xxx-xxx-xx'
Query 2:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DELETE P
FROM Products (NOLOCK)
WHERE inserted <= DATEADD(hh, -10, GETDATE())
Note: The nonclustered index is created on instance column.
Please advice me which lock I can use in this scenario.
Note I couldnt able to primary key as it is consuming time when I insert 10 million rows to the table (this for one transaction; there are 20 concurrent transations).
The report should be generated sooner. And my procedure has multiple 35 DML statments, there are around 15 DELETE statements for instance column with other columns( DELETE FROM table WHERE instance = #instance AND col1 = #col1).

(1) You should stop using read uncommitted isolation. Use at least read committed.
(2) There are a number of things you could try to avoid deadlocks, like ensuring your different transactions access database objects in the same order, etc. This would be worth a read - http://support.microsoft.com/kb/169960
(3) Disable lock escalation for your table (more granular locks so better concurrency, but more lock overhead):
ALTER TABLE Products SET (lock_escalation = disable)
(4) Disallow Page Locks, and allow Row Locks on your indexes (will mean you can't defrag indexes, but you can still rebuild them):
ALTER INDEX [<YourIndex>] ON Product WITH (allow_row_locks = on, allow_page_locks = off)

First, there's no lock that you can take on these delete statements besides an exclusive lock. Your isolation level and NOLOCK hints are being ignored by Sql Server:
(Nolock) Only applies to the SELECT statement.
Two suggestions:
Change your non-clustered index on instance to a clustered index. BUT, only do this if you can change NEWID() to NEWSEQUENTIALID().
Second, instead of performing a delete to remove records older than 10 hours... consider implementing rolling partitions. This will remove any contention caused by the cleanup with your other delete operations.

Related

Efficiency UPDATE WHERE ... vs CASE ... WHEN

In the case of a big table (~5 millions lines) what is more efficient to update all lines that matches a condition (circa 1000 rows):
1. A simple update statement ?
UPDATE table SET last_modif_date = NOW() WHERE condition;
2. A case when that perform an update if condition matches
UPDATE table SET
last_modif_date = (CASE WHEN CONDITION THEN NOW() ELSE last_modif_date END)
And why ?
Thanks in advance
I've made a simple test - and the results was that the where version is more efficient then the case version.
Here is the test I've made:
/*
-- Create a numbers (tally) table if you don't already have one
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Tally
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Tally ADD CONSTRAINT PK_Tally PRIMARY KEY CLUSTERED (Number)
*/
-- Create a dates table with 10000 rows
SELECT Number As Id, DATEADD(DAY, Number, DATEADD(DAY, -5000, GETDATE())) As TheDate
INTO DatesTest
FROM Tally
-- Update using a where clause
UPDATE DatesTest
SET TheDate = GETDATE()
WHERE Id % 100 = 0
-- Drop and re-create the same dates table
DROP TABLE DatesTest
SELECT Number As Id, DATEADD(DAY, Number, DATEADD(DAY, -5000, GETDATE())) As TheDate
INTO DatesTest
FROM Tally
-- Update using case
UPDATE DatesTest
SET TheDate = CASE WHEN Id % 100 = 0 THEN GETDATE() ELSE TheDate END
As you can see from the execution plan - The where clause version is only 7% of all execution cost while the case version is 34%.
I'ld say we have a winner.
A general best practice is not to update or delete without a where clause because people make mistakes and recovering from some of them can be a real pain.
Beyond that, even non-updating updates can have a significant impact on the database beyond just a poor performing query. Executing an update without a where can also lead to excessive locking and blocking of other concurrent operations.
I can not summarize this article any better than Paul White did for himself, here are some more things to consider:
SQL Server contains a number of optimisations to avoid unnecessary logging or page flushing when processing an UPDATE operation that will not result in any change to the persistent database.
Non-updating updates to a clustered table generally avoid extra logging and page flushing, unless a column that forms (part of) the cluster key is affected by the update operation.
If any part of the cluster key is ‘updated’ to the same value, the operation is logged as if data had changed, and the affected pages are marked as dirty in the buffer pool. This is a consequence of the conversion of the UPDATE to a delete-then-insert operation.
Heap tables behave the same as clustered tables, except they do not have a cluster key to cause any extra logging or page flushing. This remains the case even where a non-clustered primary key exists on the heap. Non-updating updates to a heap therefore generally avoid the extra logging and flushing (but see below).
Both heaps and clustered tables will suffer the extra logging and flushing for any row where a LOB column containing more than 8000 bytes of data is updated to the same value using any syntax other than ‘SET column_name = column_name’.
Simply enabling either type of row versioning isolation level on a database always causes the extra logging and flushing. This occurs regardless of the isolation level in effect for the update transaction.
~ The Impact of Non-Updating Updates - Paul White
The result are just different since the UPDATE SET with CASE will update ALL the rows and will triggers on update and so on EVEN if in your case your new values is in fact the previous value.
The UPDATE with a WHERE clause will UPDATE only the rows you really want to update and trigger only on these rows. Assuming you have an index, it is more efficient in most cases.
The only way for you to be sure is to analyse the actual execution plan of both queries and compare. Anyway the use of an UPDATE SET CASE over UPDATE WHERE is somewhat unnatural.
Usually, 'where' clause would be more efficient than 'Case' statement if you have an index on the column that you are using. 'Case' statement will make the query (search) sequential, meaning that every row will be checked for the condition. Where clause with index on the column (that can be used in the condition that you are applying), will only scan a subset of records and will use faster search algorithm (in most cases).
Bottom line is the actual query(condition that you are using) and the indexes can make a lot of differences in the performance of the statement. The best way to analyse the query, is to use the execution plan of the query in SQL server management studio.
You can read this post to learn more about execution plans:
https://www.simple-talk.com/sql/performance/execution-plan-basics/

SQL Server : how do I add a hint to every query against a table?

I have a transactional database. one of the tables is almost empty (A). it has a unique indexed column (x), and no clustered index.
2 concurrent transactions:
begin trans
insert into A (x,y,z) (1,2,3)
WAITFOR DELAY '00:00:02'; -- or manually run the first 2 lines only
select * from A where x=1; -- small tables produce a query plan of table scan here, and block against the transaction below.
commit
begin trans
insert into A (x,y,z) (2,3,4)
WAITFOR DELAY '00:00:02';
-- on a table with 3 or less pages this hint is needed to not block against the above transaction
select * from A with(forceseek) -- force query plan of index seek + rid lookup
where x=2;
commit
My problem is that when the table has very few rows the 2 transactions can deadlock, because SQL Server generates a table scan for the select, even though there is an index, and both wait on the lock held by the newly inserted row of the other transaction.
When there are lots of rows in this table, the query plan changes to an index seek, and both happily complete.
When the table is small, the WITH(FORCESEEK) hint forces the correct query plan (5% more expensive for tiny tables).
is it possible to provide a default hint for all queries on a table to pretend to have the 'forceseek' hint?
the deadlocking code above was generated by Hibernate, is it possible to have hibernate emit the needed query hints?
we can make the tables pretend to be large enough that the query optimizer selects the index seek with the undocumented features in UPDATE STATISTICS http://msdn.microsoft.com/en-AU/library/ms187348(v=sql.110).aspx . Can anyone see any downsides to making all tables with less than 1000 rows pretend they have 1000 rows over 10 pages?
You can create a Plan Guide.
Or you can enable Read Committed Snapshot isolation level in the database.
Better still: make the index clustered.
For small tables that experience high update ratio, perhaps you can apply the advice from Using tables as Queues.
Can anyone see any downsides to making all tables with less than 1000 rows pretend they have 1000 rows over 10 pages?
If the table appears in a another, more complex, query (think joins) then the cardinality estimates may cascade wildly off and produce bad plans.
You could create a view that is a copy of the table but with the hint and have queries use the view instead:
create view A2 as
select * from A with(forceseek)
If you want to preserve the table name used by queries, rename the table to something else then name the view "A":
sp_rename 'A', 'A2';
create view A as
select * from A2 with(forceseek)
Just to add another option you may consider.
You can lock entire table on update by using
ALTER TABLE MyTable SET LOCK_ESCALATION = TABLE
This workaround is fine if you do not have too many updates that will queue and slow performance.
It is table-wide and no updates to other code is needed.

SELECT COUNT(SomeId) with INSERT later to same SomeId: Appropriate locking strategy?

I am using SQL Server 2012. I have a repeatable read transaction where I perform this query:
select count(SomeId)
from dbo.MyTable
where SomeId = #SomeId
SomeId is a column whose value may repeat in the table (think foreign key). However, SomeId is not a member of any index nor is it a foreign key.
Later in the transaction, I insert a record into dbo.MyTable with the same #SomeId, thus changing what the select count(*) would return were I to run it again:
insert into dbo.MyTable (SomeId, ...)
values (#SomeId, ...)
Several threads in my application can execute this transaction at the same time. Because of this, I'm getting deadlocks on the insert statement. At first, I thought an updlock would be appropriate on the select statement, but I quickly realized that it wouldn't work because I'm not actually updating the rows selected by the select count(SomeId).
My question is this: is there any way to avoid a potentially expensive table lock? Is there any way to lock just rows that involve SomeId, even if they haven't been inserted yet (strange, I know)? I want to force other threads to wait while the original transaction completes its work but I don't want to lock rows unnecessarily.
EDIT
Here's what I'm trying to accomplish:
I only want to insert up to eight rows for a particular SomeId. There are several unrelated processes that can start one of these transactions, potentially at the same time. The select count detects whether there are already eight rows and causes the operation to fail for that process. If the count is less than eight, that same transaction performs additional work, then inserts a record at the end, thus effectively incrementing the count were the select count to be run again. I am hitting the deadlock on the insert statement.
If you have several processes that try to do the same thing and you don't want to have more records than some number, you will need to actually prevent those processes from running at the the same time.
One way would be to read the counts with exclusive lock:
select count(SomeId)
from dbo.MyTable with (xlock)
where SomeId = #SomeId
This way those records will be blocked until the transaction completes.
You should create an index for the SomeId column though as it will be most likely that the locks will be on held on the index level this way.

Running large queries in the background MS SQL

I am using MS SQL Server 2008
i have a table which is constantly in use (data is always changing and inserted to it)
it contains now ~70 Mill rows,
I am trying to run a simple query over the table with a stored procedure that should properly take a few days,
I need the table to keep being usable, now I executed the stored procedure and after a while every simple select by identity query that I try to execute on the table is not responding/running too much time that I break it
what should I do?
here is how my stored procedure looks like:
SET NOCOUNT ON;
update SOMETABLE
set
[some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
WHERE
[some_col] = 243
even if i try it with this on the where clause (with an 'and' logic..) :
ID_COL > 57000000 and ID_COL < 60000000 and
it still doesn't work
BTW- SomeFunction does some simple mathematics actions and looks up rows in another table that contains about 300k items, but is never changed
From my perspective your server has a serious performance problem. Even if we assume that none of the records in the query
select some_col with (nolock) where id_col between 57000000 and 57001000
was in memory, it shouldn't take 21 seconds to read the few pages sequentially from disk (your clustered index on the id_col should not be fragmented if it's an auto-identity and you didn't do something stupid like adding a "desc" to the index definition).
But if you can't/won't fix that, my advice would be to make the update in small packages like 100-1000 records at a time (depending on how much time the lookup function consumes). One update/transaction should take no more than 30 seconds.
You see each update keeps an exclusive lock on all the records it modified until the transaction is complete. If you don't use an explicit transaction, each statement is executed in a single, automatic transaction context, so the locks get released when the update statement is done.
But you can still run into deadlocks that way, depending on what the other processes do. If they modify more than one record at a time, too, or even if they gather and hold read locks on several rows, you can get deadlocks.
To avoid the deadlocks, your update statement needs to take a lock on all the records it will modify at once. The way to do this is to place the single update statement (with only the few rows limited by the id_col) in a serializable transaction like
IF ##TRANCOUNT > 0
-- Error: You are in a transaction context already
SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
-- Insert Loop here to work "x" through the id range
BEGIN TRANSACTION
UPDATE SOMETABLE
SET [some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
WHERE [some_col] = 243 AND id_col BETWEEN x AND x+500 -- or whatever keeps the update in the small timerange
COMMIT
-- Next loop
-- Get all new records while you where running the loop. If these are too many you may have to paginate this also:
BEGIN TRANSACTION
UPDATE SOMETABLE
SET [some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
WHERE [some_col] = 243 AND id_col >= x
COMMIT
For each update this will take an update/exclusive key-range lock on the given records (but only them, because you limit the update through the clustered index key). It will wait for any other updates on the same records to finish, then get it's lock (causing blocking for all other transactions, but still only for the given records), then update the records and release the lock.
The last extra statement is important, because it will take a key range lock up to "infinity" and thus prevent even inserts on the end of the range while the update statement runs.

Delete statement in SQL is very slow

I have statements like this that are timing out:
DELETE FROM [table] WHERE [COL] IN ( '1', '2', '6', '12', '24', '7', '3', '5')
I tried doing one at a time like this:
DELETE FROM [table] WHERE [COL] IN ( '1' )
and so far it's at 22 minutes and still going.
The table has 260,000 rows in it and is four columns.
Does anyone have any ideas why this would be so slow and how to speed it up?
I do have a non-unique, non-clustered index on the [COL] that i'm doing the WHERE on.
I'm using SQL Server 2008 R2
update: I have no triggers on the table.
Things that can cause a delete to be slow:
deleting a lot of records
many indexes
missing indexes on foreign keys in child tables. (thank you to #CesarAlvaradoDiaz for mentioning this in the comments)
deadlocks and blocking
triggers
cascade delete (those ten parent records you are deleting could mean
millions of child records getting deleted)
Transaction log needing to grow
Many Foreign keys to check
So your choices are to find out what is blocking and fix it or run the deletes in off hours when they won't be interfering with the normal production load. You can run the delete in batches (useful if you have triggers, cascade delete, or a large number of records). You can drop and recreate the indexes (best if you can do that in off hours too).
Disable CONSTRAINT
ALTER TABLE [TableName] NOCHECK CONSTRAINT ALL;
Disable Index
ALTER INDEX ALL ON [TableName] DISABLE;
Rebuild Index
ALTER INDEX ALL ON [TableName] REBUILD;
Enable CONSTRAINT
ALTER TABLE [TableName] CHECK CONSTRAINT ALL;
Delete again
Deleting a lot of rows can be very slow. Try to delete a few at a time, like:
delete top (10) YourTable where col in ('1','2','3','4')
while ##rowcount > 0
begin
delete top (10) YourTable where col in ('1','2','3','4')
end
In my case the database statistics had become corrupt. The statement
delete from tablename where col1 = 'v1'
was taking 30 seconds even though there were no matching records but
delete from tablename where col1 = 'rubbish'
ran instantly
running
update statistics tablename
fixed the issue
If the table you are deleting from has BEFORE/AFTER DELETE triggers, something in there could be causing your delay.
Additionally, if you have foreign keys referencing that table, additional UPDATEs or DELETEs may be occurring.
Preventive Action
Check with the help of SQL Profiler for the root cause of this issue. There may be Triggers causing the delay in Execution. It can be anything. Don't forget to Select the Database Name and Object Name while Starting the Trace to exclude scanning unnecessary queries...
Database Name Filtering
Table/Stored Procedure/Trigger Name Filtering
Corrective Action
As you said your table contains 260,000 records...and IN Predicate contains six values. Now, each record is being search 260,000 times for each value in IN Predicate. Instead it should be the Inner Join like below...
Delete K From YourTable1 K
Inner Join YourTable2 T on T.id = K.id
Insert the IN Predicate values into a Temporary Table or Local Variable
It's possible that other tables have FK constraint to your [table].
So the DB needs to check these tables to maintain the referential integrity.
Even if you have all needed indexes corresponding these FKs, check their amount.
I had the situation when NHibernate incorrectly created duplicated FKs on the same columns, but with different names (which is allowed by SQL Server).
It has drastically slowed down running of the DELETE statement.
Check execution plan of this delete statement. Have a look if index seek is used. Also what is data type of col?
If you are using wrong data type, change update statement (like from '1' to 1 or N'1').
If index scan is used consider using some query hint..
If you're deleting all the records in the table rather than a select few it may be much faster to just drop and recreate the table.
Is [COL] really a character field that's holding numbers, or can you get rid of the single-quotes around the values? #Alex is right that IN is slower than =, so if you can do this, you'll be better off:
DELETE FROM [table] WHERE [COL] = '1'
But better still is using numbers rather than strings to find the rows (sql likes numbers):
DELETE FROM [table] WHERE [COL] = 1
Maybe try:
DELETE FROM [table] WHERE CAST([COL] AS INT) = 1
In either event, make sure you have an index on column [COL] to speed up the table scan.
I read this article it was really helpful for troubleshooting any kind of inconveniences
https://support.microsoft.com/en-us/kb/224453
this is a case of waitresource
KEY: 16:72057595075231744 (ab74b4daaf17)
-- First SQL Provider to find the SPID (Session ID)
-- Second Identify problem, check Status, Open_tran, Lastwaittype, waittype, and waittime
-- iMPORTANT Waitresource select * from sys.sysprocesses where spid = 57
select * from sys.databases where database_id=16
-- with Waitresource check this to obtain object id
select * from sys.partitions where hobt_id=72057595075231744
select * from sys.objects where object_id=2105058535
After inspecting an SSIS Package(due to a SQL Server executing commands really slow), that was set up in a client of ours about 5-4 years before the time of me writing this, I found out that there were the below tasks:
1) insert data from an XML file into a table called [Importbarcdes].
2) merge command on an another target table, using as source the above mentioned table.
3) "delete from [Importbarcodes]", to clear the table of the row that was inserted after the XML file was read by the task of the SSIS Package.
After a quick inspection all statements(SELECT, UPDATE, DELETE etc.) on the table ImportBarcodes that had only 1 row, took about 2 minutes to execute.
Extended Events showed a whole lot PAGEIOLATCH_EX wait notifications.
No indexes were present of the table and no triggers were registered.
Upon close inspection of the properties of the table, in the Storage Tab and under general section, the Data Space field showed more than 6 GIGABYTES of space allocated in pages.
What happened:
The query ran for a good portion of time each day for the last 4 years, inserting and deleting data in the table, leaving unused pagefiles behind with out freeing them up.
So, that was the main reason of the wait events that were captured by the Extended Events Session and the slowly executed commands upon the table.
Running ALTER TABLE ImportBarcodes REBUILD fixed the issue freeing up all the unused space. TRUNCATE TABLE ImportBarcodes did a similar thing, with the only difference of deleting all pagefiles and data.
Older topic but one still relevant.
Another issue occurs when an index has become fragmented to the extent of becoming more of a problem than a help. In such a case, the answer would be to rebuild or drop and recreate the index and issuing the delete statement again.
As an extension to Andomar's answer, above, I had a scenario where the first 700,000,000 records (of ~1.2 billion) processed very quickly, with chunks of 25,000 records processing per second (roughly). But, then it starting taking 15 minutes to do a batch of 25,000. I reduced the chunk size down to 5,000 records and it went back to its previous speed. I'm not certain what internal threshold I hit, but the fix was to reduce the number of records, further, to regain the speed.
open CMD and run this commands
NET STOP MSSQLSERVER
NET START MSSQLSERVER
this will restart the SQL Server instance.
try to run again after your delete command
I have this command in a batch script and run it from time to time if I'm encountering problems like this. A normal PC restart will not be the same so restarting the instance is the most effective way if you are encountering some issues with your sql server.