I have a routine in our .NET web application that allows a user on our platform to clear their account (i.e. delete all their data). This routine runs in a stored procedure and essentially loops through the relevant data tables and clears down all the various items they have created.
The stored procedure looks something like this.
ALTER procedure [dbo].[spDeleteAccountData](
#accountNumber varchar(30) )
AS
BEGIN
SET ANSI_NULLS ON ;
SET NOCOUNT ON;
BEGIN TRAN
BEGIN TRY
DELETE FROM myDataTable1 WHERE accountNumber = #accountNumber
DELETE FROM myDataTable2 WHERE accountNumber = #accountNumber
DELETE FROM myDataTable3 WHERE accountNumber = #accountNumber
//Etc.........
END TRY
BEGIN CATCH
//CATCH ERROR
END CATCH
IF ##TRANCOUNT > 0
COMMIT TRANSACTION;
SET ANSI_NULLS OFF;
SET NOCOUNT OFF;
END
The problem is that in some cases we can have over 10,000 rows on a table and the procedure can take up to 3-5 minutes. During this period all the other connections on the database get throttled causing time-out errors like the one below:
System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Are there any general changes I can make to improve performance? I appreciate there are many unknowns related to the design of our database schema, but general best practice advice would be welcomed! I thought about scheduling this task to run during the early hours to minimise impact, but this is far from Ideal as the user wouldn't be able to regain access to their account until this task had been completed.
Additional Information:
SQL Server 2008 R2 Standard
All tables have a clustered index
No triggers have been associated to any delete commands on any of the relevant tables
Foreign key references exist on a number of tables but the deletion order accounts for this.
Edit: 16:52 GMT
The delete proc affects around 20 tables. The largest one has approx 5 million records. The others have no more the 200,000, with some containing only 1000-2000 records.
Do you have an index on accountNumber in all tables ?
Seeing that you delete using a WHERE clause by that column, this might help.
Another option (and probably even better solution) would be to schedule deletion operations at night, e.g. when user selects to delete his account, you're only setting a flag, and a delete job runs at night actually deleting those accounts flagged for deletion.
If you have an index on the accountNumber field then I guess the long time for deletion is due to locks (generated by other processes) or to foreign keys affected by the respective tables.
If is due to locks then you should see if you can reduce them using nolock where you can actually do that.
if there is a problem of foreign keys .. well you have to wait .. If you do not want to wait though and your application logic does not rely on enforcing the FKs (like sending errors to the application for FK violations, and testing against them) or you feel your application is perfect and then for a short period of time you do not need FKs, then you can disable related FKs prior to deletions with ALTER TABLE xxx NOCHECK CONSTRAINT all and then re enable it.
Off course purists will blame me for the latter but I had been using this a lot of times when need arises.
One way you might want to try is this:
Create a SP.
For each table, delete rows in small batches of some size that works for you (say 10 rows per batch).
Put each batch deletion inside a transaction and add a custom delay between each transaction.
Example:
DECLARE #DeletedRowsCount INT = 1, #BatchSize INT = 300;
WHILE (#DeletedRowsCount> 0) BEGIN
BEGIN TRANSACTION
DELETE TOP (#BatchSize) dbo.Table
FROM dbo.Table
WHERE Id = #PortalId;
SET #DeletedRowsCount = ##ROWCOUNT;
COMMIT;
WAITFOR DELAY '00:00:05';
END
I guess you can do the same without a SP as well.
In fact, it might be better like that.
SqlCommand.CommandTimeout is the short answer. Increase its value.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.commandtimeout.aspx
Note, the Connection Timeout is not the same thing as the CommandTimeout.
...
Do you have an index on "accountNumber" on each table?
You could have a clustered key on the surrogate-key of the table, but not the "accountNumber".
...
Basically, you're gonna have to look at the execution plan (or post the execution plan) here.
But here is some "starter code" for trying an index on that column(s).
if exists (select * from dbo.sysindexes where name = N'IX_myDataTable1_accountNumber' and id = object_id(N'[dbo].[myDataTable1]'))
DROP INDEX [dbo].[myDataTable1].[IX_myDataTable1_accountNumber]
GO
CREATE INDEX [IX_myDataTable1_accountNumber] ON [dbo].[myDataTable1]([accountNumber])
GO
It could be worth switching the database into Read Committed Snapshot mode. This will have a performance impact, how much depends on your application.
In Read Committed Snapshot mode, writers and readers no longer block each other, although writers still block writers. You don't say what sort of activity on the table is getting prevented by the delete, so it's a little hard to say if this will help?
http://msdn.microsoft.com/en-us/library/ms188277(v=sql.105).aspx
Having said that, 3-5 minutes for a deletion on tables with ~10k rows seems absurdly slow. You mention foreign keys, are the foreign keys indexed? If not, deletion can cause table scans on the other end to make sure you're not breaking RI, so maybe check that first? What does SQL Server Profiler say for reads/writes for these deletion queries?
Related
Can we avoid deadlock by creating different database users for different processes
e.g. one user for communicating to API 'ABC' and one user for communicating to API 'PQR' and other user for Processing System data which is brought by API 'ABC' and 'PQR'? And all these user will process same tables.
Deadlocks happen because of different sessions fighting for the same resources (tables, indexes, rows etc), SQL server doesn't care about who is the owner of the sessions, it can be the same users having multiple sessions or multiple users. So creating multiple users solely to avoid deadlocks isn't going to help.
Things that can help.....
Access objects in the same order.
Avoid user interaction in transactions.
Keep transactions short and in one batch.
Use a lower isolation level(With caution).
Use a row versioning-based isolation level.
Set READ_COMMITTED_SNAPSHOT database option ON to enable read-committed transactions to use row versioning.
Use snapshot isolation if possible (be aware it will hammer the hell out of your tempdb).
Have a look at this Minimizing Deadlocks
I guess that would prevent deadlock because you would have different users accessing different processes but that wouldnt really fix a deadlock problem. Deadlock is more where 2 entities are accessing the same piece of data/ the data gets blocked and then no one can finish the transaction. Its more like a catch 22 situation where they are both waiting for the other to finish but they both cant. Creating different users for different processes would prevent deadlock but its not really practical.
A normal approach/best practice would simply be to program the system to use locks so that transactions are locked in a certain order when entities are accessing them. This would prevent any transactions from falling into a deadlock scenario and if one transaction is using data, another trying to access the same piece would be forced to wait for the other to finish before it can proceed.
Personally, you can add a timestamp column to a table to help maintain the integrity of the database when multiple users are updating rows at the same time. You may also want to know how many rows and which rows were updated without re-querying the table.
CREATE TABLE MyTest (myKey int PRIMARY KEY, myValue int, RV rowversion);
Then, you can then use the following sample Transact-SQL statements to implement optimistic concurrency control on the [table-name] table during the update.
DECLARE #t TABLE (myKey int);
UPDATE MyTest
SET myValue = 2
OUTPUT inserted.myKey INTO #t(myKey)
WHERE myKey = 1
AND RV = [row-version-value];
IF (SELECT COUNT(*) FROM #t) = 0
BEGIN
RAISERROR ('error changing row with myKey = %d'
,16 -- Severity.
,1 -- State
,1) -- myKey that was changed
END;
It may not be suitable in all cases, but we try to handle the processing logic in a stored procedure and use 'sp_getapplock' to prevent the procedure transaction from being used simultaneously.
No, first find the deadlock victim look at this article. In most cases its lack of index or bad index causes deadlock...
If you can post ur deadlock details we can suggest a best possible solution.
Based on what you have asked its better to set priority to avoid deadlock.
I have an application connected to a SQL Server 2014 database that combines several rows into one. There are no other connections to this database while the application is running.
First, select a chunk of rows within a specific time span. This query uses a non-clustered seek (TIME column) merged with a clustered lookup.
select ...
from FOO
where TIME >= #from and TIME < #to and ...
Then, we process these rows in c# and write changes as a single update and multiple deletes, this happens many times per chunk. These also use non-clustered index seeks.
begin tran
update FOO set ...
where NON_CLUSTERED_ID = #id
delete FOO where NON_CLUSTERED_ID in (#id1, #id2, #id3, ...)
commit
I am getting deadlocks when running this with multiple parallel chunks. I tried using ROWLOCK for the update and delete but that caused even more deadlocks than before for some reason, even though there are no overlaps between chunks.
Then I tried TABLOCKX, HOLDLOCK on the update, but that means I can't perform my select in parallel so I'm losing the advantages of parallelism.
Any idea how I can avoid deadlocks but still process multiple parallel chunks?
Would it be safe to use NOLOCK on my select in this case, given there is no row overlap between chunks? Then TABLOCKX, HOLDLOCK would only block the update and delete, correct?
Or should I just accept that deadlocks will happen and retry the query in my application?
UPDATE (additional information): All deadlocks so far have happened in the update and delete phase, none in the select. I'll try to get some deadlock logs up if I can't get this solved today (the correct trace flags weren't enabled before).
UPDATE: These are the two arrangements of deadlocks that occur with ROWLOCK, they both refer only to the delete statement and the non-clustered index it uses. I'm not sure if these are the same as the deadlocks that occur without any table hints as I wasn't able to reproduce any of those.
Ask if there's anything else needed from the .xdl, I'm a bit weary of attaching the whole thing.
The general advice regarding deadlocks: make sure you do everything in the same order, i.e. acquire locks in the same order, for different processes.
You can find the same advice in this technical article on microsoft.com regarding Minimizing Deadlocks. There's a good reason it is listed first.
Access objects in the same order.
Avoid user interaction in transactions.
Keep transactions short and in one batch.
Use a lower isolation level.
Use a row versioning-based isolation level.
Set READ_COMMITTED_SNAPSHOT database option ON to enable read-committed transactions to use row versioning.
Use snapshot isolation.
Use bound connections.
Update after question from Cato:
How would acquiring locks in the same order apply here? Have you got any advice on how he would change his SQL to do that?
Deadlocks are always the same, no matter what environment: two processes (say A & B) acquire multiple locks (say X & Y) in a different order so that A is waiting for Y and B is waiting for X while A is holding X and B is holding Y.
It applies here because DELETE and UPDATE statements implicitely acquire locks on the rows or index range or table (depending on what the engine deems appropriate).
You should analyze your process and see if there are scenarios where locks could be acquired in a different order. If that doesn't reveal anything, you can analyze deadlocks using the SQL Server Profiler:
To trace deadlock events, add the Deadlock graph event class to a trace. This event class populates the TextData data column in the trace with XML data about the process and objects that are involved in the deadlock. SQL Server Profiler can extract the XML document to a deadlock XML (.xdl) file which you can view later in SQL Server Management Studio. You can configure SQL Server Profiler to extract Deadlock graph events to a single file that contains all Deadlock graph events, or to separate files.
I'd use sp_getapplock in the updating transaction to prevent multiple instances of this code running in parallel. This will not block the selecting statement as table locking hints do.
You still should program the retrying logic, because it may take a while to acquire the lock, longer than the timeout parameter.
This is how the updating transaction can be wrapped into sp_getapplock.
BEGIN TRANSACTION;
BEGIN TRY
DECLARE #VarLockResult int;
EXEC #VarLockResult = sp_getapplock
#Resource = 'some_unique_name_app_lock',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 60000,
#DbPrincipal = 'public';
IF #VarLockResult >= 0
BEGIN
-- Acquired the lock
update FOO set ...
where NON_CLUSTERED_ID = #id
delete FOO where NON_CLUSTERED_ID in (#id1, #id2, #id3, ...)
END ELSE BEGIN
-- return some error code, so that the caller could retry
END;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
-- handle the error
END CATCH;
The selecting statement doesn't need any changes.
I would recommend against NOLOCK, even though you say that IDs in chunks do not overlap. With this hint the SELECT query can skip some pages that are being changed, it can read some pages twice. It is unlikely that such behavior can be tolerated.
Kindly use get applock in such format in code. The stored procedure sp_getapplock puts the lock on the application resource .
EXEC Sp_getapplock
#Resource = 'storeprocedurename',
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 25000
It is very helpful. Kindly increase LockTimeout to reduce deadlock
I have a table that contains two columns: a resource key, and (very roughly) when it was last accessed.
I have a number of servers that are periodically dumping data about resource accesses to the table. They should either update the access time for a resource key if it already exists, or insert it if it doesn't.
Another server will very rarely generate a report from this table.
I don't require this table to be consistent. I'm okay with the reporting server reading the table in the middle of a dump. If two writing servers try to update the same row, I don't care which gets it's data in.
There are two major questions:
Is what I'm looking for even possible with SQL Server?
If it is possible, I'm potentially going to have multiple servers racing on their 'insert or update' and resulting in primary key constraint violations. Is there any way to resolve this problem?
I'm okay with the reporting server reading the table in the middle of a dump.
Look into the READ COMMITTED SNAPSHOT ISOLATION option. This was introduced in SQL Server 2005 and appears to be available across all editions. It is typically better than using the WITH (NOLOCK) table hint. For more info, check out:
Snapshot Isolation in SQL Server
Understanding Row Versioning-Based Isolation Levels
If two writing servers try to update the same row, I don't care which gets it's data in.
It is not possible for two operations to write the same row at the same time. One will wait.
Regarding two trying to INSERT the same value at the same time, since you don't care which one "wins", just trap and discard the error ;-).
Maybe something along the lines of:
BEGIN TRY
UPDATE tbl
SET tbl.AccessTime = GETDATE()
FROM SchemaName.TableName tbl
WHERE tbl.ResourceKey = #ResourceKey;
IF (##ROWCOUNT = 0)
BEGIN
INSERT INTO SchemaName.TableName (ResourceKey, AccessTime)
VALUES (#ResourceKey, GETDATE());
END;
END TRY
BEGIN CATCH
IF (ERROR_NUMBER() <> 2627) -- 2627 = Violation of PRIMARY KEY constraint
BEGIN
;THROW;
END;
END CATCH;
If you are on SQL Server 2014 (or newer, whenever that happens), then you can look into using:
the WITH DELAYED_DURABILITY = ON option for COMMIT TRAN. Look here for more info: Control Transaction Durability
In-Memory OLTP (64 bit, Enterprise Edition only)
I'm using SQL Server 2008 R2.
I have a view; let's call it view1. This view is complex and slow. It cannot be made into an indexed view because it uses left joins and various other trickery. As such, we created a stored procedure which basically:
obtains an exclusive lock
selects * into computed_view1_tmp from view1; (slow)
creates indexes on the above computed table (slow)
renames computed_view1 to computed_view1_todelete; and does the same for its indexes (assumed fast)
renames computed_view1_tmp to computed_view1; and does the same for its indexes (assumed fast)
drops the table computed_view1_todelete (slow)
releases the lock.
We run this procedure when we know we're changing the data in our web application. We then have other views, such as view2 using computed_view1 instead of view1.
Once in a while, we get:
Invalid object name 'dbo.computed_view1'. Could not use view or
function 'dbo.view2 because of binding errors.
I assume this is because we're trying to access dbo.computed_view1 at the same time as it's being renamed. I assume this is a very short period, but the frequency I am seeing this error in my logs makes me wonder if something else might be at play. I'm getting the error many times per day on a site with about a dozen users active throughout the day.
In development, this procedure takes about five seconds given the amount of data in the view. Renaming is instantaneous. In production, it must be taking longer but I don't understand why. I once saw the procedure fail to obtain the exclusive lock within 90 seconds.
Any thoughts on how to fix or a better solution?
Edit: Extra notes on my locking - maybe I'm not doing this right:
BEGIN TRANSACTION
DECLARE #result int
EXEC #result = sp_getapplock #Resource = 'lock_computed_view1', #LockMode = 'Exclusive', #LockTimeout = 90
IF #result NOT IN ( 0, 1 ) -- Only successful return codes
BEGIN
PRINT #result
RAISERROR ( 'Lock failed to acquire...', 16, 1 )
END
ELSE
BEGIN
// rest of the magic
END
EXEC #result = sp_releaseapplock #Resource = 'lock_computed_view1'
COMMIT TRANSACTION
If you're locking and transaction scope is right I would expect other transactions to wait and never see the view missing. This might be a SQL Server idiosyncrasy that I don't know about.
It is often possible to do without dynamic DDL. Here are two ways to do it:
TRUNCATE the computed table and insert into it. This takes an exclusive automatically. No need to rename. All of this is atomic and supports rollback.
Use a staging table with the same schema. Work on that. So far no service interruption at all. Then, SWITCH PARTITION the staging table with the production table. This is quick and atomic. This does not require Enterprise Edition.
With these approaches the problem is solved by just not renaming.
I am receiving deadlock errors when trying to run a sproc with a delete statement in it. What is happening is that I've got a FK constraint table row that is being updated at the same time that the table row I'm deleting that it is related to.
The data that is being updated in the constraint table is no longer important and retrieval for that data being updated will no longer be access by anyone for any reason, it just so happens that this update and delete can all happen at once. So, I need to the delete to be the principle operation.
What do I need to do to stop a deadlock like this?
DELETE FROM Storefront.Sidelite WHERE ID = #SideliteID;
Below is a screen shot of a Sidelite table and the Size constraint table.
Ok, there are no reads taking place here. The only type things taking place is many updates to the Size table while the Sidelite table is trying to delete a record that it's size is being updated and this is causing a deadlock.
I need to stop all operations to the Size table while a delete takes place in the sidelite table and then, I'll delete the related size record in a trigger.
on the select statement where it is getting the initial value you can utilize with(readuncommitted) or with(nolock) statement. This will however give you dirty reads. Please utilize the following link for more information: Why use a READ UNCOMMITTED isolation level?
1.) SET ALLOW_SNAPSHOT_ISOLATION ON
2.) SET TRANSACTION ISOLATION LEVEL SNAPSHOT
ALTER proc [Storefront].[proc_DeleteSidelite]
#SideliteID INT
AS
BEGIN
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
DECLARE #SizeID INT;
BEGIN TRAN
SELECT #SizeID= sl.SizeID FROM Storefront.Sidelite sl
with(nolock) WHERE sl.ID = #SideliteID
DELETE FROM Storefront.Sidelite WHERE ID = #SideliteID;
DELETE FROM Storefront.Size WHERE ID=#SizeID;
COMMIT TRAN
END;