SQL Server - Poor performance of PK delete - sql

I have a table in SQL Server 2008 R2 consisting of about 400 rows (pretty much nothing) - it has a clustered index on the primary key (which is an identity). The table is referenced via referential integrity (no cascade delete or update) by about 13 other tables.
Inserts/Updates/Gets are almost instant - we're talking a split second (as should be expected). However, a delete using the PK takes as long as 3 minutes and I've never seen it faster than 1.5 minutes:
DELETE FROM [TABLE] WHERE [TABLE].[PK_WITH_CLUSTERED_INDEX] = 1
The index was heavily fragmented - 90%. I rebuilt and reorganized that index (along with the rest on that table), but I can't get it below 50%.
Additionally, I did a backup/restore of the database to my local PC and I have no issues with deleting - less than a second.
The one thing I have not done is delete the clustered index entirely and re-add it. That, in and of itself is a problem, because SQL Server does not allow you to drop a PK index when it is referenced by other tables.
Any ideas?
Update
I should have included this in my original post. The execution plan places 'blame' on the clustered index delete - 70%. Of the 13 tables that reference this table, the execution plan says that none exceed more than 3% of the overall query - almost all hit on index seeks.

If you delete a row, the database must check that none of the 13 tables references that row. Are there sufficient indexes on the foreign key columns on those other tables that reference the table you are deleting from?

Well, I have an answer...
First off, I pretty much exhausted all options indicated in the question above along with the associating answers. I had no luck with what seemed like a trivial problem.
What I decided to do was the following:
Add a temporary unique index (so SQL
Server would allow me to delete the
clustered index)
Delete the clustered index.
Re-add the clustered index.
Delete temporary the unique index.
Essentially, I wiped and re-added the clustered index. The only thing I'm able to take away from this is that perhaps part of the index or where it was physically stored was 'corrupted' (I use that term loosely).

Maybe the table is locked by another time-consuming process in production.

Another thought, is there a delete trigger on the table? Could it be causing the issue?

Related

Fast deletion of many rows in data warehouse data

In SQL Server 2008 I have some million rows of data which needs be deleted. They are scattered across a handful of tables. Deletion takes up to 20 seconds which I think is way to slow! The data to be deleted is identified by a timestamp column. Here is what I have done so far in order to optimize:
Using isolation level read uncommitted. I don't care about transactions. If we fail the user will issue the delete operation again. And new data is ensured not to have the timestamp we are deleting.
Deleting leaf tables before parent tables.
The timestamp column is part of the PK clustered index, in fact its the first position of the PK/index.
Each table is emptied using a loop which deletes top 200000 entries in order to reduce the transaction log overhead.
Neither I/O nor CPU is maxed out on the server
What have I overlooked?
Also I am in doubt of the effect of moving the timestamp column to the first position in the PK. After doing so, must I reorganize the tables or is SQL Server smart enough to do this itself. My understanding of clustered index is that since it defines the physical layout of the rows, it is force into reorganizing the data. But we have no complaints from the customer that the changing clustered index operation took a long time to perform.
Please make sure the tables you want to delete data from has "primary key" specifically indicated.
Wrong: create table myTable (ID int)
True: create table myTable (ID int PRIMARY KEY)
In addition to that, please try to add "option (recompile)", which will help the performance:
DELETE FROM myTable
WHERE timestamp in (select timestamp from other_table)
OPTION (RECOMPILE)

Performance difference between Primary Key and Unique Clustered Index in SQL Server

We have 2 tables that have about 40M rows. the size of database is about 20GB, most are for these 2 tables. Everyday, We need delete some data, i.e. about 10M rows. So, we are using batch delete to keep the log file within certain size.
originally, there is no primary key for the table. But has unique, clustered index for each table. the delete takes for ever. i.e. it takes about 2-3 hours to delete 500K rows on a virtual machine. * before delete, the index was rebuilt.
now, we converted the unique, clustered index to primary key. it takes about 20-30 minutes to delete 2M rows.
I understand there is difference between primary key and unique clustered index, but why the performance is so different?
anyone has some insight?
thanks
Rolling my 8-Ball: if you declared a non-clustered primary key (as it seems to suggest from your post) then on each batch you would very likely hit the index tipping point. Thus each batch would do a full scan of 40M rows to delete the batch size. Then, on the next batch, again a full scan. And so on until your 10M would be deleted. With a clustered key the batches should scan only the actual rows being deleted (of course I assume your batch delete criteria would actually use the clustered key...). As you see, there are many unknowns when one starts guessing...
But ultimately... you have a performance question and you should investigate it using the performance troubleshooting techniques. Capture the execution plans, the wait stats, the statistics io. Follow a methodology like Waits and Queues. Measure. Don't listen to guesses from someone on the internet that just rolled an 8-Ball...
You can try to remove the index prior to delete and then re-add it back after. If I'm not mistaken, the index would be re-organized after each delete; which takes the extra time.
I imagine it could be something like your index was very fragmented before one delete operation but not before another. How fragmented was the clustered unique index? You could see if there is still a difference in runtime after doing a rebuild on all indexes before the delete with something like ALTER INDEX ALL ON blah REBUILD
What options did you use when creating your unique clustered index (specifically what are the following set to: PAD_INDEX, STATISTICS_NORECOMPUTE, SORT_IN_TEMPDB, IGNORE_DUP_KEY, ALLOW_ROW_LOCKS, and ALLOW_PAGE_LOCKS)?

Table fragmentation in SQL server

Anybody have an idea about Table fragmentation in SQL Server(not Index fragmentation). We have a table, this is the main table and its not storing any data permently, data come here and goes out continusly. There is no index on this because only insert and delete statements are running frequently. Recently we faced a huge delay for the response from this table. If we select anything it tooks more than 2 to 5 minuts to return result,even there is very few datas. At last we delete and recreate this table and now it's working very fine. Appreciate If any comments,how this is happining?
Joseph
A table without a clustered index is called a heap. A heap can be fragmented too.
Performance would probably improve if you added a clustered index with an auto-increment primary key. A clustered index does not slow deletes or inserts (to the contrary.) In addition, the scheduled routine maintenance will keep tables with a clustered index defragmented.
If you are selecting parts of the data from the table, then it may well be beneficial to have appropriate indexes on the table.

Slow bulk insert for table with many indexes

I try to insert millions of records into a table that has more than 20 indexes.
In the last run it took more than 4 hours per 100.000 rows, and the query was cancelled after 3½ days...
Do you have any suggestions about how to speed this up.
(I suspect the many indexes to be the cause. If you also think so, how can I automatically drop indexes before the operation, and then create the same indexes afterwards again?)
Extra info:
The space used by the indexes is about 4 times the space used by the data alone
The inserts are wrapped in a transaction per 100.000 rows.
Update on status:
The accepted answer helped me make it much faster.
You can disable and enable the indexes. Note that disabling them can have unwanted side-effects (such as having duplicate primary keys or unique indices etc.) which will only be found when re-enabling the indexes.
--Disable Index
ALTER INDEX [IXYourIndex] ON YourTable DISABLE
GO
--Enable Index
ALTER INDEX [IXYourIndex] ON YourTable REBUILD
GO
This sounds like a data warehouse operation.
It would be normal to drop the indexes before the insert and rebuild them afterwards.
When you rebuild the indexes, build the clustered index first, and conversely drop it last. They should all have fillfactor 100%.
Code should be something like this
if object_id('Index') is not null drop table IndexList
select name into Index from dbo.sysindexes where id = object_id('Fact')
if exists (select name from Index where name = 'id1') drop index Fact.id1
if exists (select name from Index where name = 'id2') drop index Fact.id2
if exists (select name from Index where name = 'id3') drop index Fact.id3
.
.
BIG INSERT
RECREATE THE INDEXES
As noted by another answer disabling indexes will be a very good start.
4 hours per 100.000 rows
[...]
The inserts are wrapped in a transaction per 100.000 rows.
You should look at reducing the number, the server has to maintain a huge amount of state while in a transaction (so it can be rolled back), this (along with the indexes) means adding data is very hard work.
Why not wrap each insert statement in its own transaction?
Also look at the nature of the SQL you are using, are you adding one row per statement (and network roundtrip), or adding many?
Disabling and then re-enabling indices is frequently suggested in those cases. I have my doubts about this approach though, because:
(1) The application's DB user needs schema alteration privileges, which it normally should not possess.
(2) The chosen insert approach and/or index schema might be less then optimal in the first place, otherwise rebuilding complete index trees should not be faster then some decent batch-inserting (e.g. the client issuing one insert statement at a time, causing thousands of server-roundtrips; or a poor choice on the clustered index, leading to constant index node splits).
That's why my suggestions look a little bit different:
Increase ADO.NET BatchSize
Choose the target table's clustered index wisely, so that inserts won't lead to clustered index node splits. Usually an identity column is a good choice
Let the client insert into a temporary heap table first (heap tables don't have any clustered index); then, issue one big "insert-into-select" statement to push all that staging table data into the actual target table
Apply SqlBulkCopy
Decrease transaction logging by choosing bulk-logged recovery model
You might find more detailled information in this article.

DELETE Statement hangs on SQL Server for no apparent reason

Edit: Solved, there was a trigger with a loop on the table (read my own answer further below).
We have a simple delete statement that looks like this:
DELETE FROM tablename WHERE pk = 12345
This just hangs, no timeout, no nothing.
We've looked at the execution plan, and it consists of many lookups on related tables to ensure no foreign keys would trip up the delete, but we've verified that none of those other tables have any rows referring to that particular row.
There is no other user connected to the database at this time.
We've run DBCC CHECKDB against it, and it reports 0 errors.
Looking at the results of sp_who and sp_lock while the query is hanging, I notice that my spid has plenty of PAG and KEY locks, as well as the occasional TAB lock.
The table has 1.777.621 rows, and yes, pk is the primary key, so it's a single row delete based on index. There is no table scan in the execution plan, though I notice that it contains something that says Table Spool (Eager Spool), but says Estimated number of rows 1. Can this actually be a table-scan in disguise? It only says it looks at the primary key column.
Tried DBCC DBREINDEX and UPDATE STATISTICS on the table. Both completed within reasonable time.
There is unfortunately a high number of indexes on this particular table. It is the core table in our system, with plenty of columns, and references, both outgoing and incoming. The exact number is 48 indexes + the primary key clustered index.
What else should we look at?
Note also that this table did not have this problem before, this problem occured suddently today. We also have many databases with the same table setup (copies of customer databases), and they behave as expected, it's just this one that is problematic.
One piece of information missing is the number of indices on the table you are deleting the data from. As SQL Server uses the Primary Key as a pointer in every index, any change to the primary index requires updating every index. Though, unless we are talking a high number, this shouldn't be an issue.
I am guessing, from your description, that this is a primary table in the database, referenced by many other tables in FK relationships. This would account for the large number of locks as it checks the rest of the tables for references. And, if you have cascading deletes turned on, this could lead to a delete in table a requiring checks several tables deep.
Try recreating the index on that table, and try regenerating the statistics.
DBCC REINDEX
UPDATE STATISTICS
Ok, this is embarrasing.
A collegue had added a trigger to that table a while ago, and the trigger had a bug. Although he had fixed the bug, the trigger had never been recreated for that table.
So the server was actually doing nothing, it just did it a huge number of times.
Oh well...
Thanks for the eyeballs to everyone who read this and pondered the problem.
I'm going to accept Josef's answer, as his was the closest, and indirectly thouched upon the issue with the cascading deletes.