Delete statement in SQL is very slow - sql

I have statements like this that are timing out:
DELETE FROM [table] WHERE [COL] IN ( '1', '2', '6', '12', '24', '7', '3', '5')
I tried doing one at a time like this:
DELETE FROM [table] WHERE [COL] IN ( '1' )
and so far it's at 22 minutes and still going.
The table has 260,000 rows in it and is four columns.
Does anyone have any ideas why this would be so slow and how to speed it up?
I do have a non-unique, non-clustered index on the [COL] that i'm doing the WHERE on.
I'm using SQL Server 2008 R2
update: I have no triggers on the table.

Things that can cause a delete to be slow:
deleting a lot of records
many indexes
missing indexes on foreign keys in child tables. (thank you to #CesarAlvaradoDiaz for mentioning this in the comments)
deadlocks and blocking
triggers
cascade delete (those ten parent records you are deleting could mean
millions of child records getting deleted)
Transaction log needing to grow
Many Foreign keys to check
So your choices are to find out what is blocking and fix it or run the deletes in off hours when they won't be interfering with the normal production load. You can run the delete in batches (useful if you have triggers, cascade delete, or a large number of records). You can drop and recreate the indexes (best if you can do that in off hours too).

Disable CONSTRAINT
ALTER TABLE [TableName] NOCHECK CONSTRAINT ALL;
Disable Index
ALTER INDEX ALL ON [TableName] DISABLE;
Rebuild Index
ALTER INDEX ALL ON [TableName] REBUILD;
Enable CONSTRAINT
ALTER TABLE [TableName] CHECK CONSTRAINT ALL;
Delete again

Deleting a lot of rows can be very slow. Try to delete a few at a time, like:
delete top (10) YourTable where col in ('1','2','3','4')
while ##rowcount > 0
begin
delete top (10) YourTable where col in ('1','2','3','4')
end

In my case the database statistics had become corrupt. The statement
delete from tablename where col1 = 'v1'
was taking 30 seconds even though there were no matching records but
delete from tablename where col1 = 'rubbish'
ran instantly
running
update statistics tablename
fixed the issue

If the table you are deleting from has BEFORE/AFTER DELETE triggers, something in there could be causing your delay.
Additionally, if you have foreign keys referencing that table, additional UPDATEs or DELETEs may be occurring.

Preventive Action
Check with the help of SQL Profiler for the root cause of this issue. There may be Triggers causing the delay in Execution. It can be anything. Don't forget to Select the Database Name and Object Name while Starting the Trace to exclude scanning unnecessary queries...
Database Name Filtering
Table/Stored Procedure/Trigger Name Filtering
Corrective Action
As you said your table contains 260,000 records...and IN Predicate contains six values. Now, each record is being search 260,000 times for each value in IN Predicate. Instead it should be the Inner Join like below...
Delete K From YourTable1 K
Inner Join YourTable2 T on T.id = K.id
Insert the IN Predicate values into a Temporary Table or Local Variable

It's possible that other tables have FK constraint to your [table].
So the DB needs to check these tables to maintain the referential integrity.
Even if you have all needed indexes corresponding these FKs, check their amount.
I had the situation when NHibernate incorrectly created duplicated FKs on the same columns, but with different names (which is allowed by SQL Server).
It has drastically slowed down running of the DELETE statement.

Check execution plan of this delete statement. Have a look if index seek is used. Also what is data type of col?
If you are using wrong data type, change update statement (like from '1' to 1 or N'1').
If index scan is used consider using some query hint..

If you're deleting all the records in the table rather than a select few it may be much faster to just drop and recreate the table.

Is [COL] really a character field that's holding numbers, or can you get rid of the single-quotes around the values? #Alex is right that IN is slower than =, so if you can do this, you'll be better off:
DELETE FROM [table] WHERE [COL] = '1'
But better still is using numbers rather than strings to find the rows (sql likes numbers):
DELETE FROM [table] WHERE [COL] = 1
Maybe try:
DELETE FROM [table] WHERE CAST([COL] AS INT) = 1
In either event, make sure you have an index on column [COL] to speed up the table scan.

I read this article it was really helpful for troubleshooting any kind of inconveniences
https://support.microsoft.com/en-us/kb/224453
this is a case of waitresource
KEY: 16:72057595075231744 (ab74b4daaf17)
-- First SQL Provider to find the SPID (Session ID)
-- Second Identify problem, check Status, Open_tran, Lastwaittype, waittype, and waittime
-- iMPORTANT Waitresource select * from sys.sysprocesses where spid = 57
select * from sys.databases where database_id=16
-- with Waitresource check this to obtain object id
select * from sys.partitions where hobt_id=72057595075231744
select * from sys.objects where object_id=2105058535

After inspecting an SSIS Package(due to a SQL Server executing commands really slow), that was set up in a client of ours about 5-4 years before the time of me writing this, I found out that there were the below tasks:
1) insert data from an XML file into a table called [Importbarcdes].
2) merge command on an another target table, using as source the above mentioned table.
3) "delete from [Importbarcodes]", to clear the table of the row that was inserted after the XML file was read by the task of the SSIS Package.
After a quick inspection all statements(SELECT, UPDATE, DELETE etc.) on the table ImportBarcodes that had only 1 row, took about 2 minutes to execute.
Extended Events showed a whole lot PAGEIOLATCH_EX wait notifications.
No indexes were present of the table and no triggers were registered.
Upon close inspection of the properties of the table, in the Storage Tab and under general section, the Data Space field showed more than 6 GIGABYTES of space allocated in pages.
What happened:
The query ran for a good portion of time each day for the last 4 years, inserting and deleting data in the table, leaving unused pagefiles behind with out freeing them up.
So, that was the main reason of the wait events that were captured by the Extended Events Session and the slowly executed commands upon the table.
Running ALTER TABLE ImportBarcodes REBUILD fixed the issue freeing up all the unused space. TRUNCATE TABLE ImportBarcodes did a similar thing, with the only difference of deleting all pagefiles and data.

Older topic but one still relevant.
Another issue occurs when an index has become fragmented to the extent of becoming more of a problem than a help. In such a case, the answer would be to rebuild or drop and recreate the index and issuing the delete statement again.

As an extension to Andomar's answer, above, I had a scenario where the first 700,000,000 records (of ~1.2 billion) processed very quickly, with chunks of 25,000 records processing per second (roughly). But, then it starting taking 15 minutes to do a batch of 25,000. I reduced the chunk size down to 5,000 records and it went back to its previous speed. I'm not certain what internal threshold I hit, but the fix was to reduce the number of records, further, to regain the speed.

open CMD and run this commands
NET STOP MSSQLSERVER
NET START MSSQLSERVER
this will restart the SQL Server instance.
try to run again after your delete command
I have this command in a batch script and run it from time to time if I'm encountering problems like this. A normal PC restart will not be the same so restarting the instance is the most effective way if you are encountering some issues with your sql server.

Related

Copying timestamp columns within a Postgres table

I have a table with about 30 million rows in a Postgres 9.4 db. This table has 6 columns, the primary key id, 2 text, one boolean, and two timestamp. There are indices on one of the text columns, and obviously the primary key.
I want to copy the values in the first timestamp column, call it timestamp_a into the second timestamp column, call it timestamp_b. To do this, I ran the following query:
UPDATE my_table SET timestamp_b = timestamp_a;
This worked, but it took an hour and 15 minutes to complete, which seems a really long time to me considering, as far as I know, it's just copying values from one column to the next.
I ran EXPLAIN on the query and nothing seemed particularly inefficient. I then used pgtune to modify my config file, most notably it increased the shared_buffers, work_mem, and maintenance_work_mem.
I re-ran the query and it took essentially the same amount of time, actually slightly longer (an hour and 20 mins).
What else can I do to improve the speed of this update? Is this just how long it takes to write 30 million timestamps into postgres? For context I'm running this on a macbook pro, osx, quadcore, 16 gigs of ram.
The reason this is slow is that internally PostgreSQL doesn't update the field. It actually writes new rows with the new values. This usually takes a similar time to inserting that many values.
If there are indexes on any column this can further slow the update down. Even if they're not on columns being updated, because PostgreSQL has to write a new row and write new index entries to point to that row. HOT updates can help and will do so automatically if available, but that generally only helps if the table is subject to lots of small updates. It's also disabled if any of the fields being updated are indexed.
Since you're basically rewriting the table, if you don't mind locking out all concurrent users while you do it you can do it faster with:
BEGIN
DROP all indexes
UPDATE the table
CREATE all indexes again
COMMIT
PostgreSQL also has an optimisation for writes to tables that've just been TRUNCATEd, but to benefit from that you'd have to copy the data to a temp table, then TRUNCATE and copy it back. So there's no benefit.
#Craig mentioned an optimization for COPY after TRUNCATE: Postgres can skip WAL entries because if the transaction fails, nobody will ever have seen the new table anyway.
The same optimization is true for tables created with CREATE TABLE AS:
What causes large INSERT to slow down and disk usage to explode?
Details are missing in your description, but if you can afford to write a new table (no concurrent transactions get in the way, no dependencies), then the fastest way might be (except if you have big TOAST table entries - basically big columns):
BEGIN;
LOCK TABLE my_table IN SHARE MODE; -- only for concurrent access
SET LOCAL work_mem = '???? MB'; -- just for this transaction
CREATE my_table2
SELECT ..., timestamp_a, timestamp_a AS timestamp_b
-- columns in order, timestamp_a overwrites timestamp_b
FROM my_table
ORDER BY ??; -- optionally cluster table while being at it.
DROP TABLE my_table;
ALTER TABLE my_table2 RENAME TO my_table;
ALTER TABLE my_table
, ADD CONSTRAINT my_table_id_pk PRIMARY KEY (id);
-- more constraints, indices, triggers?
-- recreate views etc. if any
COMMIT;
The additional benefit: a pristine (optionally clustered) table without bloat. Related:
Best way to populate a new column in a large table?

DELETE SQL with correlated subquery for table with 42 million rows?

I have a table cats with 42,795,120 rows.
Apparently this is a lot of rows. So when I do:
/* owner_cats is a many-to-many join table */
DELETE FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
the query times out :(
(edit: I need to increase my CommandTimeout value, default is only 30 seconds)
I can't use TRUNCATE TABLE cats because I don't want to blow away cats from other owners.
I'm using SQL Server 2005 with "Recovery model" set to "Simple."
So, I thought about doing something like this (executing this SQL from an application btw):
DELETE TOP (25) PERCENT FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
DELETE TOP(50) PERCENT FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
DELETE FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
My question is: what is the threshold of the number of rows I can DELETE in SQL Server 2005?
Or, if my approach is not optimal, please suggest a better approach. Thanks.
This post didn't help me enough:
SQL Server Efficiently dropping a group of rows with millions and millions of rows
EDIT (8/6/2010):
Okay, I just realized after reading the above link again that I did not have indexes on these tables. Also, some of you have already pointed out that issue in the comments below. Keep in mind this is a fictitious schema, so even id_cat is not a PK, because in my real life schema, it's not a unique field.
I will put indexes on:
cats.id_cat
owner_cats.id_cat
owner_cats.id_owner
I guess I'm still getting the hang of this data warehousing, and obviously I need indexes on all the JOIN fields right?
However, it takes hours for me to do this batch load process. I'm already doing it as a SqlBulkCopy (in chunks, not 42 mil all at once). I have some indexes and PKs. I read the following posts which confirms my theory that the indexes are slowing down even a bulk copy:
SqlBulkCopy slow as molasses
What’s the fastest way to bulk insert a lot of data in SQL Server (C# client)
So I'm going to DROP my indexes before the copy and then re CREATE them when it's done.
Because of the long load times, it's going to take me awhile to test these suggestions. I'll report back with the results.
UPDATE (8/7/2010):
Tom suggested:
DELETE
FROM cats c
WHERE EXISTS (SELECT 1
FROM owner_cats o
WHERE o.id_cat = c.id_cat
AND o.id_owner = 1)
And still with no indexes, for 42 million rows, it took 13:21 min:sec versus 22:08 with the way described above. However, for 13 million rows, took him 2:13 versus 2:10 my old way. It's a neat idea, but I still need to use indexes!
Update (8/8/2010):
Something is terribly wrong! Now with the indexes on, my first delete query above took 1:9 hrs:min (yes an hour!) versus 22:08 min:sec and 13:21 min:sec versus 2:10 min:sec for 42 mil rows and 13 mil rows respectively. I'm going to try Tom's query with the indexes now, but this is heading in the wrong direction. Please help.
Update (8/9/2010):
Tom's delete took 1:06 hrs:min for 42 mil rows and 10:50 min:sec for 13 mil rows with indexes versus 13:21 min:sec and 2:13 min:sec respectively. Deletes are taking longer on my database when I use indexes by an order of magnitude! I think I know why, my database .mdf and .ldf grew from 3.5 GB to 40.6 GB during the first (42 mil) delete! What am I doing wrong?
Update (8/10/2010):
For lack of any other options, I have come up with what I feel is a lackluster solution (hopefully temporary):
Increase timeout for database connection to 1 hour (CommandTimeout=60000; default was 30 sec)
Use Tom's query: DELETE FROM WHERE EXISTS (SELECT 1 ...) because it performed a little faster
DROP all indexes and PKs before running delete statement (???)
Run DELETE statement
CREATE all indexes and PKs
Seems crazy, but at least it's faster than using TRUNCATE and starting over my load from the beginning with the first owner_id, because one of my owner_id takes 2:30 hrs:min to load versus 17:22 min:sec for the delete process I just described with 42 mil rows. (Note: if my load process throws an exception, I start over for that owner_id, but I don't want to blow away previous owner_id, so I don't want to TRUNCATE the owner_cats table, which is why I'm trying to use DELETE.)
Anymore help would still be appreciated :)
There is no practical threshold. It depends on what your command timeout is set to on your connection.
Keep in mind that the time it takes to delete all of these rows is contingent upon:
The time it takes to find the rows of interest
The time it takes to log the transaction in the transaction log
The time it takes to delete the index entries of interest
The time it takes to delete the actual rows of interest
The time it takes to wait for other processes to stop using the table so you can acquire what in this case will most likely be an exclusive table lock
The last point may often be the most significant. Do an sp_who2 command in another query window to make sure that there isn't lock contention going on, preventing your command from executing.
Improperly configured SQL Servers will do poorly at this type of query. Transaction logs which are too small and/or share the same disks as the data files will often incur severe performance penalties when working with large rows.
As for a solution, well, like all things, it depends. Is this something you intend to be doing often? Depending on how many rows you have left, the fastest way might be to rebuild the table as another name and then rename it and recreate its constraints, all inside a transaction. If this is just an ad-hoc thing, make sure your ADO CommandTimeout is set high enough and you can just bear the cost of this big delete.
If the delete will remove "a significant number" of rows from the table, this can be an alternative to a DELETE: put the records to keep somewhere else, truncate the original table, put back the 'keepers'. Something like:
SELECT *
INTO #cats_to_keep
FROM cats
WHERE cats.id_cat NOT IN ( -- note the NOT
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
TRUNCATE TABLE cats
INSERT INTO cats
SELECT * FROM #cats_to_keep
Have you tried no Subquery and use a join instead?
DELETE cats
FROM
cats c
INNER JOIN owner_cats oc
on c.id_cat = oc.id_cat
WHERE
id_owner =1
And if you have have you also tried different Join hints e.g.
DELETE cats
FROM
cats c
INNER HASH JOIN owner_cats oc
on c.id_cat = oc.id_cat
WHERE
id_owner =1
If you use an EXISTS rather than an IN, you should get much better performance. Try this:
DELETE
FROM cats c
WHERE EXISTS (SELECT 1
FROM owner_cats o
WHERE o.id_cat = c.id_cat
AND o.id_owner = 1)
There's no threshold as such - you can DELETE all the rows from any table given enough transaction log space - which is where your query is most likely falling over. If you're getting some results from your DELETE TOP (n) PERCENT FROM cats WHERE ... then you can wrap it in a loop as below:
SELECT 1
WHILE ##ROWCOUNT <> 0
BEGIN
DELETE TOP (somevalue) PERCENT FROM cats
WHERE cats.id_cat IN (
SELECT owner_cats.id_cat FROM owner_cats
WHERE owner_cats.id_owner = 1)
END
As others have mentioned, when you delete 42 million rows, the db has to log 42 million deletions against the database. Thus, the transaction log has to grow substantially. What you might try is to break up the delete into chunks. In the following query, I use the NTile ranking function to break up the rows into 100 buckets. If that is too slow, you can expand the number of buckets so that each delete is smaller. It will help tremendously if there is an index on owner_cats.id_owner, owner_cats.id_cats and cats.id_cat (which I assumed the primary key and numeric).
Declare #Cats Cursor
Declare #CatId int --assuming an integer PK here
Declare #Start int
Declare #End int
Declare #GroupCount int
Set #GroupCount = 100
Set #Cats = Cursor Fast_Forward For
With CatHerd As
(
Select cats.id_cat
, NTile(#GroupCount) Over ( Order By cats.id_cat ) As Grp
From cats
Join owner_cats
On owner_cats.id_cat = cats.id_cat
Where owner_cats.id_owner = 1
)
Select Grp, Min(id_cat) As MinCat, Max(id_cat) As MaxCat
From CatHerd
Group By Grp
Open #Cats
Fetch Next From #Cats Into #CatId, #Start, #End
While ##Fetch_Status = 0
Begin
Delete cats
Where id_cat Between #Start And #End
Fetch Next From #Cats Into #CatId, #Start, #End
End
Close #Cats
Deallocate #Cats
The notable catch with the above approach is that it is not transactional. Thus, if it fails on the 40th chunk, you will have deleted 40% of the rows and the other 60% will still exist.
Might be worth trying MERGE e.g.
MERGE INTO cats
USING owner_cats
ON cats.id_cat = owner_cats.id_cat
AND owner_cats.id_owner = 1
WHEN MATCHED THEN DELETE;
<Edit> (9/28/2011)
My answer performs basically the same way as Thomas' solution (Aug 6 '10). I missed it when I posted my answer because it he uses an actual CURSOR so I thought to myself "bad" because of the # of records involved. However, when I reread his answer just now I realize that the WAY he uses the cursor is actually "good". Very clever. I just voted up his answer and will probably use his approach in the future. If you don't understand why, take a look at it again. If you still can't see it, post a comment on this answer and I will come back and try to explain in detail. I decided to leave my answer because someone may have a DBA who refuses to let them use an actual CURSOR regardless of how "good" it is. :-)
</Edit>
I realize that this question is a year old but I recently had a similar situation. I was trying to do "bulk" updates to a large table with a join to a different table, also fairly large. The problem was that the join was resulting in so many "joined records" that it took too long to process and could have led to contention problems. Since this was a one-time update I came up with the following "hack." I created a WHILE LOOP that went through the table to be updated and picked 50,000 records to update at a time. It looked something like this:
DECLARE #RecId bigint
DECLARE #NumRecs bigint
SET #NumRecs = (SELECT MAX(Id) FROM [TableToUpdate])
SET #RecId = 1
WHILE #RecId < #NumRecs
BEGIN
UPDATE [TableToUpdate]
SET UpdatedOn = GETDATE(),
SomeColumn = t2.[ColumnInTable2]
FROM [TableToUpdate] t
INNER JOIN [Table2] t2 ON t2.Name = t.DBAName
AND ISNULL(t.PhoneNumber,'') = t2.PhoneNumber
AND ISNULL(t.FaxNumber, '') = t2.FaxNumber
LEFT JOIN [Address] d ON d.AddressId = t.DbaAddressId
AND ISNULL(d.Address1,'') = t2.DBAAddress1
AND ISNULL(d.[State],'') = t2.DBAState
AND ISNULL(d.PostalCode,'') = t2.DBAPostalCode
WHERE t.Id BETWEEN #RecId AND (#RecId + 49999)
SET #RecId = #RecId + 50000
END
Nothing fancy but it got the job done. Because it was only processing 50,000 records at a time, any locks that got created were short lived. Also, the optimizer realized that it did not have to do the entire table so it did a better job of picking an execution plan.
<Edit> (9/28/2011)
There is a HUGE caveat to the suggestion that has been mentioned here more than once and is posted all over the place around the web regarding copying the "good" records to a different table, doing a TRUNCATE (or DROP and reCREATE, or DROP and rename) and then repopulating the table.
You cannot do this if the table is the PK table in a PK-FK relationship (or other CONSTRAINT). Granted, you could DROP the relationship, do the clean up, and re-establish the relationship, but you would have to clean up the FK table, too. You can do that BEFORE re-establishing the relationship, which means more "down-time", or you can choose to not ENFORCE the CONSTRAINT on creation and clean up afterwards. I guess you could also clean up the FK table BEFORE you clean up the PK table. Bottom line is that you have to explicitly clean up the FK table, one way or the other.
My answer is a hybrid SET-based/quasi-CURSOR process. Another benefit of this method is that if the PK-FK relationship is setup to CASCADE DELETES you don't have to do the clean up I mention above because the server will take care of it for you. If your company/DBA discourage cascading deletes, you can ask that it be enabled only while this process is running and then disabled when it is finished. Depending on the permission levels of the account that runs the clean up, the ALTER statements to enable/disable cascading deletes can be tacked onto the beginning and the end of the SQL statement.
</Edit>
Bill Karwin's answer to another question applies to my situation also:
"If your DELETE is intended to eliminate a great majority of the rows in that table, one thing that people often do is copy just the rows you want to keep to a duplicate table, and then use DROP TABLE or TRUNCATE to wipe out the original table much more quickly."
Matt in this answer says it this way:
"If offline and deleting a large %, may make sense to just build a new table with data to keep, drop the old table, and rename."
ammoQ in this answer (from the same question) recommends (paraphrased):
issue a table lock when deleting a large amount of rows
put indexes on any foreign key columns

SQL Server DELETE is slower with indexes

I have an SQL Server 2005 database, and I tried putting indexes on the appropriate fields in order to speed up the DELETE of records from a table with millions of rows (big_table has only 3 columns), but now the DELETE execution time is even longer! (1 hour versus 13 min for example)
I have a relationship between to tables, and the column that I filter my DELETE by is in the other table. For example
DELETE FROM big_table
WHERE big_table.id_product IN (
SELECT small_table.id_product FROM small_table
WHERE small_table.id_category = 1)
Btw, I've also tried:
DELETE FROM big_table
WHERE EXISTS
(SELECT 1 FROM small_table
WHERE small_table.id_product = big_table.id_product
AND small_table.id_category = 1)
and while it seems to run slightly faster than the first, it's still a lot slower with the indexes than without.
I created indexes on these fields:
big_table.id_product
small_table.id_product
small_table.id_category
My .ldf file grows a lot during the DELETE.
Why are my DELETE queries slower when I have indexes on my tables? I thought they were supposed to run faster.
UPDATE
Okay, consensus seems to be indexes will slow down a huge DELETE becuase the index has to be updated. Although, I still don't understand why it can't DELETE all the rows all at once, and just update the index once at the end.
I was under the impression from some of my reading that indexes would speed up DELETE by making searches for fields in the WHERE clause faster.
Odetocode.com says:
"Indexes work just as well when searching for a record in DELETE and UPDATE commands as they do for SELECT statements."
But later in the article, it says that too many indexes can hurt performance.
Answers to bobs questions:
55 million rows in table
42 million rows being deleted
Similar SELECT statement would not run (Exception of type 'System.OutOfMemoryException' was thrown)
I tried the following 2 queries:
SELECT * FROM big_table
WHERE big_table.id_product IN (
SELECT small_table.id_product FROM small_table
WHERE small_table.id_category = 1)
SELECT * FROM big_table
INNER JOIN small_table
ON small_table.id_product = big_table.id_product
WHERE small_table.id_category = 1
Both failed after running for 25 min with this error message from SQL Server 2005:
An error occurred while executing batch. Error message is: Exception of type 'System.OutOfMemoryException' was thrown.
The database server is an older dual core Xeon machine with 7.5 GB ram. It's my toy test database :) so it's not running anything else.
Do I need to do something special with my indexes after I CREATE them to make them work properly?
Indexes make lookups faster - like the index at the back of a book.
Operations that change the data (like a DELETE) are slower, as they involve manipulating the indexes. Consider the same index at the back of the book. You have more work to do if you add, remove or change pages because you have to also update the index.
I Agree with Bobs comment above - if you are deleting large volumes of data from large tables deleting the indices can take a while on top of deleting the data its the cost of doing business though. As it deletes all the data out you are causing reindexing events to happen.
With regards to the logfile growth; if you arent doing anything with your logfiles you could switch to Simple logging; but i urge you to read up on the impact that might have on your IT department before you change.
If you need to do the delete in real time; its often a good work around to flag the data as inactive either directly on the table or in another table and exclude that data from queries; then come back later and delete the data when the users aren't staring at an hourglass. There is a second reason for covering this; if you are deleting lots of data out of the table (which is what i am supposing based on your logfile issue) then you will likely want to do an indexdefrag to reorgnaise the index; doing that out of hours is the way to go if you dont like users on the phone !
JohnB is deleting about 75% of the data. I think the following would have been a possible solution and probably one of the faster ones. Instead of deleting the data, create a new table and insert the data that you need to keep. Create the indexes on that new table after inserting the data. Now drop the old table and rename the new one to the same name as the old one.
The above of course assumes that sufficient disk space is available to temporarily store the duplicated data.
Try something like this to avoid bulk delete (and thereby avoid log file growth)
declare #continue bit = 1
-- delete all ids not between starting and ending ids
while #continue = 1
begin
set #continue = 0
delete top (10000) u
from <tablename> u WITH (READPAST)
where <condition>
if ##ROWCOUNT > 0
set #continue = 1
end
You can also try TSQL extension to DELETE syntax and check whether it improves performance:
DELETE FROM big_table
FROM big_table AS b
INNER JOIN small_table AS s ON (s.id_product = b.id_product)
WHERE s.id_category =1

Improving performance of Sql Delete

We have a query to remove some rows from the table based on an id field (primary key). It is a pretty straightforward query:
delete all from OUR_TABLE where ID in (123, 345, ...)
The problem is no.of ids can be huge (Eg. 70k), so the query takes a long time. Is there any way to optimize this?
(We are using sybase - if that matters).
There are two ways to make statements like this one perform:
Create a new table and copy all but the rows to delete. Swap the tables afterwards (alter table name ...) I suggest to give it a try even when it sounds stupid. Some databases are much faster at copying than at deleting.
Partition your tables. Create N tables and use a view to join them into one. Sort the rows into different tables grouped by the delete criterion. The idea is to drop a whole table instead of deleting individual rows.
Consider running this in batches. A loop running 1000 records at a time may be much faster than one query that does everything and in addition will not keep the table locked out to other users for as long at a stretch.
If you have cascade delete (and lots of foreign key tables affected) or triggers involved, you may need to run in even smaller batches. You'll have to experiement to see which is the best number for your situation. I've had tables where I had to delete in batches of 100 and others where 50000 worked (fortunate in that case as I was deleting a million records).
But in any even I would put my key values that I intend to delete into a temp table and delete from there.
I'm wondering if parsing an IN clause with 70K items in it is a problem. Have you tried a temp table with a join instead?
Can Sybase handle 70K arguments in IN clause? All databases I worked with have some limit on number of arguments for IN clause. For example, Oracle have limit around 1000.
Can you create subselect instead of IN clause? That will shorten sql. Maybe that could help for such a big number of values in IN clause. Something like this:
DELETE FROM OUR_TABLE WHERE ID IN
(SELECT ID FROM somewhere WHERE some_condition)
Deleting large number of records can be sped up with some interventions in database, if database model permits. Here are some strategies:
you can speed things up by dropping indexes, deleting records and recreating indexes again. This will eliminate rebalancing index trees while deleting records.
drop all indexes on table
delete records
recreate indexes
if you have lots of relations to this table, try disabling constraints if you are absolutely sure that delete command will not break any integrity constraint. Delete will go much faster because database won't be checking integrity. Enable constraints after delete.
disable integrity constraints, disable check constraints
delete records
enable constraints
disable triggers on table, if you have any and if your business rules allow that. Delete records, then enable triggers.
last, do as other suggested - make a copy of the table that contains rows that are not to be deleted, then drop original, rename copy and recreate integrity constraints, if there are any.
I would try combination of 1, 2 and 3. If that does not work, then 4. If everything is slow, I would look for bigger box - more memory, faster disks.
Find out what is using up the performance!
In many cases you might use one of the solutions provided. But there might be others (based on Oracle knowledge, so things will be different on other databases. Edit: just saw that you mentioned sybase):
Do you have foreign keys on that table? Makes sure the referring ids are indexed
Do you have indexes on that table? It might be that droping before delete and recreating after the delete might be faster.
check the execution plan. Is it using an index where a full table scan might be faster? Or the other way round? HINTS might help
instead of a select into new_table as suggested above a create table as select might be even faster.
But remember: Find out what is using up the performance first.
When you are using DDL statements make sure you understand and accept the consequences it might have on transactions and backups.
Try sorting the ID you are passing into "in" in the same order as the table, or index is stored in. You may then get more hits on the disk cache.
Putting the ID to be deleted into a temp table that has the Ids sorted in the same order as the main table, may let the database do a simple scanned over the main table.
You could try using more then one connection and spiting the work over the connections so as to use all the CPUs on the database server, however think about what locks will be taken out etc first.
I also think that the temp table is likely the best solution.
If you were to do a "delete from .. where ID in (select id from ...)" it can still be slow with large queries, though. I thus suggest that you delete using a join - many people don't know about that functionality.
So, given this example table:
-- set up tables for this example
if exists (select id from sysobjects where name = 'OurTable' and type = 'U')
drop table OurTable
go
create table OurTable (ID integer primary key not null)
go
insert into OurTable (ID) values (1)
insert into OurTable (ID) values (2)
insert into OurTable (ID) values (3)
insert into OurTable (ID) values (4)
go
We can then write our delete code as follows:
create table #IDsToDelete (ID integer not null)
go
insert into #IDsToDelete (ID) values (2)
insert into #IDsToDelete (ID) values (3)
go
-- ... etc ...
-- Now do the delete - notice that we aren't using 'from'
-- in the usual place for this delete
delete OurTable from #IDsToDelete
where OurTable.ID = #IDsToDelete.ID
go
drop table #IDsToDelete
go
-- This returns only items 1 and 4
select * from OurTable order by ID
go
Does our_table have a reference on delete cascade?

Multi Rows Deletion from table in SQL Server

How I can Delete 1.5 Millions Rows From SQL Server 2000, And how much time it will take to complete this task.
I dont want to delete all records from table.... I just want to delete all records which are fullfilling WHERE condition.
EDITED from a comment to an answer below.
"I fire the same query i.e. delete from table_name with Where Clause... Is it possible to Disable Indexing at the running Query, becuase Query is going on from past 20 hr.. Also help me out how i can disable Indexing.."
If (and only if) you want to delete all of the records in a table, you can use DROP TABLE or TRUNCATE TABLE.
DELETE removes one record at a time and records an entry in the transaction log for each deleted row.
TRUNCATE TABLE is much faster because it doesn't record the activity in the transaction log. It removes all rows from a table, but the table structure & its columns, constraints, indexes and so on remain. DROP TABLE would remove those.
Use caution if you decide to TRUNCATE. It's irreversible (unless you have a backup).
create a second table, inserting all rows from the first that you don't want deleting.
delete the first table
rename the second table to be the first
(or a variation on the above)
This can often be quicker than doing a delete of selected records from a big table.
You may want to try deleting in batches too. I just tested this on a table I have and the delete operation went from 13 seconds to 3 seconds.
While Exists(Select * From YourTable Where YourCondition = True)
Delete Top (100000)
From YourTable
Where YourCondition = True
I don't think you can use the TOP predicate if you are running SQL2000, but it works with SQL2005 and up. If you are using SQL2000, then you can use this syntax instead:
Set RowCount 100000
While Exists(Select * From YourTable Where YourCondition = True)
Delete
From YourTable
Where YourCondition = True
DELETE FROM table WHERE a=b;
When deleting that many rows you may want to disable the indexes so they don't get updated on every delete. Rewriting the indexes on every deletion will significantly slow down the whole process.
You'll want to disable these indexes before beginning your deletion or else there may be table locks already in place.
--Disable Index
ALTER INDEX [IX_MyIndex] ON MyTable.MyColumn DISABLE
--Enable Index
ALTER INDEX [IX_MyIndex] ON MyTable.MyColumn REBUILD
If you wish to remove all entries in a table you can use TRUNCATE.
Does the table you are deleting from have multiple foreign keys, or cascaded deletes or triggers? All of these will impact performance.
Depending on what you want to do and the transactional integrity, can you delete things in small batches e.g. if you are trying to delete 1.5 million records that is 1 years worth of data, can you do it 1 week at a time?
Delete from table where condition for those 1.5 million rows
The time depends.
On Oracle it is also possible to use
truncate table <table>
Not sure if that is standard SQL or available in SQL Server. It will however clear the whole table - but then it is quicker than "delete from " (it will also conduct a commit).
TRUNCATE will also ignore any referential integrity or triggers on the table. DELETE FROM ... WHERE will respect both. The time will depend on the indexing of your condition columns, your hardware, and any additional system load.
The delete SQL is exactly the same as a normal SQL delete
delete from table where [your condition ]
However if your worried about time then I'll assume your question is a little deeper than this. If your table is has a significant number of non-clustered indexes then in some circumstances it may be faster to drop all these indexes first and rebuild after the delete. This is unusual but in cases where your straightforward delete is vulnerable to timeout issues it may be helpful
CREATE TABLE new_table as select <data you want to keep> from old_table;
index new_table
grant on new table
add constraints on new_table
etc on new_table
drop table old_table
rename new_table to old_table;