Improving performance of Sql Delete - sql

We have a query to remove some rows from the table based on an id field (primary key). It is a pretty straightforward query:
delete all from OUR_TABLE where ID in (123, 345, ...)
The problem is no.of ids can be huge (Eg. 70k), so the query takes a long time. Is there any way to optimize this?
(We are using sybase - if that matters).

There are two ways to make statements like this one perform:
Create a new table and copy all but the rows to delete. Swap the tables afterwards (alter table name ...) I suggest to give it a try even when it sounds stupid. Some databases are much faster at copying than at deleting.
Partition your tables. Create N tables and use a view to join them into one. Sort the rows into different tables grouped by the delete criterion. The idea is to drop a whole table instead of deleting individual rows.

Consider running this in batches. A loop running 1000 records at a time may be much faster than one query that does everything and in addition will not keep the table locked out to other users for as long at a stretch.
If you have cascade delete (and lots of foreign key tables affected) or triggers involved, you may need to run in even smaller batches. You'll have to experiement to see which is the best number for your situation. I've had tables where I had to delete in batches of 100 and others where 50000 worked (fortunate in that case as I was deleting a million records).
But in any even I would put my key values that I intend to delete into a temp table and delete from there.

I'm wondering if parsing an IN clause with 70K items in it is a problem. Have you tried a temp table with a join instead?

Can Sybase handle 70K arguments in IN clause? All databases I worked with have some limit on number of arguments for IN clause. For example, Oracle have limit around 1000.
Can you create subselect instead of IN clause? That will shorten sql. Maybe that could help for such a big number of values in IN clause. Something like this:
DELETE FROM OUR_TABLE WHERE ID IN
(SELECT ID FROM somewhere WHERE some_condition)
Deleting large number of records can be sped up with some interventions in database, if database model permits. Here are some strategies:
you can speed things up by dropping indexes, deleting records and recreating indexes again. This will eliminate rebalancing index trees while deleting records.
drop all indexes on table
delete records
recreate indexes
if you have lots of relations to this table, try disabling constraints if you are absolutely sure that delete command will not break any integrity constraint. Delete will go much faster because database won't be checking integrity. Enable constraints after delete.
disable integrity constraints, disable check constraints
delete records
enable constraints
disable triggers on table, if you have any and if your business rules allow that. Delete records, then enable triggers.
last, do as other suggested - make a copy of the table that contains rows that are not to be deleted, then drop original, rename copy and recreate integrity constraints, if there are any.
I would try combination of 1, 2 and 3. If that does not work, then 4. If everything is slow, I would look for bigger box - more memory, faster disks.

Find out what is using up the performance!
In many cases you might use one of the solutions provided. But there might be others (based on Oracle knowledge, so things will be different on other databases. Edit: just saw that you mentioned sybase):
Do you have foreign keys on that table? Makes sure the referring ids are indexed
Do you have indexes on that table? It might be that droping before delete and recreating after the delete might be faster.
check the execution plan. Is it using an index where a full table scan might be faster? Or the other way round? HINTS might help
instead of a select into new_table as suggested above a create table as select might be even faster.
But remember: Find out what is using up the performance first.
When you are using DDL statements make sure you understand and accept the consequences it might have on transactions and backups.

Try sorting the ID you are passing into "in" in the same order as the table, or index is stored in. You may then get more hits on the disk cache.
Putting the ID to be deleted into a temp table that has the Ids sorted in the same order as the main table, may let the database do a simple scanned over the main table.
You could try using more then one connection and spiting the work over the connections so as to use all the CPUs on the database server, however think about what locks will be taken out etc first.

I also think that the temp table is likely the best solution.
If you were to do a "delete from .. where ID in (select id from ...)" it can still be slow with large queries, though. I thus suggest that you delete using a join - many people don't know about that functionality.
So, given this example table:
-- set up tables for this example
if exists (select id from sysobjects where name = 'OurTable' and type = 'U')
drop table OurTable
go
create table OurTable (ID integer primary key not null)
go
insert into OurTable (ID) values (1)
insert into OurTable (ID) values (2)
insert into OurTable (ID) values (3)
insert into OurTable (ID) values (4)
go
We can then write our delete code as follows:
create table #IDsToDelete (ID integer not null)
go
insert into #IDsToDelete (ID) values (2)
insert into #IDsToDelete (ID) values (3)
go
-- ... etc ...
-- Now do the delete - notice that we aren't using 'from'
-- in the usual place for this delete
delete OurTable from #IDsToDelete
where OurTable.ID = #IDsToDelete.ID
go
drop table #IDsToDelete
go
-- This returns only items 1 and 4
select * from OurTable order by ID
go

Does our_table have a reference on delete cascade?

Related

Performance issues while DELETE operation

I have a simple delete statement like this:
DELETE FROM MY_TABLE WHERE ID='Something'
This deletes only one record for that ID, which is taking more than a minute since MY_TABLE is referenced by a Foreign Key created on another table that has around 40 millions records and the DELETE operation checks existence of this ID in that large table.
One way is to drop/re-create this Foreign Key, but this DELETE operation gets performed by the application users and I can not just drop/re-create this FK on the fly unless it could have been an off-hours operation.
Can anyone help?
So there's an excellent SO question that you can refer
SQL delete performance.
Apart from the suggestions on the question like
Possible causes:
1) cascading delete operations
2) trigger(s)
3) the type of your primary key column is something other than an
integer, thereby forcing a type conversion on each pk value to do the
comparison. this requires a full table scan.
4) does your query really end in a dot like you posted it in the
question? if so, the number may considered to be a floating point
number instead of an integer, thereby causing a type conversion
similar to 3)
5) your delete query is waiting for some other slow query to release a
lock
Importantly:
We should definitely look into triggers. You should try to do away with triggers if there are any.
Also try to see if there are index on the FK in the other large table.
See if you can convert your query to a CASCADE delete to deleting from large table and then deleting from the MY_TABLE like in query below
DELETE FROM LARGE_TABLE WHERE LT.FK IN (SELECT PK FROM MY_TABLE WHERE ID='Something')
GO
DELETE FROM MY_TABLE WHERE ID='Something'
Also see the performance plan for first query on LARGE_TABLE.
You should try to replace WHERE ID='Something' with an INT key operation like WHERE anotherID=34. Basically instead of using a string in WHERE try to use integers for deletion. It is hard to believe that ID column in varchar. You should have a Primary KEY clustered indexed column in MY_Table, and use this key in WHERE clause.
Another way to improve performance is to drop the FK contraint between tables. You can always periodically do garbage deletion from LARGE_TABLE followed by Index rebuilding.
Another simpler way is to introduce mapping table between these tables. Where say a table named Map_My_Table_Large_Table exists which keeps following columns (FK column from My_Table, FK column from Large_Table). You can shift your CASCADE delete to this table.

updating a very large oracle table

I have a very large table people with 60M rows indexed on id, wish to populate a field newid for every record based on a look up table id_conversion (1M rows) which contains id and newid, indexed on id.
when I run
update people p set p.newid=(select l.newid from id_conversion l where l.id=p.id)
it runs for an hour or so and then I get an archive error ora 00257.
Any suggestions for either running update in sections or better sql command?
To avoid writing to Oracle's undo log if your update statement hits every single row of the table then you are likely better off running a create table as select query which will bypass all undo logs, which is likely the issue you're running into as it is logging the impact across 60 million rows. You can then drop the old table and rename the new table to that of the old table's name.
Something like:
create table new_people as
select l.newid,
p.col2,
p.col3,
p.col4,
p.col5
from people p
join id_conversion l
on p.id = l.id;
drop table people;
-- rebuild any constraints and indexes
-- from old people table to new people table
alter table new_people rename to people;
For reference, read some of the tips here: http://www.dba-oracle.com/t_efficient_update_sql_dml_tips.htm
If you are basically creating a new table and not just updating some of the rows of a table it will likely prove the faster method.
I doubt you will be able to get this to run in seconds. Your query, as written, needs to update all 60 million rows.
My first advice is to add an index on id_conversion(id, newid), to make the subquery more efficient. If that doesn't help, then doing the update in batches might be the best way to go.
I should add. Because you are updating all the rows, it might be faster to take the following approach:
Copy the data into a new table with the new values.
Truncate the original table.
Insert the new data into the old table.
Inserts are faster than updates.
In addition to the answers above, which probably will work better in this case, you should know the MERGE statement
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm
that is used for updating one table according to another table and is far faster then update according to a select statement

How to optimize the following delete SQL query?

I have a following delete query in oracle. There will be about 1000 records to be deleted from the database at a time.
I have used "in" the query. Is there any better way to write this query?
DELETE FROM BI_EMPLOYEE_ACTIVITY
WHERE EMPLOYEE_ID in (
SELECT
EMP_ID
FROM
BI_EMPLOYEE
WHERE
PRODUCT_ID = IN_PRODUCT_ID
);
It is not really possible to answer this question as we're missing a description of the data distribution: How many rows are in each table? What's the relationship between the tables? How many rows are affected by the delete?
I'll be assuming that both tables are large (since this is an optimization question) and that BI_EMPLOYEE and BI_EMPLOYEE_ACTIVITY have a parent-child 1..N relationship.
If there are few rows affected by the delete, this means that not many employees have the same PRODUCT_ID and each employee has few activities. In this case it would make sense to index both BI_EMPLOYEE (product_id) and BI_EMPLOYEE_ACTIVITY (employee_id).
This is probably not the case though, the delete probably affects lots of rows. In that case the indexes could be a hindrance. If the delete affects lots of rows, the fastest access path probably is FULL TABLE SCAN + HASH JOIN.
We need some metrics here: how many rows are deleted? How long does it take? This is because large DML will always take time, especially DELETE since they produce the largest amount of undo.
There are alternatives to a large DELETE, as explained in "Deleting many rows from a big table" from asktom:
recreate the table without the deleted rows
partition the data, do a parallel delete
partition the data so that the delete is done by dropping a partition
Putting index on EMP_ID may help, I dont believe if any other optimization is possible, query is quite simple and straight forward
Create an index on PRODUCT_ID column. This would speed up the search. If the column is of varchar type, make use to function index if you are converting values to uppercase or lowercase
Maybe you can try EXIST instead of IN:
DELETE FROM BI_EMPLOYEE_ACTIVITY
WHERE EXISTS (
SELECT
EMP_ID
FROM
BI_EMPLOYEE
WHERE
PRODUCT_ID = IN_PRODUCT_ID
AND
EMP_ID = EMPLOYEE_ID
);
Create an index on BI_EMPLOYEE table for PRODUCT_ID, EMP_ID columns in this order (product_id on the first place).
And create an index on the BI_EMPLOYEE_ACTIVITY table for the column EMPLOYEE_ID
I'll just add that other than creating an index for the query, you need to take a look at the locking issue when your table grows really big, try to lock the table in exclusive mode (if possible) as this will only take a lock from the db, and if it's not possible try to commit the delete over each 2500 records so if you're stuck with row locking you don't endup starving the database of locks.

Delete statement in SQL is very slow

I have statements like this that are timing out:
DELETE FROM [table] WHERE [COL] IN ( '1', '2', '6', '12', '24', '7', '3', '5')
I tried doing one at a time like this:
DELETE FROM [table] WHERE [COL] IN ( '1' )
and so far it's at 22 minutes and still going.
The table has 260,000 rows in it and is four columns.
Does anyone have any ideas why this would be so slow and how to speed it up?
I do have a non-unique, non-clustered index on the [COL] that i'm doing the WHERE on.
I'm using SQL Server 2008 R2
update: I have no triggers on the table.
Things that can cause a delete to be slow:
deleting a lot of records
many indexes
missing indexes on foreign keys in child tables. (thank you to #CesarAlvaradoDiaz for mentioning this in the comments)
deadlocks and blocking
triggers
cascade delete (those ten parent records you are deleting could mean
millions of child records getting deleted)
Transaction log needing to grow
Many Foreign keys to check
So your choices are to find out what is blocking and fix it or run the deletes in off hours when they won't be interfering with the normal production load. You can run the delete in batches (useful if you have triggers, cascade delete, or a large number of records). You can drop and recreate the indexes (best if you can do that in off hours too).
Disable CONSTRAINT
ALTER TABLE [TableName] NOCHECK CONSTRAINT ALL;
Disable Index
ALTER INDEX ALL ON [TableName] DISABLE;
Rebuild Index
ALTER INDEX ALL ON [TableName] REBUILD;
Enable CONSTRAINT
ALTER TABLE [TableName] CHECK CONSTRAINT ALL;
Delete again
Deleting a lot of rows can be very slow. Try to delete a few at a time, like:
delete top (10) YourTable where col in ('1','2','3','4')
while ##rowcount > 0
begin
delete top (10) YourTable where col in ('1','2','3','4')
end
In my case the database statistics had become corrupt. The statement
delete from tablename where col1 = 'v1'
was taking 30 seconds even though there were no matching records but
delete from tablename where col1 = 'rubbish'
ran instantly
running
update statistics tablename
fixed the issue
If the table you are deleting from has BEFORE/AFTER DELETE triggers, something in there could be causing your delay.
Additionally, if you have foreign keys referencing that table, additional UPDATEs or DELETEs may be occurring.
Preventive Action
Check with the help of SQL Profiler for the root cause of this issue. There may be Triggers causing the delay in Execution. It can be anything. Don't forget to Select the Database Name and Object Name while Starting the Trace to exclude scanning unnecessary queries...
Database Name Filtering
Table/Stored Procedure/Trigger Name Filtering
Corrective Action
As you said your table contains 260,000 records...and IN Predicate contains six values. Now, each record is being search 260,000 times for each value in IN Predicate. Instead it should be the Inner Join like below...
Delete K From YourTable1 K
Inner Join YourTable2 T on T.id = K.id
Insert the IN Predicate values into a Temporary Table or Local Variable
It's possible that other tables have FK constraint to your [table].
So the DB needs to check these tables to maintain the referential integrity.
Even if you have all needed indexes corresponding these FKs, check their amount.
I had the situation when NHibernate incorrectly created duplicated FKs on the same columns, but with different names (which is allowed by SQL Server).
It has drastically slowed down running of the DELETE statement.
Check execution plan of this delete statement. Have a look if index seek is used. Also what is data type of col?
If you are using wrong data type, change update statement (like from '1' to 1 or N'1').
If index scan is used consider using some query hint..
If you're deleting all the records in the table rather than a select few it may be much faster to just drop and recreate the table.
Is [COL] really a character field that's holding numbers, or can you get rid of the single-quotes around the values? #Alex is right that IN is slower than =, so if you can do this, you'll be better off:
DELETE FROM [table] WHERE [COL] = '1'
But better still is using numbers rather than strings to find the rows (sql likes numbers):
DELETE FROM [table] WHERE [COL] = 1
Maybe try:
DELETE FROM [table] WHERE CAST([COL] AS INT) = 1
In either event, make sure you have an index on column [COL] to speed up the table scan.
I read this article it was really helpful for troubleshooting any kind of inconveniences
https://support.microsoft.com/en-us/kb/224453
this is a case of waitresource
KEY: 16:72057595075231744 (ab74b4daaf17)
-- First SQL Provider to find the SPID (Session ID)
-- Second Identify problem, check Status, Open_tran, Lastwaittype, waittype, and waittime
-- iMPORTANT Waitresource select * from sys.sysprocesses where spid = 57
select * from sys.databases where database_id=16
-- with Waitresource check this to obtain object id
select * from sys.partitions where hobt_id=72057595075231744
select * from sys.objects where object_id=2105058535
After inspecting an SSIS Package(due to a SQL Server executing commands really slow), that was set up in a client of ours about 5-4 years before the time of me writing this, I found out that there were the below tasks:
1) insert data from an XML file into a table called [Importbarcdes].
2) merge command on an another target table, using as source the above mentioned table.
3) "delete from [Importbarcodes]", to clear the table of the row that was inserted after the XML file was read by the task of the SSIS Package.
After a quick inspection all statements(SELECT, UPDATE, DELETE etc.) on the table ImportBarcodes that had only 1 row, took about 2 minutes to execute.
Extended Events showed a whole lot PAGEIOLATCH_EX wait notifications.
No indexes were present of the table and no triggers were registered.
Upon close inspection of the properties of the table, in the Storage Tab and under general section, the Data Space field showed more than 6 GIGABYTES of space allocated in pages.
What happened:
The query ran for a good portion of time each day for the last 4 years, inserting and deleting data in the table, leaving unused pagefiles behind with out freeing them up.
So, that was the main reason of the wait events that were captured by the Extended Events Session and the slowly executed commands upon the table.
Running ALTER TABLE ImportBarcodes REBUILD fixed the issue freeing up all the unused space. TRUNCATE TABLE ImportBarcodes did a similar thing, with the only difference of deleting all pagefiles and data.
Older topic but one still relevant.
Another issue occurs when an index has become fragmented to the extent of becoming more of a problem than a help. In such a case, the answer would be to rebuild or drop and recreate the index and issuing the delete statement again.
As an extension to Andomar's answer, above, I had a scenario where the first 700,000,000 records (of ~1.2 billion) processed very quickly, with chunks of 25,000 records processing per second (roughly). But, then it starting taking 15 minutes to do a batch of 25,000. I reduced the chunk size down to 5,000 records and it went back to its previous speed. I'm not certain what internal threshold I hit, but the fix was to reduce the number of records, further, to regain the speed.
open CMD and run this commands
NET STOP MSSQLSERVER
NET START MSSQLSERVER
this will restart the SQL Server instance.
try to run again after your delete command
I have this command in a batch script and run it from time to time if I'm encountering problems like this. A normal PC restart will not be the same so restarting the instance is the most effective way if you are encountering some issues with your sql server.

Multi Rows Deletion from table in SQL Server

How I can Delete 1.5 Millions Rows From SQL Server 2000, And how much time it will take to complete this task.
I dont want to delete all records from table.... I just want to delete all records which are fullfilling WHERE condition.
EDITED from a comment to an answer below.
"I fire the same query i.e. delete from table_name with Where Clause... Is it possible to Disable Indexing at the running Query, becuase Query is going on from past 20 hr.. Also help me out how i can disable Indexing.."
If (and only if) you want to delete all of the records in a table, you can use DROP TABLE or TRUNCATE TABLE.
DELETE removes one record at a time and records an entry in the transaction log for each deleted row.
TRUNCATE TABLE is much faster because it doesn't record the activity in the transaction log. It removes all rows from a table, but the table structure & its columns, constraints, indexes and so on remain. DROP TABLE would remove those.
Use caution if you decide to TRUNCATE. It's irreversible (unless you have a backup).
create a second table, inserting all rows from the first that you don't want deleting.
delete the first table
rename the second table to be the first
(or a variation on the above)
This can often be quicker than doing a delete of selected records from a big table.
You may want to try deleting in batches too. I just tested this on a table I have and the delete operation went from 13 seconds to 3 seconds.
While Exists(Select * From YourTable Where YourCondition = True)
Delete Top (100000)
From YourTable
Where YourCondition = True
I don't think you can use the TOP predicate if you are running SQL2000, but it works with SQL2005 and up. If you are using SQL2000, then you can use this syntax instead:
Set RowCount 100000
While Exists(Select * From YourTable Where YourCondition = True)
Delete
From YourTable
Where YourCondition = True
DELETE FROM table WHERE a=b;
When deleting that many rows you may want to disable the indexes so they don't get updated on every delete. Rewriting the indexes on every deletion will significantly slow down the whole process.
You'll want to disable these indexes before beginning your deletion or else there may be table locks already in place.
--Disable Index
ALTER INDEX [IX_MyIndex] ON MyTable.MyColumn DISABLE
--Enable Index
ALTER INDEX [IX_MyIndex] ON MyTable.MyColumn REBUILD
If you wish to remove all entries in a table you can use TRUNCATE.
Does the table you are deleting from have multiple foreign keys, or cascaded deletes or triggers? All of these will impact performance.
Depending on what you want to do and the transactional integrity, can you delete things in small batches e.g. if you are trying to delete 1.5 million records that is 1 years worth of data, can you do it 1 week at a time?
Delete from table where condition for those 1.5 million rows
The time depends.
On Oracle it is also possible to use
truncate table <table>
Not sure if that is standard SQL or available in SQL Server. It will however clear the whole table - but then it is quicker than "delete from " (it will also conduct a commit).
TRUNCATE will also ignore any referential integrity or triggers on the table. DELETE FROM ... WHERE will respect both. The time will depend on the indexing of your condition columns, your hardware, and any additional system load.
The delete SQL is exactly the same as a normal SQL delete
delete from table where [your condition ]
However if your worried about time then I'll assume your question is a little deeper than this. If your table is has a significant number of non-clustered indexes then in some circumstances it may be faster to drop all these indexes first and rebuild after the delete. This is unusual but in cases where your straightforward delete is vulnerable to timeout issues it may be helpful
CREATE TABLE new_table as select <data you want to keep> from old_table;
index new_table
grant on new table
add constraints on new_table
etc on new_table
drop table old_table
rename new_table to old_table;