I have a query like
DELETE from tablename where colname = value;
which takes awfully long time to execute.
What could be the reason? I have an index on colname.
There could be several explanations as to why your query takes a long time:
You could be blocked by another session (most likely). Before you delete you should make sure noone else is locking the rows, eg: issue SELECT NULL FROM tablename WHERE colname=:value FOR UPDATE NOWAIT,
There could be a ON DELETE TRIGGER that does additional work,
Check for UNINDEXED REFERENCE CONSTRAINTS pointing to this table (there is a script from AskTom that will help you determine if such unindexed foreign keys exist).
it could be that your table is related to multiple tables have a huge row count.
How selective is that index? If your table has one million rows and that value hits one hundred and fifty thousand of them then your index is useless. In fact it may be worse than useless if it is actually being used. Remember, a DELETE is a like a SELECT statement: we can tune its access path.
Also, deletes take up a lot of undo tablespace, so you might be suffereing from contention, if the system is experiencing heavy use. In a multi-user system another session might have a lock on the rows(s) you want to delete.
Do you have ON DELETE triggers? Do you have ON DELETE CASCADE foreign key constraints?
Edit: Given all that you say, and especially the column in question being the primary key so you are attempting to delete a single row, if it is taking a long time it is much more likely that some other process or user has a lock on the row. Is anything showing up in V$LOCK?
So I'll just post my experience. Might be helpful for someone.
The query
delete from foo
where foo_id not in (
select max(foo_id) from foo group by foo_bar_id, foo_qux_id
);
took 16 sec. deleting 1700 records from 2300 total in table foo.
I checked all the indexes on foreign keys as directed in other answers. That did not help.
Solution:
Changed the query to
delete from foo
where foo_id in (
select foo_id from foo
minus
select max(foo_id) from foo group by foo_bar_id, foo_qux_id
);
I've changed not in to in and used minus to achieve correct result.
Now the query executes in 0.04 sec.
Just posting my experience if it helps.
I had the same issue (long delete or update times) and so my gut reaction was to look for and kill 'stuck' queries. But my normal SQL for finding those wasn't showing anything. After a little digging, I went looking for 'locked tables' instead, and was able to find an inactive session that was apparently actively blocking the tables I was trying to alter. Here is the query I used (Oracle 12c) to find the sessions to kill that were blocking my table updates:
SELECT
c.owner,
c.object_name,
c.object_type,
b.sid,
b.serial#,
b.status,
b.osuser,
b.machine
FROM
v$locked_object a ,
v$session b,
dba_objects c
WHERE
b.sid = a.session_id
AND
a.object_id = c.object_id;
Then with the sid, and serial# from the above query:
ALTER SYSTEM KILL SESSION 'SID, SERIAL#' to end those sessions and free the locks.
I'm no dba, that might be bad practice, but it worked for me.
Does your table holds more number of records ?
Is there some recursive programs(some nested loops etc..) running on the database server ?
Check network problems if database server is on different machines ?
There is a significant difference between Oracle and mysql :
Oracle does not create index automatically for foreign keys but mysql does. Then if you have some parent table that you may execute delete command on it then you must create index on foreign keys in child tables otherwise the delete command on parent table will be very very slow if child tables has a lot of rows, because it must surf all records of child table per deletion of any parent records.
Then be careful when you want to delete from parent table in Oracle database.
Related
In SQL Server 2008 I have some million rows of data which needs be deleted. They are scattered across a handful of tables. Deletion takes up to 20 seconds which I think is way to slow! The data to be deleted is identified by a timestamp column. Here is what I have done so far in order to optimize:
Using isolation level read uncommitted. I don't care about transactions. If we fail the user will issue the delete operation again. And new data is ensured not to have the timestamp we are deleting.
Deleting leaf tables before parent tables.
The timestamp column is part of the PK clustered index, in fact its the first position of the PK/index.
Each table is emptied using a loop which deletes top 200000 entries in order to reduce the transaction log overhead.
Neither I/O nor CPU is maxed out on the server
What have I overlooked?
Also I am in doubt of the effect of moving the timestamp column to the first position in the PK. After doing so, must I reorganize the tables or is SQL Server smart enough to do this itself. My understanding of clustered index is that since it defines the physical layout of the rows, it is force into reorganizing the data. But we have no complaints from the customer that the changing clustered index operation took a long time to perform.
Please make sure the tables you want to delete data from has "primary key" specifically indicated.
Wrong: create table myTable (ID int)
True: create table myTable (ID int PRIMARY KEY)
In addition to that, please try to add "option (recompile)", which will help the performance:
DELETE FROM myTable
WHERE timestamp in (select timestamp from other_table)
OPTION (RECOMPILE)
I am getting a large text file of updated information from a customer that contains updates for 500,000 users. However, as I am processing this file, I often am running into SQL Server timeout errors.
Here's the process I follow in my VB application that processes the data (in general):
Delete all records from temporary table (to remove last month's data) (eg. DELETE * FROM tempTable)
Rip text file into the temp table
Fill in extra information into the temp table, such as their organization_id, their user_id, group_code, etc.
Update the data in the real tables based on the data computed in the temp table
The problem is that I often run commands like UPDATE tempTable SET user_id = (SELECT user_id FROM myUsers WHERE external_id = tempTable.external_id) and these commands frequently time out. I have tried bumping the timeouts up to as far as 10 minutes, but they still fail. Now, I realize that 500k rows is no small number of rows to manipulate, but I would think that a database purported to be able to handle millions and millions of rows should be able to cope with 500k pretty easily. Am I doing something wrong with how I am going about processing this data?
Please help. Any and all suggestions welcome.
subqueries like the one you give us in the question:
UPDATE tempTable SET user_id = (SELECT user_id FROM myUsers WHERE external_id = tempTable.external_id)
are only good on one row at a time, so you must be looping. Think set based:
UPDATE t
SET user_id = u.user_id
FROM tempTable t
inner join myUsers u ON t.external_id=u.external_id
and remove your loops, this will update all rows in one statement and be significantly faster!
Needs more information. I am manipulating 3-4 million rows in a 150 million row table regularly and I am NOT thinking this is a lot of data. I have a "products" table that contains about 8 million entries - includign full text search. No problems either.
Can you just elaborte on your hardware? I assume "normal desktop PC" or "low end server", both with absolutely non-optimal disc layout, and thus tons of IO problems - on updates.
Make sure you have indexes on your tables that you are doing the selects from. In your example UPDATE command, you select the user_id from the myUsers table. Do you have an index with the user_id column on the myUsers table? The downside of indexes is that they increase time for inserts/updates. Make sure you don't have indexes on the tables you are trying to update. If the tables you are trying to update do have indexes, consider dropping them and then rebuilding them after your import.
Finally, run your queries in SQL Server Management Studio and have a look at the execution plan to see how the query is being executed. Look for things like table scans to see where you might be able to optimize.
Look at the KM's answer and don't forget about indexes and primary keys.
Are you indexing your temp table after importing the data?
temp_table.external_id should definitely have an index since it is in the where clause.
There are more efficient ways of importing large blocks of data. Look in SQL Books Online under BCP (Bulk Copy Protocol.)
delete from a A where a.ID = 132.
The table A contains around 5000 records and A.ID is the primary key in the table A. But it is taking a long time to delete . Sometimes its getting timed out also . That table contains three indexes and it is referenced by three foreign keys . Can anyone explain me why its taking long time even though we are deleting based on the primary key . And please tell me some way to optimize this problem ...?
Possible causes:
1) cascading delete operations
2) trigger(s)
3) the type of your primary key column is something other than an integer, thereby forcing a type conversion on each pk value to do the comparison. this requires a full table scan.
4) does your query really end in a dot like you posted it in the question? if so, the number may considered to be a floating point number instead of an integer, thereby causing a type conversion similar to 3)
5) your delete query is waiting for some other slow query to release a lock
Obviously it should not be taking a long time. However, there isn't enough information here to figure out exactly why. I can tell you, though, that you should focus on the Foreign Keys.
These can slow things down if they impose constraints from other, much larger, tables. You may also find out that your timeouts are due to integrity checks that prevent the delete (then the question is why you aren't getting exceptions instead of a timeout).
My next step would be to remove the foreign keys and then check performance. Then add each one back in at a time and check performance as you go.
Are other operations (e.g. Inserts, Selects, Updates) taking a long time?
First thought: Indexes on foreign keys?
This is related to cascading deletes mentioned
All child tables muts be checked and if you have a total of 500,000 child rows, this might take some time of course...
Second thought: Triggers firing?
On this table or on child tables or trying to cascade via code etc
God forbid, cursor for each row in DELETED...
Try to update the statistics. 5000 rows is not a big deal. If you're doing this regularly you should schedule maintenance on that table as well (i.e. re-build indexes, update stats etc.)
As others have observed, the probable suspects are the foreign keys.
Firstly because the ON DELETE CASCADE can gather momentum if the dependent tables in turn are referenced by other tables, which in turn may be referenced, and so on.
Secondly, because other users may have locks on the rows which need to be deleted. This is the most likely cause of the timeouts. Quite how this works will depend on the flavour and version of your database. For instance, older versions of Oracle (<=8.0) needed to lock the entire dependent table unless the foreign key columns were indexed.
Deletes on sql server are sometimes slow and I've been often in need to optimize them in order to diminish the needed time.
I've been googleing a bit looking for tips on how to do that, and I've found diverse suggestions.
I'd like to know your favorite and most effective techinques to tame the delete beast, and how and why they work.
until now:
be sure foreign keys have indexes
be sure the where conditions are indexed
use of WITH ROWLOCK
destroy unused indexes, delete, rebuild the indexes
now, your turn.
The following article, Fast Ordered Delete Operations may be of interest to you.
Performing fast SQL Server delete operations
The solution focuses on utilising a view in order to simplify the execution plan produced for a batched delete operation. This is achieved by referencing the given table once, rather than twice which in turn reduces the amount of I/O required.
I have much more experience with Oracle, but very likely the same applies to SQL Server as well:
when deleting a large number of rows, issue a table lock, so the database doesn't have to do lots of row locks
if the table you delete from is referenced by other tables, make sure those other tables have indexes on the foreign key column(s) (otherwise the database will do a full table scan for each deleted row on the other table to ensure that deleting the row doesn't violate the foreign key constraint)
I wonder if it's time for garbage-collecting databases? You mark a row for deletion and the server deletes it later during a sweep. You wouldn't want this for every delete - because sometimes a row must go now - but it would be handy on occasion.
Summary of Answers through 2014-11-05
This answer is flagged as community wiki since this is an ever-evolving topic with a lot of nuances, but very few possible answers overall.
The first issue is you must ask yourself what scenario you're optimizing for? This is generally either performance with a single user on the db, or scale with many users on the db. Sometimes the answers are the exact opposite.
For single user optimization
Hint a TABLELOCK
Remove indexes not used in the delete then rebuild them afterward
Batch using something like SET ROWCOUNT 20000 (or whatever, depending on log space) and loop (perhaps with a WAITFOR DELAY) until you get rid of it all (##ROWCOUNT = 0)
If deleting a large % of table, just make a new one and delete the old table
Partition the rows to delete, then drop the parition. [Read more...]
For multi user optimization
Hint row locks
Use the clustered index
Design clustered index to minimize page re-organization if large blocks are deleted
Update "is_deleted" column, then do actual deletion later during a maintenance window
For general optimization
Be sure FKs have indexes on their source tables
Be sure WHERE clause has indexes
Identify the rows to delete in the WHERE clause with a view or derived table instead of referencing the table directly. [Read more...]
To be honest, deleting a million rows from a table scales just as badly as inserting or updating a million rows. It's the size of the rowset that's the problem, and there's not much you can do about that.
My suggestions:
Make sure that the table has a primary key and clustered index (this is vital for all operations).
Make sure that the clustered index is such that minimal page re-organisation would occur if a large block of rows were to be deleted.
Make sure that your selection criteria are SARGable.
Make sure that all your foreign key constraints are currently trusted.
(if the indexes are "unused", why are they there at all?)
One option I've used in the past is to do the work in batches. The crude way would be to use SET ROWCOUNT 20000 (or whatever) and loop (perhaps with a WAITFOR DELAY) until you get rid of it all (##ROWCOUNT = 0).
This might help reduce the impact upon other systems.
The problem is you haven't defined your conditions enough. I.e. what exactly are you optimizing?
For example, is the system down for nightly maintenance and no users are on the system? And are you deleting a large % of the database?
If offline and deleting a large %, may make sense to just build a new table with data to keep, drop the old table, and rename. If deleting a small %, you likely want to batch things in as large batches as your log space allows. It entirely depends on your database, but dropping indexes for the duration of the rebuild may hurt or help -- if even possible due to being "offline".
If you're online, what's the likelihood your deletes are conflicting with user activity (and is user activity predominantly read, update, or what)? Or, are you trying to optimize for user experience or speed of getting your query done? If you're deleting from a table that's frequently updated by other users, you need to batch but with smaller batch sizes. Even if you do something like a table lock to enforce isolation, that doesn't do much good if your delete statement takes an hour.
When you define your conditions better, you can pick one of the other answers here. I like the link in Rob Sanders' post for batching things.
If you have lots of foreign key tables, start at the bottom of the chain and work up. The final delete will go faster and block less things if there are no child records to cascade delete (which I would NOT turn on if I had a large number fo child tables as it will kill performance).
Delete in batches.
If you have foreign key tables that are no longer being used (you'd be surprised how often production databses end up with old tables nobody will get rid of), get rid of them or at least break the FK/PK connection. No sense cheking a table for records if it isn't being used.
Don't delete - mark records as delted and then exclude marked records from all queries. This is best set up at the time of database design. A lot of people use this because it is also the best fastest way to get back records accidentlally deleted. But it is a lot of work to set up in an already existing system.
I'll add another one to this:
Make sure your transaction isolation level and database options are set appropriately. If your SQL server is set not to use row versioning, or you're using an isolation level on other queries where you will wait for the rows to be deleted, you could be setting yourself up for some very poor performance while the operation is happening.
On very large tables where you have a very specific set of criteria for deletes, you could also partition the table, switch out the partition, and then process the deletions.
The SQLCAT team has been using this technique on really really large volumes of data. I found some references to it here but I'll try and find something more definitive.
I think, the big trap with delete that kill the performance is that sql after each row deleted, it updates all the related indexes for any column in this row. what about delting all indexes before bulk delete?
There are deletes and then there are deletes. If you are aging out data as part of a trim job, you will hopefully be able to delete contiguous blocks of rows by clustered key. If you have to age out data from a high volume table that is not contiguous it is very very painful.
If it is true that UPDATES are faster than DELETES, you could add a status column called DELETED and filter on it in your selects. Then run a proc at night that does the actual deletes.
Do you have foreign keys with referential integrity activated?
Do you have triggers active?
Simplify any use of functions in your WHERE clause! Example:
DELETE FROM Claims
WHERE dbo.YearMonthGet(DataFileYearMonth) = dbo.YearMonthGet(#DataFileYearMonth)
This form of the WHERE clause required 8 minutes to delete 125,837 records.
The YearMonthGet function composed a date with the year and month from the input date and set day = 1. This was to ensure we deleted records based on year and month but not day of month.
I rewrote the WHERE clause to:
WHERE YEAR(DataFileYearMonth) = YEAR(#DataFileYearMonth)
AND MONTH(DataFileYearMonth) = MONTH(#DataFileYearMonth)
The result: The delete required about 38-44 seconds to delete those 125,837 records!
Edit: Solved, there was a trigger with a loop on the table (read my own answer further below).
We have a simple delete statement that looks like this:
DELETE FROM tablename WHERE pk = 12345
This just hangs, no timeout, no nothing.
We've looked at the execution plan, and it consists of many lookups on related tables to ensure no foreign keys would trip up the delete, but we've verified that none of those other tables have any rows referring to that particular row.
There is no other user connected to the database at this time.
We've run DBCC CHECKDB against it, and it reports 0 errors.
Looking at the results of sp_who and sp_lock while the query is hanging, I notice that my spid has plenty of PAG and KEY locks, as well as the occasional TAB lock.
The table has 1.777.621 rows, and yes, pk is the primary key, so it's a single row delete based on index. There is no table scan in the execution plan, though I notice that it contains something that says Table Spool (Eager Spool), but says Estimated number of rows 1. Can this actually be a table-scan in disguise? It only says it looks at the primary key column.
Tried DBCC DBREINDEX and UPDATE STATISTICS on the table. Both completed within reasonable time.
There is unfortunately a high number of indexes on this particular table. It is the core table in our system, with plenty of columns, and references, both outgoing and incoming. The exact number is 48 indexes + the primary key clustered index.
What else should we look at?
Note also that this table did not have this problem before, this problem occured suddently today. We also have many databases with the same table setup (copies of customer databases), and they behave as expected, it's just this one that is problematic.
One piece of information missing is the number of indices on the table you are deleting the data from. As SQL Server uses the Primary Key as a pointer in every index, any change to the primary index requires updating every index. Though, unless we are talking a high number, this shouldn't be an issue.
I am guessing, from your description, that this is a primary table in the database, referenced by many other tables in FK relationships. This would account for the large number of locks as it checks the rest of the tables for references. And, if you have cascading deletes turned on, this could lead to a delete in table a requiring checks several tables deep.
Try recreating the index on that table, and try regenerating the statistics.
DBCC REINDEX
UPDATE STATISTICS
Ok, this is embarrasing.
A collegue had added a trigger to that table a while ago, and the trigger had a bug. Although he had fixed the bug, the trigger had never been recreated for that table.
So the server was actually doing nothing, it just did it a huge number of times.
Oh well...
Thanks for the eyeballs to everyone who read this and pondered the problem.
I'm going to accept Josef's answer, as his was the closest, and indirectly thouched upon the issue with the cascading deletes.