SQL DELETE performance - sql

delete from a A where a.ID = 132.
The table A contains around 5000 records and A.ID is the primary key in the table A. But it is taking a long time to delete . Sometimes its getting timed out also . That table contains three indexes and it is referenced by three foreign keys . Can anyone explain me why its taking long time even though we are deleting based on the primary key . And please tell me some way to optimize this problem ...?

Possible causes:
1) cascading delete operations
2) trigger(s)
3) the type of your primary key column is something other than an integer, thereby forcing a type conversion on each pk value to do the comparison. this requires a full table scan.
4) does your query really end in a dot like you posted it in the question? if so, the number may considered to be a floating point number instead of an integer, thereby causing a type conversion similar to 3)
5) your delete query is waiting for some other slow query to release a lock

Obviously it should not be taking a long time. However, there isn't enough information here to figure out exactly why. I can tell you, though, that you should focus on the Foreign Keys.
These can slow things down if they impose constraints from other, much larger, tables. You may also find out that your timeouts are due to integrity checks that prevent the delete (then the question is why you aren't getting exceptions instead of a timeout).
My next step would be to remove the foreign keys and then check performance. Then add each one back in at a time and check performance as you go.
Are other operations (e.g. Inserts, Selects, Updates) taking a long time?

First thought: Indexes on foreign keys?
This is related to cascading deletes mentioned
All child tables muts be checked and if you have a total of 500,000 child rows, this might take some time of course...
Second thought: Triggers firing?
On this table or on child tables or trying to cascade via code etc
God forbid, cursor for each row in DELETED...

Try to update the statistics. 5000 rows is not a big deal. If you're doing this regularly you should schedule maintenance on that table as well (i.e. re-build indexes, update stats etc.)

As others have observed, the probable suspects are the foreign keys.
Firstly because the ON DELETE CASCADE can gather momentum if the dependent tables in turn are referenced by other tables, which in turn may be referenced, and so on.
Secondly, because other users may have locks on the rows which need to be deleted. This is the most likely cause of the timeouts. Quite how this works will depend on the flavour and version of your database. For instance, older versions of Oracle (<=8.0) needed to lock the entire dependent table unless the foreign key columns were indexed.

Related

Records updating based on primary key is slow

I'm updating 60k records in 200k records using informatica based only only one primary key. Still it is running longer. Is there a way to reduce the time as we cannot create index on primary key again which is not necessary.
60k updated rows out of 200k total is a pretty high hit rate. An indexed read would be extremely inefficient compared to a Full Table Scan. So really you don't want to use the primary key index to do an update like this.
However it's difficult to provide more assistance unless you can post the exact query that's being executed, preferably with its explain plan.
Most likely your target table has multiple indexes defined on it (regardless of how many keys you are using to do the update) also there might be multiple foreign keys which need to be resolved against their related tables. Ignore informatica for a minute and try run the update directly on the db and resolve
The best way for this is deleting and inserting which is a faster alternative way and it works.

Which database operation is heaviest?

If I perform the CRUD operations on the same table, what is the heaviest operation in terms of performance?
People say DELETE and then INSERT is better than UPDATE in some cases, is this true? Then UPDATE is the heaviest operation?
Like all things in life, it depends.
SQL Server uses WAL (write ahead logging) to maintain ACID (Atomicity, Consistency, Isolation, Durability) properties.
A insert needs to log entries for data page and index page changes. If page splits occur, it takes longer. Then the data is written to the data file.
A delete marks the data and index pages for re-use. The data will still be there right after the operation.
A update is implemented as an delete and insert. There for double the log entries.
What can help inserts is pre-allocating the space in the data file before running the job. Auto growing the data files is expensive.
In summary, I would expect updates on average to be the most expensive operation.
I am by no way an expert on the storage engine.
Please check out http://www.sqlskills.com - Paul Randals blog and/or Kalen Daleny SQL Server Internals book, http://sqlserverinternals.com/. These authors go in depth on all the cases that might happen.
It depends mostly on foregin keys and indexes which you have on this table. For deletion and isertion every column that is a foreign key and part of an index has to be checked on foreign key references and every index containing that column has to be rebuilt.
If you do DELETE and then INSERT then checking and rebuilding happens twice. If it is a really large table then rebuilding indexes can take very long time and in this case update will be MUCH faster.
Of course if you have index on the key that you're searching with update statement and you are not updating the key.
For a small table with almost no indexes/foreign keys the operations run so fast that it's not a big issue.

Benefit to keeping a record in the database, rather than deleting it, for performance issues?

So I have a client that I am building a Rails app for....I am using PostgreSQL.
He made this comment about preferring to hide records, rather than delete them, and given that this is the first time I have heard about this I figured I would ask you guys to hear your thoughts.
I'd rather hide than delete because deletions in tables eventually lead to table index havoc that causes queries to take longer than expected (much worse than Inserts or Updates). This won't be a problem in the beginning of the site (it gets exponentially worse over time), but seems like an easy issue to never encounter by just not deleting anything (yet) as part of the "everyday" web application functionality. We can always handle deletions much later as part of a Data Optimization & Maintenance process and re-index tables in that process on some (yet to be determined) scheduled basis.
In all the Rails apps I have built, I have never had an issue with records being deleted and it affecting the index.
Am I missing something? Is this a problem that used to exist, but modern RDBMS products have fixed it?
There may be functional reasons for preferring that records not be deleted, but reasons relating to some form of table index "havoc" are almost certainly bogus unless supported by some technical evidence.
You hear this sort of thing quite often in the Oracle world -- that indexes do not re-use space freed up by deletions. It's usually based on some misinterpretation of the facts (eg. that index blocks are not freed for re-use until they are completely empty). Hence you end up with people giving advice to periodically rebuild indexes. If you give these issues some thought, you wonder why the RDBMS developers would not have fixed such an issue, given that it supposedly harms the system performance.
So there may be some piece of Postgres-related, possibly obsolete, information on which this is based, but the onus is really on the person objecting to a perfectly normal type of database operation to come with evidence to support their position.
Another thought: I believe that in Postgres an update is implemented as a delete and insert, hence the advice to vacuum frequently on heavily updated tables. Based on that, updates should also cause the same index problems that are supposed to be associated with deletes.
Other reasons for not deleting the records.
you don't have to worry about cascading a delete through various other tables in the database that reference the row you are deleting
Every bit of data is useful. Debugging and auditing becomes easy.
Easier to rollback if needed.
Create an deleted column in your table and dont index that one.
If you update that record with deleted = 1 or deleted = o only the data needs to be rewriten, the index doesnt have to be updated that saves lots off IO reads and IO writes
This advies goes for all modern RDBMS when they make use of B tree indexes. B tree is an very good algorithm for searching but not for updating and deleting because off high number off IO reads and IO writes thats needed to insert or update node in the node or delete notes from the tree this is also the reason why you should not "over index" your table
"Delete" an record like this
UPDATE table SET deleted = 1 WHERE id = 1 -- if deleted not is indexed assuming id is index as an primary key this also should be fast
Check if an record is deleted
SELECT * FROM table WHERE id = 1 and deleted = 1 -- assuming id is index as an primary key this also should be fast
Check if an record not is deleted
SELECT * FROM table WHERE id = 1 and deleted = 0 -- assuming id is index as an primary key this also should be fast

Optimizing Delete on SQL Server

Deletes on sql server are sometimes slow and I've been often in need to optimize them in order to diminish the needed time.
I've been googleing a bit looking for tips on how to do that, and I've found diverse suggestions.
I'd like to know your favorite and most effective techinques to tame the delete beast, and how and why they work.
until now:
be sure foreign keys have indexes
be sure the where conditions are indexed
use of WITH ROWLOCK
destroy unused indexes, delete, rebuild the indexes
now, your turn.
The following article, Fast Ordered Delete Operations may be of interest to you.
Performing fast SQL Server delete operations
The solution focuses on utilising a view in order to simplify the execution plan produced for a batched delete operation. This is achieved by referencing the given table once, rather than twice which in turn reduces the amount of I/O required.
I have much more experience with Oracle, but very likely the same applies to SQL Server as well:
when deleting a large number of rows, issue a table lock, so the database doesn't have to do lots of row locks
if the table you delete from is referenced by other tables, make sure those other tables have indexes on the foreign key column(s) (otherwise the database will do a full table scan for each deleted row on the other table to ensure that deleting the row doesn't violate the foreign key constraint)
I wonder if it's time for garbage-collecting databases? You mark a row for deletion and the server deletes it later during a sweep. You wouldn't want this for every delete - because sometimes a row must go now - but it would be handy on occasion.
Summary of Answers through 2014-11-05
This answer is flagged as community wiki since this is an ever-evolving topic with a lot of nuances, but very few possible answers overall.
The first issue is you must ask yourself what scenario you're optimizing for? This is generally either performance with a single user on the db, or scale with many users on the db. Sometimes the answers are the exact opposite.
For single user optimization
Hint a TABLELOCK
Remove indexes not used in the delete then rebuild them afterward
Batch using something like SET ROWCOUNT 20000 (or whatever, depending on log space) and loop (perhaps with a WAITFOR DELAY) until you get rid of it all (##ROWCOUNT = 0)
If deleting a large % of table, just make a new one and delete the old table
Partition the rows to delete, then drop the parition. [Read more...]
For multi user optimization
Hint row locks
Use the clustered index
Design clustered index to minimize page re-organization if large blocks are deleted
Update "is_deleted" column, then do actual deletion later during a maintenance window
For general optimization
Be sure FKs have indexes on their source tables
Be sure WHERE clause has indexes
Identify the rows to delete in the WHERE clause with a view or derived table instead of referencing the table directly. [Read more...]
To be honest, deleting a million rows from a table scales just as badly as inserting or updating a million rows. It's the size of the rowset that's the problem, and there's not much you can do about that.
My suggestions:
Make sure that the table has a primary key and clustered index (this is vital for all operations).
Make sure that the clustered index is such that minimal page re-organisation would occur if a large block of rows were to be deleted.
Make sure that your selection criteria are SARGable.
Make sure that all your foreign key constraints are currently trusted.
(if the indexes are "unused", why are they there at all?)
One option I've used in the past is to do the work in batches. The crude way would be to use SET ROWCOUNT 20000 (or whatever) and loop (perhaps with a WAITFOR DELAY) until you get rid of it all (##ROWCOUNT = 0).
This might help reduce the impact upon other systems.
The problem is you haven't defined your conditions enough. I.e. what exactly are you optimizing?
For example, is the system down for nightly maintenance and no users are on the system? And are you deleting a large % of the database?
If offline and deleting a large %, may make sense to just build a new table with data to keep, drop the old table, and rename. If deleting a small %, you likely want to batch things in as large batches as your log space allows. It entirely depends on your database, but dropping indexes for the duration of the rebuild may hurt or help -- if even possible due to being "offline".
If you're online, what's the likelihood your deletes are conflicting with user activity (and is user activity predominantly read, update, or what)? Or, are you trying to optimize for user experience or speed of getting your query done? If you're deleting from a table that's frequently updated by other users, you need to batch but with smaller batch sizes. Even if you do something like a table lock to enforce isolation, that doesn't do much good if your delete statement takes an hour.
When you define your conditions better, you can pick one of the other answers here. I like the link in Rob Sanders' post for batching things.
If you have lots of foreign key tables, start at the bottom of the chain and work up. The final delete will go faster and block less things if there are no child records to cascade delete (which I would NOT turn on if I had a large number fo child tables as it will kill performance).
Delete in batches.
If you have foreign key tables that are no longer being used (you'd be surprised how often production databses end up with old tables nobody will get rid of), get rid of them or at least break the FK/PK connection. No sense cheking a table for records if it isn't being used.
Don't delete - mark records as delted and then exclude marked records from all queries. This is best set up at the time of database design. A lot of people use this because it is also the best fastest way to get back records accidentlally deleted. But it is a lot of work to set up in an already existing system.
I'll add another one to this:
Make sure your transaction isolation level and database options are set appropriately. If your SQL server is set not to use row versioning, or you're using an isolation level on other queries where you will wait for the rows to be deleted, you could be setting yourself up for some very poor performance while the operation is happening.
On very large tables where you have a very specific set of criteria for deletes, you could also partition the table, switch out the partition, and then process the deletions.
The SQLCAT team has been using this technique on really really large volumes of data. I found some references to it here but I'll try and find something more definitive.
I think, the big trap with delete that kill the performance is that sql after each row deleted, it updates all the related indexes for any column in this row. what about delting all indexes before bulk delete?
There are deletes and then there are deletes. If you are aging out data as part of a trim job, you will hopefully be able to delete contiguous blocks of rows by clustered key. If you have to age out data from a high volume table that is not contiguous it is very very painful.
If it is true that UPDATES are faster than DELETES, you could add a status column called DELETED and filter on it in your selects. Then run a proc at night that does the actual deletes.
Do you have foreign keys with referential integrity activated?
Do you have triggers active?
Simplify any use of functions in your WHERE clause! Example:
DELETE FROM Claims
WHERE dbo.YearMonthGet(DataFileYearMonth) = dbo.YearMonthGet(#DataFileYearMonth)
This form of the WHERE clause required 8 minutes to delete 125,837 records.
The YearMonthGet function composed a date with the year and month from the input date and set day = 1. This was to ensure we deleted records based on year and month but not day of month.
I rewrote the WHERE clause to:
WHERE YEAR(DataFileYearMonth) = YEAR(#DataFileYearMonth)
AND MONTH(DataFileYearMonth) = MONTH(#DataFileYearMonth)
The result: The delete required about 38-44 seconds to delete those 125,837 records!

DELETE Statement hangs on SQL Server for no apparent reason

Edit: Solved, there was a trigger with a loop on the table (read my own answer further below).
We have a simple delete statement that looks like this:
DELETE FROM tablename WHERE pk = 12345
This just hangs, no timeout, no nothing.
We've looked at the execution plan, and it consists of many lookups on related tables to ensure no foreign keys would trip up the delete, but we've verified that none of those other tables have any rows referring to that particular row.
There is no other user connected to the database at this time.
We've run DBCC CHECKDB against it, and it reports 0 errors.
Looking at the results of sp_who and sp_lock while the query is hanging, I notice that my spid has plenty of PAG and KEY locks, as well as the occasional TAB lock.
The table has 1.777.621 rows, and yes, pk is the primary key, so it's a single row delete based on index. There is no table scan in the execution plan, though I notice that it contains something that says Table Spool (Eager Spool), but says Estimated number of rows 1. Can this actually be a table-scan in disguise? It only says it looks at the primary key column.
Tried DBCC DBREINDEX and UPDATE STATISTICS on the table. Both completed within reasonable time.
There is unfortunately a high number of indexes on this particular table. It is the core table in our system, with plenty of columns, and references, both outgoing and incoming. The exact number is 48 indexes + the primary key clustered index.
What else should we look at?
Note also that this table did not have this problem before, this problem occured suddently today. We also have many databases with the same table setup (copies of customer databases), and they behave as expected, it's just this one that is problematic.
One piece of information missing is the number of indices on the table you are deleting the data from. As SQL Server uses the Primary Key as a pointer in every index, any change to the primary index requires updating every index. Though, unless we are talking a high number, this shouldn't be an issue.
I am guessing, from your description, that this is a primary table in the database, referenced by many other tables in FK relationships. This would account for the large number of locks as it checks the rest of the tables for references. And, if you have cascading deletes turned on, this could lead to a delete in table a requiring checks several tables deep.
Try recreating the index on that table, and try regenerating the statistics.
DBCC REINDEX
UPDATE STATISTICS
Ok, this is embarrasing.
A collegue had added a trigger to that table a while ago, and the trigger had a bug. Although he had fixed the bug, the trigger had never been recreated for that table.
So the server was actually doing nothing, it just did it a huge number of times.
Oh well...
Thanks for the eyeballs to everyone who read this and pondered the problem.
I'm going to accept Josef's answer, as his was the closest, and indirectly thouched upon the issue with the cascading deletes.