Deleting records with BLOB is very slow

Deleting records with BLOB is very slow - sql

We have a Oracle 11g database with several different tables. One of the tables contains around 120,000 records and each record has a BLOB field. The data size in the BLOBs vary from 2-3KB to several MB. The problem is that when we want to delete a record from this table, it takes a long time - sometimes over a minute to delete just one record (using SQL developer or using a DBMS-job). We have no problem with deleting records of the other tables. For test purposes, we set the BLOB field of one record which we wanted to delete to EMPTY_BLOB(), but again deleting this record took a long time.
Is this behaviour normal for records containing BLOB? Is there any way to tune deleting such records so that they can be deleted a bit faster?

Thanks for the hints, specially the hint related to index. The problem was originated from lack of indexes on foreign keys. Actually, we had a "ON DELETE SET NULL" statement in foreign keys which were not indexed. When we indexed the foreign keys, the delete became very fast.

Related

Why CASCADE constraint is preventing any operation on tables having bulk records?

As I had raised a question here at Please recommend the best bulk-delete option, CASCADE constraint is the one that prevents me to delete the records in all the tables when they were loaded with bulk records.
Is there any reason for why CASCADE takes time when DELETE FROM table1; Or TRUNCATE table1 CASCADE is attempted?
FYI, I'm using PostgreSQL 8.1.4. Though outdated, when I remove CASCADE constraint in my tables (listed in the top link), both DELETE and TRUNCATE queries work fine.
However, CASCADE is what I needed! I can't just remove the constraint. Please help me on this.

A common mistake is a missing index on the column of the foreign key. When deleting one row from the referenced table, all refering rows have to be found. Witout an index each row will lead to a SLOW sequential scan. With an index - easy and fast.
Perhaps this is your problem.

Using cascade delete is a very poor idea! You have now discovered why. It siomply takes too long if large numbers of records are deleted. You should correctly delete by starting with the child records first. If you are deleting a large number of records, you may need to write a script to delete in batches to avoid locking and taking too long for one command.
Let me explain why it gets slower. Suppose you want to delete 1000 records from the parent table, called TableA. There are three child tables involved. TableB averages 10 records per parent record. TableC averages 5 records per parent record. TableD averages 100 records per parant record. So your delete of 1000 records in Table A actually involves deleting 115000 records. Now suppose you were deleting 10,000 records from tableA, now your cascade delete will delete 1,150,000 records. Now in most databases, a parent table could have considerably more than three related tables (we have one with over 100 FKS). If we were to allow a cascade delete on our databases and someone tried to deleted 1000 records, they would end up deleting hundreds of millions of records.

ON CASCADE DELETE on small operations, but it performs poorly on large ones. To understand why we have to look at what is going on behind the scenes: On PostgreSQL we use triggers.
So if we delete from the parent table, for every row we delete, it goes and deletes on the child table as well. This happens for each row deleted. Now, note, that sequential scans are relatively cheap in PostgreSQL so you may be forcing a large number of index scans when a single sequential scan would be a lot faster.
So suppose on table 1 we are deleting 1000 records, and this means on table 2 we are deleting 10000 records. If we do this right, we go and we delete from table 2, performing a single scan to do it. Might take a few seconds on good hardware. Then we go and delete from the parent record and this is fast. Good, right?
Now suppose we rely on triggers to do the deletion.....
Scan through table 1, for each of 1000 rows we delete, scan through table 2's index, delete 10 rows, go to next. We entirely lose any help we could get from the OS's prefetch routines, and we substitute a lot of redundant, random page reads for a much smaller number of sequential reads. Now we are spending a lot of time waiting for disk platters to turn and heads to move. Ouch......
ON DELETE CASCADE triggers have their place. They work fine if we are just deleting from a few records. But they fall apart very fast on bulk deletions. Wrap all your deletions in a transaction, and delete from child tables first, and it will be far faster.

Delete large portion of huge tables

I have a very large table (more than 300 millions records) that will need to be cleaned up. Roughly 80% of it will need to be deleted. The database software is MS SQL 2005. There are several indexes and statistics on the table but not external relationships.
The best solution I came up with, so far, is to put the database into "simple" recovery mode, copy all the records I want to keep to a temporary table, truncate the original table, set identity insert to on and copy back the data from the temp table.
It works but it's still taking several hours to complete. Is there a faster way to do this ?

As per the comments my suggestion would be to simply dispense with the copy back step and promote the table containing records to be kept to become the new main table by renaming it.
It should be quite straightforward to script out the index/statistics creation to be applied to the new table before it gets swapped in.
The clustered index should be created before the non clustered indexes.
A couple of points I'm not sure about though.
Whether it would be quicker to insert into a heap then create the clustered index afterwards. (I guess no if the insert can be done in clustered index order)
Whether the original table should be truncated before being dropped (I guess yes)

#uriDium -- Chunking using batches of 50,000 will escalate to a table lock, unless you have disabled lock escalation via alter table (sql2k8) or other various locking tricks.

I am not sure what the structure of your data is. When does a row become eligible for deletion? If it is a purely ID based on date based thing then you can create a new table for each day, insert your new data into the new tables and when it comes to cleaning simply drop the required tables. Then for any selects construct a view over all the tables. Just an idea.
EDIT: (In response to comments)
If you are maintaining a view over all the tables then no it won't be complicated at all. The complex part is coding the dropping and recreating of the view.
I am assuming that you don't want you data to be locked down too much during deletes. Why not chunk the delete operations. Created a SP that will delete the data in chunks, 50 000 rows at a time. This should make sure that SQL Server keeps a row lock instead of a table lock. Use the
WAITFOR DELAY 'x'
In your while loop so that you can give other queries a bit of breathing room. Your problem is the old age computer science, space vs time.

SQL DELETE performance

delete from a A where a.ID = 132.
The table A contains around 5000 records and A.ID is the primary key in the table A. But it is taking a long time to delete . Sometimes its getting timed out also . That table contains three indexes and it is referenced by three foreign keys . Can anyone explain me why its taking long time even though we are deleting based on the primary key . And please tell me some way to optimize this problem ...?

Possible causes:
1) cascading delete operations
2) trigger(s)
3) the type of your primary key column is something other than an integer, thereby forcing a type conversion on each pk value to do the comparison. this requires a full table scan.
4) does your query really end in a dot like you posted it in the question? if so, the number may considered to be a floating point number instead of an integer, thereby causing a type conversion similar to 3)
5) your delete query is waiting for some other slow query to release a lock

Obviously it should not be taking a long time. However, there isn't enough information here to figure out exactly why. I can tell you, though, that you should focus on the Foreign Keys.
These can slow things down if they impose constraints from other, much larger, tables. You may also find out that your timeouts are due to integrity checks that prevent the delete (then the question is why you aren't getting exceptions instead of a timeout).
My next step would be to remove the foreign keys and then check performance. Then add each one back in at a time and check performance as you go.
Are other operations (e.g. Inserts, Selects, Updates) taking a long time?

First thought: Indexes on foreign keys?
This is related to cascading deletes mentioned
All child tables muts be checked and if you have a total of 500,000 child rows, this might take some time of course...
Second thought: Triggers firing?
On this table or on child tables or trying to cascade via code etc
God forbid, cursor for each row in DELETED...

Try to update the statistics. 5000 rows is not a big deal. If you're doing this regularly you should schedule maintenance on that table as well (i.e. re-build indexes, update stats etc.)

As others have observed, the probable suspects are the foreign keys.
Firstly because the ON DELETE CASCADE can gather momentum if the dependent tables in turn are referenced by other tables, which in turn may be referenced, and so on.
Secondly, because other users may have locks on the rows which need to be deleted. This is the most likely cause of the timeouts. Quite how this works will depend on the flavour and version of your database. For instance, older versions of Oracle (<=8.0) needed to lock the entire dependent table unless the foreign key columns were indexed.

Optimizing Delete on SQL Server

Deletes on sql server are sometimes slow and I've been often in need to optimize them in order to diminish the needed time.
I've been googleing a bit looking for tips on how to do that, and I've found diverse suggestions.
I'd like to know your favorite and most effective techinques to tame the delete beast, and how and why they work.
until now:
be sure foreign keys have indexes
be sure the where conditions are indexed
use of WITH ROWLOCK
destroy unused indexes, delete, rebuild the indexes
now, your turn.

The following article, Fast Ordered Delete Operations may be of interest to you.
Performing fast SQL Server delete operations
The solution focuses on utilising a view in order to simplify the execution plan produced for a batched delete operation. This is achieved by referencing the given table once, rather than twice which in turn reduces the amount of I/O required.

I have much more experience with Oracle, but very likely the same applies to SQL Server as well:
when deleting a large number of rows, issue a table lock, so the database doesn't have to do lots of row locks
if the table you delete from is referenced by other tables, make sure those other tables have indexes on the foreign key column(s) (otherwise the database will do a full table scan for each deleted row on the other table to ensure that deleting the row doesn't violate the foreign key constraint)

I wonder if it's time for garbage-collecting databases? You mark a row for deletion and the server deletes it later during a sweep. You wouldn't want this for every delete - because sometimes a row must go now - but it would be handy on occasion.

Summary of Answers through 2014-11-05
This answer is flagged as community wiki since this is an ever-evolving topic with a lot of nuances, but very few possible answers overall.
The first issue is you must ask yourself what scenario you're optimizing for? This is generally either performance with a single user on the db, or scale with many users on the db. Sometimes the answers are the exact opposite.
For single user optimization
Hint a TABLELOCK
Remove indexes not used in the delete then rebuild them afterward
Batch using something like SET ROWCOUNT 20000 (or whatever, depending on log space) and loop (perhaps with a WAITFOR DELAY) until you get rid of it all (##ROWCOUNT = 0)
If deleting a large % of table, just make a new one and delete the old table
Partition the rows to delete, then drop the parition. [Read more...]
For multi user optimization
Hint row locks
Use the clustered index
Design clustered index to minimize page re-organization if large blocks are deleted
Update "is_deleted" column, then do actual deletion later during a maintenance window
For general optimization
Be sure FKs have indexes on their source tables
Be sure WHERE clause has indexes
Identify the rows to delete in the WHERE clause with a view or derived table instead of referencing the table directly. [Read more...]

To be honest, deleting a million rows from a table scales just as badly as inserting or updating a million rows. It's the size of the rowset that's the problem, and there's not much you can do about that.
My suggestions:
Make sure that the table has a primary key and clustered index (this is vital for all operations).
Make sure that the clustered index is such that minimal page re-organisation would occur if a large block of rows were to be deleted.
Make sure that your selection criteria are SARGable.
Make sure that all your foreign key constraints are currently trusted.

(if the indexes are "unused", why are they there at all?)
One option I've used in the past is to do the work in batches. The crude way would be to use SET ROWCOUNT 20000 (or whatever) and loop (perhaps with a WAITFOR DELAY) until you get rid of it all (##ROWCOUNT = 0).
This might help reduce the impact upon other systems.

The problem is you haven't defined your conditions enough. I.e. what exactly are you optimizing?
For example, is the system down for nightly maintenance and no users are on the system? And are you deleting a large % of the database?
If offline and deleting a large %, may make sense to just build a new table with data to keep, drop the old table, and rename. If deleting a small %, you likely want to batch things in as large batches as your log space allows. It entirely depends on your database, but dropping indexes for the duration of the rebuild may hurt or help -- if even possible due to being "offline".
If you're online, what's the likelihood your deletes are conflicting with user activity (and is user activity predominantly read, update, or what)? Or, are you trying to optimize for user experience or speed of getting your query done? If you're deleting from a table that's frequently updated by other users, you need to batch but with smaller batch sizes. Even if you do something like a table lock to enforce isolation, that doesn't do much good if your delete statement takes an hour.
When you define your conditions better, you can pick one of the other answers here. I like the link in Rob Sanders' post for batching things.

If you have lots of foreign key tables, start at the bottom of the chain and work up. The final delete will go faster and block less things if there are no child records to cascade delete (which I would NOT turn on if I had a large number fo child tables as it will kill performance).
Delete in batches.
If you have foreign key tables that are no longer being used (you'd be surprised how often production databses end up with old tables nobody will get rid of), get rid of them or at least break the FK/PK connection. No sense cheking a table for records if it isn't being used.
Don't delete - mark records as delted and then exclude marked records from all queries. This is best set up at the time of database design. A lot of people use this because it is also the best fastest way to get back records accidentlally deleted. But it is a lot of work to set up in an already existing system.

I'll add another one to this:
Make sure your transaction isolation level and database options are set appropriately. If your SQL server is set not to use row versioning, or you're using an isolation level on other queries where you will wait for the rows to be deleted, you could be setting yourself up for some very poor performance while the operation is happening.

On very large tables where you have a very specific set of criteria for deletes, you could also partition the table, switch out the partition, and then process the deletions.
The SQLCAT team has been using this technique on really really large volumes of data. I found some references to it here but I'll try and find something more definitive.

I think, the big trap with delete that kill the performance is that sql after each row deleted, it updates all the related indexes for any column in this row. what about delting all indexes before bulk delete?

There are deletes and then there are deletes. If you are aging out data as part of a trim job, you will hopefully be able to delete contiguous blocks of rows by clustered key. If you have to age out data from a high volume table that is not contiguous it is very very painful.

If it is true that UPDATES are faster than DELETES, you could add a status column called DELETED and filter on it in your selects. Then run a proc at night that does the actual deletes.

Do you have foreign keys with referential integrity activated?
Do you have triggers active?

Simplify any use of functions in your WHERE clause! Example:
DELETE FROM Claims
WHERE dbo.YearMonthGet(DataFileYearMonth) = dbo.YearMonthGet(#DataFileYearMonth)
This form of the WHERE clause required 8 minutes to delete 125,837 records.
The YearMonthGet function composed a date with the year and month from the input date and set day = 1. This was to ensure we deleted records based on year and month but not day of month.
I rewrote the WHERE clause to:
WHERE YEAR(DataFileYearMonth) = YEAR(#DataFileYearMonth)
AND MONTH(DataFileYearMonth) = MONTH(#DataFileYearMonth)
The result: The delete required about 38-44 seconds to delete those 125,837 records!

DELETE Statement hangs on SQL Server for no apparent reason

Edit: Solved, there was a trigger with a loop on the table (read my own answer further below).
We have a simple delete statement that looks like this:
DELETE FROM tablename WHERE pk = 12345
This just hangs, no timeout, no nothing.
We've looked at the execution plan, and it consists of many lookups on related tables to ensure no foreign keys would trip up the delete, but we've verified that none of those other tables have any rows referring to that particular row.
There is no other user connected to the database at this time.
We've run DBCC CHECKDB against it, and it reports 0 errors.
Looking at the results of sp_who and sp_lock while the query is hanging, I notice that my spid has plenty of PAG and KEY locks, as well as the occasional TAB lock.
The table has 1.777.621 rows, and yes, pk is the primary key, so it's a single row delete based on index. There is no table scan in the execution plan, though I notice that it contains something that says Table Spool (Eager Spool), but says Estimated number of rows 1. Can this actually be a table-scan in disguise? It only says it looks at the primary key column.
Tried DBCC DBREINDEX and UPDATE STATISTICS on the table. Both completed within reasonable time.
There is unfortunately a high number of indexes on this particular table. It is the core table in our system, with plenty of columns, and references, both outgoing and incoming. The exact number is 48 indexes + the primary key clustered index.
What else should we look at?
Note also that this table did not have this problem before, this problem occured suddently today. We also have many databases with the same table setup (copies of customer databases), and they behave as expected, it's just this one that is problematic.

One piece of information missing is the number of indices on the table you are deleting the data from. As SQL Server uses the Primary Key as a pointer in every index, any change to the primary index requires updating every index. Though, unless we are talking a high number, this shouldn't be an issue.
I am guessing, from your description, that this is a primary table in the database, referenced by many other tables in FK relationships. This would account for the large number of locks as it checks the rest of the tables for references. And, if you have cascading deletes turned on, this could lead to a delete in table a requiring checks several tables deep.

Try recreating the index on that table, and try regenerating the statistics.
DBCC REINDEX
UPDATE STATISTICS

Ok, this is embarrasing.
A collegue had added a trigger to that table a while ago, and the trigger had a bug. Although he had fixed the bug, the trigger had never been recreated for that table.
So the server was actually doing nothing, it just did it a huge number of times.
Oh well...
Thanks for the eyeballs to everyone who read this and pondered the problem.
I'm going to accept Josef's answer, as his was the closest, and indirectly thouched upon the issue with the cascading deletes.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas