Which database operation is heaviest? - sql

If I perform the CRUD operations on the same table, what is the heaviest operation in terms of performance?
People say DELETE and then INSERT is better than UPDATE in some cases, is this true? Then UPDATE is the heaviest operation?

Like all things in life, it depends.
SQL Server uses WAL (write ahead logging) to maintain ACID (Atomicity, Consistency, Isolation, Durability) properties.
A insert needs to log entries for data page and index page changes. If page splits occur, it takes longer. Then the data is written to the data file.
A delete marks the data and index pages for re-use. The data will still be there right after the operation.
A update is implemented as an delete and insert. There for double the log entries.
What can help inserts is pre-allocating the space in the data file before running the job. Auto growing the data files is expensive.
In summary, I would expect updates on average to be the most expensive operation.
I am by no way an expert on the storage engine.
Please check out http://www.sqlskills.com - Paul Randals blog and/or Kalen Daleny SQL Server Internals book, http://sqlserverinternals.com/. These authors go in depth on all the cases that might happen.

It depends mostly on foregin keys and indexes which you have on this table. For deletion and isertion every column that is a foreign key and part of an index has to be checked on foreign key references and every index containing that column has to be rebuilt.
If you do DELETE and then INSERT then checking and rebuilding happens twice. If it is a really large table then rebuilding indexes can take very long time and in this case update will be MUCH faster.
Of course if you have index on the key that you're searching with update statement and you are not updating the key.
For a small table with almost no indexes/foreign keys the operations run so fast that it's not a big issue.

Related

Is it possible to do usual atomic INSERT operation but update Indexes asynchronously?

indexes make read fast but write slower. But why can't you have single writes and have db add indexes asynchronously with time, also cache in the INSERT until it's indexed?
Is there any database like that?
Converting my comments to an answer:
indexes make read fast but write slower
That's an oversimplification and it's also misleading.
Indexes make data lookups faster because the DBMS doesn't need to do a table-scan to find rows matching a predicate (the WHERE part of a query). Indexes don't make "reads" any faster (that's entirely dependent on the characteristics of your disk IO) and when used improperly they can sometimes even make queries slower (for reasons I won't get into).
I want to stress that the additional cost of writing to a single index, or even multiple indexes, when executing a DML statement (INSERT/UPDATE/DELETE/MERGE/etc) is negligible, really! (In actuality: foreign-key constraints are a much bigger culprit - and I note you can practically eliminate the cost of foreign-key constraint checking by adding additional indexes!). Indexes are primarily implemented using B-trees (a B-tree is essentially like a binary-tree, except rather than each node having only 2 children it can have many children because each tree-node comes with unused space for all those child node pointers, so inserting into the middle of a B-tree won't require data to be moved-around on-disk unlike with other kinds of trees, like a heap-tree).
Consider this QA where a Postgres user (like yourself) reports inserting 10,000 rows into a table. Without an index it took 78ms, with an index it took 84ms, that's only a 7.5% increase, which at that scale (6ms!) is so small it may as well be a rounding error or caused by IO scheduling. That should be proof enough it shouldn't be something you should worry about without actual hard data showing it's a problem for you and your application.
I assume you have this negative impression about indexes after reading an article like this one, which certainly gives the impression that "indexes are bad" - but while the points mentioned in that article are not wrong, there's a LOT of problems with that article so you shouldn't take it dogmatically. (I'll list my concerns with that article in the footer).
But why can't you have single writes and have db add indexes asynchronously with time
By this I assume you mean you'd like a DMBS to do a single-row INSERT by simply appending a new record to the end of a table and then immediately returning and then at an arbitrary point later the DBMS' housekeeping system would update the indexes afterwards.
The problem with that is that it breaks the A, C, and I parts of the the A.C.I.D. model.
Indexes are used for more than just avoiding table-scans: they're also used to store copies of table data for the benefit of queries that would use the index and which also need (for example) a small subset of the table's data, this significantly reduces disk reads. For this reason, RDBMS (and ISO SQL) allow indexes to include non-indexed data using the INCLUDES clause.
Consider this scenario:
CREATE INDEX IX_Owners ON cars ( ownerId ) INCLUDE ( colour );
CREATE INDEX IX_Names ON people ( name ) INCLUDE ( personId, hairColour );
GO;
SELECT
people.name,
people.hairColour,
cars.colour
FROM
cars
INNER JOIN people ON people.personId = cars.ownerId
WHERE
people.name LIKE 'Steve%'
The above query will not need to read either the cars or people tables on-disk. The DBMS will be able to fully answer the query using data only in the index - which is great because indexes tend to exist in a small number of pages on-disk which tend to be in proximal-locality which is good for performance because it means it will use sequential IO which scales much better than random IO.
The RDBMS will perform a string-prefix index-scan of the people.IX_Names index to get all of the personId (and hairColour) values, then it will look-up those personId values in the cars.IX_Owners index and be able to get the car.colour from the copy of the data inside the IX_Owners index without needing to read the tables directly.
Now, assuming that another database client has just completed inserted a load of records into the cars and/or people table with a COMMIT TRANSACTION just for good measure, and the RDMBS uses your idea of only updating indexes later whenever it feels like it, then if that same database client re-runs the query from above it would return stale data (i.e. wrong data) because the query uses the index, but the index is old.
In addition to using index tree nodes to store copies of table data to avoid non-proximal disk IO, many RDBMS also use index-trees to store entire copies - even multiple copies of table data, to enable other scenarios, such as columnar data storage and indexed-VIEWs - both of these features absolutely require that indexes are updated atomically with table data.
Is there any database like that?
Yes, they exist - but they're not widely used (or they're niche) because for the vast majority of applications it's entirely undesirable behaviour for the reasons described above.
There are distributed databases that are designed around eventual consistency, but clients (and entire application code) needs to be designed with that in-mind, and it's a huge PITA to have to redesign a data-centric application to support eventual-consistency which is why you only really see them being used in truly massive systems (like Facebook, Google, etc) where availability (uptime) is more important than users seeing stale-data for a few minutes.
Footnote:
Regarding this article: https://use-the-index-luke.com/sql/dml/insert
The number of indexes on a table is the most dominant factor for insert performance. The more indexes a table has, the slower the execution becomes. The insert statement is the only operation that cannot directly benefit from indexing because it has no where clause.
I disagree. I'd argue that foreign-key constraints (and triggers) are far more likely to have a larger detrimental effect on DML operations.
Adding a new row to a table involves several steps. First, the database must find a place to store the row. For a regular heap table—which has no particular row order—the database can take any table block that has enough free space. This is a very simple and quick process, mostly executed in main memory. All the database has to do afterwards is to add the new entry to the respective data block.
I agree with this.
If there are indexes on the table, the database must make sure the new entry is also found via these indexes. For this reason it has to add the new entry to each and every index on that table. The number of indexes is therefore a multiplier for the cost of an insert statement.
This is true, but I don't know if I agree that it's a "multiplier" of the cost of an insert.
For example, consider a table with hundreds of nvarchar(1000) columns and several int columns - and there's separate indexes for each int column (with no INCLUDE columns). If you're inserting 100x megabyte-sized rows all-at-once (using an INSERT INTO ... SELECT FROM statement) the cost of updating those int indexes is very likely to require much less IO than the table data.
Moreover, adding an entry to an index is much more expensive than inserting one into a heap structure because the database has to keep the index order and tree balance. That means the new entry cannot be written to any block—it belongs to a specific leaf node. Although the database uses the index tree itself to find the correct leaf node, it still has to read a few index blocks for the tree traversal.
I strongly disagree with this, especially the first sentence: "adding an entry to an index is much more expensive than inserting one into a heap structure".
Indexes in RDBMS today are invariably based on B-trees, not binary-trees or heap-trees. B-trees are essentially like binary-trees except each node has built-in space for dozens of child node pointers and B-trees are only rebalanced when a node fills its internal child pointer list, so a B-tree node insert will be considerably cheaper than the article is saying because each node will have plenty of empty space for a new insertion without needing to re-balance itself or any other relatively expensive operation (besides, DBMS can and do index maintenance separately and independently of any DML statement).
The article is correct about how the DBMS will need to traverse the B-tree to find the node to insert into, but index nodes are efficently arranged on-disk, such as keeping related nodes in the same disk page which minimizes index IO reads (assuming they aren't already loaded into memory first). If an index tree is too big to store in-memory the RDBMS can always keep a "meta-indexes" in-memory so it could potentially instantly find the correct B-tree index without needing to traverse the B-tree from the root.
Once the correct leaf node has been identified, the database confirms that there is enough free space left in this node. If not, the database splits the leaf node and distributes the entries between the old and a new node. This process also affects the reference in the corresponding branch node as that must be duplicated as well. Needless to say, the branch node can run out of space as well so it might have to be split too. In the worst case, the database has to split all nodes up to the root node. This is the only case in which the tree gains an additional layer and grows in depth.
In practice this isn't a problem, because the RDBMS's index maintenance will ensure there's sufficient free space in each index node.
The index maintenance is, after all, the most expensive part of the insert operation. That is also visible in Figure 8.1, “Insert Performance by Number of Indexes”: the execution time is hardly visible if the table does not have any indexes. Nevertheless, adding a single index is enough to increase the execute time by a factor of a hundred. Each additional index slows the execution down further.
I feel the article is being dishonest by suggesting (implying? stating?) that index-maintenance happens with every DML. This is not true. This may have been the case with some early dBase-era databases, but this is certainly not the case with modern RDBMS like Postgres, MS SQL Server, Oracle and others.
Considering insert statements only, it would be best to avoid indexes entirely—this yields by far the best insert performance.
Again, this claim in the article is not wrong, but it's basically saying if you want a clean and tidy house you should get rid of all of your possessions. Indexes are a fact of life.
However tables without indexes are rather unrealistic in real world applications. You usually want to retrieve the stored data again so that you need indexes to improve query speed. Even write-only log tables often have a primary key and a respective index.
Indeed.
Nevertheless, the performance without indexes is so good that it can make sense to temporarily drop all indexes while loading large amounts of data—provided the indexes are not needed by any other SQL statements in the meantime. This can unleash a dramatic speed-up which is visible in the chart and is, in fact, a common practice in data warehouses.
Again, with modern RDBMS this isn't necessary. If you do a batch insert then a RDBMS won't update indexes until after the table-data has finished being modified, as a batch index update is cheaper than many individual updates. Similarly I expect that multiple DML statements and queries inside an explicit BEGIN TRANSACTION may cause an index-update deferral provided no subsequent query in the transaction relies on an updated index.
But my biggest issue with that article is that the author is making these bold claims about detrimental IO performance without providing any citations or even benchmarks they've run themselves. It's even more galling that they posted a bar-chart with arbitrary numbers on, again, without any citation or raw benchmark data and instructions for how to reproduce their results. Always demand citations and evidence from anything you read making claims: because the only claims anyone should accept without evidence are logical axioms - and a quantitative claim about database index IO cost is not a logical axiom :)
For PostgreSQL GIN indexes, there is the fastupdate feature. This stores new index entries into a unordered unconsolidated area waiting for some other process to file them away into the main index structure. But this doesn't directly match up with what you want. It is mostly designed so that the index updates are done in bulk (which can be more IO efficient), rather than in the background. Once the unconsolidated area gets large enough, then a foreground process might take on the task of filing them away, and it can be hard to tune the settings in a way to get this to always be done by a background process instead of a foreground process. And it only applies to GIN indexes. (With the use of the btree_gin extension, you can create GIN indexes on regular scalar columns rather than the array-like columns it usually works with.) While waiting for the entries to be consolidated, every query will have to sequential scan the unconsolidated buffer area, so delaying the updates for the sake of INSERT can come at a high cost for SELECTs.
There are more general techniques to do something like this, such as fractal tree indexes. But these are not implemented in PostgreSQL, and wherever they are implemented they seem to be proprietary.

Oracle BEST PRACTICE to update 50 million child rows in a table using value from parent table

I have a child table with 100 million rows and need to update 50 million rows of a column using the value from the parent table. I have read around that assuming if we have enough space, it would be the fastest to "create table as select", but I want to know if anyone disagrees or if other factors are required in order to make a better guess? Would it be better to go this route versus using pl/sql's BULK COLLECT FORALL UPDATE feature?
If you have a lot of data then CREATE TABLE AS SELECT is definitely faster because it does not require UNDO table space. However, to recreate all the indices on the new table can be quite a hassle due to name conflicts.
Good news is: 50 min rows is not really a lot of data. If you have a modern midrange server it should not cause problems so it is not worth the extra work. The best way to find out is to make a copy of the original table (including all indices) and try the update there. Then you get a rough idea how long it will take.
Parallel UPDATE is probably the best option for a large change to a child table. (If you have Enterprise Edition, sufficient resources, a sane configuration, etc.)
alter session enable parallel dml;
update /*+ parallel */ ...;
(You might want to play with different parallel numbers, like parallel(8). The default degree of parallelism is usually good enough. But some platforms like SPARC inflate their "CPU_COUNT", leading to ridiculous degrees of parallelism.)
Parallel UPDATE is likely not the optimal solution. Recreating the objects can be faster because it can almost completely avoid generating REDO and UNDO. But re-creating objects is usually buggy and getting that optimal performance is tricky.
Here are things to consider before you decide to simply drop and recreate a table:
Grants. Save and re-apply the object grants after the objects are recreated.
Dependent objects. The process needs to re-create all objects, and dependent objects, in the exact same way. This can be painfully difficult depending on how complex your schema is. DBMS_METADATA can be tricky, and in some cases still won't make the objects exactly the same way. If you decide to hard-code the DDL instead you have to remember to update the process whenever the objects change.
Invalid objects. Most objects will automatically recompile when necessary. But you probably don't want to wait for that because it always looks bad to have invalid objects. And even if they do compile correctly, some programs may still get those pesky ORA-04068: existing state of packages has been discarded errors. (Because most PL/SQL programmers are unaware of session state and make every package variable public by default.)
Statistics. Simply re-gathering them after the table is re-created is not always sufficient. Histograms depend on whether columns were used in a predicate. If the table is re-created all the columns are new and no histograms will be initially created.
Direct-path writes are elusive. A parent-child table implies a foreign key, which normally prevents direct-path writes. The process needs to disable or drop the foreign key. And also set the table and index to NOLOGGING, and then remember to set them back to LOGGING at the end. When you re-create the foreign key, if you want to do it in parallel you have to initially create it as NOVALIDATE, set the table to parallel, enable validate the constraint, and then set the table back to NOPARALLEL.
In a large data warehouse it's worth going through all those steps and building code for dealing with all the issues. If this is your only large table UPDATE I suggest you avoid that work and accept a slightly non-optimal solution.

Drop/Rebuild indexes during Bulk Insert

I have got tables which has got more than 70 million records in it; what I just found that developers were dropping indexes before bulk insert and then creating again after the bulk insert is over. Execution time for the stored procedure is nearly 30 mins (do drop index, do bulk insert, then recreate index from scratch
Advice: Is this a good practice to drop INDEXs from table which has more than 70+ millions records and increasing by 3-4 million everyday.
Would it be help to improve performance by not dropping index before bulk insert ?
What is the best practice to be followed while doing BULK insert in BIG TABLE.
Thanks and Regards
Like everything in SQL Server, "It Depends"
There is overhead in maintaining indexes during the insert and there is overhead in rebuilding the indexes after the insert. The only way to definitively determine which method incurs less overhead is to try them both and benchmark them.
If I were a betting man I would put my wager that leaving the indexes in place would edge out the full rebuild but I don't have the full picture to make an educated guess. Again, the only way to know for sure is to try both options.
One key optimization is to make sure your bulk insert is in clustered key order.
If I'm reading your question correctly, that table is pretty much off limits (locked) for the duration of the load and this is a problem.
If your primary goal is to increase availability/decrease blocking, try taking the A/B table approach.
The A/B approach breaks down as follows:
Given a table called "MyTable" you would actually have two physical tables (MyTable_A and MyTable_B) and one view (MyTable).
If MyTable_A contains the current "active" dataset, your view (MyTable) is selecting all columns from MyTable_A. Meanwhile you can have carte blanche on MyTable_B (which contains a copy of MyTable_A's data and the new data you're writing.) Once MyTable_B is loaded, indexed and ready to go, update your "MyTable" view to point to MyTable_B and truncate MyTable_A.
This approach assumes that you're willing to increase I/O and storage costs (dramatically, in your case) to maintain availability. It also assumes that your big table is also relatively static. If you do follow this approach, I would recommend a second view, something like MyTable_old which points to the non-live table (i.e. if MyTable_A is the current presentation table and is referenced by the MyTable view, MyTable_old will reference MyTable_B) You would update the MyTable_old view at the same time you update the MyTable view.
Depending on the nature of the data you're inserting (and your SQL Server version/edition), you may also be able to take advantage of partitioning (MSDN blog on this topic.)

Optimizing Delete on SQL Server

Deletes on sql server are sometimes slow and I've been often in need to optimize them in order to diminish the needed time.
I've been googleing a bit looking for tips on how to do that, and I've found diverse suggestions.
I'd like to know your favorite and most effective techinques to tame the delete beast, and how and why they work.
until now:
be sure foreign keys have indexes
be sure the where conditions are indexed
use of WITH ROWLOCK
destroy unused indexes, delete, rebuild the indexes
now, your turn.
The following article, Fast Ordered Delete Operations may be of interest to you.
Performing fast SQL Server delete operations
The solution focuses on utilising a view in order to simplify the execution plan produced for a batched delete operation. This is achieved by referencing the given table once, rather than twice which in turn reduces the amount of I/O required.
I have much more experience with Oracle, but very likely the same applies to SQL Server as well:
when deleting a large number of rows, issue a table lock, so the database doesn't have to do lots of row locks
if the table you delete from is referenced by other tables, make sure those other tables have indexes on the foreign key column(s) (otherwise the database will do a full table scan for each deleted row on the other table to ensure that deleting the row doesn't violate the foreign key constraint)
I wonder if it's time for garbage-collecting databases? You mark a row for deletion and the server deletes it later during a sweep. You wouldn't want this for every delete - because sometimes a row must go now - but it would be handy on occasion.
Summary of Answers through 2014-11-05
This answer is flagged as community wiki since this is an ever-evolving topic with a lot of nuances, but very few possible answers overall.
The first issue is you must ask yourself what scenario you're optimizing for? This is generally either performance with a single user on the db, or scale with many users on the db. Sometimes the answers are the exact opposite.
For single user optimization
Hint a TABLELOCK
Remove indexes not used in the delete then rebuild them afterward
Batch using something like SET ROWCOUNT 20000 (or whatever, depending on log space) and loop (perhaps with a WAITFOR DELAY) until you get rid of it all (##ROWCOUNT = 0)
If deleting a large % of table, just make a new one and delete the old table
Partition the rows to delete, then drop the parition. [Read more...]
For multi user optimization
Hint row locks
Use the clustered index
Design clustered index to minimize page re-organization if large blocks are deleted
Update "is_deleted" column, then do actual deletion later during a maintenance window
For general optimization
Be sure FKs have indexes on their source tables
Be sure WHERE clause has indexes
Identify the rows to delete in the WHERE clause with a view or derived table instead of referencing the table directly. [Read more...]
To be honest, deleting a million rows from a table scales just as badly as inserting or updating a million rows. It's the size of the rowset that's the problem, and there's not much you can do about that.
My suggestions:
Make sure that the table has a primary key and clustered index (this is vital for all operations).
Make sure that the clustered index is such that minimal page re-organisation would occur if a large block of rows were to be deleted.
Make sure that your selection criteria are SARGable.
Make sure that all your foreign key constraints are currently trusted.
(if the indexes are "unused", why are they there at all?)
One option I've used in the past is to do the work in batches. The crude way would be to use SET ROWCOUNT 20000 (or whatever) and loop (perhaps with a WAITFOR DELAY) until you get rid of it all (##ROWCOUNT = 0).
This might help reduce the impact upon other systems.
The problem is you haven't defined your conditions enough. I.e. what exactly are you optimizing?
For example, is the system down for nightly maintenance and no users are on the system? And are you deleting a large % of the database?
If offline and deleting a large %, may make sense to just build a new table with data to keep, drop the old table, and rename. If deleting a small %, you likely want to batch things in as large batches as your log space allows. It entirely depends on your database, but dropping indexes for the duration of the rebuild may hurt or help -- if even possible due to being "offline".
If you're online, what's the likelihood your deletes are conflicting with user activity (and is user activity predominantly read, update, or what)? Or, are you trying to optimize for user experience or speed of getting your query done? If you're deleting from a table that's frequently updated by other users, you need to batch but with smaller batch sizes. Even if you do something like a table lock to enforce isolation, that doesn't do much good if your delete statement takes an hour.
When you define your conditions better, you can pick one of the other answers here. I like the link in Rob Sanders' post for batching things.
If you have lots of foreign key tables, start at the bottom of the chain and work up. The final delete will go faster and block less things if there are no child records to cascade delete (which I would NOT turn on if I had a large number fo child tables as it will kill performance).
Delete in batches.
If you have foreign key tables that are no longer being used (you'd be surprised how often production databses end up with old tables nobody will get rid of), get rid of them or at least break the FK/PK connection. No sense cheking a table for records if it isn't being used.
Don't delete - mark records as delted and then exclude marked records from all queries. This is best set up at the time of database design. A lot of people use this because it is also the best fastest way to get back records accidentlally deleted. But it is a lot of work to set up in an already existing system.
I'll add another one to this:
Make sure your transaction isolation level and database options are set appropriately. If your SQL server is set not to use row versioning, or you're using an isolation level on other queries where you will wait for the rows to be deleted, you could be setting yourself up for some very poor performance while the operation is happening.
On very large tables where you have a very specific set of criteria for deletes, you could also partition the table, switch out the partition, and then process the deletions.
The SQLCAT team has been using this technique on really really large volumes of data. I found some references to it here but I'll try and find something more definitive.
I think, the big trap with delete that kill the performance is that sql after each row deleted, it updates all the related indexes for any column in this row. what about delting all indexes before bulk delete?
There are deletes and then there are deletes. If you are aging out data as part of a trim job, you will hopefully be able to delete contiguous blocks of rows by clustered key. If you have to age out data from a high volume table that is not contiguous it is very very painful.
If it is true that UPDATES are faster than DELETES, you could add a status column called DELETED and filter on it in your selects. Then run a proc at night that does the actual deletes.
Do you have foreign keys with referential integrity activated?
Do you have triggers active?
Simplify any use of functions in your WHERE clause! Example:
DELETE FROM Claims
WHERE dbo.YearMonthGet(DataFileYearMonth) = dbo.YearMonthGet(#DataFileYearMonth)
This form of the WHERE clause required 8 minutes to delete 125,837 records.
The YearMonthGet function composed a date with the year and month from the input date and set day = 1. This was to ensure we deleted records based on year and month but not day of month.
I rewrote the WHERE clause to:
WHERE YEAR(DataFileYearMonth) = YEAR(#DataFileYearMonth)
AND MONTH(DataFileYearMonth) = MONTH(#DataFileYearMonth)
The result: The delete required about 38-44 seconds to delete those 125,837 records!

Slow deletes from table with CLOB fields in Oracle 10g

I am encountering an issue where Oracle is very slow when I attempt to delete rows from a table which contains two CLOB fields. The table has millions of rows, no constraints, and the deletes are based on the Primary Key. I have rebuilt indexes and recomputed statistics, to no avail.
What can I do to improve the performance of deletes from this table?
Trace it, with waits enabled
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_monitor.htm#i1003679
Find the trace file in the UDUMP directory. TKPROF it.
Look at the end and it will tell you what the database spent its time doing during that SQL. The following link is a good overview of how to analyze a performance issue.
http://www.method-r.com/downloads/doc_download/10-for-developers-making-friends-with-the-oracle-database-cary-millsap
With Oracle you have to consider the amount of redo you are generating when deleting a row. If the CLOB fields are very big, it may just take awhile for Oracle to delete them due to the amount of redo being written and there may not be much you can do.
A test you may perform is seeing if the delete takes a long time on a row, where both CLOB fields are set to null. If that's the case, then it may be the index updates taking a long time. If that is the case, you may need to investigate consolidating indexes if possible, if deletes occur very frequently.
If the table is a derived table, meaning, it can be rebuilt from other tables, you may look at the NOLOGGING option on the table. You can the rebuild the table from the source table, with minimal logging.
I hope this entry helps some, however some more details could help diagnose the issue.
Are there any child tables that reference this table from which are deleting? (You can do a select from user_constraints where r_constraint_name = primary key name on the table you are deleting from).
A delete can be slow if Oracle needs to look into another table to check there are no child records. Normal practice is to index all foreign keys on the child tables so this is not a problem.
Follow Gary's advice, perform the trace and post the TKPROF results here someone will be able to help further.
Your UNDO tablespace seems to be the bottleneck in this case.
Check how long it takes to make a ROLLBACK after you delete the data. If it takes time comparable to the time of the query itself (within 50%), then this certainly is the case.
When you perform a DML query, your data (both original and changed) are written into redo logs and then applied to the datafiles and to the UNDO tablespace.
Deleting millions of CLOB rows takes copying several hundreds of megabytes, if not gigabytes, to the UNDO tablespace, which takes tens of seconds itself.
What can you do about this?
Create a faster UNDO: put it onto a separate disk, make it less sparse (create a larger datafile).
Use ROLLBACK SEGMENTS instead of managed UNDO, assign a ROLLBACK SEGMENT for this very query and issue SET TRANSACTION USE ROLLBACK SEGMENT before running the query.
If it's not the case, i. e. ROLLBACK executes much faster that the query itself, then try to play with you REDO parameters:
Increase your REDO buffer size using LOG_BUFFER parameter.
Increate the size of your logfiles.
Create your logfiles on separate disks so that reading from a first datafile does not hinder writing to a second an so on.
Note that UNDO operations also generate REDO, so it's useful to do all this anyway.
NOLOGGING adviced before is useless, as it is applied only to certain set of operations listed here, DELETE not being one of those operations.
Deleted CLOBs do not end up in the UNDOTBS since they are versioned and retented in the LOB Segment. I think it will generate some LOBINDEX changes in the undo.
If you null or empty the LOBs before, did you actually measured that time with commit separate of the DELETE? If you issue thousands of deletes, do you use batch commits? Is the instance idle? Then AWR report should tell you what is going on.