I am encountering an issue where Oracle is very slow when I attempt to delete rows from a table which contains two CLOB fields. The table has millions of rows, no constraints, and the deletes are based on the Primary Key. I have rebuilt indexes and recomputed statistics, to no avail.
What can I do to improve the performance of deletes from this table?
Trace it, with waits enabled
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_monitor.htm#i1003679
Find the trace file in the UDUMP directory. TKPROF it.
Look at the end and it will tell you what the database spent its time doing during that SQL. The following link is a good overview of how to analyze a performance issue.
http://www.method-r.com/downloads/doc_download/10-for-developers-making-friends-with-the-oracle-database-cary-millsap
With Oracle you have to consider the amount of redo you are generating when deleting a row. If the CLOB fields are very big, it may just take awhile for Oracle to delete them due to the amount of redo being written and there may not be much you can do.
A test you may perform is seeing if the delete takes a long time on a row, where both CLOB fields are set to null. If that's the case, then it may be the index updates taking a long time. If that is the case, you may need to investigate consolidating indexes if possible, if deletes occur very frequently.
If the table is a derived table, meaning, it can be rebuilt from other tables, you may look at the NOLOGGING option on the table. You can the rebuild the table from the source table, with minimal logging.
I hope this entry helps some, however some more details could help diagnose the issue.
Are there any child tables that reference this table from which are deleting? (You can do a select from user_constraints where r_constraint_name = primary key name on the table you are deleting from).
A delete can be slow if Oracle needs to look into another table to check there are no child records. Normal practice is to index all foreign keys on the child tables so this is not a problem.
Follow Gary's advice, perform the trace and post the TKPROF results here someone will be able to help further.
Your UNDO tablespace seems to be the bottleneck in this case.
Check how long it takes to make a ROLLBACK after you delete the data. If it takes time comparable to the time of the query itself (within 50%), then this certainly is the case.
When you perform a DML query, your data (both original and changed) are written into redo logs and then applied to the datafiles and to the UNDO tablespace.
Deleting millions of CLOB rows takes copying several hundreds of megabytes, if not gigabytes, to the UNDO tablespace, which takes tens of seconds itself.
What can you do about this?
Create a faster UNDO: put it onto a separate disk, make it less sparse (create a larger datafile).
Use ROLLBACK SEGMENTS instead of managed UNDO, assign a ROLLBACK SEGMENT for this very query and issue SET TRANSACTION USE ROLLBACK SEGMENT before running the query.
If it's not the case, i. e. ROLLBACK executes much faster that the query itself, then try to play with you REDO parameters:
Increase your REDO buffer size using LOG_BUFFER parameter.
Increate the size of your logfiles.
Create your logfiles on separate disks so that reading from a first datafile does not hinder writing to a second an so on.
Note that UNDO operations also generate REDO, so it's useful to do all this anyway.
NOLOGGING adviced before is useless, as it is applied only to certain set of operations listed here, DELETE not being one of those operations.
Deleted CLOBs do not end up in the UNDOTBS since they are versioned and retented in the LOB Segment. I think it will generate some LOBINDEX changes in the undo.
If you null or empty the LOBs before, did you actually measured that time with commit separate of the DELETE? If you issue thousands of deletes, do you use batch commits? Is the instance idle? Then AWR report should tell you what is going on.
Related
I have a table where i have millions of records. The total size of that table only is somewhere 6-7 GigaByte. This table is my application log table. This table is growing really fast, which makes sense. Now I want to move records from log table into backup table. Here is the scenario and here is my question.
Table Log_A
Insert into Log_b select * from Log_A;
Delete from Log_A;
I am using postgres database. the question is
When this query is performed Does all the records from Log_A gets load in physical memory ? NOTE: My both of the above query runs inside a stored procedure.
If No, then how will it works ?
I hope this question applies for all database.
I hope if somebody could provide me some idea on this.
In PostgreSQL, that's likely to execute a sequential scan, loading some records into shared_buffers, inserting them, writing the dirty buffers out, and carrying on.
All the records will pass through main memory, but they don't all have to be in memory at once. Because they all get read from disk using normal buffered reads (pread) it will affect the operating system disk cache, potentially pushing other data out of the cache.
Other databases may vary. Some could execute the whole SELECT before processing the INSERT (though I'd be surprised if any serious ones did). Some do use O_DIRECT reads or raw disk I/O to avoid the OS cache affects, so the buffer cache effects might be different. I'd be amazed if any database relied on loading the whole SELECT into memory, though.
When you want to see what PostgreSQL is doing and how, the EXPLAIN and EXPLAIN (BUFFERS, ANALYZE) commands are quite useful. See the manual.
You may find writable common table expressions interesting for this purpose; it lets you do all this in one statement. In this simple case there's probably little benefit, but it can be a big win in more complex data migrations.
BTW, make sure to run that pair of queries wrapped in BEGIN and COMMIT.
Probably not.
Each record is individually processed; this particular query doesn't need to have knowledge of any of the other records to successfully execute. So the only record that needs to be in memory at any given moment is the one currently being processed.
But it really depends on whether or not the database thinks it can do it faster by loading up the whole table. Check the execution plan of the query.
If your setup allows it, just rename the old table and create a new empty one. Much faster, obviously, as no copying is done at all.
ALTER TABLE log_a RENAME TO log_b;
CREATE TABLE log_a (LIKE log_b INCLUDING ALL);
The LIKE clause copies the structure of the (now renamed) old table. INCLUDING ALL includes defaults, constraints, indexes, ...
Foreign key constraints or views depending on the table or other less common dependencies (but not queries in plpgsql functions) might be a hurdle for this route. You would have to recreate those to have them point to the new table. But a logging table like you describe probably carries no such dependencies.
This acquires an exclusive lock on the table. I assume, typical write access will be INSERT only in your case? One way to deal with concurrent access would then be to create the new table in a different schema and alter the search_path for your application user. Then the applications starts to write to the new table without concurrency issues. Of course, you wouldn't schema-qualify the table name in your INSERT statements for this to take effect.
CREATE SCHEMA log20121018;
CREATE TABLE log20121018.log_a (LIKE log20121011.log_a INCLUDING ALL);
ALTER ROLE myrole SET search_path = app, log20121018, public;
Or alter the search_path setting at whatever level is effective for you:
globally, per database, per role, per session, per function ...
I use SQL Server. I got a large table - millions of rows. And I iterate through them (SELECT .. WHERE ..). This is a long operation (and I assume can't be shorter).
So what am I asking is if there will be any problems to insert data into that table in the progress of selecting? If yes, what should I do to reduce that? Same questing for update command (with indexed parameters of course).
Yes, you will have performance, and more specifically, locking and blocking issues. If your SELECT statements are using indexes, which they should be, these indexes will be locked every time that you INSERT data into the table. Since the table is relatively large, the lock will probably be long enough to block your SELECT statements, and deadlocks are likely as well.
This might be a scenario where you need to re-evaluate your table structure, and possibly even consider denormalizing to avoid this.
You might also consider Enabling Row Versioning-Based Isolation Levels, assuming that you can throughly test the rest of your system to understand the impact.
The answer is yes, absolutely. A simple solution (if it's an acceptable trade off within your application) is to specify the NOLOCK locking hint. IE:
select * from table with NOLOCK
The tradeoff is that you won't get a consistent read, but in many cases this isn't problem.
It's generally not a good idea to have long running queries on a database with frequent updates. This decrease performance significantly because of locking.
It might be a good idea to look into data warehouses and see if that is something that you could use. That would enable you to have the transactions on a separate database and the bulk load from it in to another database that would have your warehouse.
This would greatly improve performance for both inserts and queries. The trans-actional database could have no indexes, and the warehouse could have all the indexes you want.
You could also put the warehouse in a column store database. That would give you the best query time with the minimal effort because there isn't any need to create indexes in a column store, all you would have to do is to design the schema properly. The drawback with column stores is how ever that inserts, updates and deletes are very slow compared to relational databases. But bulk loading from the transactional database should do the trick. If you require the data to be very up to date, you could bulk load every few minutes. If you just need data from the previous day you could bulk load into the warehouse each night.
The possibilities are endless. If you want to look into column store warehouses you could try MonetDB. Its an open source column store so you could try it out and see if that's anything that suits you.
Do not assume execution time can't be shorter. If you query a date range, an index on date is a must!
Solve your problem indexing on date field:
-- please use correct names for your_table and date_field --
CREATE INDEX index_name ON your_table date_field
Warehousing, as per #Gisli, is a good option: build a copy of the data elsewhere, and run your long-running queries there, freeing up the "main" database for OLTP processing.
If this is not an option, you can mess around with snapshot isolation (something I know about, but have never worked with personally). Esssentially, this will take a "snapshot" of the database at the point in time you start the query, and will execute the query as if no subsequent changes were made to the database, even if changes are made to the database while the query is running. More importantly, any such changes are "real" and permanent. Think of it like a short-term branching of your database.
The duration of the branch (snapshot) is where I get weak. I believe you can have the snapshot last for the duration of the query, which means you'd (possibly) never be able to get the same results for a given query twice (if the data changes while you are running it); or you can create a "saved" snapshot that can be re-used over and over until you get around to deleting it. Be wary with this, you don't want your system to get cluttered up with old forgotten branches of past data!
There is no PROBLEM. SQL Serve is built to deal with this kind of situations, you just need to set the correct isolation level on the transactions.
There are several possible scenarios, for example, if you don't mind reading the data that is being inserted, set your isolation level to read uncommited on your read transaction. If you are inserting values in a range and reading values on another range, you can use SERIALIZABLE.
Take a look at the possible isolation levels:
http://msdn.microsoft.com/en-us/library/ms173763.aspx
I have a SQLite table with 6 million rows.
Doing a DELETE FROM TABLE is quite slow;
Dropping the table and then re-creating it seems quicker.
I'm using this for a database import.
Would dropping the table be a better approach or is there a way to delete all data quickly?
One big difference is that DELETE FROM TABLE is DML and DROP TABLE is DDL. This is very important when it comes to db transactions. The result at the end may be the same, but these operations are very different.
If it's just performance you've to be aware of then it may be ok to drop and recreate the table. If you need transactions in your imports then you've to be aware that DDL is not covered and cannot be rollbacked for example.
Generally speaking the DROP TABLE would be a non-logged transaction. DELETE FROM would require a transient journal to log the records until the DELETE statement has been completed.
TRUNCATE TABLE is a lot faster.
The following except from an Oracle website explains why:
'Deletes' perform normal DML. That is, they take locks on rows, they generate redo (lots of it), and they require segments in the UNDO tablespace. Deletes clear records out of blocks carefully. If a mistake is made a rollback can be issued to restore the records prior to a commit. A delete does not relinquish segment space thus a table in which all records have been deleted retains all of its original blocks.
Truncates are DDL and, in a sense, cheat. A truncate moves the High Water Mark of the table back to zero. No row-level locks are taken, no redo or rollback is generated. All extents bar the initial are de-allocated from the table (if you have MINEXTENTS set to anything other than 1, then that number of extents is retained rather than just the initial). By re-positioning the high water mark, they prevent reading of any table data, so they have the same effect as a delete, but without all the overhead. Just one slight problem: a truncate is a DDL command, so you can't roll it back if you decide you made a mistake. (It's also true that you can't selectively truncate -no "WHERE" clause is permitted, unlike with deletes, of course).
By resetting the High Water Mark, the truncate prevents reading of any table's data, so they it has the same effect as a delete, but without the overhead. There is, however, one aspect of a Truncate that must be kept in mind. Because a Truncate is DDL it issues a COMMIT before it acts and another COMMIT afterward so no rollback of the transaction is possible.
Note that by default, TRUNCATE drops storage even if DROP STORAGE is not specified.
I don't think SQLite implements TRUNCATE but if it does it likely will be more performant than DELETE
I have a database of around 20GB. I need to delete 5 tables & drop a few columns in some other 3 tables.
Dropping 5 tables with free some 3 GB and dropping columns in other tables should free another 8GB.
How do I reclaim this space from MySQL.
I've read dumping the database and restoring it back as one of the solution but I'm not really sure how that works, I am not even sure if this only works for deleting the entire database or just parts of it?
Please suggest how to go about this. THanks.
From the comments, it sounds like you're using InnoDB without the file per table option.
Reclaiming space from the innodb tablespace is not generally possible in this mode. Your only course of action is to dump the whole database, turn on file-per-table mode, and reload it (with a completely clean mysql instance). This is going to take a long time with a large database; mk-parallel-dump and restore tools might be a bit quicker, but it will still take a while. Be sure to test this process on a non-production server first.
EDIT: Doesn't apply without file_per_table, Mark is right there.
What's going on is that once MySQL takes space, it won't give it back. This is so that if you delete 500 rows and then immediately insert 500, it doesn't have to give that space back to the file system and then request it back. It's an optimization to avoid filesystem overhead, and it works well when you delete little bits.
If you delete a large amount, it will take a long time to end up using all that space again, which can be annoying. This can be fixed two ways: dropping the table and reloading the contents, or optimizing the table (which I believe basically reloads the table internally).
All you have to do to get space back from a table is:
OPTIMIZE TABLE my_big_table;
Note that this can take a while, it's not a near instant operation. Basically, plan for a some downtime. If your tables are just a few gigs, it shouldn't be too long (probably a few minutes). This also rebuilds the indexes and does some other housekeeping.
You can see more about optimize on the MySQL site. Here is it's advice:
OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions. You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file.
Deletes on sql server are sometimes slow and I've been often in need to optimize them in order to diminish the needed time.
I've been googleing a bit looking for tips on how to do that, and I've found diverse suggestions.
I'd like to know your favorite and most effective techinques to tame the delete beast, and how and why they work.
until now:
be sure foreign keys have indexes
be sure the where conditions are indexed
use of WITH ROWLOCK
destroy unused indexes, delete, rebuild the indexes
now, your turn.
The following article, Fast Ordered Delete Operations may be of interest to you.
Performing fast SQL Server delete operations
The solution focuses on utilising a view in order to simplify the execution plan produced for a batched delete operation. This is achieved by referencing the given table once, rather than twice which in turn reduces the amount of I/O required.
I have much more experience with Oracle, but very likely the same applies to SQL Server as well:
when deleting a large number of rows, issue a table lock, so the database doesn't have to do lots of row locks
if the table you delete from is referenced by other tables, make sure those other tables have indexes on the foreign key column(s) (otherwise the database will do a full table scan for each deleted row on the other table to ensure that deleting the row doesn't violate the foreign key constraint)
I wonder if it's time for garbage-collecting databases? You mark a row for deletion and the server deletes it later during a sweep. You wouldn't want this for every delete - because sometimes a row must go now - but it would be handy on occasion.
Summary of Answers through 2014-11-05
This answer is flagged as community wiki since this is an ever-evolving topic with a lot of nuances, but very few possible answers overall.
The first issue is you must ask yourself what scenario you're optimizing for? This is generally either performance with a single user on the db, or scale with many users on the db. Sometimes the answers are the exact opposite.
For single user optimization
Hint a TABLELOCK
Remove indexes not used in the delete then rebuild them afterward
Batch using something like SET ROWCOUNT 20000 (or whatever, depending on log space) and loop (perhaps with a WAITFOR DELAY) until you get rid of it all (##ROWCOUNT = 0)
If deleting a large % of table, just make a new one and delete the old table
Partition the rows to delete, then drop the parition. [Read more...]
For multi user optimization
Hint row locks
Use the clustered index
Design clustered index to minimize page re-organization if large blocks are deleted
Update "is_deleted" column, then do actual deletion later during a maintenance window
For general optimization
Be sure FKs have indexes on their source tables
Be sure WHERE clause has indexes
Identify the rows to delete in the WHERE clause with a view or derived table instead of referencing the table directly. [Read more...]
To be honest, deleting a million rows from a table scales just as badly as inserting or updating a million rows. It's the size of the rowset that's the problem, and there's not much you can do about that.
My suggestions:
Make sure that the table has a primary key and clustered index (this is vital for all operations).
Make sure that the clustered index is such that minimal page re-organisation would occur if a large block of rows were to be deleted.
Make sure that your selection criteria are SARGable.
Make sure that all your foreign key constraints are currently trusted.
(if the indexes are "unused", why are they there at all?)
One option I've used in the past is to do the work in batches. The crude way would be to use SET ROWCOUNT 20000 (or whatever) and loop (perhaps with a WAITFOR DELAY) until you get rid of it all (##ROWCOUNT = 0).
This might help reduce the impact upon other systems.
The problem is you haven't defined your conditions enough. I.e. what exactly are you optimizing?
For example, is the system down for nightly maintenance and no users are on the system? And are you deleting a large % of the database?
If offline and deleting a large %, may make sense to just build a new table with data to keep, drop the old table, and rename. If deleting a small %, you likely want to batch things in as large batches as your log space allows. It entirely depends on your database, but dropping indexes for the duration of the rebuild may hurt or help -- if even possible due to being "offline".
If you're online, what's the likelihood your deletes are conflicting with user activity (and is user activity predominantly read, update, or what)? Or, are you trying to optimize for user experience or speed of getting your query done? If you're deleting from a table that's frequently updated by other users, you need to batch but with smaller batch sizes. Even if you do something like a table lock to enforce isolation, that doesn't do much good if your delete statement takes an hour.
When you define your conditions better, you can pick one of the other answers here. I like the link in Rob Sanders' post for batching things.
If you have lots of foreign key tables, start at the bottom of the chain and work up. The final delete will go faster and block less things if there are no child records to cascade delete (which I would NOT turn on if I had a large number fo child tables as it will kill performance).
Delete in batches.
If you have foreign key tables that are no longer being used (you'd be surprised how often production databses end up with old tables nobody will get rid of), get rid of them or at least break the FK/PK connection. No sense cheking a table for records if it isn't being used.
Don't delete - mark records as delted and then exclude marked records from all queries. This is best set up at the time of database design. A lot of people use this because it is also the best fastest way to get back records accidentlally deleted. But it is a lot of work to set up in an already existing system.
I'll add another one to this:
Make sure your transaction isolation level and database options are set appropriately. If your SQL server is set not to use row versioning, or you're using an isolation level on other queries where you will wait for the rows to be deleted, you could be setting yourself up for some very poor performance while the operation is happening.
On very large tables where you have a very specific set of criteria for deletes, you could also partition the table, switch out the partition, and then process the deletions.
The SQLCAT team has been using this technique on really really large volumes of data. I found some references to it here but I'll try and find something more definitive.
I think, the big trap with delete that kill the performance is that sql after each row deleted, it updates all the related indexes for any column in this row. what about delting all indexes before bulk delete?
There are deletes and then there are deletes. If you are aging out data as part of a trim job, you will hopefully be able to delete contiguous blocks of rows by clustered key. If you have to age out data from a high volume table that is not contiguous it is very very painful.
If it is true that UPDATES are faster than DELETES, you could add a status column called DELETED and filter on it in your selects. Then run a proc at night that does the actual deletes.
Do you have foreign keys with referential integrity activated?
Do you have triggers active?
Simplify any use of functions in your WHERE clause! Example:
DELETE FROM Claims
WHERE dbo.YearMonthGet(DataFileYearMonth) = dbo.YearMonthGet(#DataFileYearMonth)
This form of the WHERE clause required 8 minutes to delete 125,837 records.
The YearMonthGet function composed a date with the year and month from the input date and set day = 1. This was to ensure we deleted records based on year and month but not day of month.
I rewrote the WHERE clause to:
WHERE YEAR(DataFileYearMonth) = YEAR(#DataFileYearMonth)
AND MONTH(DataFileYearMonth) = MONTH(#DataFileYearMonth)
The result: The delete required about 38-44 seconds to delete those 125,837 records!