deleting a large number of rows from a table - sql

We have a requirement to delete rows in the order of millions from multiple tables as a batch job (note that we are not deleting all the rows, we are deleting based on a timestamp stored in an indexed column). Obviously a normal DELETE takes forever (because of logging, referential constraint checking etc.). I know in the LUW world, we have ALTER TABLE NOT LOGGED INITIALLY but I can't seem to find the an equivalent SQL statement for DB2 v8 z/OS. Any one has any ideas on how to do this really fast? Also, any ideas on how to avoid the referential checks when deleting the rows? Please let me know.

In the past I have solved this kind of problem by exporting the data and re-loading it with a replace style command. For example:
EXPORT to myfile.ixf OF ixf
SELECT *
FROM my_table
WHERE last_modified < CURRENT TIMESTAMP - 30 DAYS;
Then you can LOAD it back in, replacing the old stuff.
LOAD FROM myfile.ixf OF ixf
REPLACE INTO my_table
NONRECOVERABLE INDEXING MODE INCREMENTAL;
I'm not sure whether this will be faster or not for you (probably it depends on whether you're deleting more than you're keeping).

Do the foreign keys already have indexes as well?
How do you have your delete action set? CASCADE, NULL, NO ACTION
Use SET INTEGRITY to temporarily disable constraints on the batch process.
http://www.ibm.com/developerworks/data/library/techarticle/dm-0401melnyk/index.html
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r

We modified the tablespace so the lock would occur at the tablespace level instead of at the page level. Once we changed that DB2 only required one lock to do the DELETE and we didn't have any issues with locking. As for the logging, we just asked the customer to be aware of the amount of logging required (as there did not seem to be a solution to get around the logging issue). As for the constraints, we just dropped and recreated them after the delete.
Thanks all for your help.

Related

sqlldr direct load=true with referential partition options

So I need to do multiple bulk inserts into a table with row level triggers. I thought it would be a good idea to gather the generated ids first, combine them with my data and then do a direct=true sql load. Normally this would work fine but the table is partitioned by reference so it cannot disable the foreign key constraint that would allow me to do the direct load.
Does anyone know of anyway around this? My first solution of bulk collecting into a varray and inserting every 100,000 went moderately fast but if I was able to do a direct load, that would be much faster.
ERROR: SQL*Loader-965: Error -1 disabling constraint client_fk on
table my_table
The manual implies there is no way to have SQL*Loader use a direct path load but not disable the foreign keys.
But direct-path inserts can work on reference partitioned tables, even with the foreign keys enabled, as I demonstrated in this question and answer.
Convert the process from SQL*Loader to an external table INSERT statement. SQL*Loader and external tables use similar mechanisms so the conversion shouldn't be too difficult. External tables require a little more work - you have to write the INSERT with an append hint, and manually disable and re-enable triggers and perhaps other objects. But that extra control allows loading data quickly with direct-path inserts.

How Do I Ignore Errors When Deleting Records

I'm in the process of testing a large number of schema changes to upgrade our db to run with the latest version of a packaged product we have.
At this point I'm not interested in the data contained in the db, only the schema (i.e. the tables, views, constraints, keys, stored procedures, etc.).
My testing entails running scripts, resolving errors, an re-running the scripts. If I want to re-run the scripts I need to first restore the db to get it back to a known state. Restoring the db is very time consuming as it has lots of data. I would like to "slim down" the db and remove as much data as possible. That way it will be quicker to restore the db and re-run my scripts
When I attempt to delete records from many of the tables ("delete from table-name") I run into constraint errors and the command stops.
Is there a way to allow the command to continue and, in effect, delete all the records in the table where there aren't constraint issues? In other words I'd like the command to ignore errors and continue to delete all the records it can.
Any Suggestions would be greatly appreciated.
Byron above makes a good point; you can disable FOREIGN KEY or CHECK constraints with the ALTER TABLE command:
ALTER TABLE TableName NOCHECK CONSTRAINT ALL
Then do your work, and when finished,
ALTER TABLE TableName CHECK CONSTRAINT ALL
to re-enable constraints. Be careful, though: if you leave your data in an inconsistent state that violates constraints once you re-enable them, future updates can fail due to constraint errors.
Sources:
http://weblogs.sqlteam.com/joew/archive/2008/10/01/60719.aspx
http://technet.microsoft.com/en-us/library/ms190273.aspx

Oracle performance - disabling FKs to make DELETE statement work faster?

When deleting from a large table in Oracle - let's call it table X - does it make sense to disable table X's FKs that do not have ON DELETE CASCADE? I'm not referring to disabling FKs on other tables that link to table X, but just disabling FKs on table X to improve the performance of the DELETE statements.
I'm making the indexes on table X unusable, but the DELETE still takes a while.
I think that those FKs don't matter to the performance of the DELETE statement since we're just deleting, and not inserting or updating, so the FKs don't need to be checked. What do you think?
That seems like a really bad idea. No matter what you do, you'll have a period where referential integrity is not enforced on your database. Then you go to put the FKs back in place and, oops, someone has inserted an invalid row.
Furthermore, ALTER TABLE is a DDL statement, so executing it will commit any work up to that point. You'll lose the ability to rollback if something goes wrong elsewhere in your transaction.
Can you look through the explain plan to see why your DELETE statement is taking so long?
Eventually I didn't have to disable those FKs before the archive process starts and enable them when the process ends. But instead, in order to improve performance of the DELETE statements, we had to drop indexes before the archive process starts and recreate them after the archive process finishes. We also committed more often.

Bulk delete (truncate vs delete)

We have a table with a 150+ million records. We need to clear/delete all rows. Delete operation would take forever due to it writing to the t-logs and we cannot change our recovery model for the whole DB. We have tested the truncate table option.
What we realized that truncate deallocates pages from the table, and if I am not wrong makes them available for reuse but doesn't shrink the db automatically. So, if we want to reduce the DB size, we really would need to do run the shrink db command after truncating the table.
Is this normal procedure? Anything we need to be careful or aware about, or are there any better alternatives?
truncate is what you're looking for. If you need to slim down the db afterwards, run a shrink.
This MSDN refernce (if you're talking T-SQL) compares the behind the scenes of deleting rows versus truncating.
"Delete all rows"... wouldn't DROP TABLE (and re-recreate an empty one with same schema / indices) be preferable ? (I personally like "fresh starts" ;-) )
This said TRUNCATE TABLE is quite OK too, and yes, DBCC SHRINKFILE may be required afterwards if you wish to recover the space.
Depending on the size of the full database, the shrink may take a while; I've found it to go faster if it is shrunk in smaller chunks, rather than trying to get it back all at once.
One thing to remember with Truncate Table (as well as drop table) is going forward this will not work if you ever have foreign keys referencing the table.
As pointed out, if you can't use truncate or drop
SELECT 1
WHILE ##ROWCOUNT <> 0
DELETE TOP (100000) MyTable
You have a normal solution (truncate + shrink db) to remove all the records from a table.
As Irwin pointed out. The TRUNCATE command won't work while being referenced by a Foreign key constraint. So first drop the constraints, truncate the table and recreate the constraints.
If your concerned about performance and this is a regular routine for your system. You might want to look into moving this table to it's own data file, then run shrink only against the target datafile!

Why 'delete from table' takes a long time when 'truncate table' takes 0 time?

(I've tried this in MySql)
I believe they're semantically equivalent. Why not identify this trivial case and speed it up?
truncate table cannot be rolled back, it is like dropping and recreating the table.
...just to add some detail.
Calling the DELETE statement tells the database engine to generate a transaction log of all the records deleted. In the event the delete was done in error, you can restore your records.
Calling the TRUNCATE statement is a blanket "all or nothing" that removes all the records with no transaction log to restore from. It is definitely faster, but should only be done when you're sure you don't need any of the records you're going to remove.
Delete from table deletes each row from the one at a time and adds a record into the transaction log so that the operation can be rolled back. The time taken to delete is also proportional to the number of indexes on the table, and if there are any foreign key constraints (for innodb).
Truncate effectively drops the table and recreates it and can not be performed within a transaction. It therefore required fewer operations and executes quickly. Truncate also does not make use of any on delete triggers.
Exact details about why this is quicker in MySql can be found in the MySql documentation:
http://dev.mysql.com/doc/refman/5.0/en/truncate-table.html
Your question was about MySQL and I know little to nothing about MySQL as a product but I thought I'd add that in SQL Server a TRUNCATE statement can be rolled back. Try it for yourself
create table test1 (col1 int)
go
insert test1 values(3)
begin tran
truncate table test1
select * from test1
rollback tran
select * from test1
In SQL Server TRUNCATE is logged, it's just not logged in such a verbose way as DELETE is logged. I believe it's referred to as a minimally logged operation. Effectively the data pages still contain the data but their extents have been marked for deletion. As long as the data pages still exist you can roll back the truncate. Hope this is helpful. I'd be interested to know the results if somebody tries it on MySQL.
For MySql 5 using InnoDb as the storage engine, TRUNCATE acts just like DELETE without a WHERE clause: i.e. for large tables it takes ages because it deletes rows one-by-one. This is changing in version 6.x.
see
http://dev.mysql.com/doc/refman/5.1/en/truncate-table.html
for 5.1 info (row-by-row with InnoDB) and
http://blogs.mysql.com/peterg/category/personal-opinion/
for changes in 6.x
Editor's note
This answer is clearly contradicted by the MySQL documentation:
"For an InnoDB table before version 5.0.3, InnoDB processes TRUNCATE TABLE by deleting rows one by one. As of MySQL 5.0.3, row by row deletion is used only if there are any FOREIGN KEY constraints that reference the table. If there are no FOREIGN KEY constraints, InnoDB performs fast truncation by dropping the original table and creating an empty one with the same definition, which is much faster than deleting rows one by one."
Truncate is on a table level, while Delete is on a row level. If you would translate this to sql in an other syntax, truncate would be:
DELETE * FROM table
thus deleting all rows at once, while DELETE statement (in PHPMyAdmin) goes like:
DELETE * FROM table WHERE id = 1
DELETE * FROM table WHERE id = 2
Just until the table is empty. Each query taking a number of (milli)seconds which add up to taking longer than a truncate.