Optimize Delete query with large number of data on oracle

Optimize Delete query with large number of data on oracle - sql

I'm working on oracle 9i. I have a table with 135,000,000 records, partitioned where each partition having approx. 10,000,000 rows. all indexed and everything.
I need to delete around 70,000,000 rows from this as the new business requirement.
So I created a backup of the rows to be deleted as separate table.
Table1 <col1, col2........> -- main table (135,000,000 rows)
Table2 <col1, col2........> -- backup table (70,000,000 rows)
Tried the below delete query.
Delete from table1 t1 where exists (select 1 from table2 t2 where t2.col1 = t1.col1)
but it takes infinite hours.
then tried
declare
cursor c1 is
select col1 from table2;
c2 c1%rowtype;
cnt number;
begin
cnt :=0;
open c1;
loop
fetch c1 into c2;
exit when c1%notfound;
delete from table1 t1 where t1.col1 = c2.col1;
if cnt >= 100000 then
commit;
end if;
cnt:=cnt+1;
end loop;
close c1;
end;
even still its been running for more than 12 hours. and still not completed.
Please note that there are multiple indexes on table1 and an index on col1 on table2. all the tables and indexes are analysed.
Please advise if there is any way of optimizing for this scenario.
Thanks guys.

Drop all indexes (backup the create statements)
Use the select statement that used to build the backup table, create from it a DELETE command
Recreate all index

I remember facing this issue earlier. In that case, we resorted to doing this since it worked out faster than any other delete operation:
1) Create another table with identical structure
2) Insert into the new table the records you want to keep (use Direct path insert to speed this up)
3) Drop the old table
4) Rename the new table

You say that the table is partitioned. Is your intention to drop all the data in certain partitions? If so, you should be able to simply drop the 7 partitions that have the 70 million rows that you want to drop. I'm assuming, however, that your problem isn't that simple.
If you can do interim commits, that implies that you don't care about transactional consistency, the most efficient approach is likely something along the lines of
CREATE TABLE rows_to_save
AS SELECT *
FROM table1
WHERE <<criteria to select the 65 million rows you want to keep>>
TRUNCATE TABLE table1;
INSERT /*+ append */
INTO table1
SELECT *
FROM rows_to_save;
Barring that, rather than creating the backup table, it would be more efficient to simply issue the DELETE statement
DELETE FROM table1
WHERE <<criteria to select the 70 million rows you want to keep>>
You may also benefit from dropping or disabling the indexes and constraints before running the DELETE.

I'm going to answer this assuming that it is cheaper to filter against the backup table, but it would probably be cheaper to just use the negation of the criteria you used to populate the backup table.
1) create a new table with the same structure. No indexes, constraints, or triggers.
2)
select 'insert /*+ append nologging */ into new_table partition (' || n.partition_name || ') select * from old_table partition (' || o.partition_name || ') minus select * from bak_table partition (' || b.partition_name || ');'
from all_tab_partitions o, all_tab_partitions n, all_tab_partitions b
where o.partition_no = all( n.partition_no, b.partition_no)
and o.table_name = 'OLD_TABLE' and o.table_owner = 'OWNER'
and n.table_name = 'NEW_TABLE' and n.table_owner = 'OWNER'
and b.table_name = 'BAK_TABLE' and b.table_owner = 'OWNER';
-- note, I haven't run this it may need minor corrections in addition to the obvious substitutions
3) verify and the run the result of the previous query
4) build the indexes, constraints, and triggers if needed
This avoids massive amounts of redo and undo compared to the delete.
append hint for direct path inserts
no logging to further reduce redo - make sure you backup afterwards
takes advantage of your partitioning to break the work into chunks that can be sorted in less passes
You could probably go faster with parallel insert + parallel select, but it is probably not necessary. Just don't do a parallel select without the insert and an "alter session enable parallel dml"

Related

Update statement taking more than 24 hours for updating 32k rows

Below is the update statement that I am running for 32k times and it is taking more than 15 hours and running.
I have to update the value in table 2 for 32k different M_DISPLAY VALUES.
UPDATE TABLE_2 T2 SET T2.M_VALUE = 'COL_ANC'
WHERE EXISTS (SELECT 1 FROM TABLE_2 T1 WHERE TRIM(T1.M_DISPLAY) = 'ANCHORTST' AND T1.M_LABEL=T2.M_LABEL );
Am not sure Why is it taking such a long time as I have tuned the query,
I have copied 32000 update statements in a Update.sql file and running the SQL in command line.
Though it is updating the table, it is a neverending process
Please advice if I have gone wrong anywhere
Regards

Using FORALL
If you cannot rewrite the query to run a single bulk-update instead of 32k individual updates, you might still get lucky by using PL/SQL's FORALL. An example:
DECLARE
TYPE rec_t IS RECORD (
m_value table_2.m_value%TYPE,
m_display table_2.m_display%TYPE
);
TYPE tab_t IS TABLE OF rec_t;
data tab_t := tab_t();
BEGIN
-- Fill in data object. Replace this by whatever your logic for matching
-- m_value to m_display is
data.extend(1);
data(1).m_value := 'COL_ANC';
data(1).m_display := 'ANCHORTST';
-- Then, run the 32k updates using FORALL
FORALL i IN 1 .. data.COUNT
UPDATE table_2 t2
SET t2.m_value = data(i).m_value
WHERE EXISTS (
SELECT 1
FROM table_2 t1
WHERE trim(t1.m_display) = data(i).m_display
AND t1.m_label = t2.m_label
);
END;
/
Concurrency
If you're not the only process on the system, 32k updates in a single transaction can hurt. It's definitely worth committing a few thousand rows in sub-transactions to reduce concurrency effects with other processes that might read the same table while you're updating.
Bulk update
Really, the goal of any improvement should be bulk updating the entire data set in one go (or perhaps split in a few bulks, see concurrency).
If you had a staging table containing the update instructions:
CREATE TABLE update_instructions (
m_value VARCHAR2(..),
m_display VARCHAR2(..)
);
Then you could pull off something along the lines of:
MERGE INTO table_2 t2
USING (
SELECT u.*, t1.m_label
FROM update_instructions u
JOIN table_2 t1 ON trim(t1.m_display) = u.m_display
) t1
ON t2.m_label = t1.m_label
WHEN MATCHED THEN UPDATE SET t2.m_value = t1.m_value;
This should be even faster than FORALL (but might have more concurrency implications).
Indexing and data sanitisation
Of course, one thing that might definitely hurt you when running 32k individual update statements is the TRIM() function, which prevents using an index on M_DISPLAY efficiently. If you could sanitise your data so it doesn't need trimming first, that would definitely help. Otherwise, you could add a function based index just for the update (and then drop it again):
CREATE INDEX i ON table_2 (trim (m_display));

The query and subquery query the same table: TABLE_2. Assuming that M_LABEL is unique, the subquery returns 1s for all rows in TABLE_2 where M_DISPLAY is ANCHORTST. Then the update query updates the same (!) TABLE_2 for all 1s returned from subquery - so for all rows where M_DISPLAY is ANCHORTST.
Therefore, the query could be simplified, exploiting the fact that both update and select work on the same table - TABLE_2:
UPDATE TABLE_2 T2 SET T2.M_VALUE = 'COL_ANC' WHERE TRIM(T2.M_DISPLAY) = 'ANCHORTST'
If M_LABEL is not unique, then the above is not going to work - thanks to commentators for pointing that out!

For significantly faster execution:
Ensure that you have created an index on M_DISPLAY and M_LABEL columns that are in your WHERE clause.
Ensure that M_DISPLAY has a function based index. If it does not, then do not pass it to the TRIM function, because the function will prevent the database from using the index that you have created for the M_DISPLAY column. TRIM the data before storing in the table.
Thats it.
By the way, as has been mentioned, you shouldn't need 32k queries for meeting your objective. One will probably suffice. Look into query based update. As an example, see the accepted answer here:
Oracle SQL: Update a table with data from another table

Deletion of millions of records without parallel hint and bulk collect

I have a table PROD_MAIN which have 750 million records on a single database. The database infrastructure is very basic and does not have any RACs on it. It is just 1 database.
The requirement is to delete the records which are more than 1 year old. I wrote a PL SQL code with parallel hint and bulk collect. It takes very long time to execute. Please find the code below.
ALTER SESSION ENABLE PARALLEL DML;
DECLARE
TYPE TABLE_DELETE IS TABLE OF ROWID;
T_DELETE TABLE_DELETE;
CURSOR C_DELETE IS
SELECT /*+ PARALLEL(10) */ ROWID FROM PROD_MAIN WHERE RECORD_DATE < (TRUNC(SYSDATE) - 366);
L_DELETE_BUFFER PLS_INTEGER := 50000;
BEGIN
OPEN C_DELETE;
LOOP
FETCH C_DELETE BULK COLLECT
INTO T_DELETE LIMIT L_DELETE_BUFFER;
FORALL I IN 1..T_DELETE.COUNT
DELETE /*+ PARALLEL(10) */ PROD_MAIN WHERE ROWID = T_DELETE(I);
EXIT WHEN C_DELETE%NOTFOUND;
COMMIT;
END LOOP;
CLOSE C_DELETE;
COMMIT;
END;
ALTER SESSION DISABLE PARALLEL DML;
I also did NOLOGGING on the table. I created indexes and did stat gathering but the performance did not improve. So, is there any other way where I can delete the millions of records within 3 - 5 hours?

If the table is partitioned by date, you can truncate the partitions with more than one year (truncate a partition takes no time a dont degrades the table)
if it has no partitions, i think the best think you can do is not to try to remove all records in one single transaction. Try to remove a few records and put it in a loop. For example, i you want to delete 10.000 records you can do:
DELETE FROM your_table WHERE your_conditions LIMIT 10.000 (MySQL)
DELETE FROM your_table WHERE your_conditions AND rownum <10000 (Oracle)
Remember to optimize the table after finishing (or even alternated between deletes) due to it will degrade the index.
Depending on your environment requirements, another thing you can try is to create an empty table copy, and perform an INSERT from SELECT, inserting in the new table all the rows that you want to maintain. after that, truncate the original table, drop it, and rename the new one.
MyOriginalTable whit All Data
Create en Empty Copy: MyTemporalTable (without indexes)
Move valid data from MyOriginalTable to MyTemporalTable
Truncate and Drop MyOriginalTable
Create indexes in MyTemporalTable
Rename MyTemporalTable to MyORiginalTable

I think the problem is: this table is master table for other table(s).
To speed up disable those foreign keys in other tables. Then delete rows then enable the indexes.
But the third solution of 'Diego Sal Diaz' to copy remaining row to temp table and rename it is good also.

I resolved this issue by creating a temporary table PROD_MAIN_TEMP which has the exact table structure like PROD_MAIN. After creating, I inserted the data which I want to keep.
SELECT /*+ PARALLEL(10) */ * FROM PROD_MAIN WHERE RECORD_DATE < (TRUNC(SYSDATE) - 366);
Dropped the table main table PROD_MAIN and renamed the temporary table PROD_MAIN_TEMP to PROD_MAIN.
This whole process completed in 3 hours.

Delete SQL - Taking forever

Delete SQL scripts are taking very long time and even hanging forever in Oracle 12c. We are having hundreds of delete scripts like below and even tried to run it by parallel operation /*+ PARALLEL (a,4) */ as well, but no luck in the performance improvement.
Is there any way to tune the delete scripts.
Can we use PL/SQL - for loop to make any performance improvement?
If yes, please share your thoughts and advices.
Some of Sample SQL Scripts:
DELETE
FROM
E_PROJ_DETAIL
WHERE
CATEGORY_ID in (SELECT PRIMARY_KEY FROM Y_OBJ_CATEGORY WHERE TREE_POSITION='VEN$_MADD');
COMMIT;
delete
from
e_proj_group_access
where enterprise_object_id in (select primary_key from t_project where application_id in (select application_id from y_object_definition where unique_code ='VEN$_MADD'));
commit;

I don't know about any possibility to 'tune' DELETE statements, maybe except droping any useless (=unused) indexes and constraints upfront and recreating them afterwards.
In these cases (deleting many rows) I used FOR loops with commits inside, something like this:
I := 0;
FOR c IN (SELECT id FROM table WHERE [conditions to delete])
LOOP
DELETE FROM table WHERE t.id = c.id; /* id = primary key */
IF (I > 1000) THEN
COMMIT;
I := 0;
END IF;
I := I + 1;
END LOOP;
But here you can occasionally run into ORA-01555: snapshot too old, because you will delete rows from the same table from which you opened cursor in the FOR loop.
In other situations, you could do CREATE TABLE newtable AS SELECT * FROM oldtable WHERE [conditions for rows I want to keep] and then do TRUNCATE oldtable and INSERT /*+ APPEND */ INTO oldtable SELECT * from newtable; to write correct data back.
It really depends on the situation you are in (as others commented - how many rows do you have in the table, how many rows do you want to delete, etc.).
hth :-)

Depending if it is a one shot deal or not, creating a new table with only the rows you want to keep is often much faster :
CREATE TABLE E_PROJ_DETAIL_NEW AS
SELECT * FROM
E_PROJ_DETAIL
WHERE
CATEGORY_ID NOT IN (SELECT PRIMARY_KEY FROM Y_OBJ_CATEGORY WHERE TREE_POSITION='VEN$_MADD');
Then delete the old table and rename the new one.
You may need to re-create indexes / fk if you had some.

How to delete large amount of data from Oracle table using separated transactions?

I need to delete about 5 millions of records from Oracle table.
Due to the performance (REDO logs) I would like to remove 100000 records per transaction, like this:
DECLARE
v_limit PLS_INTEGER :=100000;
CURSOR person_deleted_cur
IS
SELECT rowid
FROM Persons p
WHERE City = 'ABC'
AND NOT EXISTS
(SELECT O_Id
FROM Orders o
WHERE p.P_Id = o.P_Id);
TYPE person_deleted_nt IS TABLE OF person_deleted_cur%ROWTYPE
INDEX BY PLS_INTEGER;
BEGIN
OPEN person_deleted_cur;
LOOP
FETCH person_deleted_cur
BULK COLLECT INTO person_deleted_nt LIMIT v_limit;
FORALL indx IN 1 .. person_deleted_nt.COUNT
DELETE FROM Persons WHERE rowid=person_deleted_nt(indx);
EXIT WHEN person_deleted_cur%NOTFOUND;
END LOOP;
CLOSE person_deleted_cur;
COMMIT;
END;
But Liquibase runs changeSet in one transaction and rolls it back if there are any errors. Is a good habit to use COMMIT manifestly in Liquibase scripts?
What should be a well-written script?

In the book "Oracle for professionals" Tom Kyte written about update in others transactions. The point is: if you can change table with one query then so do. Because one query will faster than differ transactions or plsql loop with partition delete.
Another a approach would be to use CREATE TABLE with NOLOGGING instead of UPDATE/DELETE. It is the best solution for a change many rows.
So create nologging table with your query, than to delete original table and recreate index, constraints and etc, than rename temp table to original table.

Agree with #jimmbraddock, but a more simple solution that is lower impact when it comes to an OLTP system might be to repeatedly run this query until it affects no more rows:
DELETE FROM Persons p
WHERE City = 'ABC'
AND NOT EXISTS
(SELECT O_Id
FROM Orders o
WHERE p.P_Id = o.P_Id)
AND ROWNUM <= 100000;
The total resource usage would be higher than a single delete, and thus a single delete would still be better if your system can accommodate it, but this would be pretty robust, and with an index on persons(city,p_id) and one on orders(p_id) it should be very performant.

update x set y = null takes a long time

At work, I have a large table (some 3 million rows, like 40-50 columns). I sometimes need to empty some of the columns and fill them with new data. What I did not expect is that
UPDATE table1 SET y = null
takes much more time than filling the column with data which is generated, for example, in the sql query from other columns of the same table or queried from other tables in a subquery. It does not matter if I go through all table rows at once (like in the update query above) or if I use a cursor to go through the table row by row (using the pk). It does not matter if I use the large table at work or if I create a small test table and fill it with some hundredthousands of test-rows. Setting the column to null always takes way longer (Throughout the tests, I encountered factors of 2 to 10) than updating the column with some dynamic data (which is different for each row).
Whats the reason for this? What does Oracle do when setting a column to null? Or - what's is my error in reasoning?
Thanks for your help!
P.S.: I am using oracle 11g2, and found these results using both plsql developer and oracle sql developer.

Is column Y indexed? It could be that setting the column to null means Oracle has to delete from the index, rather than just update it. If that's the case, you could drop and rebuild it after updating the data.
EDIT:
Is it just column Y that exhibits the issue, or is it independent of the column being updated? Can you post the table definition, including constraints?

Summary
I think updating to null is slower because Oracle (incorrectly) tries to take advantage of the way it stores nulls, causing it to frequently re-organize the rows in the block ("heap block compress"), creating a lot of extra UNDO and REDO.
What's so special about null?
From the Oracle Database Concepts:
"Nulls are stored in the database if they fall between columns with data values. In these cases they require 1 byte to store the length of the column (zero).
Trailing nulls in a row require no storage because a new row header signals that the remaining columns in the previous row are null. For example, if the last three columns of a table are null, no information is stored for those columns. In tables with many columns,
the columns more likely to contain nulls should be defined last to conserve disk space."
Test
Benchmarking updates is very difficult because the true cost of an update cannot be measured just from the update statement. For example, log switches will
not happen with every update, and delayed block cleanout will happen later. To accurately test an update, there should be multiple runs,
objects should be recreated for each run, and the high and low values should be discarded.
For simplicity the script below does not throw out high and low results, and only tests a table with a single column. But the problem still occurs regardless of the number of columns, their data, and which column is updated.
I used the RunStats utility from http://www.oracle-developer.net/utilities.php to compare the resource consumption of updating-to-a-value with updating-to-a-null.
create table test1(col1 number);
BEGIN
dbms_output.enable(1000000);
runstats_pkg.rs_start;
for i in 1 .. 10 loop
execute immediate 'drop table test1 purge';
execute immediate 'create table test1 (col1 number)';
execute immediate 'insert /*+ append */ into test1 select 1 col1
from dual connect by level <= 100000';
commit;
execute immediate 'update test1 set col1 = 1';
commit;
end loop;
runstats_pkg.rs_pause;
runstats_pkg.rs_resume;
for i in 1 .. 10 loop
execute immediate 'drop table test1 purge';
execute immediate 'create table test1 (col1 number)';
execute immediate 'insert /*+ append */ into test1 select 1 col1
from dual connect by level <= 100000';
commit;
execute immediate 'update test1 set col1 = null';
commit;
end loop;
runstats_pkg.rs_stop();
END;
/
Result
There are dozens of differences, these are the four I think are most relevant:
Type Name Run1 Run2 Diff
----- ---------------------------- ------------ ------------ ------------
TIMER elapsed time (hsecs) 1,269 4,738 3,469
STAT heap block compress 1 2,028 2,027
STAT undo change vector size 55,855,008 181,387,456 125,532,448
STAT redo size 133,260,596 581,641,084 448,380,488
Solutions?
The only possible solution I can think of is to enable table compression. The trailing-null storage trick doesn't happen for compressed tables.
So even though the "heap block compress" number gets even higher for Run2, from 2028 to 23208, I guess it doesn't actually do anything.
The redo, undo, and elapsed time between the two runs is almost identical with table compression enabled.
However, there are lots of potential downsides to table compression. Updating to a null will run much faster, but every other update will run at least slightly slower.

That's because it deletes from blocks that data.
And delete is the hardest operation. If you can avoid a delete, do it.
I recommend you to create another table with that column null(Create table as select for example, or insert select), and fill it(the column) with your procedure. Drop old table and then rename the new table with current name.
UPDATE:
Another important thing is that you should update the column as is, with new values. It is useless to set them null and after that refill them.
If you do not have values for all rows, you can do the update like this:
udpate table1
set y = (select new_value from source where source.key = table1.key)
and will set to null those rows that does not exists in source.

I would try what Tom Kyte suggested on large updates.
When it comes to huge tables, it best to go like this : take a few rows, update them, take some more, update those etc. Don't try to issue an update on all the table. That's a killer move right from the start.
Basically create binary_integer indexed table, fetch 10 rows at a time, and update them.
Here is a piece of code that i have used of large tables with success. Because im lazy and its like 2AM now ill just copy paste it here and let you figure it out, but let me know if you need help :
DECLARE
TYPE BookingRecord IS RECORD (
bprice number,
bevent_id number,
book_id number
);
TYPE array is TABLE of BookingRecord index by binary_integer;
l_data array;
CURSOR c1 is
SELECT LVC_USD_PRICE_V2(ev.activity_version_id,ev.course_start_date,t.local_update_date,ev.currency,nvl(t.delegate_country,ev.sponsor_org_country),ev.price,ev.currency,t.ota_status,ev.location_type) x,
ev.title,
t.ota_booking_id
FROM ota_gsi_delegate_bookings_t#diseulprod t,
inted_parted_events_t#diseulprod ev
WHERE t.event_id = ev.event_id
and t.ota_booking_id =
BEGIN
open c1;
loop
fetch c1 bulk collect into l_data limit 20;
for i in 1..l_data.count
loop
update ou_inc_int_t_01
set price = l_data(i).bprice,
updated = 'Y'
where booking_id = l_data(i).book_id;
end loop;
exit when c1%notfound;
end loop;
close c1;
END;

what can also help speed up updates is to use alter table table1 nologging so that the update won't generate redo logs. another possibility is to drop the column and re-add it. since this is a DDL operation it will generate neither redo nor undo.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas