I am trying to delete a lot of rows from a table. I want to try the approach of putting rows I want to delete into a cursor and then keep doing fetch, delete, commit on each row of the cursor until it is empty.
In the below code we are fetching rows and putting them in a TYPE.
How can I modify the below code to remove the TYPE from the picture and just simply do fetch,delete,commit on the cursor itself.
OPEN bulk_delete_dup;
LOOP
FETCH bulk_delete_dup BULK COLLECT INTO arr_inc_del LIMIT c_rows;
FORALL i IN arr_inc_del.FIRST .. arr_inc_del.LAST
DELETE FROM UIV_RESPONSE_INCOME
WHERE ROWID = arr_inc_del(i);
COMMIT;
arr_inc_del.DELETE;
EXIT WHEN bulk_delete_dup%NOTFOUND;
END LOOP;
arr_inc_del.DELETE;
CLOSE bulk_delete_dup;
Why do you want to commit in batches? That is only going to slow down your processing. Unless there are other sessions that are trying to modify the rows you are trying to delete, which seems problematic for other reasons, the most efficient approach would be simply to delete the data with a single DELETE, i.e.
DELETE FROM uiv_response_income uri
WHERE EXISTS(
SELECT 1
FROM (<<bulk_delete_dup query>>) bdd
WHERE bdd.rowid = uri.rowid
)
Of course, there may well be a more optimal way of writing this depending on how the query behind your cursor is designed.
If you really want to eliminate the BULK COLLECT (which will slow the process down substantially), you could use the WHERE CURRENT OF syntax to do the DELETE
SQL> create table foo
2 as
3 select level col1
4 from dual
5 connect by level < 10000;
Table created.
SQL> ed
Wrote file afiedt.buf
1 declare
2 cursor c1 is select * from foo for update;
3 l_rowtype c1%rowtype;
4 begin
5 open c1;
6 loop
7 fetch c1 into l_rowtype;
8 exit when c1%notfound;
9 delete from foo where current of c1;
10 end loop;
11* end;
SQL> /
PL/SQL procedure successfully completed.
Be aware, however, that since you have to lock the row (with the FOR UPDATE clause), you cannot put a commit in the loop. Doing a commit would release the locks you had requested with the FOR UPDATE and you'll get an ORA-01002: fetch out of sequence error
SQL> ed
Wrote file afiedt.buf
1 declare
2 cursor c1 is select * from foo for update;
3 l_rowtype c1%rowtype;
4 begin
5 open c1;
6 loop
7 fetch c1 into l_rowtype;
8 exit when c1%notfound;
9 delete from foo where current of c1;
10 commit;
11 end loop;
12* end;
SQL> /
declare
*
ERROR at line 1:
ORA-01002: fetch out of sequence
ORA-06512: at line 7
You may not get a runtime error if you remove the locking and avoid the WHERE CURRENT OF syntax, deleting the data based on the value(s) you fetched from the cursor. However, this is still doing a fetch across commit which is a poor practice and radically increases the odds that you will, at least intermittently, get an ORA-01555: snapshot too old error. It will also be painfully slow compared to the single SQL statement or the BULK COLLECT option.
SQL> ed
Wrote file afiedt.buf
1 declare
2 cursor c1 is select * from foo;
3 l_rowtype c1%rowtype;
4 begin
5 open c1;
6 loop
7 fetch c1 into l_rowtype;
8 exit when c1%notfound;
9 delete from foo where col1 = l_rowtype.col1;
10 commit;
11 end loop;
12* end;
SQL> /
PL/SQL procedure successfully completed.
Of course, you also have to ensure that your process is restartable in case you process some subset of rows and have some unknown number of interim commits before the process dies. If the DELETE is sufficient to cause the row to no longer be returned from your cursor, your process is probably already restartable. But in general, that's a concern if you try to break a single operation into multiple transactions.
A few things. It seems from your company's "no transaction over 8 second" rule (8 seconds, you in Texas?), you have a production db instance that traditionally supported apps doing OLTP stuff (insert 1 row, update 2 rows, etc), and has now also become the batch processing db (remove 50% of the rows and replace with 1mm new rows).
Batch processing should be separated from OLTP instance. In a batch ("data factory") instance, I wouldn't try deleting in this case, I'd probably do a CTAS, drop old table, rename new table, rebuild indexes/stats, recompile invalid objs approach.
Assuming you are stuck doing batch processing in your "8 second" instance, you'll probably find your company will ask for more and more of this in the future, so ask the DBAs for as much rollback as you can get, and hope you don't get a snapshot too old by fetching across commits (cursor select driving the deletes, commit every 1000 rows or so, delete using rowid).
If DBAs cant help, you may be able to first create a temp table containing the rowids that you wish to delete, and then loop through the temp table to delete from main table (avoid fetching across commits), but your company will probably have some rule against this as well as this is another (basic) batch technique.
Something like:
declare
-- assuming index on someCol
cursor sel_cur is
select rowid as row_id
from someTable
where someCol = 'BLAH';
v_ctr pls_integer := 0;
begin
for rec in sel_cur
loop
v_ctr := v_ctr + 1;
-- watch out for snapshot too old...
delete from someTable
where rowid = rec.row_id;
if (mod(v_ctr, 1000) = 0) then
commit;
end if;
end loop;
commit;
end;
Related
I have created one query to update the 35 million records column,
but unfortunately, it took around more than one hour to process.
did I miss anything on the below query?
DECLARE
CURSOR exp_cur IS
SELECT
DECODE(
COLUMN_NAME,
NULL, NULL,
standard_hash(COLUMN_NAME)
) AS COLUMN_NAME
FROM TABLE1;
TYPE nt_fName IS TABLE OF VARCHAR2(100);
fname nt_fName;
BEGIN
OPEN exp_cur;
FETCH exp_cur BULK COLLECT INTO fname LIMIT 1000000;
CLOSE exp_cur;
--Print data
FOR idx IN 1 .. fname.COUNT
LOOP
UPDATE TABLE1 SET COLUMN_NAME=fname(idx);
commit;
DBMS_OUTPUT.PUT_LINE (idx||' '||fname(idx) );
END LOOP;
END;
The reason why bulk collect used with a forall construction is generally faster than the equivalent row-by-row loop is because it applies all the updates in one shot, instead of laboriously stepping though the rows one at a time and launching 35 million separate update statements, each one requiring the database to search for the individual row before updating it. But what you have written (even when the bugs are fixed) is still a row-by-row loop with 35 million search and update statements, plus the additional overhead of populating a 700 MB array in memory, 35 million commits, and 35 million dbms_output messages. It has to be slower because it has significantly more work to do than a plain update.
If it is practical to copy the data to a new table, insert will be a lot faster than update. At the end you can reapply any grants, indexes and constraints to the new table, rename both tables and drop the old one. You can also insert /*+ parallel enable_parallel_dml */ (or prior to Oracle 12c, you have to alter session enable parallel dml separately.) You could define the new table as nologging during the copy, but check with your DBA as that can affect replication and backups, though that might not matter if this is a test system. This will all need careful scripting if it's going to form part of a routine workflow.
Your code is updating all records of TABLE1 in each loop. (It loops 35 million times and in each loop updating 35 million records, That's why it is taking time)
You can simply use a single update statement as follows:
UPDATE TABLE1 SET COLUMN_NAME = standard_hash(COLUMN_NAME)
WHERE COLUMN_NAME IS NOT NULL;
So, If you want to use the BULK COLLECT and FORALL then you can use it as follows:
DECLARE
CURSOR EXP_CUR IS
SELECT COLUMN_NAME FROM TABLE1
WHERE COLUMN_NAME IS NOT NULL;
TYPE NT_FNAME IS TABLE OF VARCHAR2(100);
FNAME NT_FNAME;
BEGIN
OPEN EXP_CUR;
FETCH EXP_CUR BULK COLLECT INTO FNAME LIMIT 1000000;
FORALL IDX IN FNAME.FIRST..FNAME.LAST
UPDATE TABLE1
SET COLUMN_NAME = STANDARD_HASH(COLUMN_NAME)
WHERE COLUMN_NAME = FNAME(IDX);
COMMIT;
CLOSE EXP_CUR;
END;
/
I made this procedure to bulk delete data (35m records). Can you see why this pl/sql procedure runs without exiting and rows are not getting deleted ?
create or replace procedure clear_logs
as
CURSOR c_logstodel IS SELECT * FROM test where id=23;
TYPE typ_log is table of test%ROWTYPE;
v_log_del typ_log;
BEGIN
OPEN c_logstodel;
LOOP
FETCH c_logstodel BULK COLLECT INTO v_log_del LIMIT 5000;
EXIT WHEN c_logstodel%NOTFOUND;
FORALL i IN v_log_del.FIRST..v_log_del.LAST
DELETE FROM test WHERE id =v_log_del(i).id;
COMMIT;
END LOOP;
CLOSE c_logstodel;
END clear_logs;
Adding in rowid instead of column name, exit when v_delete_data.count = 0; instead of EXIT WHEN c_logstodel%NOTFOUND; and changing chunk limit to 50,000 allowed the script clear 35 million rows in 15 mins
create or replace procedure clear_logs
as
CURSOR c_logstodel IS SELECT rowid FROM test where id=23;
TYPE typ_log is table of rowid index by binary_integer;
v_log_del typ_log;
BEGIN
OPEN c_logstodel;
LOOP
FETCH c_logstodel BULK COLLECT INTO v_log_del LIMIT 50000;
exit when v_log_del.count = 0;
FORALL i IN v_log_del.FIRST..v_log_del.LAST
DELETE FROM test WHERE rowid =v_log_del(i);
exit when v_log_del.count = 0;
COMMIT;
END LOOP;
COMMIT;
CLOSE c_logstodel;
END clear_logs;
First off when using BULK COLLECT LIMIT X the %NOTFOUND takes on a slightly unexpected meaning. In this case %NOTFOUND actually means Oracle could not retrieve X rows. (I guess technically it always does you fetch the next 1 and it says it could not fill the 1 row buffer.) Just move the EXIT WHEN %NOTFOUND to after the FORALL. But there is actually no reason to retrieve the data and then delete the retrieved rows. While one statement would be considerable faster 35M rows would require signifient rollback space. There is an interment solution.
Although not commonly used Delate statements generate rownum as do selects. This value can be user to limit the number or rows processed. So to break into a given commit size just limit rownum on the delete:
create or replace procedure clear_logs
as
k_max_rows_per_interation constant integer := 50000;
begin
loop
delete
from test
where id=23
and rownum <= k_max_rows_per_interation;
exit when sql%rowcount < k_max_rows_per_interation;
commit;
end loop;
commit;
end;
As #Stilgar points out deletes are expensive, meaning slow, so their solution may be better. But this has the advantage that it does not essentially take the table completely out-of-service during the operation. NOTE: I tend to use a much larger commit interval size, generally around 400,000 - 300,000 rows. I suggest you talk with your DBA see what they think this limit should be. Remember it is their job to properly size rollback space for typical operations. If this is normal in your operation they need to set it correctly. If you can get rollback space for 35M deletes then that is the fastest you are going to get.
I have got a table:
table foo{
bar number,
status varchar2(50)
}
I have multiple threads/hosts each consuming the table. Each thread updates the status, i.e. pessimistically locks the row.
In oracle 12.2.
select ... for update skip locked seems to do the job but I want to limit number of rows. The new FETCH NEXT sounds right, but I cant get the syntax right:
SELECT * FROM foo ORDER BY bar
OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY
FOR UPDATE SKIP LOCKED;
What is the simplest way to achieve this, i.e. with minimum code1 (ideally without pl/sql function)?
I want something like this:
select * from (select * from foo
where status<>'baz' order by bar
) where rownum<10 for update skip locked
PS
1. We are considering moving away from oracle.
I suggest to create pl/sql function and use dynamic sql to control the number of locked records. The lock is acquired at a fetch time. So fetching N records automatically locks them. Keep in mind that records are unlocked once you finish the transaction - commit or rollback.
The following is the example to lock N records and return their id values as an array (assume you have added the primary key ID column in your table):
create or replace function get_next_unlocked_records(iLockSize number)
return sys.odcinumberlist
is
cRefCursor sys_refcursor;
aIds sys.odcinumberlist := sys.odcinumberlist();
begin
-- open cursor. No locks so far
open cRefCursor for
'select id from foo '||
'for update skip locked';
-- we fetch and lock at the same time
fetch cRefCursor bulk collect into aIds limit iLockSize;
-- close cursor
close cRefCursor;
-- return locked ID values,
-- lock is kept until the transaction is finished
return aIds;
end;
sys.odcinumberlist is the built-in array of numbers.
Here is the test script to run in db:
declare
aRes sys.odcinumberlist;
begin
aRes := get_next_unlocked_records(10);
for c in (
select column_value id
from table(aRes)
) loop
dbms_output.put_line(c.id);
end loop;
end;
Assume there are two tables TST_SAMPLE (10000 rows) and TST_SAMPLE_STATUS (empty).
I want to iterate over each record in TST_SAMPLE and add exactly one record to TST_SAMPLE_STATUS accordingly.
In a single thread that would be simply this:
begin
for r in (select * from TST_SAMPLE)
loop
insert into TST_SAMPLE_STATUS(rec_id, rec_status)
values (r.rec_id, 'TOUCHED');
end loop;
commit;
end;
/
In a multithreaded solution there's a situation, which is not clear to me.
So could you explain what causes processing one row of TST_SAMPLE several times.
Please, see details below.
create table TST_SAMPLE(
rec_id number(10) primary key
);
create table TST_SAMPLE_STATUS(
rec_id number(10),
rec_status varchar2(10),
session_id varchar2(100)
);
begin
insert into TST_SAMPLE(rec_id)
select LEVEL from dual connect by LEVEL <= 10000;
commit;
end;
/
CREATE OR REPLACE PROCEDURE tst_touch_recs(pi_limit int) is
v_last_iter_count int;
begin
loop
v_last_iter_count := 0;
--------------------------
for r in (select *
from TST_SAMPLE A
where rownum < pi_limit
and NOT EXISTS (select null
from TST_SAMPLE_STATUS B
where B.rec_id = A.rec_id)
FOR UPDATE SKIP LOCKED)
loop
insert into TST_SAMPLE_STATUS(rec_id, rec_status, session_id)
values (r.rec_id, 'TOUCHED', SYS_CONTEXT('USERENV', 'SID'));
v_last_iter_count := v_last_iter_count + 1;
end loop;
commit;
--------------------------
exit when v_last_iter_count = 0;
end loop;
end;
/
In the FOR-LOOP I try to iterate over rows that:
- has no status (NOT EXISTS clause)
- is not currently locked in another thread (FOR UPDATE SKIP LOCKED)
There's no requirement for the exact amount of rows in an iteration.
Here pi_limit is just a maximal size of one batch. The only thing needed is to process each row of TST_SAMPLE in exactly one session.
So let's run this procedure in 3 threads.
declare
v_job_id number;
begin
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
commit;
end;
Unexpectedly, we see that some rows were processed in several sessions
select count(unique rec_id) AS unique_count,
count(rec_id) AS total_count
from TST_SAMPLE_STATUS;
| unique_count | total_count |
------------------------------
| 10000 | 17397 |
------------------------------
-- run to see duplicates
select *
from TST_SAMPLE_STATUS
where REC_ID in (
select REC_ID
from TST_SAMPLE_STATUS
group by REC_ID
having count(*) > 1
)
order by REC_ID;
Please, help to recognize mistakes in implementation of procedure tst_touch_recs.
Here's a little example that shows why you're reading rows twice.
Run the following code in two sessions, starting the second a few seconds after the first:
declare
cursor c is
select a.*
from TST_SAMPLE A
where rownum < 10
and NOT EXISTS (select null
from TST_SAMPLE_STATUS B
where B.rec_id = A.rec_id)
FOR UPDATE SKIP LOCKED;
type rec is table of c%rowtype index by pls_integer;
rws rec;
begin
open c; -- data are read consistent to this time
dbms_lock.sleep ( 10 );
fetch c
bulk collect
into rws;
for i in 1 .. rws.count loop
dbms_output.put_line ( rws(i).rec_id );
end loop;
commit;
end;
/
You should see both sessions display the same rows.
Why?
Because Oracle Database has statement-level consistency, the result set for both is frozen when you open the cursor.
But when you have SKIP LOCKED, the FOR UPDATE locking only kicks in when you fetch the rows.
So session 1 starts and finds the first 9 rows not in TST_SAMPLE_STATUS. It then waits 10 seconds.
Provided you start session 2 within these 10 seconds, the cursor will look for the same nine rows.
At this point no rows are locked.
Now, here's where it gets interesting.
The sleep in the first session will finish. It'll then fetch the rows, locking them and skipping any that are already locked.
Very shortly after, it'll commit. Releasing the lock.
A few moments later, session 2 comes to read these rows. At this point the rows are not locked!
So there's nothing to skip.
How exactly you solve this depends on what you're trying to do.
Assuming you can't move to a set-based approach, you could make the transactions serializable by adding:
set transaction isolation level serializable;
before the cursor loop. This will then move to transaction-level consistency. Enabling the database to detect "something changed" when fetching rows.
But you'll need to catch ORA-08177: can't serialize access for this transaction errors in your within the outer loop. Or any process that re-reads the same rows will drop out at this point.
Or, as commenters have suggested used Advanced Queueing.
Edit: Please answer one of the two answers I ask. I know there are other options that would be better in a different case. These other potential options (partitioning the table, running as one large delete statement w/o committing in batches, etc) are NOT options in my case due to things outside my control.
I have several very large tables to delete from. All have the same foreign key that is indexed. I need to delete certain records from all tables.
table source
id --primary_key
import_source --used for choosing the ids to delete
table t1
id --foreign key
--other fields
table t2
id --foreign key
--different other fields
Usually when doing a delete like this, I'll put together a loop to step through all the ids:
declare
my_counter integer := 0;
begin
for cur in (
select id from source where import_source = 'bad.txt'
) loop
begin
delete from source where id = cur.id;
delete from t1 where id = cur.id;
delete from t2 where id = cur.id;
my_counter := my_counter + 1;
if my_counter > 500 then
my_counter := 0;
commit;
end if;
end;
end loop;
commit;
end;
However, in some code I saw elsewhere, it was put together in separate loops, one for each delete.
declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'
for h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
--do commit check
end loop;
for h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
--do commit check
end loop;
--do commit check will be replaced with the same chunk to commit every 500 rows as the above query
So I need one of the following answered:
1) Which of these is better?
2) How can I find out which is better for my particular case? (IE if it depends on how many tables I have, how big they are, etc)
Edit:
I must do this in a loop due to the size of these tables. I will be deleting thousands of records from tables with hundreds of millions of records. This is happening on a system that can't afford to have the tables locked for that long.
EDIT:
NOTE: I am required to commit in batches. The amount of data is too large to do it in one batch. The rollback tables will crash our database.
If there is a way to commit in batches other than looping, I'd be willing to hear it. Otherwise, don't bother saying that I shouldn't use a loop...
Why loop at all?
delete from t1 where id IN (select id from source where import_source = 'bad.txt';
delete from t2 where id IN (select id from source where import_source = 'bad.txt';
delete from source where import_source = 'bad.txt'
That's using standard SQL. I don't know Oracle specifically, but many DBMSes also feature multi-table JOIN-based DELETEs as well that would let you do the whole thing in a single statement.
David,
If you insist on commiting, you can use the following code:
declare
type import_ids is table of integer index by pls_integer;
my_import_ids import_ids;
cursor c is select id from source where import_source = 'bad.txt';
begin
open c;
loop
fetch c bulk collect into my_import_ids limit 500;
forall h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
commit;
exit when c%notfound;
end loop;
close c;
end;
This program fetches ids by pieces of 500 rows, deleting and commiting each piece. It should be much faster then row-by-row processing, because bulk collect and forall works as a single operation (in a single round-trip to and from database), thus minimizing the number of context switches. See Bulk Binds, Forall, Bulk Collect for details.
First of all, you shouldn't commit in the loop - it is not efficient (generates lots of redo) and if some error occurrs, you can't rollback.
As mentioned in previous answers, you should issue single deletes, or, if you are deleting most of the records, then it could be more optimal to create new tables with remaining rows, drop old ones and rename the new ones to old names.
Something like this:
CREATE TABLE new_table as select * from old_table where <filter only remaining rows>;
index new_table
grant on new table
add constraints on new_table
etc on new_table
drop table old_table
rename new_table to old_table;
See also Ask Tom
Larry Lustig is right that you don't need a loop. Nonetheless there may be some benefit in doing the delete in smaller chunks. Here PL/SQL bulk binds can improve speed greatly:
declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'
forall h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
The way I wrote it it does it all at once, in which case yeah the single SQL is better. But you can change your loop conditions to break it into chunks. The key points are
don't commit on every row. If anything, commit only every N rows.
When using chunks of N, don't run the delete in an ordinary loop. Use forall to run the delete as a bulk bind, which is much faster.
The reason, aside from the overhead of commits, is that each time you execute an SQL statement inside PL/SQL code it essentially does a context switch. Bulk binds avoid that.
You may try partitioning anyway to use parallel execution, not just to drop one partition. The Oracle documentation may prove useful in setting this up. Each partition would use it's own rollback segment in this case.
If you are doing the delete from the source before the t1/t2 deletes, that suggests you don't have referential integrity constraints (as otherwise you'd get errors saying child records exist).
I'd go for creating the constraint with ON DELETE CASCADE. Then a simple
DECLARE
v_cnt NUMBER := 1;
BEGIN
WHILE v_cnt > 0 LOOP
DELETE FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
v_cnt := SQL%ROWCOUNT;
COMMIT;
END LOOP;
END;
The child records would get deleted automatically.
If you can't have the ON DELETE CASCADE, I'd go with a GLOBAL TEMPORARY TABLE with ON COMMIT DELETE ROWS
DECLARE
v_cnt NUMBER := 1;
BEGIN
WHILE v_cnt > 0 LOOP
INSERT INTO temp (id)
SELECT id FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
v_cnt := SQL%ROWCOUNT;
DELETE FROM t1 WHERE id IN (SELECT id FROM temp);
DELETE FROM t2 WHERE id IN (SELECT id FROM temp);
DELETE FROM source WHERE id IN (SELECT id FROM temp);
COMMIT;
END LOOP;
END;
I'd also go for the largest chunk your DBA will allow.
I'd expect each transaction to last for at least a minute. More frequent commits would be a waste.
This is happening on a system that
can't afford to have the tables locked
for that long.
Oracle doesn't lock tables, only rows. I'm assuming no-one will be locking the rows you are deleting (or at least not for long). So locking is not an issue.