I have about 94,000 records that need to be deleted, but I have been told not to delete all at once because it will slow the performance due to the delete trigger. What would be the best solution to accomplish this? I was thinking of an additional loop after the commit of 1000, but not too sure how to implement or know if that will reduce performance even more.
DECLARE
CURSOR CLEAN IS
SELECT EMP_ID, ACCT_ID FROM RECORDS_TO_DELETE F; --Table contains the records that needs to be deleted.
COUNTER INTEGER := 0;
BEGIN
FOR F IN CLEAN LOOP
COUNTER := COUNTER + 1;
DELETE FROM EMPLOYEES
WHERE EMP_ID = F.EMP_ID AND ACCT_ID = F.ACCT_ID;
IF MOD(COUNTER, 1000) = 0 THEN
COMMIT;
END IF;
END LOOP;
COMMIT;
END;
You need to read a bit about BULK COLLECT statements in oracle. This is commonly considered as proper way working with large tables.
Example:
LOOP
FETCH c_delete BULK COLLECT INTO t_delete LIMIT l_delete_buffer;
FORALL i IN 1..t_delete.COUNT
DELETE ps_al_chk_memo
WHERE ROWID = t_delete (i);
COMMIT;
EXIT WHEN c_delete%NOTFOUND;
COMMIT;
END LOOP;
CLOSE c_delete;
You can do it in a single statement, this should be the fastest way in any kind:
DELETE FROM EMPLOYEES
WHERE (EMP_ID, ACCT_ID) =ANY (SELECT EMP_ID, ACCT_ID FROM RECORDS_TO_DELETE)
Since I can see the volume of record is not that much so can still go with SQL not by PLSQL.Whenever possible try SQL. I think it should not cause that much performance impact.
DELETE FROM EMPLOYEES
WHERE EXISTS
(SELECT 1 FROM RECORDS_TO_DELETE F
WHERE EMP_ID = F.EMP_ID
AND ACCT_ID= F.ACCT_ID);
Hope this helps.
Related
I'm new in PL/SQL.
I have a procedure like:
create or replace procedure insert_charge is
v_count number;
begin
for i in (select t.name, t.hire_date, t.salary
from emp t
where t.create_date >= (sysdate - 30)
and t.salary = 0) loop
insert into charge
(name, hire_date, salary)
values
(i.name, hire_date, salary);
commit;
update emp l
set l.status = 1
where l.name = i.name
and l.status = 0
and l.hire_date = i.hire_date;
commit;
end loop;
exception
when others then
rollback;
end insert_charge;
How can use FORALL statement instead of this?
You can't.
The FORALL statement runs one DML statement multiple times
ONE DML statement. You have two (update and insert).
As of code you wrote:
move COMMIT out of the loop
remove that when others "handler" as it handles nothing. If error happens, Oracle will silently rollback and report that procedure completed successfully, while it - actually - failed
There are a few additional tasks for FORALL; namely defining a collection to define the bulk area and a variable of that collection type to contain the actual data. As a safety value you should place a LIMIT on the number of fetched at once. Bulk Collect/ Forall is a trade off of speed vs memory. And at a certain point (depending on your configuration) has diminishing returns. Besides the memory you use for it is unavailable to other processes in the database. Plat well with your fellow queries. Then as #Littlefoot points out DO NOT SQUASH EXCEPTIONS log them and re-raise. Finally, a note about commits. **Do not commit after each DML statement, You may want spend some time to investigate [transactions][1]. With this in mind your procedure becomes something like:
create or replace procedure insert_charge is
cursor c_emp_cur is
select t.name, t.hire_date, t.salary
from emp t
where t.create_date >= (sysdate - 30)
and t.salary = 0;
type c_emp_array_t is table of c_emp%rowtype ; -- define collection for rows selected
k_emp_rows_max constant integer := 500; -- defines the maximum rows per fetch
l_emp_list c_emp_array_t; -- define variable of rows collection
begin
open c_emp_cur;
loop
fetch c_emp_cur -- fetch up to LIMIT rows from cursor
bulk collect
into l_emp_collect
limit k_emp_rows_max;
forall i in 1 .. l_emp_collect.count -- run insert for ALL rows in the collection
insert into charge(name, hire_date, salary)
values( l_emp_collect(i).name
, l_emp_collect(i).hire_date
, l_emp_collect(i).salary);
forall i in 1 .. l_emp_collect.count -- run update for ALL rows in the collection
update emp l
set l.status = 1
where l.name = l_emp_collect(i).name
and l.status = 0
and l.hire_date = l_emp_collect(i).hire_date;
exit when c_emp_cur%notfound; -- no more rows so exit
end loop;
close c_emp_cur;
commit; -- JUST 1 COMMIT;
exception
when others then
generate_exception_log ('insert_charge', sysdate, sql_errm ); --ASSUMED Anonomous Transaction procedure for exception log table.
raise;
end insert_charge;
DISCLAIMER: Not tested.
[1]: https://www.techopedia.com/definition/16455/transaction-databases
Assume there are two tables TST_SAMPLE (10000 rows) and TST_SAMPLE_STATUS (empty).
I want to iterate over each record in TST_SAMPLE and add exactly one record to TST_SAMPLE_STATUS accordingly.
In a single thread that would be simply this:
begin
for r in (select * from TST_SAMPLE)
loop
insert into TST_SAMPLE_STATUS(rec_id, rec_status)
values (r.rec_id, 'TOUCHED');
end loop;
commit;
end;
/
In a multithreaded solution there's a situation, which is not clear to me.
So could you explain what causes processing one row of TST_SAMPLE several times.
Please, see details below.
create table TST_SAMPLE(
rec_id number(10) primary key
);
create table TST_SAMPLE_STATUS(
rec_id number(10),
rec_status varchar2(10),
session_id varchar2(100)
);
begin
insert into TST_SAMPLE(rec_id)
select LEVEL from dual connect by LEVEL <= 10000;
commit;
end;
/
CREATE OR REPLACE PROCEDURE tst_touch_recs(pi_limit int) is
v_last_iter_count int;
begin
loop
v_last_iter_count := 0;
--------------------------
for r in (select *
from TST_SAMPLE A
where rownum < pi_limit
and NOT EXISTS (select null
from TST_SAMPLE_STATUS B
where B.rec_id = A.rec_id)
FOR UPDATE SKIP LOCKED)
loop
insert into TST_SAMPLE_STATUS(rec_id, rec_status, session_id)
values (r.rec_id, 'TOUCHED', SYS_CONTEXT('USERENV', 'SID'));
v_last_iter_count := v_last_iter_count + 1;
end loop;
commit;
--------------------------
exit when v_last_iter_count = 0;
end loop;
end;
/
In the FOR-LOOP I try to iterate over rows that:
- has no status (NOT EXISTS clause)
- is not currently locked in another thread (FOR UPDATE SKIP LOCKED)
There's no requirement for the exact amount of rows in an iteration.
Here pi_limit is just a maximal size of one batch. The only thing needed is to process each row of TST_SAMPLE in exactly one session.
So let's run this procedure in 3 threads.
declare
v_job_id number;
begin
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
commit;
end;
Unexpectedly, we see that some rows were processed in several sessions
select count(unique rec_id) AS unique_count,
count(rec_id) AS total_count
from TST_SAMPLE_STATUS;
| unique_count | total_count |
------------------------------
| 10000 | 17397 |
------------------------------
-- run to see duplicates
select *
from TST_SAMPLE_STATUS
where REC_ID in (
select REC_ID
from TST_SAMPLE_STATUS
group by REC_ID
having count(*) > 1
)
order by REC_ID;
Please, help to recognize mistakes in implementation of procedure tst_touch_recs.
Here's a little example that shows why you're reading rows twice.
Run the following code in two sessions, starting the second a few seconds after the first:
declare
cursor c is
select a.*
from TST_SAMPLE A
where rownum < 10
and NOT EXISTS (select null
from TST_SAMPLE_STATUS B
where B.rec_id = A.rec_id)
FOR UPDATE SKIP LOCKED;
type rec is table of c%rowtype index by pls_integer;
rws rec;
begin
open c; -- data are read consistent to this time
dbms_lock.sleep ( 10 );
fetch c
bulk collect
into rws;
for i in 1 .. rws.count loop
dbms_output.put_line ( rws(i).rec_id );
end loop;
commit;
end;
/
You should see both sessions display the same rows.
Why?
Because Oracle Database has statement-level consistency, the result set for both is frozen when you open the cursor.
But when you have SKIP LOCKED, the FOR UPDATE locking only kicks in when you fetch the rows.
So session 1 starts and finds the first 9 rows not in TST_SAMPLE_STATUS. It then waits 10 seconds.
Provided you start session 2 within these 10 seconds, the cursor will look for the same nine rows.
At this point no rows are locked.
Now, here's where it gets interesting.
The sleep in the first session will finish. It'll then fetch the rows, locking them and skipping any that are already locked.
Very shortly after, it'll commit. Releasing the lock.
A few moments later, session 2 comes to read these rows. At this point the rows are not locked!
So there's nothing to skip.
How exactly you solve this depends on what you're trying to do.
Assuming you can't move to a set-based approach, you could make the transactions serializable by adding:
set transaction isolation level serializable;
before the cursor loop. This will then move to transaction-level consistency. Enabling the database to detect "something changed" when fetching rows.
But you'll need to catch ORA-08177: can't serialize access for this transaction errors in your within the outer loop. Or any process that re-reads the same rows will drop out at this point.
Or, as commenters have suggested used Advanced Queueing.
I'm new to PL/SQL.
I decided to use the following SQL to insert thousands of record to a table. However, is it correct to place the FORALL statement outside for loop?
Is it better to move the FORALL statement inside for loop block? Thank you.
DECLARE
CURSOR books_cur
IS
SELECT book_id, book_type
FROM books
WHERE book_category = 'PROGRAMMING';
TYPE book_ids_t IS TABLE OF
books.book_id%TYPE;
l_book_ids book_ids_t := book_ids_t();
BEGIN
FOR i IN books_cur LOOP
IF(i.book_type = 'PLSQL') THEN
l_book_ids.EXTENDS;
l_book_ids(l_book_ids.LAST) := i.book_id;
END IF;
END LOOP;
FORALL i IN l_book_ids.FIRST..l_book_ids.LAST
INSERT INTO table_a (book_id) VALUES l_book_ids(i);
END;
"Is it better to move the FORALL statement inside for loop block"
No. FORALL is a set operation. You need to execute it once per unit of work, not once per row.
" to insert thousands of record to a table"
Please remember that collections are maintained in session memory. This means you need to be more conscious of memory management when you're working with very large numbers of records.
So you may want to work in batches, something like this:
BEGIN
FOR i IN books_cur LOOP
IF(i.book_type = 'PLSQL') THEN
l_book_ids.EXTENDS;
l_book_ids(l_book_ids.LAST) := i.book_id;
END IF;
if l_book_ids.count() = 1000 -- say
then
FORALL i IN l_book_ids.FIRST..l_book_ids.LAST
INSERT INTO table_a (book_id) VALUES l_book_ids(i);
l_book_ids.delete();
end if;
END LOOP;
FORALL i IN l_book_ids.FIRST..l_book_ids.LAST
INSERT INTO table_a (book_id) VALUES l_book_ids(i);
END;
This version of the code inserts records in batches of 1000, with a supplementary FORALL statement to catch the last batch which is likely to be less than 1000 records.
However, the most efficient way of implementing your example remains
INSERT INTO table_a (book_id)
SELECT book_id
FROM books
WHERE book_category = 'PROGRAMMING'
and book_type = 'PLSQL';
No, your FORALL is in the correct place. Of course, your example program doesn't need this at all, it could be written as:
BEGIN
INSERT INTO table_a (book_id)
SELECT book_id
FROM books
WHERE book_category = 'PROGRAMMING'
AND book_type = 'PLSQL';
END;
But I realise you are interested in learning the FORALL concept rather than writing the simplest code!
In your example, FORALL requires the collection l_book_ids as input.
Inside the FOR loop, the collection is not yet fully created, so you cannot use FORALL on it.
FORALL is one statement, as APC mentioned, you cannot break it into multiple steps - as a matter of fact this is its advantage - it does the work fast, in one shot.
I am fairly new to SQL and was wondering if someone can help me.
I got a database that has around 10 million rows.
I need to make a script that finds the records that have some NULL fields, and then updates it to a certain value.
The problem I have from doing a simple update statement, is that it will blow the rollback space.
I was reading around that I need to use BULK COLLECT AND FETCH.
My idea was to fetch 10,000 records at a time, update, commit, and continue fetching.
I tried looking for examples on Google but I have not found anything yet.
Any help?
Thanks!!
This is what I have so far:
DECLARE
CURSOR rec_cur IS
SELECT DATE_ORIGIN
FROM MAIN_TBL WHERE DATE_ORIGIN IS NULL;
TYPE date_tab_t IS TABLE OF DATE;
date_tab date_tab_t;
BEGIN
OPEN rec_cur;
LOOP
FETCH rec_cur BULK COLLECT INTO date_tab LIMIT 1000;
EXIT WHEN date_tab.COUNT() = 0;
FORALL i IN 1 .. date_tab.COUNT
UPDATE MAIN_TBL SET DATE_ORIGIN = '23-JAN-2012'
WHERE DATE_ORIGIN IS NULL;
END LOOP;
CLOSE rec_cur;
END;
I think I see what you're trying to do. There are a number of points I want to make about the differences between the code below and yours.
Your forall loop will not use an index. This is easy to get round by using rowid to update your table.
By committing after each forall you reduce the amount of undo needed; but make it more difficult to rollback if something goes wrong. Though logically your query could be re-started in the middle easily and without detriment to your objective.
rowids are small, collect at least 25k at a time; if not 100k.
You cannot index a null in Oracle. There are plenty of questions on stackoverflow about this is you need more information. A functional index on something like nvl(date_origin,'x') as a loose example would increase the speed at which you select data. It also means you never actually have to use the table itself. You only select from the index.
Your date data-type seems to be a string. I've kept this but it's not wise.
If you can get someone to increase your undo tablespace size then a straight up update will be quicker.
Assuming as per your comments date_origin is a date then the index should be on something like:
nvl(date_origin,to_date('absolute_minimum_date_in_Oracle_as_a_string','yyyymmdd'))
I don't have access to a DB at the moment but to find out the amdiOaas run the following query:
select to_date('0001','yyyy') from dual;
It should raise a useful error for you.
Working example in PL/SQL Developer.
create table main_tbl as
select cast( null as date ) as date_origin
from all_objects
;
create index i_main_tbl
on main_tbl ( nvl( to_date(date_origin,'yyyy-mm-dd')
, to_date('0001-01-01' ,'yyyy-mm-dd') )
)
;
declare
cursor c_rec is
select rowid
from main_tbl
where nvl(date_origin,to_date('0001-01-01','yyyy-mm-dd'))
= to_date('0001-01-01','yyyy-mm-dd')
;
type t__rec is table of rowid index by binary_integer;
t_rec t__rec;
begin
open c_rec;
loop
fetch c_rec bulk collect into t_rec limit 50000;
exit when t_rec.count = 0;
forall i in t_rec.first .. t_rec.last
update main_tbl
set date_origin = to_date('23-JAN-2012','DD-MON-YYYY')
where rowid = t_rec(i)
;
commit ;
end loop;
close c_rec;
end;
/
Edit: Please answer one of the two answers I ask. I know there are other options that would be better in a different case. These other potential options (partitioning the table, running as one large delete statement w/o committing in batches, etc) are NOT options in my case due to things outside my control.
I have several very large tables to delete from. All have the same foreign key that is indexed. I need to delete certain records from all tables.
table source
id --primary_key
import_source --used for choosing the ids to delete
table t1
id --foreign key
--other fields
table t2
id --foreign key
--different other fields
Usually when doing a delete like this, I'll put together a loop to step through all the ids:
declare
my_counter integer := 0;
begin
for cur in (
select id from source where import_source = 'bad.txt'
) loop
begin
delete from source where id = cur.id;
delete from t1 where id = cur.id;
delete from t2 where id = cur.id;
my_counter := my_counter + 1;
if my_counter > 500 then
my_counter := 0;
commit;
end if;
end;
end loop;
commit;
end;
However, in some code I saw elsewhere, it was put together in separate loops, one for each delete.
declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'
for h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
--do commit check
end loop;
for h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
--do commit check
end loop;
--do commit check will be replaced with the same chunk to commit every 500 rows as the above query
So I need one of the following answered:
1) Which of these is better?
2) How can I find out which is better for my particular case? (IE if it depends on how many tables I have, how big they are, etc)
Edit:
I must do this in a loop due to the size of these tables. I will be deleting thousands of records from tables with hundreds of millions of records. This is happening on a system that can't afford to have the tables locked for that long.
EDIT:
NOTE: I am required to commit in batches. The amount of data is too large to do it in one batch. The rollback tables will crash our database.
If there is a way to commit in batches other than looping, I'd be willing to hear it. Otherwise, don't bother saying that I shouldn't use a loop...
Why loop at all?
delete from t1 where id IN (select id from source where import_source = 'bad.txt';
delete from t2 where id IN (select id from source where import_source = 'bad.txt';
delete from source where import_source = 'bad.txt'
That's using standard SQL. I don't know Oracle specifically, but many DBMSes also feature multi-table JOIN-based DELETEs as well that would let you do the whole thing in a single statement.
David,
If you insist on commiting, you can use the following code:
declare
type import_ids is table of integer index by pls_integer;
my_import_ids import_ids;
cursor c is select id from source where import_source = 'bad.txt';
begin
open c;
loop
fetch c bulk collect into my_import_ids limit 500;
forall h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
commit;
exit when c%notfound;
end loop;
close c;
end;
This program fetches ids by pieces of 500 rows, deleting and commiting each piece. It should be much faster then row-by-row processing, because bulk collect and forall works as a single operation (in a single round-trip to and from database), thus minimizing the number of context switches. See Bulk Binds, Forall, Bulk Collect for details.
First of all, you shouldn't commit in the loop - it is not efficient (generates lots of redo) and if some error occurrs, you can't rollback.
As mentioned in previous answers, you should issue single deletes, or, if you are deleting most of the records, then it could be more optimal to create new tables with remaining rows, drop old ones and rename the new ones to old names.
Something like this:
CREATE TABLE new_table as select * from old_table where <filter only remaining rows>;
index new_table
grant on new table
add constraints on new_table
etc on new_table
drop table old_table
rename new_table to old_table;
See also Ask Tom
Larry Lustig is right that you don't need a loop. Nonetheless there may be some benefit in doing the delete in smaller chunks. Here PL/SQL bulk binds can improve speed greatly:
declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'
forall h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
The way I wrote it it does it all at once, in which case yeah the single SQL is better. But you can change your loop conditions to break it into chunks. The key points are
don't commit on every row. If anything, commit only every N rows.
When using chunks of N, don't run the delete in an ordinary loop. Use forall to run the delete as a bulk bind, which is much faster.
The reason, aside from the overhead of commits, is that each time you execute an SQL statement inside PL/SQL code it essentially does a context switch. Bulk binds avoid that.
You may try partitioning anyway to use parallel execution, not just to drop one partition. The Oracle documentation may prove useful in setting this up. Each partition would use it's own rollback segment in this case.
If you are doing the delete from the source before the t1/t2 deletes, that suggests you don't have referential integrity constraints (as otherwise you'd get errors saying child records exist).
I'd go for creating the constraint with ON DELETE CASCADE. Then a simple
DECLARE
v_cnt NUMBER := 1;
BEGIN
WHILE v_cnt > 0 LOOP
DELETE FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
v_cnt := SQL%ROWCOUNT;
COMMIT;
END LOOP;
END;
The child records would get deleted automatically.
If you can't have the ON DELETE CASCADE, I'd go with a GLOBAL TEMPORARY TABLE with ON COMMIT DELETE ROWS
DECLARE
v_cnt NUMBER := 1;
BEGIN
WHILE v_cnt > 0 LOOP
INSERT INTO temp (id)
SELECT id FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
v_cnt := SQL%ROWCOUNT;
DELETE FROM t1 WHERE id IN (SELECT id FROM temp);
DELETE FROM t2 WHERE id IN (SELECT id FROM temp);
DELETE FROM source WHERE id IN (SELECT id FROM temp);
COMMIT;
END LOOP;
END;
I'd also go for the largest chunk your DBA will allow.
I'd expect each transaction to last for at least a minute. More frequent commits would be a waste.
This is happening on a system that
can't afford to have the tables locked
for that long.
Oracle doesn't lock tables, only rows. I'm assuming no-one will be locking the rows you are deleting (or at least not for long). So locking is not an issue.