I'm new to PL/SQL.
I decided to use the following SQL to insert thousands of record to a table. However, is it correct to place the FORALL statement outside for loop?
Is it better to move the FORALL statement inside for loop block? Thank you.
DECLARE
CURSOR books_cur
IS
SELECT book_id, book_type
FROM books
WHERE book_category = 'PROGRAMMING';
TYPE book_ids_t IS TABLE OF
books.book_id%TYPE;
l_book_ids book_ids_t := book_ids_t();
BEGIN
FOR i IN books_cur LOOP
IF(i.book_type = 'PLSQL') THEN
l_book_ids.EXTENDS;
l_book_ids(l_book_ids.LAST) := i.book_id;
END IF;
END LOOP;
FORALL i IN l_book_ids.FIRST..l_book_ids.LAST
INSERT INTO table_a (book_id) VALUES l_book_ids(i);
END;
"Is it better to move the FORALL statement inside for loop block"
No. FORALL is a set operation. You need to execute it once per unit of work, not once per row.
" to insert thousands of record to a table"
Please remember that collections are maintained in session memory. This means you need to be more conscious of memory management when you're working with very large numbers of records.
So you may want to work in batches, something like this:
BEGIN
FOR i IN books_cur LOOP
IF(i.book_type = 'PLSQL') THEN
l_book_ids.EXTENDS;
l_book_ids(l_book_ids.LAST) := i.book_id;
END IF;
if l_book_ids.count() = 1000 -- say
then
FORALL i IN l_book_ids.FIRST..l_book_ids.LAST
INSERT INTO table_a (book_id) VALUES l_book_ids(i);
l_book_ids.delete();
end if;
END LOOP;
FORALL i IN l_book_ids.FIRST..l_book_ids.LAST
INSERT INTO table_a (book_id) VALUES l_book_ids(i);
END;
This version of the code inserts records in batches of 1000, with a supplementary FORALL statement to catch the last batch which is likely to be less than 1000 records.
However, the most efficient way of implementing your example remains
INSERT INTO table_a (book_id)
SELECT book_id
FROM books
WHERE book_category = 'PROGRAMMING'
and book_type = 'PLSQL';
No, your FORALL is in the correct place. Of course, your example program doesn't need this at all, it could be written as:
BEGIN
INSERT INTO table_a (book_id)
SELECT book_id
FROM books
WHERE book_category = 'PROGRAMMING'
AND book_type = 'PLSQL';
END;
But I realise you are interested in learning the FORALL concept rather than writing the simplest code!
In your example, FORALL requires the collection l_book_ids as input.
Inside the FOR loop, the collection is not yet fully created, so you cannot use FORALL on it.
FORALL is one statement, as APC mentioned, you cannot break it into multiple steps - as a matter of fact this is its advantage - it does the work fast, in one shot.
Related
I have about 94,000 records that need to be deleted, but I have been told not to delete all at once because it will slow the performance due to the delete trigger. What would be the best solution to accomplish this? I was thinking of an additional loop after the commit of 1000, but not too sure how to implement or know if that will reduce performance even more.
DECLARE
CURSOR CLEAN IS
SELECT EMP_ID, ACCT_ID FROM RECORDS_TO_DELETE F; --Table contains the records that needs to be deleted.
COUNTER INTEGER := 0;
BEGIN
FOR F IN CLEAN LOOP
COUNTER := COUNTER + 1;
DELETE FROM EMPLOYEES
WHERE EMP_ID = F.EMP_ID AND ACCT_ID = F.ACCT_ID;
IF MOD(COUNTER, 1000) = 0 THEN
COMMIT;
END IF;
END LOOP;
COMMIT;
END;
You need to read a bit about BULK COLLECT statements in oracle. This is commonly considered as proper way working with large tables.
Example:
LOOP
FETCH c_delete BULK COLLECT INTO t_delete LIMIT l_delete_buffer;
FORALL i IN 1..t_delete.COUNT
DELETE ps_al_chk_memo
WHERE ROWID = t_delete (i);
COMMIT;
EXIT WHEN c_delete%NOTFOUND;
COMMIT;
END LOOP;
CLOSE c_delete;
You can do it in a single statement, this should be the fastest way in any kind:
DELETE FROM EMPLOYEES
WHERE (EMP_ID, ACCT_ID) =ANY (SELECT EMP_ID, ACCT_ID FROM RECORDS_TO_DELETE)
Since I can see the volume of record is not that much so can still go with SQL not by PLSQL.Whenever possible try SQL. I think it should not cause that much performance impact.
DELETE FROM EMPLOYEES
WHERE EXISTS
(SELECT 1 FROM RECORDS_TO_DELETE F
WHERE EMP_ID = F.EMP_ID
AND ACCT_ID= F.ACCT_ID);
Hope this helps.
Just wondering if the way I put COMMIT in the code block is appropriate or not? Should I put them when it finished loop or after each insert statement or after the if else statement?
FOR VAL1 IN (SELECT A.* FROM TABLE_A A) LOOP
IF VAL1.QTY >= 0 THEN
INSERT INTO TEMP_TABLE VALUES('MORE OR EQUAL THAN 0');
COMMIT; /*<-- Should I put this here?*/
INSERT INTO AUDIT_TABLE VALUE('DATA INSERTED >= 0');
COMMIT; /*<-- Should I put this here too?*/
ELSE
INSERT INTO TEMP_TABLE VALUES ('0');
COMMIT; /*<-- Should I put this here too?*/
INSERT INTO AUDIT_TABLE('DATA INSERTED IS 0');
COMMIT; /*<-- Should I put this here too?*/
END IF;
/*Or put commit here?*/
END LOOP;
/*Or here??*/
Generally, committing in a loop is not a good idea, especially after every DML in that loop. Doing so you force oracle(LGWR) to write data in redo log files and may find yourself in a situation when other sessions hang because of log file sync wait event. Or facing ORA-1555 because undo segments will be cleared more often.
Divide your DMLs into logical units of work (transactions) and commit when that unit of work is done, not before and not too late or in a middle of a transaction. This will allow you to keep your database in a consistent state. If, for example, two insert statements form a one unit of work(one transaction), it makes sense to commit or rollback them altogether not separately.
So, generally, you should commit as less as possible. If you have to commit in a loop, introduce some threshold. For instance issue commit after, let say 150 rows:
declare
l_commit_rows number;
For i in (select * from some_table)
loop
l_commit_rows := l_commit_rows + 1;
insert into some_table(..) values(...);
if mode(l_commit_rows, 150) = 0
then
commit;
end if;
end loop;
-- commit the rest
commit;
It is rarely appropriate; say your insert into TEMP_TABLE succeeds but your insert into AUDIT_TABLE fails. You then don't know where you are at all. Additionally, commits will increase the amount of time it takes to perform an operation.
It would be more normal to do everything within a single transaction; that is remove the LOOP and perform your inserts in a single statement. This can be done by using a multi-table insert and would look something like this:
insert
when ( a.qty >= 0 ) then
into temp_table values ('MORE OR EQUAL THAN 0')
into audit_table values ('DATA INSERTED >= 0')
else
into temp_table values ('0')
into audit_table values ('DATA INSERTED IS 0')
select qty from table_a
A simple rule is to not commit in the middle of an action; you need to be able to tell exactly where you were if you have to restart an operation. This normally means, go back to the beginning but doesn't have to. For instance, if you were to place your COMMIT inside your loop but outside the IF statement then you know that that has completed. You'd have to write back somewhere to tell you that this operation has been completed though or use your SQL statement to determine whether you need to re-evaluate that row.
If you insert commit after each insert statement then the database will commit each row inserted. Same will happen if you insert commit after the IF statement ends. (So both will commit after each inserted row). If commit is given after loop then commit will happen after all rows are inserted.
Commit after the loop should work faster as it will commit bulk data but if your loop encounters any error (say after 50 rows are processed there is an error) then your 50 rows also won't be inserted.
So according to your requirement u can either use commit after if or after loop
I wrote a function to create posts for a simple blogging engine:
CREATE FUNCTION CreatePost(VARCHAR, TEXT, VARCHAR[])
RETURNS INTEGER AS $$
DECLARE
InsertedPostId INTEGER;
TagName VARCHAR;
BEGIN
INSERT INTO Posts (Title, Body)
VALUES ($1, $2)
RETURNING Id INTO InsertedPostId;
FOREACH TagName IN ARRAY $3 LOOP
DECLARE
InsertedTagId INTEGER;
BEGIN
-- I am concerned about this part.
BEGIN
INSERT INTO Tags (Name)
VALUES (TagName)
RETURNING Id INTO InsertedTagId;
EXCEPTION WHEN UNIQUE_VIOLATION THEN
SELECT INTO InsertedTagId Id
FROM Tags
WHERE Name = TagName
FETCH FIRST ROW ONLY;
END;
INSERT INTO Taggings (PostId, TagId)
VALUES (InsertedPostId, InsertedTagId);
END;
END LOOP;
RETURN InsertedPostId;
END;
$$ LANGUAGE 'plpgsql';
Is this prone to race conditions when multiple users delete tags and create posts at the same time?
Specifically, do transactions (and thus functions) prevent such race conditions from happening?
I'm using PostgreSQL 9.2.3.
It's the recurring problem of SELECT or INSERT under possible concurrent write load, related to (but different from) UPSERT (which is INSERT or UPDATE).
This PL/pgSQL function uses UPSERT (INSERT ... ON CONFLICT ..) to INSERT or SELECT a single row:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
SELECT tag_id -- only if row existed before
FROM tag
WHERE tag = _tag
INTO _tag_id;
IF NOT FOUND THEN
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
END IF;
END
$func$;
There is still a tiny window for a race condition. To make absolutely sure we get an ID:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT tag_id
FROM tag
WHERE tag = _tag
INTO _tag_id;
EXIT WHEN FOUND;
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
EXIT WHEN FOUND;
END LOOP;
END
$func$;
db<>fiddle here
This keeps looping until either INSERT or SELECT succeeds.
Call:
SELECT f_tag_id('possibly_new_tag');
If subsequent commands in the same transaction rely on the existence of the row and it is actually possible that other transactions update or delete it concurrently, you can lock an existing row in the SELECT statement with FOR SHARE.
If the row gets inserted instead, it is locked (or not visible for other transactions) until the end of the transaction anyway.
Start with the common case (INSERT vs SELECT) to make it faster.
Related:
Get Id from a conditional INSERT
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
Related (pure SQL) solution to INSERT or SELECT multiple rows (a set) at once:
How to use RETURNING with ON CONFLICT in PostgreSQL?
What's wrong with this pure SQL solution?
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE sql AS
$func$
WITH ins AS (
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
)
SELECT tag_id FROM ins
UNION ALL
SELECT tag_id FROM tag WHERE tag = _tag
LIMIT 1;
$func$;
Not entirely wrong, but it fails to seal a loophole, like #FunctorSalad worked out. The function can come up with an empty result if a concurrent transaction tries to do the same at the same time. The manual:
All the statements are executed with the same snapshot
If a concurrent transaction inserts the same new tag a moment earlier, but hasn't committed, yet:
The UPSERT part comes up empty, after waiting for the concurrent transaction to finish. (If the concurrent transaction should roll back, it still inserts the new tag and returns a new ID.)
The SELECT part also comes up empty, because it's based on the same snapshot, where the new tag from the (yet uncommitted) concurrent transaction is not visible.
We get nothing. Not as intended. That's counter-intuitive to naive logic (and I got caught there), but that's how the MVCC model of Postgres works - has to work.
So do not use this if multiple transactions can try to insert the same tag at the same time. Or loop until you actually get a row. The loop will hardly ever be triggered in common work loads anyway.
Postgres 9.4 or older
Given this (slightly simplified) table:
CREATE table tag (
tag_id serial PRIMARY KEY
, tag text UNIQUE
);
An almost 100% secure function to insert new tag / select existing one, could look like this.
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
BEGIN
WITH sel AS (SELECT t.tag_id FROM tag t WHERE t.tag = _tag FOR SHARE)
, ins AS (INSERT INTO tag(tag)
SELECT _tag
WHERE NOT EXISTS (SELECT 1 FROM sel) -- only if not found
RETURNING tag.tag_id) -- qualified so no conflict with param
SELECT sel.tag_id FROM sel
UNION ALL
SELECT ins.tag_id FROM ins
INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- insert in concurrent session?
RAISE NOTICE 'It actually happened!'; -- hardly ever happens
END;
EXIT WHEN tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$;
db<>fiddle here
Old sqlfiddle
Why not 100%? Consider the notes in the manual for the related UPSERT example:
https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
Explanation
Try the SELECT first. This way you avoid the considerably more expensive exception handling 99.99% of the time.
Use a CTE to minimize the (already tiny) time slot for the race condition.
The time window between the SELECT and the INSERT within one query is super tiny. If you don't have heavy concurrent load, or if you can live with an exception once a year, you could just ignore the case and use the SQL statement, which is faster.
No need for FETCH FIRST ROW ONLY (= LIMIT 1). The tag name is obviously UNIQUE.
Remove FOR SHARE in my example if you don't usually have concurrent DELETE or UPDATE on the table tag. Costs a tiny bit of performance.
Never quote the language name: 'plpgsql'. plpgsql is an identifier. Quoting may cause problems and is only tolerated for backwards compatibility.
Don't use non-descriptive column names like id or name. When joining a couple of tables (which is what you do in a relational DB) you end up with multiple identical names and have to use aliases.
Built into your function
Using this function you could largely simplify your FOREACH LOOP to:
...
FOREACH TagName IN ARRAY $3
LOOP
INSERT INTO taggings (PostId, TagId)
VALUES (InsertedPostId, f_tag_id(TagName));
END LOOP;
...
Faster, though, as a single SQL statement with unnest():
INSERT INTO taggings (PostId, TagId)
SELECT InsertedPostId, f_tag_id(tag)
FROM unnest($3) tag;
Replaces the whole loop.
Alternative solution
This variant builds on the behavior of UNION ALL with a LIMIT clause: as soon as enough rows are found, the rest is never executed:
Way to try multiple SELECTs till a result is available?
Building on this, we can outsource the INSERT into a separate function. Only there we need exception handling. Just as safe as the first solution.
CREATE OR REPLACE FUNCTION f_insert_tag(_tag text, OUT tag_id int)
RETURNS int
LANGUAGE plpgsql AS
$func$
BEGIN
INSERT INTO tag(tag) VALUES (_tag) RETURNING tag.tag_id INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- catch exception, NULL is returned
END
$func$;
Which is used in the main function:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT tag_id FROM tag WHERE tag = _tag
UNION ALL
SELECT f_insert_tag(_tag) -- only executed if tag not found
LIMIT 1 -- not strictly necessary, just to be clear
INTO _tag_id;
EXIT WHEN _tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$;
This is a bit cheaper if most of the calls only need SELECT, because the more expensive block with INSERT containing the EXCEPTION clause is rarely entered. The query is also simpler.
FOR SHARE is not possible here (not allowed in UNION query).
LIMIT 1 would not be necessary (tested in pg 9.4). Postgres derives LIMIT 1 from INTO _tag_id and only executes until the first row is found.
There's still something to watch out for even when using the ON CONFLICT clause introduced in Postgres 9.5. Using the same function and example table as in #Erwin Brandstetter's answer, if we do:
Session 1: begin;
Session 2: begin;
Session 1: select f_tag_id('a');
f_tag_id
----------
11
(1 row)
Session 2: select f_tag_id('a');
[Session 2 blocks]
Session 1: commit;
[Session 2 returns:]
f_tag_id
----------
NULL
(1 row)
So f_tag_id returned NULL in session 2, which would be impossible in a single-threaded world!
If we raise the transaction isolation level to repeatable read (or the stronger serializable), session 2 throws ERROR: could not serialize access due to concurrent update instead. So no "impossible" results at least, but unfortunately we now need to be prepared to retry the transaction.
Edit: With repeatable read or serializable, if session 1 inserts tag a, then session 2 inserts b, then session 1 tries to insert b and session 2 tries to insert a, one session detects a deadlock:
ERROR: deadlock detected
DETAIL: Process 14377 waits for ShareLock on transaction 1795501; blocked by process 14363.
Process 14363 waits for ShareLock on transaction 1795503; blocked by process 14377.
HINT: See server log for query details.
CONTEXT: while inserting index tuple (0,3) in relation "tag"
SQL function "f_tag_id" statement 1
After the session that received the deadlock error rolls back, the other session continues. So I guess we should treat deadlock just like serialization_failure and retry, in a situation like this?
Alternatively, insert the tags in a consistent order, but this is not easy if they don't all get added in one place.
I think there is a slight chance that when the tag already existed it might be deleted by another transaction after your transaction has found it. Using a SELECT FOR UPDATE should solve that.
I am fairly new to SQL and was wondering if someone can help me.
I got a database that has around 10 million rows.
I need to make a script that finds the records that have some NULL fields, and then updates it to a certain value.
The problem I have from doing a simple update statement, is that it will blow the rollback space.
I was reading around that I need to use BULK COLLECT AND FETCH.
My idea was to fetch 10,000 records at a time, update, commit, and continue fetching.
I tried looking for examples on Google but I have not found anything yet.
Any help?
Thanks!!
This is what I have so far:
DECLARE
CURSOR rec_cur IS
SELECT DATE_ORIGIN
FROM MAIN_TBL WHERE DATE_ORIGIN IS NULL;
TYPE date_tab_t IS TABLE OF DATE;
date_tab date_tab_t;
BEGIN
OPEN rec_cur;
LOOP
FETCH rec_cur BULK COLLECT INTO date_tab LIMIT 1000;
EXIT WHEN date_tab.COUNT() = 0;
FORALL i IN 1 .. date_tab.COUNT
UPDATE MAIN_TBL SET DATE_ORIGIN = '23-JAN-2012'
WHERE DATE_ORIGIN IS NULL;
END LOOP;
CLOSE rec_cur;
END;
I think I see what you're trying to do. There are a number of points I want to make about the differences between the code below and yours.
Your forall loop will not use an index. This is easy to get round by using rowid to update your table.
By committing after each forall you reduce the amount of undo needed; but make it more difficult to rollback if something goes wrong. Though logically your query could be re-started in the middle easily and without detriment to your objective.
rowids are small, collect at least 25k at a time; if not 100k.
You cannot index a null in Oracle. There are plenty of questions on stackoverflow about this is you need more information. A functional index on something like nvl(date_origin,'x') as a loose example would increase the speed at which you select data. It also means you never actually have to use the table itself. You only select from the index.
Your date data-type seems to be a string. I've kept this but it's not wise.
If you can get someone to increase your undo tablespace size then a straight up update will be quicker.
Assuming as per your comments date_origin is a date then the index should be on something like:
nvl(date_origin,to_date('absolute_minimum_date_in_Oracle_as_a_string','yyyymmdd'))
I don't have access to a DB at the moment but to find out the amdiOaas run the following query:
select to_date('0001','yyyy') from dual;
It should raise a useful error for you.
Working example in PL/SQL Developer.
create table main_tbl as
select cast( null as date ) as date_origin
from all_objects
;
create index i_main_tbl
on main_tbl ( nvl( to_date(date_origin,'yyyy-mm-dd')
, to_date('0001-01-01' ,'yyyy-mm-dd') )
)
;
declare
cursor c_rec is
select rowid
from main_tbl
where nvl(date_origin,to_date('0001-01-01','yyyy-mm-dd'))
= to_date('0001-01-01','yyyy-mm-dd')
;
type t__rec is table of rowid index by binary_integer;
t_rec t__rec;
begin
open c_rec;
loop
fetch c_rec bulk collect into t_rec limit 50000;
exit when t_rec.count = 0;
forall i in t_rec.first .. t_rec.last
update main_tbl
set date_origin = to_date('23-JAN-2012','DD-MON-YYYY')
where rowid = t_rec(i)
;
commit ;
end loop;
close c_rec;
end;
/
Edit: Please answer one of the two answers I ask. I know there are other options that would be better in a different case. These other potential options (partitioning the table, running as one large delete statement w/o committing in batches, etc) are NOT options in my case due to things outside my control.
I have several very large tables to delete from. All have the same foreign key that is indexed. I need to delete certain records from all tables.
table source
id --primary_key
import_source --used for choosing the ids to delete
table t1
id --foreign key
--other fields
table t2
id --foreign key
--different other fields
Usually when doing a delete like this, I'll put together a loop to step through all the ids:
declare
my_counter integer := 0;
begin
for cur in (
select id from source where import_source = 'bad.txt'
) loop
begin
delete from source where id = cur.id;
delete from t1 where id = cur.id;
delete from t2 where id = cur.id;
my_counter := my_counter + 1;
if my_counter > 500 then
my_counter := 0;
commit;
end if;
end;
end loop;
commit;
end;
However, in some code I saw elsewhere, it was put together in separate loops, one for each delete.
declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'
for h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
--do commit check
end loop;
for h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
--do commit check
end loop;
--do commit check will be replaced with the same chunk to commit every 500 rows as the above query
So I need one of the following answered:
1) Which of these is better?
2) How can I find out which is better for my particular case? (IE if it depends on how many tables I have, how big they are, etc)
Edit:
I must do this in a loop due to the size of these tables. I will be deleting thousands of records from tables with hundreds of millions of records. This is happening on a system that can't afford to have the tables locked for that long.
EDIT:
NOTE: I am required to commit in batches. The amount of data is too large to do it in one batch. The rollback tables will crash our database.
If there is a way to commit in batches other than looping, I'd be willing to hear it. Otherwise, don't bother saying that I shouldn't use a loop...
Why loop at all?
delete from t1 where id IN (select id from source where import_source = 'bad.txt';
delete from t2 where id IN (select id from source where import_source = 'bad.txt';
delete from source where import_source = 'bad.txt'
That's using standard SQL. I don't know Oracle specifically, but many DBMSes also feature multi-table JOIN-based DELETEs as well that would let you do the whole thing in a single statement.
David,
If you insist on commiting, you can use the following code:
declare
type import_ids is table of integer index by pls_integer;
my_import_ids import_ids;
cursor c is select id from source where import_source = 'bad.txt';
begin
open c;
loop
fetch c bulk collect into my_import_ids limit 500;
forall h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
commit;
exit when c%notfound;
end loop;
close c;
end;
This program fetches ids by pieces of 500 rows, deleting and commiting each piece. It should be much faster then row-by-row processing, because bulk collect and forall works as a single operation (in a single round-trip to and from database), thus minimizing the number of context switches. See Bulk Binds, Forall, Bulk Collect for details.
First of all, you shouldn't commit in the loop - it is not efficient (generates lots of redo) and if some error occurrs, you can't rollback.
As mentioned in previous answers, you should issue single deletes, or, if you are deleting most of the records, then it could be more optimal to create new tables with remaining rows, drop old ones and rename the new ones to old names.
Something like this:
CREATE TABLE new_table as select * from old_table where <filter only remaining rows>;
index new_table
grant on new table
add constraints on new_table
etc on new_table
drop table old_table
rename new_table to old_table;
See also Ask Tom
Larry Lustig is right that you don't need a loop. Nonetheless there may be some benefit in doing the delete in smaller chunks. Here PL/SQL bulk binds can improve speed greatly:
declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'
forall h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
The way I wrote it it does it all at once, in which case yeah the single SQL is better. But you can change your loop conditions to break it into chunks. The key points are
don't commit on every row. If anything, commit only every N rows.
When using chunks of N, don't run the delete in an ordinary loop. Use forall to run the delete as a bulk bind, which is much faster.
The reason, aside from the overhead of commits, is that each time you execute an SQL statement inside PL/SQL code it essentially does a context switch. Bulk binds avoid that.
You may try partitioning anyway to use parallel execution, not just to drop one partition. The Oracle documentation may prove useful in setting this up. Each partition would use it's own rollback segment in this case.
If you are doing the delete from the source before the t1/t2 deletes, that suggests you don't have referential integrity constraints (as otherwise you'd get errors saying child records exist).
I'd go for creating the constraint with ON DELETE CASCADE. Then a simple
DECLARE
v_cnt NUMBER := 1;
BEGIN
WHILE v_cnt > 0 LOOP
DELETE FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
v_cnt := SQL%ROWCOUNT;
COMMIT;
END LOOP;
END;
The child records would get deleted automatically.
If you can't have the ON DELETE CASCADE, I'd go with a GLOBAL TEMPORARY TABLE with ON COMMIT DELETE ROWS
DECLARE
v_cnt NUMBER := 1;
BEGIN
WHILE v_cnt > 0 LOOP
INSERT INTO temp (id)
SELECT id FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
v_cnt := SQL%ROWCOUNT;
DELETE FROM t1 WHERE id IN (SELECT id FROM temp);
DELETE FROM t2 WHERE id IN (SELECT id FROM temp);
DELETE FROM source WHERE id IN (SELECT id FROM temp);
COMMIT;
END LOOP;
END;
I'd also go for the largest chunk your DBA will allow.
I'd expect each transaction to last for at least a minute. More frequent commits would be a waste.
This is happening on a system that
can't afford to have the tables locked
for that long.
Oracle doesn't lock tables, only rows. I'm assuming no-one will be locking the rows you are deleting (or at least not for long). So locking is not an issue.