Why row is visible to several sessions when selected FOR UPDATE SKIP LOCKED? - sql

Assume there are two tables TST_SAMPLE (10000 rows) and TST_SAMPLE_STATUS (empty).
I want to iterate over each record in TST_SAMPLE and add exactly one record to TST_SAMPLE_STATUS accordingly.
In a single thread that would be simply this:
begin
for r in (select * from TST_SAMPLE)
loop
insert into TST_SAMPLE_STATUS(rec_id, rec_status)
values (r.rec_id, 'TOUCHED');
end loop;
commit;
end;
/
In a multithreaded solution there's a situation, which is not clear to me.
So could you explain what causes processing one row of TST_SAMPLE several times.
Please, see details below.
create table TST_SAMPLE(
rec_id number(10) primary key
);
create table TST_SAMPLE_STATUS(
rec_id number(10),
rec_status varchar2(10),
session_id varchar2(100)
);
begin
insert into TST_SAMPLE(rec_id)
select LEVEL from dual connect by LEVEL <= 10000;
commit;
end;
/
CREATE OR REPLACE PROCEDURE tst_touch_recs(pi_limit int) is
v_last_iter_count int;
begin
loop
v_last_iter_count := 0;
--------------------------
for r in (select *
from TST_SAMPLE A
where rownum < pi_limit
and NOT EXISTS (select null
from TST_SAMPLE_STATUS B
where B.rec_id = A.rec_id)
FOR UPDATE SKIP LOCKED)
loop
insert into TST_SAMPLE_STATUS(rec_id, rec_status, session_id)
values (r.rec_id, 'TOUCHED', SYS_CONTEXT('USERENV', 'SID'));
v_last_iter_count := v_last_iter_count + 1;
end loop;
commit;
--------------------------
exit when v_last_iter_count = 0;
end loop;
end;
/
In the FOR-LOOP I try to iterate over rows that:
- has no status (NOT EXISTS clause)
- is not currently locked in another thread (FOR UPDATE SKIP LOCKED)
There's no requirement for the exact amount of rows in an iteration.
Here pi_limit is just a maximal size of one batch. The only thing needed is to process each row of TST_SAMPLE in exactly one session.
So let's run this procedure in 3 threads.
declare
v_job_id number;
begin
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
dbms_job.submit(v_job_id, 'begin tst_touch_recs(100); end;', sysdate);
commit;
end;
Unexpectedly, we see that some rows were processed in several sessions
select count(unique rec_id) AS unique_count,
count(rec_id) AS total_count
from TST_SAMPLE_STATUS;
| unique_count | total_count |
------------------------------
| 10000 | 17397 |
------------------------------
-- run to see duplicates
select *
from TST_SAMPLE_STATUS
where REC_ID in (
select REC_ID
from TST_SAMPLE_STATUS
group by REC_ID
having count(*) > 1
)
order by REC_ID;
Please, help to recognize mistakes in implementation of procedure tst_touch_recs.

Here's a little example that shows why you're reading rows twice.
Run the following code in two sessions, starting the second a few seconds after the first:
declare
cursor c is
select a.*
from TST_SAMPLE A
where rownum < 10
and NOT EXISTS (select null
from TST_SAMPLE_STATUS B
where B.rec_id = A.rec_id)
FOR UPDATE SKIP LOCKED;
type rec is table of c%rowtype index by pls_integer;
rws rec;
begin
open c; -- data are read consistent to this time
dbms_lock.sleep ( 10 );
fetch c
bulk collect
into rws;
for i in 1 .. rws.count loop
dbms_output.put_line ( rws(i).rec_id );
end loop;
commit;
end;
/
You should see both sessions display the same rows.
Why?
Because Oracle Database has statement-level consistency, the result set for both is frozen when you open the cursor.
But when you have SKIP LOCKED, the FOR UPDATE locking only kicks in when you fetch the rows.
So session 1 starts and finds the first 9 rows not in TST_SAMPLE_STATUS. It then waits 10 seconds.
Provided you start session 2 within these 10 seconds, the cursor will look for the same nine rows.
At this point no rows are locked.
Now, here's where it gets interesting.
The sleep in the first session will finish. It'll then fetch the rows, locking them and skipping any that are already locked.
Very shortly after, it'll commit. Releasing the lock.
A few moments later, session 2 comes to read these rows. At this point the rows are not locked!
So there's nothing to skip.
How exactly you solve this depends on what you're trying to do.
Assuming you can't move to a set-based approach, you could make the transactions serializable by adding:
set transaction isolation level serializable;
before the cursor loop. This will then move to transaction-level consistency. Enabling the database to detect "something changed" when fetching rows.
But you'll need to catch ORA-08177: can't serialize access for this transaction errors in your within the outer loop. Or any process that re-reads the same rows will drop out at this point.
Or, as commenters have suggested used Advanced Queueing.

Related

Oracle: how to limit number of rows in "select .. for update skip locked"

I have got a table:
table foo{
bar number,
status varchar2(50)
}
I have multiple threads/hosts each consuming the table. Each thread updates the status, i.e. pessimistically locks the row.
In oracle 12.2.
select ... for update skip locked seems to do the job but I want to limit number of rows. The new FETCH NEXT sounds right, but I cant get the syntax right:
SELECT * FROM foo ORDER BY bar
OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY
FOR UPDATE SKIP LOCKED;
What is the simplest way to achieve this, i.e. with minimum code1 (ideally without pl/sql function)?
I want something like this:
select * from (select * from foo
where status<>'baz' order by bar
) where rownum<10 for update skip locked
PS
1. We are considering moving away from oracle.
I suggest to create pl/sql function and use dynamic sql to control the number of locked records. The lock is acquired at a fetch time. So fetching N records automatically locks them. Keep in mind that records are unlocked once you finish the transaction - commit or rollback.
The following is the example to lock N records and return their id values as an array (assume you have added the primary key ID column in your table):
create or replace function get_next_unlocked_records(iLockSize number)
return sys.odcinumberlist
is
cRefCursor sys_refcursor;
aIds sys.odcinumberlist := sys.odcinumberlist();
begin
-- open cursor. No locks so far
open cRefCursor for
'select id from foo '||
'for update skip locked';
-- we fetch and lock at the same time
fetch cRefCursor bulk collect into aIds limit iLockSize;
-- close cursor
close cRefCursor;
-- return locked ID values,
-- lock is kept until the transaction is finished
return aIds;
end;
sys.odcinumberlist is the built-in array of numbers.
Here is the test script to run in db:
declare
aRes sys.odcinumberlist;
begin
aRes := get_next_unlocked_records(10);
for c in (
select column_value id
from table(aRes)
) loop
dbms_output.put_line(c.id);
end loop;
end;

How to create a trigger to allow both manual and auto primary key in sql developer with oracle 11g

I am attempting to create a trigger (using SQL Developer with Oracle 11g) that will allow manual insertions onto the primary key, and if a record is created without a specified primary key it will assign one from a sequence. First I tried to use a select statement in the trigger that checks if the id generated by the sequence is already in the table because of manual insertion :
DROP TABLE testing;
DROP SEQUENCE testing_seq;
CREATE TABLE testing (
id_number NUMBER PRIMARY KEY,
test_data VARCHAR(50)
);
CREATE SEQUENCE testing_seq
MINVALUE 1
MAXVALUE 10000
START WITH 1
INCREMENT BY 1
NOORDER
NOCYCLE;
CREATE OR REPLACE TRIGGER auto_testing_id BEFORE
INSERT ON testing
FOR EACH ROW
DECLARE
tmp NUMBER;
seq NUMBER;
BEGIN
IF :NEW.id_number IS NULL THEN
seq := testing_seq.nextval;
SELECT
(SELECT 1 FROM testing WHERE id_number = seq
) INTO tmp FROM dual;
while (tmp = 1)
loop
seq := testing_seq.nextval;
SELECT
(SELECT 1 FROM testing WHERE id_number = seq
) INTO tmp FROM dual;
END loop;
:NEW.id_number := seq;
END IF;
END;
/
INSERT INTO testing VALUES(1,'test1');
INSERT INTO testing (test_data) VALUES('test2');
SELECT * FROM testing;
Table TESTING dropped.
Sequence TESTING_SEQ dropped.
Table TESTING created.
Sequence TESTING_SEQ created.
Trigger AUTO_TESTING_ID compiled
1 row inserted.
1 row inserted.
ID_NUMBER TEST_DATA
---------- --------------------------------------------------
1 test1
2 test2
This works for manually created insertions, but not if I try to insert using a select statement. I believe this is because I am referencing the table being inserted on inside the trigger.
I tried a trigger without the check, but as expected if the trigger created an id that was already in the table it threw a unique constraint error
CREATE OR REPLACE TRIGGER auto_testing_id2 BEFORE
INSERT ON testing
FOR EACH ROW
DECLARE
BEGIN
IF :NEW.id_number is null
then
:NEW.id_number := testing_seq.nextval;
end if;
end;
/
Trigger AUTO_TESTING_ID2 compiled
1 row inserted.
Error starting at line : 59 in command -
INSERT INTO testing (test_data) VALUES('test2')
Error report -
SQL Error: ORA-00001: unique constraint (KATRINA_LEARNING.SYS_C001190313) violated
00001. 00000 - "unique constraint (%s.%s) violated"
*Cause: An UPDATE or INSERT statement attempted to insert a duplicate key.
For Trusted Oracle configured in DBMS MAC mode, you may see
this message if a duplicate entry exists at a different level.
*Action: Either remove the unique restriction or do not insert the key.
ID_NUMBER TEST_DATA
---------- --------------------------------------------------
1 test1
I tried to catch this error (using error name DUP_VAL_ON_INDEX), and then loop it until it found the next number in the sequence that isn't in the table (with and without error catching), but it wouldn't even send up a test error message, and when I added the loop it wouldn't compile...
Can anyone please help me create a trigger that works without using a select statement to see if the sequence nextval is already used?
The example I'm adding here is clunky and performance here would be poor, but I wanted to put out an idea that could perhaps be a starting place.
You mentioned that you don't want to SELECT to check whether NEXTVAL is already used. If you meant that you didn't want to have to perform any SELECT at all, then this answer would be cheating as it does include a SELECT statement. But if you only want to avoid the mutating-table problem, then it could be a possibility.
The approach I'll add here takes a few steps to avoid collision, and runs them anytime a non-NULL value is provided as a key. It is currently set up to fire per-row, but if entire statements will be all-NULL or all-non-NULL then it could possible be changed to a statement- or compound-trigger to improve efficiency a little. In any event, the collision-avoidance is inefficient in this example.
General steps:
- If NULL, just use the NEXTVAL
- If non-NULL, check the LAST_VALUE and CACHE for the SEQUENCE against the provided value.
- If the provided value is within the CACHE (or beyond the cache) and could cause a collision, jump the SEQUENCE beyond the provided value and throw away the values in the cache.
Create the test table/sequence:
CREATE TABLE MY_TABLE (
MY_TABLE_ID NUMBER NOT NULL PRIMARY KEY
);
--Default cache here 20
CREATE SEQUENCE MY_SEQUENCE;
Create an autonomous synchronizer. Please note, this is not at all efficient, and concurrency could be a real problem (A serializing alternative is below).
It assumes a CACHE of at least 1 and may be incompatible with NOCACHE. (Actually the whole situation might be simpler with a NOCACHE SEQUENCE though)
CREATE OR REPLACE PROCEDURE SYNC_MY_SEQUENCE(P_CANDIDATE_MAX_VALUE IN NUMBER)
IS
PRAGMA AUTONOMOUS_TRANSACTION;
V_LAST_NUMBER NUMBER;
V_CACHE_SIZE NUMBER;
V_SEQUENCE_GAP NUMBER;
V_SEQUENCE_DELTA NUMBER;
V_NEXTVAL NUMBER;
BEGIN
SELECT
LAST_NUMBER,
CACHE_SIZE
INTO V_LAST_NUMBER, V_CACHE_SIZE
FROM USER_SEQUENCES
WHERE SEQUENCE_NAME = 'MY_SEQUENCE';
--Only do anything if the provided value could cause a collision.
IF P_CANDIDATE_MAX_VALUE >= (V_LAST_NUMBER - V_CACHE_SIZE)
THEN
-- Get the delta, in case the provided value is way way higher than the SEQUENCE
V_SEQUENCE_DELTA := P_CANDIDATE_MAX_VALUE + V_CACHE_SIZE - V_LAST_NUMBER ;
-- Use the biggest gap to get a safe zone when resetting the SEQUENCE
V_SEQUENCE_GAP := GREATEST(V_SEQUENCE_DELTA, V_CACHE_SIZE);
-- Set the increment so the distance between the last_value and the safe zone can be moved in one jump
EXECUTE IMMEDIATE 'ALTER SEQUENCE MY_SEQUENCE INCREMENT BY '||V_SEQUENCE_GAP;
-- Jump to the safe zone.
V_NEXTVAL := MY_SEQUENCE.NEXTVAL;
-- Reset increment. Note there is a space here that other sessions could get big NEXTVALs from concurrent access
EXECUTE IMMEDIATE 'ALTER SEQUENCE MY_SEQUENCE INCREMENT BY 1';
--Chew through the rest of at least one cache cycle.
FOR CACHE_POINTER IN 1..V_CACHE_SIZE LOOP
V_NEXTVAL := MY_SEQUENCE.NEXTVAL;
END LOOP;
END IF;
COMMIT;
END;
/
EDIT: it would be even more costly, but one might be able to serialize access to manage concurrency with something like the below alternative:
CREATE OR REPLACE PROCEDURE SYNC_MY_SEQUENCE(P_CANDIDATE_MAX_VALUE IN NUMBER)
IS
PRAGMA AUTONOMOUS_TRANSACTION;
V_LAST_NUMBER NUMBER;
V_CACHE_SIZE NUMBER;
V_SEQUENCE_GAP NUMBER;
V_SEQUENCE_DELTA NUMBER;
V_NEXTVAL NUMBER;
V_LOCK_STATUS NUMBER;
V_LOCK_HANDLE VARCHAR2(64);
C_LOCK_KEY CONSTANT VARCHAR2(20) := 'SYNC_MY_SEQUENCE';
BEGIN
DBMS_LOCK.ALLOCATE_UNIQUE (C_LOCK_KEY,V_LOCK_HANDLE,10);
--Serialize access
V_LOCK_STATUS := DBMS_LOCK.REQUEST(
LOCKHANDLE => V_LOCK_HANDLE,
LOCKMODE => DBMS_LOCK.X_MODE,
TIMEOUT => 10,
RELEASE_ON_COMMIT => TRUE);
SELECT
LAST_NUMBER,
CACHE_SIZE
INTO V_LAST_NUMBER, V_CACHE_SIZE
FROM USER_SEQUENCES
WHERE SEQUENCE_NAME = 'MY_SEQUENCE';
IF P_CANDIDATE_MAX_VALUE >= (V_LAST_NUMBER - V_CACHE_SIZE)
THEN
V_SEQUENCE_DELTA := P_CANDIDATE_MAX_VALUE + V_CACHE_SIZE - V_LAST_NUMBER ;
V_SEQUENCE_GAP := GREATEST(V_SEQUENCE_DELTA, V_CACHE_SIZE);
EXECUTE IMMEDIATE 'ALTER SEQUENCE MY_SEQUENCE INCREMENT BY '||V_SEQUENCE_GAP;
V_NEXTVAL := MY_SEQUENCE.NEXTVAL;
EXECUTE IMMEDIATE 'ALTER SEQUENCE MY_SEQUENCE INCREMENT BY 1';
FOR CACHE_POINTER IN 1..V_CACHE_SIZE LOOP
V_NEXTVAL := MY_SEQUENCE.NEXTVAL;
END LOOP;
END IF;
COMMIT;
END;
/
Create the trigger:
CREATE OR REPLACE TRIGGER MAYBE_SET
BEFORE INSERT ON MY_TABLE
FOR EACH ROW
BEGIN
IF :NEW.MY_TABLE_ID IS NULL
THEN
:NEW.MY_TABLE_ID := MY_SEQUENCE.NEXTVAL;
ELSE
SYNC_MY_SEQUENCE(:NEW.MY_TABLE_ID);
END IF;
END;
/
Then test it:
INSERT INTO MY_TABLE SELECT LEVEL FROM DUAL CONNECT BY LEVEL < 5;
SELECT * FROM MY_TABLE ORDER BY 1 ASC;
MY_TABLE_ID
1
2
3
4
It just used the NEXTVAL each time.
Then add a collidable value. Adding this will fire the sproc and do the extra work to push the SEQUENCE into a safe zone.
INSERT INTO MY_TABLE VALUES(5);
SELECT * FROM MY_TABLE ORDER BY 1 ASC;
MY_TABLE_ID
1
2
3
4
5
Then use NULL again:
INSERT INTO MY_TABLE VALUES(NULL);
SELECT * FROM MY_TABLE ORDER BY 1 ASC;
MY_TABLE_ID
1
2
3
4
5
41
The SEQUENCE had a costly operation to get there, but has settled and didn't collide.
If other provided values are below the SEQUENCE visibility, they add freely and don't change the NEXTVAL:
INSERT INTO MY_TABLE VALUES(7);
INSERT INTO MY_TABLE VALUES(19);
INSERT INTO MY_TABLE VALUES(-9999);
INSERT INTO MY_TABLE VALUES(NULL);
SELECT * FROM MY_TABLE ORDER BY 1 ASC;
MY_TABLE_ID
-9999
1
2
3
4
5
7
19
41
42
If the gap is huge, it jumps way out there:
INSERT INTO MY_TABLE VALUES(50000);
INSERT INTO MY_TABLE VALUES(NULL);
SELECT * FROM MY_TABLE ORDER BY 1 ASC;
MY_TABLE_ID
-9999
1
2
3
4
5
7
19
41
42
50000
50022
This could be too costly for your use case, and I haven't tested in in a RAC, but wanted to throw out an idea that can avoid collisions.

how we can tune this Delete query?

Justin, As per your suggestion, here i added loop..
Is there anyway we can tune this procedure ? I have n't tested yet.
Here we are just deleting the records from master and child table prior to 90days history records.
Assume that tables have more than 20k records are there to delete. and here i put commit for each 5k records.. Please letme know if i am wrong here ?
create or replace
Procedure PURGE_CLE_ALL_STATUS
( days_in IN number )
IS
LV_EXCEPTIONID NUMBER;
i number := 0;
cursor s1 is
select EXCEPTIONID
from EXCEPTIONREC --- master table
where TIME_STAMP < (sysdate -days_in);
BEGIN
for c1 in s1 loop
delete from EXCEPTIONRECALTKEY -- child table
where EXCEPTIONID =c1.EXCEPTIONID ;
delete from EXCEPTIONREC
where EXCEPTIONID =c1.EXCEPTIONID;
i := i + 1;
if i > 5000 then
commit;
i := 0;
end if;
end loop;
commit;
close S1;
EXCEPTION
WHEN OTHERS THEN
raise_application_error(-20001,'An error was encountered - '||
SQLCODE||' -ERROR- '||SQLERRM);
END;
/
instead of cursor .. you can directly give the condition in delete statement..
like below-
create or replace
Procedure PURGE_CLE_ALL_STATUS
( days_in IN number )
IS
LV_EXCEPTIONID NUMBER;
BEGIN
delete from EXCEPTIONRECALTKEY -- child table
where EXCEPTIONID = -- (1)
(select EXCEPTIONID
from EXCEPTIONREC --- master table
where TIME_STAMP < (sysdate -days_in));
delete from EXCEPTIONREC
where EXCEPTIONID = --(2)
(select EXCEPTIONID
from EXCEPTIONREC --- master table
where TIME_STAMP < (sysdate -days_in));
commit;
end if;
close c1;
END;
/
I fully agree with Justin Cave .. he gave very good point ..
if you are getting multiple rows from your cursor .. you can use in in place of = at (1) and (2).
"delete records from master and child table prior to 90days history records "
To get full performance benefit , you shall implement partition table. of courser this change in design but may not change your application.
your use case is to delete data on regular basis. You try creating daily or weekly partition which will store the new data in newer partition & drop the older partition regularly.
Current way of deleting will have performance bottle neck, since you trying to delete data older than 90 days. let us say there 10,000 record per day, then by 90th day there will be 90*10000. locating & deleting few records from 90 million record will be always will be slow & create some other lock problem.

how to fetch, delete, commit from cursor

I am trying to delete a lot of rows from a table. I want to try the approach of putting rows I want to delete into a cursor and then keep doing fetch, delete, commit on each row of the cursor until it is empty.
In the below code we are fetching rows and putting them in a TYPE.
How can I modify the below code to remove the TYPE from the picture and just simply do fetch,delete,commit on the cursor itself.
OPEN bulk_delete_dup;
LOOP
FETCH bulk_delete_dup BULK COLLECT INTO arr_inc_del LIMIT c_rows;
FORALL i IN arr_inc_del.FIRST .. arr_inc_del.LAST
DELETE FROM UIV_RESPONSE_INCOME
WHERE ROWID = arr_inc_del(i);
COMMIT;
arr_inc_del.DELETE;
EXIT WHEN bulk_delete_dup%NOTFOUND;
END LOOP;
arr_inc_del.DELETE;
CLOSE bulk_delete_dup;
Why do you want to commit in batches? That is only going to slow down your processing. Unless there are other sessions that are trying to modify the rows you are trying to delete, which seems problematic for other reasons, the most efficient approach would be simply to delete the data with a single DELETE, i.e.
DELETE FROM uiv_response_income uri
WHERE EXISTS(
SELECT 1
FROM (<<bulk_delete_dup query>>) bdd
WHERE bdd.rowid = uri.rowid
)
Of course, there may well be a more optimal way of writing this depending on how the query behind your cursor is designed.
If you really want to eliminate the BULK COLLECT (which will slow the process down substantially), you could use the WHERE CURRENT OF syntax to do the DELETE
SQL> create table foo
2 as
3 select level col1
4 from dual
5 connect by level < 10000;
Table created.
SQL> ed
Wrote file afiedt.buf
1 declare
2 cursor c1 is select * from foo for update;
3 l_rowtype c1%rowtype;
4 begin
5 open c1;
6 loop
7 fetch c1 into l_rowtype;
8 exit when c1%notfound;
9 delete from foo where current of c1;
10 end loop;
11* end;
SQL> /
PL/SQL procedure successfully completed.
Be aware, however, that since you have to lock the row (with the FOR UPDATE clause), you cannot put a commit in the loop. Doing a commit would release the locks you had requested with the FOR UPDATE and you'll get an ORA-01002: fetch out of sequence error
SQL> ed
Wrote file afiedt.buf
1 declare
2 cursor c1 is select * from foo for update;
3 l_rowtype c1%rowtype;
4 begin
5 open c1;
6 loop
7 fetch c1 into l_rowtype;
8 exit when c1%notfound;
9 delete from foo where current of c1;
10 commit;
11 end loop;
12* end;
SQL> /
declare
*
ERROR at line 1:
ORA-01002: fetch out of sequence
ORA-06512: at line 7
You may not get a runtime error if you remove the locking and avoid the WHERE CURRENT OF syntax, deleting the data based on the value(s) you fetched from the cursor. However, this is still doing a fetch across commit which is a poor practice and radically increases the odds that you will, at least intermittently, get an ORA-01555: snapshot too old error. It will also be painfully slow compared to the single SQL statement or the BULK COLLECT option.
SQL> ed
Wrote file afiedt.buf
1 declare
2 cursor c1 is select * from foo;
3 l_rowtype c1%rowtype;
4 begin
5 open c1;
6 loop
7 fetch c1 into l_rowtype;
8 exit when c1%notfound;
9 delete from foo where col1 = l_rowtype.col1;
10 commit;
11 end loop;
12* end;
SQL> /
PL/SQL procedure successfully completed.
Of course, you also have to ensure that your process is restartable in case you process some subset of rows and have some unknown number of interim commits before the process dies. If the DELETE is sufficient to cause the row to no longer be returned from your cursor, your process is probably already restartable. But in general, that's a concern if you try to break a single operation into multiple transactions.
A few things. It seems from your company's "no transaction over 8 second" rule (8 seconds, you in Texas?), you have a production db instance that traditionally supported apps doing OLTP stuff (insert 1 row, update 2 rows, etc), and has now also become the batch processing db (remove 50% of the rows and replace with 1mm new rows).
Batch processing should be separated from OLTP instance. In a batch ("data factory") instance, I wouldn't try deleting in this case, I'd probably do a CTAS, drop old table, rename new table, rebuild indexes/stats, recompile invalid objs approach.
Assuming you are stuck doing batch processing in your "8 second" instance, you'll probably find your company will ask for more and more of this in the future, so ask the DBAs for as much rollback as you can get, and hope you don't get a snapshot too old by fetching across commits (cursor select driving the deletes, commit every 1000 rows or so, delete using rowid).
If DBAs cant help, you may be able to first create a temp table containing the rowids that you wish to delete, and then loop through the temp table to delete from main table (avoid fetching across commits), but your company will probably have some rule against this as well as this is another (basic) batch technique.
Something like:
declare
-- assuming index on someCol
cursor sel_cur is
select rowid as row_id
from someTable
where someCol = 'BLAH';
v_ctr pls_integer := 0;
begin
for rec in sel_cur
loop
v_ctr := v_ctr + 1;
-- watch out for snapshot too old...
delete from someTable
where rowid = rec.row_id;
if (mod(v_ctr, 1000) = 0) then
commit;
end if;
end loop;
commit;
end;

SQL optimization question (oracle)

Edit: Please answer one of the two answers I ask. I know there are other options that would be better in a different case. These other potential options (partitioning the table, running as one large delete statement w/o committing in batches, etc) are NOT options in my case due to things outside my control.
I have several very large tables to delete from. All have the same foreign key that is indexed. I need to delete certain records from all tables.
table source
id --primary_key
import_source --used for choosing the ids to delete
table t1
id --foreign key
--other fields
table t2
id --foreign key
--different other fields
Usually when doing a delete like this, I'll put together a loop to step through all the ids:
declare
my_counter integer := 0;
begin
for cur in (
select id from source where import_source = 'bad.txt'
) loop
begin
delete from source where id = cur.id;
delete from t1 where id = cur.id;
delete from t2 where id = cur.id;
my_counter := my_counter + 1;
if my_counter > 500 then
my_counter := 0;
commit;
end if;
end;
end loop;
commit;
end;
However, in some code I saw elsewhere, it was put together in separate loops, one for each delete.
declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'
for h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
--do commit check
end loop;
for h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
--do commit check
end loop;
--do commit check will be replaced with the same chunk to commit every 500 rows as the above query
So I need one of the following answered:
1) Which of these is better?
2) How can I find out which is better for my particular case? (IE if it depends on how many tables I have, how big they are, etc)
Edit:
I must do this in a loop due to the size of these tables. I will be deleting thousands of records from tables with hundreds of millions of records. This is happening on a system that can't afford to have the tables locked for that long.
EDIT:
NOTE: I am required to commit in batches. The amount of data is too large to do it in one batch. The rollback tables will crash our database.
If there is a way to commit in batches other than looping, I'd be willing to hear it. Otherwise, don't bother saying that I shouldn't use a loop...
Why loop at all?
delete from t1 where id IN (select id from source where import_source = 'bad.txt';
delete from t2 where id IN (select id from source where import_source = 'bad.txt';
delete from source where import_source = 'bad.txt'
That's using standard SQL. I don't know Oracle specifically, but many DBMSes also feature multi-table JOIN-based DELETEs as well that would let you do the whole thing in a single statement.
David,
If you insist on commiting, you can use the following code:
declare
type import_ids is table of integer index by pls_integer;
my_import_ids import_ids;
cursor c is select id from source where import_source = 'bad.txt';
begin
open c;
loop
fetch c bulk collect into my_import_ids limit 500;
forall h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
commit;
exit when c%notfound;
end loop;
close c;
end;
This program fetches ids by pieces of 500 rows, deleting and commiting each piece. It should be much faster then row-by-row processing, because bulk collect and forall works as a single operation (in a single round-trip to and from database), thus minimizing the number of context switches. See Bulk Binds, Forall, Bulk Collect for details.
First of all, you shouldn't commit in the loop - it is not efficient (generates lots of redo) and if some error occurrs, you can't rollback.
As mentioned in previous answers, you should issue single deletes, or, if you are deleting most of the records, then it could be more optimal to create new tables with remaining rows, drop old ones and rename the new ones to old names.
Something like this:
CREATE TABLE new_table as select * from old_table where <filter only remaining rows>;
index new_table
grant on new table
add constraints on new_table
etc on new_table
drop table old_table
rename new_table to old_table;
See also Ask Tom
Larry Lustig is right that you don't need a loop. Nonetheless there may be some benefit in doing the delete in smaller chunks. Here PL/SQL bulk binds can improve speed greatly:
declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'
forall h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
The way I wrote it it does it all at once, in which case yeah the single SQL is better. But you can change your loop conditions to break it into chunks. The key points are
don't commit on every row. If anything, commit only every N rows.
When using chunks of N, don't run the delete in an ordinary loop. Use forall to run the delete as a bulk bind, which is much faster.
The reason, aside from the overhead of commits, is that each time you execute an SQL statement inside PL/SQL code it essentially does a context switch. Bulk binds avoid that.
You may try partitioning anyway to use parallel execution, not just to drop one partition. The Oracle documentation may prove useful in setting this up. Each partition would use it's own rollback segment in this case.
If you are doing the delete from the source before the t1/t2 deletes, that suggests you don't have referential integrity constraints (as otherwise you'd get errors saying child records exist).
I'd go for creating the constraint with ON DELETE CASCADE. Then a simple
DECLARE
v_cnt NUMBER := 1;
BEGIN
WHILE v_cnt > 0 LOOP
DELETE FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
v_cnt := SQL%ROWCOUNT;
COMMIT;
END LOOP;
END;
The child records would get deleted automatically.
If you can't have the ON DELETE CASCADE, I'd go with a GLOBAL TEMPORARY TABLE with ON COMMIT DELETE ROWS
DECLARE
v_cnt NUMBER := 1;
BEGIN
WHILE v_cnt > 0 LOOP
INSERT INTO temp (id)
SELECT id FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
v_cnt := SQL%ROWCOUNT;
DELETE FROM t1 WHERE id IN (SELECT id FROM temp);
DELETE FROM t2 WHERE id IN (SELECT id FROM temp);
DELETE FROM source WHERE id IN (SELECT id FROM temp);
COMMIT;
END LOOP;
END;
I'd also go for the largest chunk your DBA will allow.
I'd expect each transaction to last for at least a minute. More frequent commits would be a waste.
This is happening on a system that
can't afford to have the tables locked
for that long.
Oracle doesn't lock tables, only rows. I'm assuming no-one will be locking the rows you are deleting (or at least not for long). So locking is not an issue.