Oracle delete rows in multiple tables based on single primary key. - sql

There is about 20 tables which branch based on a single primary key - EmployeeId. There's about 12,000 employees I want completely gone from my database. The chances of other processes updating these employees when I am deleting them are close to zero. I am planning on deleting them in bulk and then committing. All the delete shouldn't ideally fail but I am unsure whether to go the cursor route, commit every 500 rows or something. Here's how it looks like now.
--STEP 1: Collect the EmployeeIds to delete in a temp table
Create table temp as select EmployeeId from Employee where <all conditions are met>;
--STEP 2: Delete names
Delete from EmployeeName where EmployeeId in (select EmployeeId from temp);
--STEP 3 - STEP 30: Delete all other child tables
Delete from table inner join another_table on some_key inner join yet_another_table on some_key where EmployeeId in (select EmployeeId from temp);
--STEP 4: Commit
commit;

If you're going to do this often, how about letting Oracle do the job for you ?
Set all your foreign keys referencing table Employee to "ON DELETE CASCADE" (see this link for example)
delete from Employee where <all your conditions>;
The FKs being set to "ON DELETE CASCADE", Oracle will automatically delete orphaned rows from child tables when a row is deleted in the parent table.

Assuming you want to maintain the integrity of the data and when there is an error deleting from one table then you want to ROLLBACK all the deletes for that employee then you could do something like:
DECLARE
TYPE Emp_ID_Tab_Type IS TABLE OF Employee.EmployeeID%TYPE;
All_Employees Emp_ID_Tab_Type;
Deleted_Employees Emp_ID_Tab_Type := Emp_ID_Tab_Type();
Error_Employees Emp_ID_Tab_Type := Emp_ID_Tab_Type();
BEGIN
SELECT EmployeeID
BULK COLLECT INTO All_Employees
FROM Employees
WHERE 1 = 0; -- Your conditions
FOR i IN 1 .. All_Employees.COUNT LOOP
BEGIN
DELETE FROM child_table1
WHERE EmployeeID = All_Employees(i);
DELETE FROM child_table2
WHERE EmployeeID = All_Employees(i);
-- ...
DELETE FROM child_table20
WHERE EmployeeID = All_Employees(i);
DELETE FROM Employees
WHERE EmployeeID = All_Employees(i);
COMMIT;
Deleted_Employees.EXTEND;
Deleted_Employees(Deleted_Employees.COUNT) := All_Employees(i);
DBMS_OUTPUT.PUT_LINE( All_Employees(i) || ' deleted' );
EXCEPTION
WHEN others THEN
ROLLBACK;
Error_Employees.EXTEND;
Error_Employees(Error_Employees.COUNT) := All_Employees(i);
DBMS_OUTPUT.PUT_LINE( All_Employees(i) || ' error - ' || SQLERRM );
END;
END LOOP;
-- Do something with the errors
END;
It is not going to be the fastest with a COMMIT at the end of each loop but it does ensure you can ROLLBACK each employee.

If you are aware of any trouble during deletion and still want to do entire operation without cursor you could use DML Error Logging:
In some situations the most obvious solution to a problem is a DML statement (INSERT ... SELECT, UPDATE, DELETE), but you may choose to avoid DML because of the way it reacts to exceptions.
By default, when a DML statement fails the whole statement is rolled back, regardless of how many rows were processed successfully before the error was detected.
In the past, the only way around this problem was to process each row individually, preferably with a bulk operation using FORALL and the SAVE EXCEPTIONS clause. In Oracle 10g Database Release 2, the DML error logging feature has been introduced to solve this problem. Adding the appropriate LOG ERRORS clause on to most INSERT, UPDATE, MERGE and DELETE statements enables the operations to complete, regardless of errors.
BEGIN
DBMS_ERRLOG.create_error_log (dml_table_name => 'EmployeeName');
END;
/
Delete from EmployeeName
where EmployeeId in (select EmployeeId from temp)
LOG ERRORS INTO err$_EmployeeName ('DELETE') REJECT LIMIT UNLIMITED;

Related

Removing records from related tables

I have two tables:
reports
report_contents
which are related by foreign key content_id on reports table.
I need to create procedure which delete some reports together with their contents, like this:
DELETE FROM report_contents WHERE id IN
(SELECT content_id FROM reports WHERE extra_value = extraValue)
DELETE FROM reports WHERE extra_value = extraValue
But it is impossible to delete records from report_contents table firstly, because there is constrain on content_id column on reports table.
On the other hand when I delete records from reports table firstly, I won't know what report_contents should be deleted then...
CREATE OR REPLACE PROCEDURE delete_reports (extraValue NUMBER) IS
BEGIN
/* removing reports with extra_value = extraValue */
/* removing their report_contents */
END;
What is the best way to do this? (I don't want to add on delete cascade constrain)
If the number of ids is relatively small (i.e. just a few hundred or thousand) you can comfortably store the IDs to delete temporarily in a PL/SQL array.
PROCEDURE delete_reports (extraValue NUMBER) IS
TYPE id_table IS TABLE OF reports.content_id%TYPE INDEX BY BINARY_INTEGER;
ids id_table;
BEGIN
/* which reports to delete? */
SELECT content_id BULK COLLECT INTO ids
FROM reports WHERE extra_value = p_extra_value;
/* removing reports with extra_value = extraValue */
DELETE reports WHERE extra_value = p_extra_value;
/* removing their report_contents */
FORALL i IN 1..ids.COUNT
DELETE report_contents WHERE id = ids(i);
END delete_reports;
If the number of ids is large (e.g. millions or more) then I'd probably break this into a loop and get the ids in batches.
Since its an SP, you could use an intermediate TABLE variable to store your results
CREATE OR REPLACE PROCEDURE delete_reports (extraValue NUMBER) IS
BEGIN
DECLARE #TABLE table
( CONTENT_ID int)
INSERT INTo #TABLE
SELECT content_id FROM reports WHERE extra_value = extraValue
DELETE FROM reports B WHERE EXISTS (SELECT * FROM #TABLE A WHERE A.Content_id=B.Content_id)
DELETE FROM report_contents C WHERE EXISTS (SELECT * FROM #TABLE A WHERE A.Content_id=C.ID)
END
I am assuming that you could use CONTENT_ID to delete from both tabes
EDIT: Thanks to a helpful commenter, who pointed out how my original solution left out a relationship betweenCONTENT_ID of the REPORTS table and ID of REPORT_CONTENT table. The cursor query of my first attempt assumed that any orphaned ID in the REPORT_CONTENT table would be an ideal candidate for deletion. This assumption is not supportable so I rewrote the cursor into two different cursors below.
I think the original post was a question about how to do this in ORACLE`? Here's an alternate approach using an Oracle PL/SQL CURSOR.
CREATE or REPLACE PROCEDURE proc_delete_reports ( p_extra_value in varchar2 )
is
CURSOR del_reports is
SELECT content_id FROM reports WHERE extra_value = p_extra_value
FOR UPDATE;
CURSOR del_contents (p_content_id IN number) is
SELECT id
FROM report_contents
WHERE id = p_content_id
FOR UPDATE;
BEGIN
FOR i in del_reports
LOOP
DELETE FROM reports
WHERE current of del_reports;
FOR j in del_contents(p_content_id => i.content_id)
LOOP
DELETE from report_contents
WHERE current of del_contents;
END LOOP;
END LOOP;
COMMIT;
END proc_delete_reports;
Given the appropriate syntax, you can modify the contents of a cursor output as you walk through each value in a loop.

BEFORE DELETE trigger

How to stop from deleting a row, that has PK in another table (without FK) with a trigger?
Is CALL cannot_delete_error would stop from deleting?
This is what I've got so far.
CREATE TRIGGER T1
BEFORE DELETE ON Clients
FOR EACH ROW
BEGIN
SELECT Client, Ref FROM Clients K, Invoice F
IF F.Client = K.Ref
CALL cannot_delete_error
END IF;
END
Use an 'INSTEAD OF DELETE' trigger.
Basically, you can evaluate whether or not you should the delete the item. In the trigger you can ultimately decide to delete the item like:
--test to see if you actually should delete it.
--if you do decide to delete it
DELETE FROM MyTable
WHERE ID IN (SELECT ID FROM deleted)
One side note, remember that the 'deleted' table may be for several rows.
Another side note, try to do this outside of the db if possible! Or with a preceding query. Triggers are downright difficult to maintain. A simple query, or function (e.g. dbo.udf_CanIDeleteThis()') can be much more versatile.
If you're using MySQL 5.5 or up you can use SIGNAL
DELIMITER //
CREATE TRIGGER tg_fk_check
BEFORE DELETE ON clients
FOR EACH ROW
BEGIN
IF EXISTS(SELECT *
FROM invoices
WHERE client_id = OLD.client_id) THEN
SIGNAL sqlstate '45000'
SET message_text = 'Cannot delete a parent row: child rows exist';
END IF;
END//
DELIMITER ;
Here is SQLFiddle demo. Uncomment the last delete and click Build Schema to see it in action.

How to merge rows + retrieve new and existing keys

In an Oracle table (e.g. MYTABLE, with a numeric sequenced field as primary key), I have to insert several thousand of rows, but some of them are supposed to already exist in the table.
Naturally, I should try to use MERGE but I need, as well, to retrieve all created (when inserting) and existing (when updating) primary keys.
As well, it should be as fast as possible.
Is the following attempt (pseudo code) the only way to go? Thanks.
keys_list = empty array
for each row to merge
do query 'SELECT PK_MYTABLE FROM MYTABLE WHERE PK_MYTABLE = '+row.pk_mytable
==> retrieve key
if found then:
add key to keys_list
else:
do query 'INSERT INTO MYTABLE (PK_MYTABLE, ...) VALUES (SEQ_MYTABLE.NEXTVAL, ...)'
do query 'SELECT SEQ_MYTABLE.CURRVAL FROM DUAL' ==> retrieve key
add key to keys_list
Add a MODIFICATION_DATE column to the table
Grab and save the sysdate.
When you merge update/insert the value of the sysdate as well.
When the merge is complete, select the rows where the MODIFICATION_DATE = SYSDATE and you
have the set you are interested in.
Why can't you use a MERGE statement for this? This is exactly what a MERGE is for. Here is a rough idea of how it would look...
merge into mytable mt
using
(
select key_field, value_field from sourcetable
) st
on
( mt.key_field = st.key_field )
when matched then update
set mt.value_field = st.value_field
when not matched then insert
( key_field, value_field )
values
( st.key_field, st.value_field )
;
Using a MERGE statement is fast because it is a single statement and the Oracle optimizer can utilize indexes and choose a better explain path than iterating through a cursor using PL/SQL.
If the keys are being generated from a sequence, then the normal way to get the key generated by that insert is to use the returning clause:
declare
v_insert_seq integer;
begin
insert into t1 (pk, c1)
values (myseq.nextval, 'value') returning pk into v_insert_seq;
end;
/
However, as best as I can tell, the merge statement doesn't support that returning feature.
Depending on the source of your new rows, there are different ways you could do this. If you are inserting one row at a time, then the approach above will work pretty well.
To detect the duplicate records, just catch the exceptions when you are inserting (when dup_val_on_index) and then handle them with updates.
If your source of rows is another table, you probably want to look at bulk inserts, and allowing Oracle to return you an array of new PK values. I tried this, but couldn't get it working, so perhaps it's not supported (or I'm missing something today - it gives a syntax error):
declare
type t_type is table of t1.pk%type;
v_insert_seqs t_type;
begin
insert into t1 (pk, c1)
select level newpk, 'value' c1value
from dual
connect by level <= 10 returning pk bulk collect into v_insert_seqs;
exception
when dup_val_on_index then
raise;
end;
/
The next best thing is to select the rows into arrays and then use bulk binds with the returning clause to capture the new PK IDs and also use Save Exceptions to catch all the rows that failed to inserted. Then you can process any of the failed inserted afterwards:
set serveroutput on
declare
type t_pk is table of t1.pk%type;
type t_c1 is table of t1.c1%type;
v_pks t_pk;
v_c1s t_c1;
v_new_pks t_pk;
ex_dml_errors EXCEPTION;
PRAGMA EXCEPTION_INIT(ex_dml_errors, -24381);
begin
-- get the batch of rows you want to insert
select level newpk, 'value' c1
bulk collect into v_pks, v_c1s
from dual connect by level <= 10;
-- bulk bind insert, saving exceptions and capturing the newly inserted
-- records
forall i in v_pks.first .. v_pks.last save exceptions
insert into t1 (pk, c1)
values (v_pks(i), v_c1s(i)) returning pk bulk collect into v_new_pks;
exception
-- Process the exceptions
when ex_dml_errors then
for i in 1..SQL%BULK_EXCEPTIONS.count loop
DBMS_OUTPUT.put_line('Error: ' || i ||
' Array Index: ' || SQL%BULK_EXCEPTIONS(i).error_index ||
' Message: ' || SQLERRM(-SQL%BULK_EXCEPTIONS(i).ERROR_CODE));
end loop;
end;
/
If you are running Oracle 10 or better, you might be able to do much the same thing, for nearly free by issuing a commit before the merge to update the SCN, then after the merge,
use the ORA_ROWSCN to detect which rows have changed.

Efficient way to update all rows in a table

I have a table with a lot of records (could be more than 500 000 or 1 000 000). I added a new column in this table and I need to fill a value for every row in the column, using the corresponding row value of another column in this table.
I tried to use separate transactions for selecting every next chunk of 100 records and update the value for them, but still this takes hours to update all records in Oracle10 for example.
What is the most efficient way to do this in SQL, without using some dialect-specific features, so it works everywhere (Oracle, MSSQL, MySQL, PostGre etc.)?
ADDITIONAL INFO: There are no calculated fields. There are indexes. Used generated SQL statements which update the table row by row.
The usual way is to use UPDATE:
UPDATE mytable
SET new_column = <expr containing old_column>
You should be able to do this is a single transaction.
As Marcelo suggests:
UPDATE mytable
SET new_column = <expr containing old_column>;
If this takes too long and fails due to "snapshot too old" errors (e.g. if the expression queries another highly-active table), and if the new value for the column is always NOT NULL, you could update the table in batches:
UPDATE mytable
SET new_column = <expr containing old_column>
WHERE new_column IS NULL
AND ROWNUM <= 100000;
Just run this statement, COMMIT, then run it again; rinse, repeat until it reports "0 rows updated". It'll take longer but each update is less likely to fail.
EDIT:
A better alternative that should be more efficient is to use the DBMS_PARALLEL_EXECUTE API.
Sample code (from Oracle docs):
DECLARE
l_sql_stmt VARCHAR2(1000);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask');
-- Chunk the table by ROWID
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 'EMPLOYEES', true, 100);
-- Execute the DML in parallel
l_sql_stmt := 'update EMPLOYEES e
SET e.salary = e.salary + 10
WHERE rowid BETWEEN :start_id AND :end_id';
DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', l_sql_stmt, DBMS_SQL.NATIVE,
parallel_level => 10);
-- If there is an error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask');
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
END LOOP;
-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask');
END;
/
Oracle Docs: https://docs.oracle.com/database/121/ARPLS/d_parallel_ex.htm#ARPLS67333
You could drop any indexes on the table, then do your insert, and then recreate the indexes.
Might not work you for, but a technique I've used a couple times in the past for similar circumstances.
created updated_{table_name}, then select insert into this table in batches. Once finished, and this hinges on Oracle ( which I don't know or use ) supporting the ability to rename tables in an atomic fashion. updated_{table_name} becomes {table_name} while {table_name} becomes original_{table_name}.
Last time I had to do this was for a heavily indexed table with several million rows that absolutely positively could not be locked for the duration needed to make some serious changes to it.
What is the database version? Check out virtual columns in 11g:
Adding Columns with a Default Value
http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-schemamanagement.html
update Hotels set Discount=30 where Hotelid >= 1 and Hotelid <= 5504
For Postgresql I do something like this (if we are sure no more updates/inserts take place):
create table new_table as table orig_table with data;
update new_table set column = <expr>
start transaction;
drop table orig_table;
rename new_table to orig_table;
commit;
Update:
One improvement is that if your table is very large you will not lock the table, this operation in this case could take minutes.
Only if you are sure in the process no inserts and/or updates take
place.

SQL optimization question (oracle)

Edit: Please answer one of the two answers I ask. I know there are other options that would be better in a different case. These other potential options (partitioning the table, running as one large delete statement w/o committing in batches, etc) are NOT options in my case due to things outside my control.
I have several very large tables to delete from. All have the same foreign key that is indexed. I need to delete certain records from all tables.
table source
id --primary_key
import_source --used for choosing the ids to delete
table t1
id --foreign key
--other fields
table t2
id --foreign key
--different other fields
Usually when doing a delete like this, I'll put together a loop to step through all the ids:
declare
my_counter integer := 0;
begin
for cur in (
select id from source where import_source = 'bad.txt'
) loop
begin
delete from source where id = cur.id;
delete from t1 where id = cur.id;
delete from t2 where id = cur.id;
my_counter := my_counter + 1;
if my_counter > 500 then
my_counter := 0;
commit;
end if;
end;
end loop;
commit;
end;
However, in some code I saw elsewhere, it was put together in separate loops, one for each delete.
declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'
for h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
--do commit check
end loop;
for h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
--do commit check
end loop;
--do commit check will be replaced with the same chunk to commit every 500 rows as the above query
So I need one of the following answered:
1) Which of these is better?
2) How can I find out which is better for my particular case? (IE if it depends on how many tables I have, how big they are, etc)
Edit:
I must do this in a loop due to the size of these tables. I will be deleting thousands of records from tables with hundreds of millions of records. This is happening on a system that can't afford to have the tables locked for that long.
EDIT:
NOTE: I am required to commit in batches. The amount of data is too large to do it in one batch. The rollback tables will crash our database.
If there is a way to commit in batches other than looping, I'd be willing to hear it. Otherwise, don't bother saying that I shouldn't use a loop...
Why loop at all?
delete from t1 where id IN (select id from source where import_source = 'bad.txt';
delete from t2 where id IN (select id from source where import_source = 'bad.txt';
delete from source where import_source = 'bad.txt'
That's using standard SQL. I don't know Oracle specifically, but many DBMSes also feature multi-table JOIN-based DELETEs as well that would let you do the whole thing in a single statement.
David,
If you insist on commiting, you can use the following code:
declare
type import_ids is table of integer index by pls_integer;
my_import_ids import_ids;
cursor c is select id from source where import_source = 'bad.txt';
begin
open c;
loop
fetch c bulk collect into my_import_ids limit 500;
forall h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
commit;
exit when c%notfound;
end loop;
close c;
end;
This program fetches ids by pieces of 500 rows, deleting and commiting each piece. It should be much faster then row-by-row processing, because bulk collect and forall works as a single operation (in a single round-trip to and from database), thus minimizing the number of context switches. See Bulk Binds, Forall, Bulk Collect for details.
First of all, you shouldn't commit in the loop - it is not efficient (generates lots of redo) and if some error occurrs, you can't rollback.
As mentioned in previous answers, you should issue single deletes, or, if you are deleting most of the records, then it could be more optimal to create new tables with remaining rows, drop old ones and rename the new ones to old names.
Something like this:
CREATE TABLE new_table as select * from old_table where <filter only remaining rows>;
index new_table
grant on new table
add constraints on new_table
etc on new_table
drop table old_table
rename new_table to old_table;
See also Ask Tom
Larry Lustig is right that you don't need a loop. Nonetheless there may be some benefit in doing the delete in smaller chunks. Here PL/SQL bulk binds can improve speed greatly:
declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'
forall h in 1..my_import_ids.count
delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
delete from t2 where id = my_import_ids(h);
The way I wrote it it does it all at once, in which case yeah the single SQL is better. But you can change your loop conditions to break it into chunks. The key points are
don't commit on every row. If anything, commit only every N rows.
When using chunks of N, don't run the delete in an ordinary loop. Use forall to run the delete as a bulk bind, which is much faster.
The reason, aside from the overhead of commits, is that each time you execute an SQL statement inside PL/SQL code it essentially does a context switch. Bulk binds avoid that.
You may try partitioning anyway to use parallel execution, not just to drop one partition. The Oracle documentation may prove useful in setting this up. Each partition would use it's own rollback segment in this case.
If you are doing the delete from the source before the t1/t2 deletes, that suggests you don't have referential integrity constraints (as otherwise you'd get errors saying child records exist).
I'd go for creating the constraint with ON DELETE CASCADE. Then a simple
DECLARE
v_cnt NUMBER := 1;
BEGIN
WHILE v_cnt > 0 LOOP
DELETE FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
v_cnt := SQL%ROWCOUNT;
COMMIT;
END LOOP;
END;
The child records would get deleted automatically.
If you can't have the ON DELETE CASCADE, I'd go with a GLOBAL TEMPORARY TABLE with ON COMMIT DELETE ROWS
DECLARE
v_cnt NUMBER := 1;
BEGIN
WHILE v_cnt > 0 LOOP
INSERT INTO temp (id)
SELECT id FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
v_cnt := SQL%ROWCOUNT;
DELETE FROM t1 WHERE id IN (SELECT id FROM temp);
DELETE FROM t2 WHERE id IN (SELECT id FROM temp);
DELETE FROM source WHERE id IN (SELECT id FROM temp);
COMMIT;
END LOOP;
END;
I'd also go for the largest chunk your DBA will allow.
I'd expect each transaction to last for at least a minute. More frequent commits would be a waste.
This is happening on a system that
can't afford to have the tables locked
for that long.
Oracle doesn't lock tables, only rows. I'm assuming no-one will be locking the rows you are deleting (or at least not for long). So locking is not an issue.