Massive delete (2 joins required) with no partitionning

Massive delete (2 joins required) with no partitionning - sql

I'm experiencing some infinite loop for my delete. I tried many way to deal with this problem but still take too much time. I will try to be clear as possible.
I have 4 implicated table in this problem.
The deletion is done depending of the pool_id given
Table 1 contain the pool_id
Table 2 the ticket_id foreign join ticket_pool_id with the pool_id
Table 3 ticket_child_id foreign join ticket_id with the ticket_id
Table 4 ticket_grand_child_id foreign ticket_child_id join with the ticket_child_id
Concerned count for each
table 1---->1
table 2---->1 200 000
table 3---->6 300 000
table 4---->6 300 000
So in fact it`s 6.3M+6.3M+1.2M+1 row to be deleted
Here`s the constraint :
No partitioning
Oracle version 9
Online all the time so no downtime neither CTAS
We cannot use cascade constraint
The normalization is very important
Here`s what I tried:
Bulk delete
Delete with statement (In and Exists clause)
temp table for each level and 1 level join
procedure and commit each 20k
None of those worked in a decent time frame like less then one hour. The fact that we cannot base a delete on one of the column value is not helping. Is there a way?

If you are trying to delete joining the tables the complexity may become cubical or even worse. With tables with many records this will become a performance killer. You can try to output a list of values for deletion from the first table in a temporary table, then use another one to select the IDs for deletion from the second table and so on. I suppose having proper indexes will keep the complexity quadratic and will complete the task in normal porion of time. Good luck

Related

How can I optimize this query to delete records that have no corresponding Id in another table? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have a "Main" table of half a million records which contains a primary key Id (MovieId) that is used by several other tables as a foreign key (many-to-many tables).
Some of these many-to-many tables have millions of records (up to 20 million).
I want to remove all of the records from the many-to-many tables whose foreign key does not exist in the main table. This will reduce their size tremendously (down to "only" a million or two million records each).
But the SQL to accomplish this seems as if it will be very time consuming - essentially looping over 20 million records, each time looking through up to half a million records in the main table to see if the foreign key in the 20 million many-to-many table records exists in the half million main table records as a primary key.
I can imagine this taking a LONG time. Is there a (relatively) quick way to do this?
My first idea to accomplish this is something like:
DELETE FROM ACTORS_MOVIES_M2M
WHERE MovieiD NOT EXISTS (SELECT MovieiD FROM MOVIES_MAIN)
...and again, I perceive this will take ... a WHILE.

The query you wanted to write:
delete m2m
from actors_movies_m2m m2m
where not exists (select 1 from movies_main m where m.movieid = m2m.movieid)
An index on movies_main(movieid) would help the subquery to execute quickly (provided that movieid is the primary key of movies_main, it is already there).
While this is technically correct, this might not be the most efficient approach. If you are going to delete a significant part of the table, then it might be more efficient to empty and refill it.
create table tmp_actors_movies_m2m as
select *
from actors_movies_m2m m2m
where exists (select 1 from movies_main m where m.movieid = m2m.movieid)
truncate table actors_movies_m2m; -- back it up first!
insert into actors_movies_m2m
select *
from tmp_actors_movies_m2m;
drop table actors_movies_m2m;
Note that your question itself indicates a potential design problem. You can avoid orphan records from the start by setting up proper a foreign key with the on delete cascade option:
create table actors_movies_m2m (
... -- columns here
movieid int references movies_main (movieid) on delete cascade
);

How to Tune Delete operation over the DB Link

I am working on Oracle 12c database and I am performing Delete operation between source database to target database over db_link. The delete operation is taking huge time and I have to tune it.
I have already tried Driving site hint on target but still its taking lot of time in execution. The source table has like 1.2 million records and target table have records like 38 million. Report_id and Id columns are respective PK's of there tables.
Delete from PRIMARY_TABLE WHERE REPORT_ID IN (SELECT ID FROM PROCESS_DATA#PVA_TO_SRC WHERE TYPE='E');

If this is a batch job/one off job then i suggest you bring the the 1.2 million rows table to the same database as the target as the first step.
Eg:
CREATE TABLE TEMP_DELETE AS
SELECT ID
FROM PROCESS_DATA#PVA_TO_SRC
WHERE TYPE='E'
Followed by gathering statistics on the newly created table. After that the delete should work out better.
Also a 38 million row table must have been partitioned on a field?. if there is any way you can use the partition key in destination table and link it with the records in TEMP_DELETE it would make the delete better.
Also if you got indexes on 38 million table. Drop those indexes, perform the delete and then recreate the index

Create a materialized view for
SELECT ID FROM PROCESS_DATA#PVA_TO_SRC WHERE TYPE='E';
Whenever you want to delete the data just refresh it. Also you can use index on it.
Cheers!!

How to delete 3 billion rows from 2 related tables

I have a table with 5 billion rows (table1) another table with 3 billion rows in table 2. These 2 tables are related. I have to delete 3 billion rows from table 1 and its related rows from table 2. Table1 is child of table 2. I tried using the for all method from plsql it didn't help much. Then I thought of using oracle partition strategy. Since I am not a DBA I would like to know if partioning of a existing table is possible on primary key column for a selected number of id's? My primary key is 64 bit auto generated number.

It is hard to partition the objects online(it can be done using dbms_redefinition). And not necessary(with the details you gave).
Best ideea would be to recreate the objects without the undesired rows.
For example some simple code would be like:
create table undesired_data as (select undesired rows from table1);
Create table1_new as (select * from table1 where key not in (select key from undesired_data));
Create table2_new as (select * from table2 where key not in (select key from undesired_data));
rename table1 to table1_old;
rename table2 to table2_old;
rename table1_new to table1;
rename table2_new to table2;
recreate constraints;
check if everything is ok;
drop table1_old and table2_old;
This can be done offlining consumers, but would be very small downtime for them if scripts are ok(you should test them in a test environment).

Sounds very dubious.
If it is real use-case then you don't delete you create another table, well defined, including partitioned and you fill it using insert /*+ append */ into MyNewTable select ....
The most common practice is to define partitions on dates (record create date, event date etc.).
Again, if this is a real use-case I strongly recommend that you will reach for real help, not seek for advice on the internet and not doing it yourself.

Performing mass DELETE operation in Oracle 11g

I have a table MyTable with multiple int columns with date and one column containing a date. The date column has an index created like follows
CREATE INDEX some_index_name ON MyTable(my_date_column)
because the table will often be queried for its contents between a user-specified date range. The table has no foreign keys pointing to it, nor have any other indexes other than the primary key which is an auto-incrementing index filled by a sequence/trigger.
Now, the issue I have is that the data on this table is often replaced for a given time period because it was out of date. So they way it is updated is by deleting all the entries within a given time period and inserting the new ones. The delete is performed using
DELETE FROM MyTable
WHERE my_date_column >= initialDate
AND my_date_column < endDate
However, because the number of rows deleted is massive (from 5 million to 12 million rows) the program pretty much blocks during the delete.
Is there something I can disable to make the operation faster? Or maybe specify an option in the index to make it faster? I read something about redo space having to do with this but I don't know how to disable it during an operation.
EDIT: The process runs every day and it deletes the last 5 days of data, then it brings the data for those 5 days (which may have changed in the external source) and reinserts the data.
The amount of data deleted is a tiny fraction compared to the whole amount of data in the table ( < 1%). So copying the data I want to keep into another table and dropping-recreating the table may not be the best solution.

I can only think of two ways to speed up this.
if you do this on a regular basis, you should consider partitioning your table by month. Then you just drop the partition of the month you want to delete. That is basically as fast as dropping a table. Partitioning requires an enterprise license if I'm not mistaken
create a new table with the data you want to keep (using create table new_table as select ...), drop the old table and rename the interims table. This will be much faster, but has the drawback that you need to re-create all indexes and (primary, foreign key) constraints on the new table.

Database Update Query for Huge Records

We hare having around 20,80,000 records in the table.
We needed to add new column to it and we added that.
Since this new column needs to be primary key and we want to update all rows with Sequence
Here's the query
BEGIN
FOR loop_counter IN 1 .. 211 LOOP
update user_char set id = USER_CHAR__ID_SEQ.nextval where user_char.id is null and rownum<100000;
commit;
END LOOP;
end;
But it'w now almost 1 day completed. still the query is running.
Note: I am not db developer/programmer.
Is there anything wrong with this query or any other query solution (quick) to do the same job?

First, there does not appear to be any reason to use PL/SQL here. It would be more efficient to simply issue a single SQL statement to update every row
UPDATE user_char
SET id = USER_CHAR__ID_SEQ.nextval
WHERE id IS NULL;
Depending on the situation, it may also be more efficient to create a new table and move the data from the old table to the new table in order to avoid row migration, i.e.
ALTER TABLE user_char
RENAME TO user_char_old;
CREATE TABLE user_char
AS
SELECT USER_CHAR__ID_SEQ.nextval, <<list of other columns>>
FROM user_char;
<<Build indexes on user_char>>
<<Drop and recreate any foreign key constraints involving user_char>>
If this was a large table, you could use parallelism in the CREATE TABLE statement. It's not obvious that you'd get a lot of benefit from parallelism with a small 2 million row table but that might shave a few seconds off the operation.
Second, if it is taking a day to update a mere 2 million rows, there must be something else going on. A 2 million row table is pretty small these days-- I can populate and update a 2 million row table on my laptop in somewhere between a few seconds and a few minutes. Are there triggers on this table? Are there foreign keys? Are there other sessions updating the rows? What is the query waiting on?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas