bulk delete in performance way with incremental fashion

bulk delete in performance way with incremental fashion - sql

We want to delete some matching rows within the same table seems to have a performance issue as the table has 1 billion rows.
Since it Oracle database, we can use PLSQL as well to incrementally delete, but we want to see what options available just using sql to improve the performance of it.
DELETE
FROM schema.adress
WHERE key = 6776
AND matchSequence = 1
AND EXISTS
(
SELECT 1
FROM schema.adress t2
WHERE t2.flngEntityKey = 9909
AND t2.matchType = 'NEW'
AND t2.matchType = schema.adress.matchType
AND t2.key = schema.adress.key
AND t2.sequence = schema.adress.sequence
)
Additional details
Cardinality is 900 Million rows
No triggers

Related

DML operation performance for large table with multiple projections in vertica

I have table with 3billion records, and have 2 projections on the same. Currently delete operation is taking around 3-4 hours in daily loads.
So having multiple projections impacts data loads or DML operations in vertica. Or is there any other better way to tune the delete operation in vertica.
DELETE FROM TABLE1 WHERE EXISTS (SELECT 1 FROM TABLE2 WHERE ID = TABLE1.ID);
Table1 have 3b records while Table2 have 50k records.
Projection1 for Table1 have ID column while Projection2 dont have ID column in it.

To be more efficient, the statement should be:
DELETE FROM table1 WHERE id IN (
SELECT id FROM table2
);
But also that can be inefficient if id is not part of all projections of table1

Oracle Sql tuning with index

I have a table T with some 500000 records. That table is a hierarchical table.
My goal is to update the table by self joining the same table based on some condition for parent - child relationship
The update query is taking really long because the number of rows is really high. I have created an unique index on the column which helps identifying the rows to update (meanign x and Y). After creating the index the cost has reduced but still the query is performing a lot slower.
This my query format
update T
set a1, b1
= (select T.parent.a1, T.parent.b1
from T T.paremt, T T.child
where T.parent.id = T.child.Parent_id
and T.X = T.child.X
and T.Y = T.child.Y
after creating the index the execution plan shows that it is doing an index scan for CRS.PARENT but going for a full table scan for for CRS.CHILD and also during update as a result the query is taking for ever to complete.
Please suggest any tips or recommendations to solve this problem

You are updating all 500,000 rows, so an index is a bad idea. 500,000 index lookups will take much longer than it needs to.
You would be better served using a MERGE statement.
It is hard to tell exactly what your table structure is, but it would look something like this, assuming X and Y are the primary key columns in T (...could be wrong about that):
MERGE INTO T
USING ( SELECT TC.X,
TC.Y,
TP.A1,
TP.A2
FROM T TC
INNER JOIN T TP ON TP.ID = TC.PARENT_ID ) U
ON ( T.X = U.X AND T.Y = U.Y )
WHEN MATCHED THEN UPDATE SET T.A1 = U.A1,
T.A2 = U.A2;

Bulk update in Postgres

I need to update one table records based on another one.
I tried
update currencies
set priority_order= t2.priority_order
from currencies t1
inner join currencies1 t2
on t1.id = t2.id
but is giving error (same query work for MySQL and SQL Server).
Then I tried below:
update currencies
set priority_order= (select
priority_order
from currencies1
where
currencies.id=currencies1.id
)
It is working but is very slow, I need to do it for some big tables as well.
Any ideas?

In Postgres, this would look something like:
update currencies t1
set priority_order = t2.priority_order
from currencies1 t2
where t1.id = t2.id;

UPDATE currencies dst
SET priority_order = src.priority_order
FROM currencies src
WHERE dst.id = src.id
-- Suppress updates if the value does not actually change
-- This will avoid creation of row-versions
-- which will need to be cleaned up afterwards, by (auto)vacuum.
AND dst.priority_order IS DISTINCT FROM src.priority_order
;
Testing (10K rows, after cache warmup), using the same table for both source and target for the updates:
CREATE TABLE
INSERT 0 10000
VACUUM
Timing is on.
cache warming:
UPDATE 0
Time: 16,410 ms
zero-rows-touched:
UPDATE 0
Time: 8,520 ms
all-rows-touched:
UPDATE 10000
Time: 84,375 ms
Normally you will seldomly see the case of no rows affected, neither the case with all rows affected. But with only 50% of the rows touched, the query would still be twice as fast. (plus the reduced work for vacuum after the query)

Fine tuning a UPDATE query on a table

I have 2 tables table1 and table2 both having large amounts of data, Table1 has 5 million and Table2 has 80,000 records. I am running an update,
Update Table1 a
Set
a.id1=(SELECT DISTINCT p.col21
FROM Table2 p
WHERE p.col21 = SUBSTR(a.id, 2, LENGTH(a.id));
The substr and distinct in the query are making it slow.
How can this query be re-written to speed up the process and
What columns do I need to index

May be a merge
merge into Table1 a
using Table2 p
on (p.col21 = SUBSTR(a.id, 2, LENGTH(a.id))
When matched then
update set a.id1 = p.col21;
and a Function Based Index on a.id.

I see that you are dynamically calculating:
p.col21=SUBSTR(a.id,2,LENGTH(a.id))
This will take some substantial time and make it impossible to create an index. Did you consider actually creating a column with that value? This would allow you to index on it and make it much faster. If the id is static this seems like an easy win.

How many rows does your subquery return, and how many rows are you updating? With a large number of updates, indexes may not help you out at all.

To execute SQL query takes a lot of time

I have two tables. Tables 2 contains more recent records.
Table 1 has 900K records and Table 2 about the same.
To execute the query below takes about 10 mins. Most of the queries (at the time of execution the query below) to table 1 give timeout exception.
DELETE T1
FROM Table1 T1 WITH(NOLOCK)
LEFT OUTER JOIN Table2 T2
ON T1.ID = T2.ID
WHERE T2.ID IS NULL AND T1.ID IS NOT NULL
Could someone help me to optimize the query above or write something more efficient?
Also how to fix the problem with time out issue?

Optimizer will likely chose to block whole table as it is easier to do if it needs to delete that many rows. In the case like this I delete in chunks.
while(1 = 1)
begin
with cte
as
(
select *
from Table1
where Id not in (select Id from Table2)
)
delete top(1000) cte
if ##rowcount = 0
break
waitfor delay '00:00:01' -- give it some rest :)
end
So the query deletes 1000 rows at a time. Optimizer will likely lock just a page to delete the rows, not whole table.
The total time of this query execution will be longer, but it will not block other callers.
Disclaimer: assumed MS SQL.
Another approach is to use SNAPSHOT transaction. This way table readers will not be blocked while rows are being deleted.

Wait a second, are you trying to do this...
DELETE Table1 WHERE ID NOT IN (SELECT ID FROM Table2)
?
If so, that's how I would write it.
You could also try to update the statistics on both tables. And of course indexes on Table1.ID and Table2.ID could speed things up considerably.
EDIT: If you're getting timeouts from the designer, increase the "Designer" timeout value in SSMS (default is 30 seconds). Tools -> Options -> Designers -> "Override connection string time-out value for table designer updates" -> enter reasonable number (in seconds).

Both ID columns need an index
Then use simpler SQL
DELETE Table1 WHERE NOT EXISTS (SELECT * FROM Table2 WHERE Table1.ID = Table2.ID)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

bulk delete in performance way with incremental fashion - sql

Related

DML operation performance for large table with multiple projections in vertica

Oracle Sql tuning with index

Bulk update in Postgres

Fine tuning a UPDATE query on a table

To execute SQL query takes a lot of time

Categories

Resources