Delete duplicate rows from oracle DB with one condition

Delete duplicate rows from oracle DB with one condition - sql

I have got the script right but the execution time of completion is about 5 Mins to delete 11320860 records. Is there alternate way of writing this query so that the execution time is reduced ?
Scenario is same record combination can have E as well as A records. And the code is trying to delete both A and E records if there exists at least one E record for the same combination.
Delete from tableA u
WHERE EXISTS
(Select 1 from tableA w
WHERE w.a = u.a
AND w.b = u.b
AND w.c = u.c
AND w.d = u.d
AND w.flag ='E' ); - Del about 11320860 records in 4 Mins

So you need this, I think:
Delete from tableA u
WHERE u.flag in ('A', 'E')
and EXISTS
(Select 1 from tableA w
WHERE w.a = u.a
AND w.b = u.b
AND w.c = u.c
AND w.d = u.d
AND w.flag ='E')
This way also should work:
delete from tableA
where flag in ('A', 'E')
and (a, b, c, d) in
(select a, b, c, d
from tableA
where flag = 'E')

Related

SQL replase empty row to 0

I am trying to get a list of locks on ORACLE. When there are no locks, an empty string is returned. How to make it output 0 if there are no rows, and output the required result if there are rows?
SELECT (b.seconds_in_wait) as TIME
FROM sys.v_$session b, sys.dba_blockers c, sys.dba_lock a
WHERE c.holding_session = a.session_id AND c.holding_session = b.sid and (username like '%MOBILE%');
I don't even know where to look for the answer)

Left join your sql to a dummy row and handle null with Nvl() function
WITH
dummy AS
( Select 0 "DUMMY" From Dual),
blocked AS
( -- your SQL using Joins
SELECT b.SECONDS_IN_WAIT "A_TIME"
FROM sys.v_$session b
INNER JOIN sys.dba_lock a ON(a.SESSION_ID = b.SID)
INNER JOIN sys.dba_blockers c ON(c.HOLDING_SESSION = b.SID)
WHERE b.username LIKE('%MOBILE%')
)
SELECT Nvl(b.A_TIME, 0) "A_TIME"
FROM dummy
LEFT JOIN blocked b ON(1 = 1)

Use the below by creating a virtual row using DUAL
WITH SUB_QUERY AS(
SELECT (b.seconds_in_wait) as TIME
FROM sys.v_$session b, sys.dba_blockers c, sys.dba_lock a
WHERE c.holding_session = a.session_id AND c.holding_session = b.sid
and(username like '%MOBILE%'))
select * FROM SUB_QUERY
union all
select 0 FROM DUAL
where NOT EXISTS (SELECT 1 FROM SUB_QUERY);

I would simply query the dual table and outer join the wait time query to it, like so:
WITH wait_time AS
(SELECT (b.seconds_in_wait) AS TIME
FROM sys.v_$session b,
sys.dba_blockers c,
sys.dba_lock a
WHERE c.holding_session = a.session_id
AND c.holding_session = b.sid
AND (username LIKE '%MOBILE%'))
SELECT NVL(wt.time, 0)
FROM dual
LEFT OUTER JOIN wait_time wt ON 1=1;
That way, you'll always get at least one row returned, and you're only querying the wait time query once.

SQL Finding duplicate values in two of the three columns of each row

Let's say we have three columns: A, B, and C.
I would like to filter the results as follows:
The values of A and B are the same (duplicated) for > 1 (more than 1) row, and the value of C is always different.
In the attached image, the values that appear selected would meet the conditions mentioned above.
What I've tried:
SELECT
a.notation as A, a.gene as B, b.id as C
FROM
`db-dummy`.sgdata c
join `db-dummy`.g_info a on a.rec_id = c.gen_id
join `db-dummy`.spec_data b on b.rec_id = c.spec_id GROUP BY A, B HAVING COUNT(*) > 1;
I thought that using GROUP BY and HAVING COUNT(*) > 1 I could get the desired result, but I get the following error:
SQL Error [1055] [42000]: (conn=1632) Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'db-dummy.b.spec_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

If you had a single table, I would suggest just using exists. But because you have a join, use window functions. If you are. looking for different values of id:
SELECT A, B, C
FROM (SELECT a.notation as A, a.gene as B, b.id as C,
MIN(b.id) OVER (PARTITION BY a.notation, a.gene) as min_id,
MAX(b.id) OVER (PARTITION BY a.notation, a.gene) as max_id
FROM `db-dummy`.sgdata c JOIN
`db-dummy`.g_info a
ON a.rec_id = c.gen_id JOIN
`db-dummy`.spec_data b
ON b.rec_id = c.spec_id
) x
WHERE min_id <> max_id;
If you are just looking for multiple rows for a given A and B, then you can use:
SELECT A, B, C
FROM (SELECT a.notation as A, a.gene as B, b.id as C,
COUNT(*) OVER (PARTITION BY a.noation, a.gene) as cnt
FROM `db-dummy`.sgdata c JOIN
`db-dummy`.g_info a
ON a.rec_id = c.gen_id JOIN
`db-dummy`.spec_data b
ON b.rec_id = c.spec_id
) x
WHERE cnt > 1;

SELECT * FROM `db-dummy`.sgdata a
LEFT JOIN
(SELECT COUNT(Id) as count, notation, gene
FROM `db-dummy`.sgdata
GROUP BY notation, gene
HAVING COUNT(id) > 1) b
on a.notation = b.notation AND a.gene = b.gene

Summarized table in postgreSQL for better performance

I am using postgreSQL as my database. I have a table MASTER(A, B, C, D, N1, N2, N3, N4, N5, N6) where the primary key is (A, B, C, D) and N1, N2, N3, N4, N5, N6 are the numeric columns.
I have a query as below to get the summarized data of each A selected from each list in MASTERCOMB.
SELECT MASTERCOM.A
,STATS.sumn1
,STATS.sumn2
,STATS.sumn3
,STATS.sumn4
,STATS.sumn5
,STATS.sumn6
FROM (WITH
sum1 AS (SELECT A, SUM(N1) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N1) DESC LIMIT $2),
sum2 AS (SELECT A, SUM(N2) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N2) DESC LIMIT $2),
sum3 AS (SELECT A, SUM(N3) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N3) DESC LIMIT $2),
sum4 AS (SELECT A, SUM(N4) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N4) DESC LIMIT $2),
sum5 AS (SELECT A, SUM(N5) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N5) DESC LIMIT $2),
sum6 AS (SELECT A, SUM(N6) FROM MASTER WHERE B = $1 GROUP BY A ORDER BY SUM(N6) DESC LIMIT $2)
SELECT DISTINCT COALESCE(sum1.A, sum2.A, sum3.A, sum4.A, sum5.A, sum6.A) A
FROM sum1
FULL OUTER JOIN sum2 ON sum2.A = sum1.A
FULL OUTER JOIN sum3 ON sum3.A = sum1.A
FULL OUTER JOIN sum4 ON sum4.A = sum1.A
FULL OUTER JOIN sum5 ON sum5.A = sum1.A
FULL OUTER JOIN sum6 ON sum6.A = sum1.A) MASTERCOMB
LEFT JOIN (SELECT A
,SUM(N1) sumn1
,SUM(N2) sumn2
,SUM(N3) sumn3
,SUM(N4) sumn4
,SUM(N5) sumn5
,SUM(N6 sumn6)
FROM MASTER WHERE B = $1 GROUP BY A) AS STATS
ON STATS.A = MASTERCOMB.A
This is just one kind of query with B in the WHERE clause. I may have to query with different combinations like 'WHERE C = $3' OR 'WHERE D = $4'. In rare cases I may have to query with combinations of multiple conditions on B, C and D together;
As the table grows, the performance of the queries could drop. So I am thinking of two aproaches
Approach #1:
Create Summary Tables SMRY_A_B, SMRY_A_C, SMRY_A_D
On each insert, update and delete of MASTER table, SUM the values and insert/update/delete respective tables
Approach #2:
Create a Summary table SMRY_A_B_C_D with primary key (A, B, C, D)
On each insert, update and delete of MASTER table, SUM the values and insert/update/delete SMRY_A_B_C_D table
possible values for SMRY_A_B_C_D could be
(valA, valB, 'N/A', 'N/A', sumn1, sumn2, sumn3, sumn4, sumn5, sumn6)
(valA, 'N/A, valC, 'N/A', sumn1, sumn2, sumn3, sumn4, sumn5, sumn6)
(valA, 'N/A, 'N/A', 'valD', sumn1, sumn2, sumn3, sumn4, sumn5, sumn6)
Questions:
Which approach is better to go with?
Should I not consider both the approaches and query from the master table itself? If so should I optimize the query?

how can get the result and appoint into next

1 )
SELECT A.SSID FROM T_TABLE_1 A, T_TABLE_2 B WHERE A.SSID = B.SSID AND B.NUMBER = '123456';`
2)
delete from T_TABLE_3 where ssid='139729252';
delete from T_TABLE_4 where ssid='139729252';
Result of 1) is a SSID, eg: '139729252' ，how can I use the result of 1) into 2), no need to copy and paste every time? thanks.

Just use IN operator if you expect more than 1 record to be retrieved using your select statement. Else you can use = operator.
Ex:
delete from T_TABLE_3
where ssid=(SELECT A.SSID FROM T_TABLE_1 A, T_TABLE_2 B WHERE A.SSID = B.SSID AND B.NUMBER = '123456');
or
delete from T_TABLE_3
where ssid IN (SELECT A.SSID FROM T_TABLE_1 A, T_TABLE_2 B WHERE A.SSID = B.SSID AND B.NUMBER = '123456');

delete from T_TABLE_3 where ssid in ( select a.ssid t_table_1 a, t_table_2 b where a.ssid=b.ssid and b.number='123456');
or

convert if-else statement inside a cursor in a set-based approach

I have a script containing a cursor with if-else statement, but it takes too much times to browse the table. (a table with 79000 rows takes 1h).
So i need to convert it in a set-based approach.
The if statement is
IF (
SELECT count (b.key)
FROM general..ean a,
general..mainframe b,
general..hope c
WHERE a.ean = #ean
AND a.c_suppression = '0'
AND a.key = b.key
AND b.key = c.key
AND c.canal = #canal
) = 0
where #ean and #canal are value retrieved in each row with the cursor. The table browsed is tmp_day_house_info_corporate.
So i need to retrieve all rows from tmp_day_house_info_corporate, for which #info and #canal in the if statement retrieve 0.
Thank you for any help.

SELECT *
FROM tmp_day_house_info_corporate
WHERE not exists(
SELECT b.key
FROM general..ean a,
general..mainframe b,
general..hope c
WHERE a.ean = tmp_day_house_info_corporate.ean
AND a.c_suppression = '0'
AND a.key = b.key
AND b.key = c.key
AND c.canal = tmp_day_house_info_corporate.canal
)

Count is highly ineffective when checking if record exists, you would be much better with not exists
IF NOT EXISTS (
SELECT *
FROM general..ean a,
general..mainframe b,
general..hope c
WHERE a.ean = #ean
AND a.c_suppression = '0'
AND a.key = b.key
AND b.key = c.key
AND c.canal = #canal
)
If this is till to slow show your full query and maybe it will be possible to make it set-based.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Delete duplicate rows from oracle DB with one condition - sql

Related

SQL replase empty row to 0

SQL Finding duplicate values in two of the three columns of each row

Summarized table in postgreSQL for better performance

how can get the result and appoint into next

convert if-else statement inside a cursor in a set-based approach

Categories

Resources