How to find records with duplicate values for some specific columns only in oracle PL/SQL - sql

I work in PL/SQL Developer with Oracle.
I have this simple SQL query below:
select
col1
col2,
col3,
col4,
col5
from table t1
(condition required)
and col1=X or col=X or...
and I want to select all different records having col2 and col3 with identical values.
For example:
Record 1: col2=5 col3=orange Record 2: col2=5 col3=orange Record 3:
col2=8 col3=apple Record 4: col2=8 col3=apple

Use analytic functions:
select t.*
from (select t.*, count(*) over (partition by col2, col3) as cnt
from t
) t
where cnt > 1
order by col2, col3;

select
t1.col1
t1.col2,
t1.col3,
t1.col4,
t1.col5
from table t1
join table t2 on t1.col2 = t2.col2 and t1.col3 = t2.col3 and t1.rowid <> t2.rowid
where ...
;
If you have a primary key column on the table, use that instead of rowid.

Related

Delete Duplicate record in sql server if 2 colums matching

Col1
Col2
Col3
A
B
1
A
B
1
A
B
2
A
B
2
A
c
1
When col1 and Col2 values are same and Col3 values are different I dont want that values in result set.
I want result as below. I tried with row_number, group by , so manythings but did not worked. Please help me here
Col1
Col2
Col3
A
c
1
You can use exists:
delete from t
where exists (select 1
from t t2
where t2.col1 = t.col1 and t2.col2 = t.col1 and
t2.col3 <> t.col3
);
You can also use window functions:
with todelete as (
select t.*,
min(col3) over (partition by col1, col2) as min_col3,
max(col3) over (partition by col1, col2) as min_col4
from t
)
delete from todelete
where min_col3 <> max_col3;
Best way is to make these column a unique composite key. But here is a query to delete all records other than your desired result.
delete from Table_1
where
Col1=(SELECT Col1
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1)
And
Col2 =(SELECT Col2
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1)
this might not be the most optimized and efficient query but it works. if you don't want to delete duplicated records and just retrieve unique ones:
SELECT Col1,Col2
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) = 1
To get duplicating records:
SELECT Col2,Col1
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1

How Can I Use the Max Function to Filter a List and Insert Into

Is there a way to do something like:
Insert Into (col1, col2, col3)
Select col1, col2, col3, max(col4)
From mytable
Group By col1, col2, col3
That gives me: The select list for the INSERT statement contains more items than the insert list.
I want to use the max function to filter out dupes but when I select this extra field, the order of fields and number of fields doesn’t match up. How can I filter a list from a table, use the max function, and insert all records except the ones in the max field?
I want to use the max function to filter out dupes
Well, I suspect that you actually want distinct:
insert into my_target_table(col1, col2, col3)
select distinct col1, col2, col3 from my_source_table
This will insert one record in the target table for each distinct (col1, col2, col3) tuple in the source table.
You are describing something like this:
Insert Into (col1, col2, col3)
select col1, col2, col3
from mytable
where t.col4 = (select max(t2.col4)
from mytable t2
where t2.col1 = t.col1 and t2.col2 = t.col2 and t2.col3 = t.col3
);
However, this is pretty much equivalent to select distinct (NULL values might be treated differently). You probably want dupes defined on only one column, so I'm thinking:
insert into (col1, col2, col3)
select col1, col2, col3
from mytable
where t.col4 = (select max(t2.col4)
from mytable t2
where t2.col1 = t.col1
);

using partition by clause in delete statement postgresql

I am trying to debug the below code. It throws me an error saying ERROR: syntax error at or near "(" .
My aim to to delete duplicate records in the table
delete FROM (SELECT *,
ROW_NUMBER() OVER (partition BY snapshot,col1,col2,col3,col4,col5) AS rnum
FROM table where snapshot='2019-08-31') as t
WHERE t.rnum > 1;
try like below
DELETE FROM table a
WHERE a.ctid <> (SELECT min(b.ctid)
FROM table b
WHERE a.snapshot = b.snapshot
and a.col1=b.col1 and a.col2=b.col2);
Postgres does not allow deleting from subqueries. You can join in other tables. But in this case, I think a correlated subquery is sufficient, assuming you have a unique id of some sort:
delete from t
where snapshot = '2019-08-31' and
id > (select min(id)
from t t2
where t2.snapshot = t.snapshot and
t2.col1 = t.col1 and
t2.col2 = t.col2 and
t2.col3 = t.col3 and
t2.col4 = t.col4 and
t2.col5 = t.col5
);
Note: This also assumes that the columns are not NULL. You can replace = with is not distinct from if NULLs are a possibility.
If you have lots of duplicates and no identity column, you might find it simpler to remove and re-insert the data:
create table temp_snapshot as
select distinct on (col1, col2, col3, col4, col5) t.*
from t
where snapshot = '2019-08-31'
order by col1, col2, col3, col4, col5;
delete from t
where col1, col2, col3, col4, col5;
insert into t
select *
from temp_snapshot;
If your table is partitioned by snapshot (possibly a very good idea), then you can drop the partition instead and then add the data back in. That process is typically faster than deleting records.

Get Rows from table where column one has same value and column 2 has a different value

I need to find all the rows where col2 has same value but col3 has a different value .From the table above , It should return Pk1,Pk3 and Pk4. I tried the following a self join but i see duplicate records .
SELECT T1.COL1,T1.COL2,T1.COl3
FROM Tab T1, Tab T2
WHERE T1.Col2=T2.Col1
AND T1.Col3 <> T2.Col3
;
I would use exists:
select t.*
from t
where exists (select 1 from t t2 where t2.col2 = t.col2 and t2.col3 <> t.col3);
Analytic functions are better for this kind of job - they avoid all joins. For example:
select col1, col2, col3
from (
select t.*,
case when min(col3) over (partition by col2) !=
max(col3) over (partition by col2) then 0 end as flag
from tab t
)
where flag = 0;
It is not entirely clear how you want to handle null in col3 - does that count as a "different" value? What if you have null more than once (for the same value in col2)? Also - what if col2 can be null?
Try this:
SELECT COL1,COL2,COL3 FROM
(SELECT COL1,COL2,COL3, COUNT(DISTINCT COL3) OVER (PARTITION BY COL2) CNT
FROM TEST)
WHERE CNT > 1
db<>fiddle demo
Cheers!!

Oracle: Alternative to sub-query

Is there an alternative to re-write the below oracle query without using sub-queries
SELECT COL1,COL2 FROM TABLE WHERE COL2 IN (SELECT MAX(COL2) FROM TABLE)
Edit: There is only 1 table with COL1 and COL2 where the row with maximum value of
COL2 is the expected output
SELECT COL1,COL2
FROM TABLE
ORDER BY COL2 DESC
FETCH FIRST 1 ROW WITH TIES
This one should also work:
SELECT MAX(COL1) KEEP (DENSE_RANK LAST ORDER BY COL2) as COL1,
MAX(COL2) as COL2
FROM TABLE;
Use PARTITION key find out maximum col2 with col1 as per below:
select COL1, COL2 from (
select COL1, COL2, ROW_NUMBER() over(PARTITION BY COL1 ORDER BY COL2 desc) row_num from TABLE
) where row_num=1;
Assuming COL1 can be used as candidate key -
SELECT T1.COL1,
T1.COL2
FROM TABLE1 T1
INNER JOIN TABLE1 T2
ON (T1.COL1 = T2.COL1
AND T1.COL2 >= T2.COL2)