I work in PL/SQL Developer with Oracle.
I have this simple SQL query below:
select
col1
col2,
col3,
col4,
col5
from table t1
(condition required)
and col1=X or col=X or...
and I want to select all different records having col2 and col3 with identical values.
For example:
Record 1: col2=5 col3=orange Record 2: col2=5 col3=orange Record 3:
col2=8 col3=apple Record 4: col2=8 col3=apple
Use analytic functions:
select t.*
from (select t.*, count(*) over (partition by col2, col3) as cnt
from t
) t
where cnt > 1
order by col2, col3;
select
t1.col1
t1.col2,
t1.col3,
t1.col4,
t1.col5
from table t1
join table t2 on t1.col2 = t2.col2 and t1.col3 = t2.col3 and t1.rowid <> t2.rowid
where ...
;
If you have a primary key column on the table, use that instead of rowid.
Related
Col1
Col2
Col3
A
B
1
A
B
1
A
B
2
A
B
2
A
c
1
When col1 and Col2 values are same and Col3 values are different I dont want that values in result set.
I want result as below. I tried with row_number, group by , so manythings but did not worked. Please help me here
Col1
Col2
Col3
A
c
1
You can use exists:
delete from t
where exists (select 1
from t t2
where t2.col1 = t.col1 and t2.col2 = t.col1 and
t2.col3 <> t.col3
);
You can also use window functions:
with todelete as (
select t.*,
min(col3) over (partition by col1, col2) as min_col3,
max(col3) over (partition by col1, col2) as min_col4
from t
)
delete from todelete
where min_col3 <> max_col3;
Best way is to make these column a unique composite key. But here is a query to delete all records other than your desired result.
delete from Table_1
where
Col1=(SELECT Col1
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1)
And
Col2 =(SELECT Col2
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1)
this might not be the most optimized and efficient query but it works. if you don't want to delete duplicated records and just retrieve unique ones:
SELECT Col1,Col2
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) = 1
To get duplicating records:
SELECT Col2,Col1
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1
Is there a way to do something like:
Insert Into (col1, col2, col3)
Select col1, col2, col3, max(col4)
From mytable
Group By col1, col2, col3
That gives me: The select list for the INSERT statement contains more items than the insert list.
I want to use the max function to filter out dupes but when I select this extra field, the order of fields and number of fields doesn’t match up. How can I filter a list from a table, use the max function, and insert all records except the ones in the max field?
I want to use the max function to filter out dupes
Well, I suspect that you actually want distinct:
insert into my_target_table(col1, col2, col3)
select distinct col1, col2, col3 from my_source_table
This will insert one record in the target table for each distinct (col1, col2, col3) tuple in the source table.
You are describing something like this:
Insert Into (col1, col2, col3)
select col1, col2, col3
from mytable
where t.col4 = (select max(t2.col4)
from mytable t2
where t2.col1 = t.col1 and t2.col2 = t.col2 and t2.col3 = t.col3
);
However, this is pretty much equivalent to select distinct (NULL values might be treated differently). You probably want dupes defined on only one column, so I'm thinking:
insert into (col1, col2, col3)
select col1, col2, col3
from mytable
where t.col4 = (select max(t2.col4)
from mytable t2
where t2.col1 = t.col1
);
I am trying to debug the below code. It throws me an error saying ERROR: syntax error at or near "(" .
My aim to to delete duplicate records in the table
delete FROM (SELECT *,
ROW_NUMBER() OVER (partition BY snapshot,col1,col2,col3,col4,col5) AS rnum
FROM table where snapshot='2019-08-31') as t
WHERE t.rnum > 1;
try like below
DELETE FROM table a
WHERE a.ctid <> (SELECT min(b.ctid)
FROM table b
WHERE a.snapshot = b.snapshot
and a.col1=b.col1 and a.col2=b.col2);
Postgres does not allow deleting from subqueries. You can join in other tables. But in this case, I think a correlated subquery is sufficient, assuming you have a unique id of some sort:
delete from t
where snapshot = '2019-08-31' and
id > (select min(id)
from t t2
where t2.snapshot = t.snapshot and
t2.col1 = t.col1 and
t2.col2 = t.col2 and
t2.col3 = t.col3 and
t2.col4 = t.col4 and
t2.col5 = t.col5
);
Note: This also assumes that the columns are not NULL. You can replace = with is not distinct from if NULLs are a possibility.
If you have lots of duplicates and no identity column, you might find it simpler to remove and re-insert the data:
create table temp_snapshot as
select distinct on (col1, col2, col3, col4, col5) t.*
from t
where snapshot = '2019-08-31'
order by col1, col2, col3, col4, col5;
delete from t
where col1, col2, col3, col4, col5;
insert into t
select *
from temp_snapshot;
If your table is partitioned by snapshot (possibly a very good idea), then you can drop the partition instead and then add the data back in. That process is typically faster than deleting records.
I need to find all the rows where col2 has same value but col3 has a different value .From the table above , It should return Pk1,Pk3 and Pk4. I tried the following a self join but i see duplicate records .
SELECT T1.COL1,T1.COL2,T1.COl3
FROM Tab T1, Tab T2
WHERE T1.Col2=T2.Col1
AND T1.Col3 <> T2.Col3
;
I would use exists:
select t.*
from t
where exists (select 1 from t t2 where t2.col2 = t.col2 and t2.col3 <> t.col3);
Analytic functions are better for this kind of job - they avoid all joins. For example:
select col1, col2, col3
from (
select t.*,
case when min(col3) over (partition by col2) !=
max(col3) over (partition by col2) then 0 end as flag
from tab t
)
where flag = 0;
It is not entirely clear how you want to handle null in col3 - does that count as a "different" value? What if you have null more than once (for the same value in col2)? Also - what if col2 can be null?
Try this:
SELECT COL1,COL2,COL3 FROM
(SELECT COL1,COL2,COL3, COUNT(DISTINCT COL3) OVER (PARTITION BY COL2) CNT
FROM TEST)
WHERE CNT > 1
db<>fiddle demo
Cheers!!
Is there an alternative to re-write the below oracle query without using sub-queries
SELECT COL1,COL2 FROM TABLE WHERE COL2 IN (SELECT MAX(COL2) FROM TABLE)
Edit: There is only 1 table with COL1 and COL2 where the row with maximum value of
COL2 is the expected output
SELECT COL1,COL2
FROM TABLE
ORDER BY COL2 DESC
FETCH FIRST 1 ROW WITH TIES
This one should also work:
SELECT MAX(COL1) KEEP (DENSE_RANK LAST ORDER BY COL2) as COL1,
MAX(COL2) as COL2
FROM TABLE;
Use PARTITION key find out maximum col2 with col1 as per below:
select COL1, COL2 from (
select COL1, COL2, ROW_NUMBER() over(PARTITION BY COL1 ORDER BY COL2 desc) row_num from TABLE
) where row_num=1;
Assuming COL1 can be used as candidate key -
SELECT T1.COL1,
T1.COL2
FROM TABLE1 T1
INNER JOIN TABLE1 T2
ON (T1.COL1 = T2.COL1
AND T1.COL2 >= T2.COL2)