I have rows in my table that needs deleting based on a few columns being duplicates.
e.g Col1,Col2,Col3,Col4
If Col1,Col2 and Col3 are duplicates regardless of what value is in Col4 I want both these duplicates deleted. How do I do this?
You can do this using the where clause:
delete from t
where (col1, col2, col3) in (select col1, col2, col3
from t
group by col1, col2, col3
having count(*) > 1
);
Group by these IDs and check with HAVING whether there are duplicates. With the duplicates thus found delete the records.
delete from mytable
where (col1,col2,col3) in
(
select col1,col2,col3
from mytable
group by col1,col2,col3
having count(*) > 1
);
Use EXISTS to remove a row if another row with same col1, col2 and col3 exists with a lower col4 value. I.e keep one col1, col2, col3 row.
delete from tablename t1
where exists (select 1 from tablename t2
where t2.col1 = t1.col1
and t2.col2 = t1.col2
and t2.col3 = t1.col3
and t2.col4 < t1.col4)
To remove both/all rows, skip the col4 condition, do a group by instead:
delete from tablename t1
where exists (select 1 from tablename t2
where t2.col1 = t1.col1
and t2.col2 = t1.col2
and t2.col3 = t1.col3
group by t2.col1, t2.col2, t2.col3
having count(*) > 1)
Related
I would like to understand the easy/better way to join 2 tables with same characteristics and different measures as an example described below:
tab1
Col1
Col2
Measure1
1
1
10
1
2
5
tab2
Col1
Col2
Measure2
1
1
20
2
1
25
Expected Result
Col1
Col2
Measure1
Measure2
1
1
10
20
1
2
5
0
2
1
0
25
Questions:
How to avoid message: Ambiguous column name col1?
How to create a correct Join?
I have tried:
select col1, col2, t1.Measure1, t2.Measure2
from tab1 t1
full outer jon tab2 t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
I have tried a Union and it works, but i am looking a easy way using joins:
Select col1, col2, Measure1, 0 as Measure2 From tab1
Union
Select col1, col2, 0 as Measure1, Measure2 From tab2
The full join is the correct approach. But you need to disambiguate col1 and col2 in the select clause: both tables have both columns, so it is unclear to which column an unprefixed col1 refers.
A typical approach uses coalesce():
select
coalesce(t1.col1, t2.col1) col1,
coalesce(t1.col2, t2.col2) col2,
coalesce(t1.measure1, 0) measure1,
coalesce(t2.measure2, 0) measure2
from tab1 t1
full outer jon tab2 t2
on t1.col1 = t2.col1 and t1.col2 = t2.col2
Note that you also need coalesce() around the measures to return 0 instead of null on "missing" values.
In some databases (eg Postgres), you can use the using syntax to declare the join conditions for columns that have the same name across the tables ; this syntax automagically disambiguates the unprefixed column names, so:
select
col1,
col2,
coalesce(t1.measure1, 0) measure1,
coalesce(t2.measure2, 0) measure2
from tab1 t1
full join tab2 t2 using (col1, col2)
You should reference source tables for col1 and col2 as well.
As you're using FULL OUTER JOIN I'd recommend using COALESCE statement.
SELECT COALESCE(t1.col1, t2.col1) as col1,
COALESCE(t1.col2, t2.col2) as col2,
t1.Measure1,
t2.Measure2
FROM tab1 t1
FULL OUTER JOIN tab2 t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
Let's say I have two tables t1 and t2.
t1 has two integer cols col1 (primary) and col2
t2 has two cols a foreign key of t1.col1 and t2.col2
I want to do the following
Retrieve only the records where t1.col2 is unique OR if t1.col2 is duplicate only those if t2.col2 is not null.
Insert the above records into another summary table, let's say t3
This is what I tried:
insert into t3 (col1,col2)
select col1, col2
from t1
where t.col1 in (select A.col1 from t1 as A
group by 1
having count(*) > 1
union
select col1, col2
from t1, t2
where t.col1 in (select A.col1 from t1 as A
group by 1
having count(*) > 1
and t2.col2 is not null;
While the 'union qry' works on its own, the insert is not happening.
Any ideas or any other efficient way to achieve this please
You can select the records you want using:
select t1.*
from (select t1.*, count(*) over (partition by col2) as cnt
from t1
) t1
where cnt = 1 or
exists (select 1 from t2.col1 = t1.col1 and t2.col2 is null);
The rest is just an insert.
I have two tables with the same columns, There are no unique columns for these tables. Lets say the columns are Col1, Col2, Col3 and Col4. The tables are T1 and T2.
What I want to do is insert all the rows from T2 to T1 where Col1 & Col2 combinations do not exist in T1 already. Col1 is a string and Col2 is an int.
So for example Col1 = "APPLE" and Col2 = "2019". If a row contains Col1 = "APPLE" and Col2=2019 in T2 I do not want to insert it into T1 whereas if a row contains Col1 = "APPLE" and Col2=2020 then I want to insert it into T1.
I am trying to find the easiest solution to do this and cant seem to find a straightforward way using INSERT INTO WHERE NOT EXISTS or using UPSERT.
You can use insert ... select ... where not exists with tuple equality to compare (col1, col2):
insert into t1(col1, col2, col3, col4)
select * from t2
where not exists (
select 1 from t1 where (t1.col1, t1.col2) = (t2.col1, t2.col2)
)
With NOT EXISTS:
insert into t1(Col1, Col2, Col3, Col4)
select Col1, Col2, Col3, Col4
from t2
where not exists (
select 1 from t1
where t1.Col1 = t2.Col1 and t1.Col2 = t2.Col2
)
See a simplified demo.
I am having some trouble using a subquery for the IN clause of a query.
Hard-coding the IN .. values allows the query to execute quickly, but using a subquery slows everything down. Is there a way to speed this query up?
SELECT col1, col2, col3 FROM table1
WHERE ...
and col1 in (SELECT col1 FROM table2)
...
*The values for the IN clause will be a list of strings
SELECT col1, col2, col3 FROM table1
WHERE ...
and col1 in ('str1', 'str2', 'str3', ...)
...
The above works fine.
EDIT:
I think I was oversimplifying the problem. The query I am trying to execute looks like this:
SELECT col1, col2, col3, ...
FROM table1 t1, table2 t2
WHERE t1.col1 IN (SELECT col FROM table3)
and t1.col2 < 50
and t2.col3 = t1.col3
...
You cant write select * from . If you give select * from, it doesnot understand which column to compare with from table2. Use the column name you need.
SELECT * FROM table1
WHERE ...
and col1 in (SELECT col1 FROM table2)
...
Use JOIN instead,
and keep an index defined on table1.col1 or table2.col3 or table1.col3 or table3.col :
SELECT col1, col2, col3, ...
FROM table1 t1
INNER JOIN table2 t2 on ( t2.col3 = t1.col3 )
INNER JOIN table3 t3 on ( t1.col1 = t3.col )
WHERE t1.col2 < 50;
Never use commas in the FROM clause. Always use proper, explicit, standard JOIN syntax. You should write the query as:
SELECT col1, col2, col3, ...
FROM table1 t1 JOIN
table2 t2
ON t2.col3 = t1.col3
WHERE t1.col1 IN (SELECT col FROM table3) AND
t1.col2 < 50;
I would write this using EXISTS, rather than IN:
SELECT col1, col2, col3, ...
FROM table1 t1 JOIN
table2 t2
ON t2.col3 = t1.col3
WHERE EXISTS (SELECT 1 FROM table3 t3 WHERE t1.col1 = t3.col) AND
t1.col2 < 50;
The filtering is all on table1; however, the columns are being compared with inequalities. I would try the following indexes: table2(col3), table1(col2, col1), and table3(col).
I have two tables:
Table Tab1 ( Col1, Col2 )
Table Tab2 ( Col1, Col2 )
I want to check in the table Tab2 Col2 is there any same value with the table Tab1 in column Col2.
How to do that?
Using EXISTS
SELECT t2.col2
FROM TABLE2 t2
WHERE EXISTS (SELECT NULL
FROM TABLE1 t1
WHERE t1.col2 = t2.col2)
Using IN
SELECT t2.col2
FROM TABLE2 t2
WHERE t2.col2 IN (SELECT t1.col2
FROM TABLE1 t1)
Using JOIN:
SELECT DISTINCT t2.col2
FROM TABLE2 t2
JOIN TABLE1 t1 ON t1.col2 = t2.col2