Remove Duplicates from my SQL Server table - sql

I have a table with the below data.
Col1 Col2
A B
B A
C D
D C
E F
F E
If the (col1 and Col2) and (col2 and Col1) values are same in the multiple rows, they are considered as Duplicates.In the above example, Col1 and Col2 are same between Row 1 and Row 2, they are considered as duplicates. We need only 1 row among them.
So the output for the above example will be,
Col1 Col2
A B
C D
E F
or
Col1 Col2
B A
D C
F E
Please help me.
Thanks..

Try this:
rextester: http://rextester.com/XCYU52032
create table tb (col1 char(1), col2 char(1))
insert into tb (col1, col2) values
('a','b')
,('b','a')
,('c','d')
,('d','c')
,('e','f')
,('f','e');
with cte as (
select col1, col2, rn = row_number() over(order by col1)
from tb
)
/*
select x.*
from cte as x
where not exists (
select 1
from cte as y
where y.col2 = x.col1
and x.rn>y.rn -- returns col1 in ('a','c','e')
--and x.rn<y.rn -- returns col1 in ('b','d','f')
)
--*/
delete x
from cte as x
where not exists (
select 1
from cte as y
where y.col2 = x.col1
--and x.rn>y.rn -- returns col1 in ('a','c','e')
and x.rn<y.rn -- returns col1 in ('b','d','f')
)
select * from tb

Try
delete from myTable t1
where col1 > col2 and exists (select 1
from myTable t2
where t2.col1 = t1.col2 and t2.col2 = t1.col1);

Related

Need help finding duplicate values for Data Quality checks

I have a table which requires me to ensure that a combination of attributes should have a unique record against it.
col1 col2 col3
a b x
a b y
a c x
a d z
e b w
How do I ensure that a col1+col2 combination only has unique col3 values. Here ab has both x and y as col3 values. I have to send such rows to a reject file and I am looking for the right filter query.
We can use an aggregation approach. To identify rows which are failing the unique requirement use:
WITH cte AS (
SELECT col1, col2
FROM yourTable
GROUP BY col1, col2
HAVING MIN(col3) <> MAX(col3)
)
SELECT t1.*
FROM yourTable t1
INNER JOIN cte t2
ON t2.col1 = t1.col1 AND
t2.col2 = t1.col2;

Perform column search on next table if record not found in previous table in db2 sql query

I have three tables with same columns
table 1
col1 col2 col3 col4
table 2
col1 col2 col3 col4
table 3
col1 col2 col3 col4
I have to perform a search if record is not found on table1 then only go to search in table2 and if not found in table2 then go to table3. But if record found in any of these table then perform some calculation on col4 and return col4 without execution further. I am using DB2 but not able to find
the exact solution . How can i achieve this ?.
If you want to keep this as one query, you can use UNION ALL to get the correct table:
SELECT col4, 1 as SortCol
FROM Table1
WHERE col1 = 'whatever'
UNION ALL
SELECT col4, 2 as SortCol
FROM Table2
WHERE col1 = 'whatever'
UNION ALL
SELECT col4, 3 as SortCol
FROM Table3
WHERE col1 = 'whatever'
ORDER BY SortCol
FETCH 1 ROW ONLY;
EDIT
Another method is possible. I must say, I'm unsure coming from SQL Server the exact syntax, but it would be something like this:
SELECT COALESCE(t1.col4, t2.col4, t3.col4)
FROM (VALUES (#col1, #col2) ) v(col1, col2)
LEFT JOIN Table1 t1 ON t1.col1 = v.col1 AND t1.col2 = v.col2
LEFT JOIN Table2 t2 ON t2.col1 = v.col1 AND t2.col2 = v.col2
AND t1.col4 IS NULL
LEFT JOIN Table3 t3 ON t3.col1 = v.col1 AND t3.col2 = v.col2
AND t1.col4 IS NULL AND t2.col4 IS NULL;
The idea being to use the VALUES clause (or a SELECT with no FROM) as a driving row.
You can try this:
select col1 col2 col3 col4
from (
select col1 col2 col3 col4, 1 as lvl
from table_1
where some_condition
union all
select col1 col2 col3 col4, 2 as lvl
from table_2
where some_condition
union all
select col1 col2 col3 col4, 3 as lvl
from table_3
where some_condition) as t
order by lvl
limit 1
Similar to what the other posters suggest, you could use code like this if you explicitly want to follow your "if then" logic
CREATE TABLE TABLE_1(C1 INT, C2 INT, C3 INT, C4 INT)
CREATE TABLE TABLE_2(C1 INT, C2 INT, C3 INT, C4 INT)
CREATE TABLE TABLE_3(C1 INT, C2 INT, C3 INT, C4 INT)
WITH
C4(C1, C2, C3, C4) AS (VALUES (1,2,3,4))
, T1 AS ( SELECT '1' AS LVL, * FROM TABLE_1 JOIN C4 USING (C1, C2, C3, C4) )
, T2 AS ( SELECT '2' AS LVL, * FROM TABLE_2 JOIN C4 USING (C1, C2, C3, C4) WHERE NOT EXISTS (SELECT 1 FROM T1))
, T3 AS ( SELECT '3' AS LVL, * FROM TABLE_3 JOIN C4 USING (C1, C2, C3, C4) WHERE NOT EXISTS (SELECT 1 FROM T2))
, T4 AS ( SELECT '4' AS LVL, * FROM C4 WHERE NOT EXISTS (SELECT 1 FROM T3))
SELECT * FROM T1 UNION ALL
SELECT * FROM T2 UNION ALL
SELECT * FROM T3 UNION ALL
SELECT * FROM T4

Filter rows if value in one column exists in another column

I have following table in Postgres 11:
col1 col2 col3 col4
1 trial_1 ag-270 ag
2 trial_2 ag ag
3 trial_3 methotexate (mtx) mtx
4 trial_4 mtx mtx
5 trial_5 hep-nor-b nor-b
I would like to search each value of col4 throughout the column col3. If the value in col4 exists in col3, I would like to keep the rows else the row should be excluded.
Desired output is:
col1 col2 col3 col4
1 trial_1 ag-270 ag
2 trial_2 ag ag
3 trial_3 methotexate (mtx) mtx
4 trial_4 mtx mtx
I could not try anything on this as I am unable to find a solution to this yet.
If the value in col4 exists in col3, I would like to keep the rows.
... translates to:
SELECT *
FROM tbl a
WHERE EXISTS (SELECT FROM tbl b WHERE b.col3 = a.col4);
db<>fiddle here
Produces your desired result.
This can be done as an inner join:
select distinct t.col1, t.col2, t.col3, t,col4
from T t inner join T t2 on t2.col3 = t.col4
select a.*
from myTable a
where exists (
select 1
from myTable b
where b.col3 = a.col4)
If your table has many rows, you should ensure that col3 is indexed.

Using the names of columns stored within fields to retrieve data from a different table

I was wondering if there exists code to accomplish the following in SQL-Server 2008?
Table 1:
id column name
-------------------
1 col1
2 col2
3 col3
4 col2
Table 2:
col1 col2 col3
--------------------
a b c
Result Table:
id data
--------------------
1 a
2 b
3 c
4 b
Thanks in advance, I really have no idea how to do this.
You can use UNPIVOT table2 to access the data from the columns:
select t1.id, t2.value
from table1 t1
left join
(
select value, col
from table2
unpivot
(
value
for col in (col1, col2, col3)
) u
) t2
on t1.name = t2.col
see SQL Fiddle with Demo
Or you can use a UNION ALL to access the data in table2:
select t1.id, t2.value
from table1 t1
left join
(
select col1 value, 'col1' col
from table2
union all
select col2 value, 'col2' col
from table2
union all
select col3 value, 'col3' col
from table2
) t2
on t1.name = t2.col
see SQL Fiddle with Demo
I dont see how you do it withou a column connection them:
Table1:
ID
ColumnName
Table2:
Table1ID
Letter
Select table1.id, table2.Letter
from table1
inner join table2 on table1.ID = table2.Table1ID
You can do this with a case statement and cross join:
select t1.id,
(case when t1.columnname = 'col1' then t2.col1
when t1.columnname = 'col2' then t2.col2
when t1.columnname = 'col3' then t2.col3
end) as data
from table1 t1 cross join
table2 t2

SQL Server self join

I have a table as below:
table1
col1 col2 col3
1 A 1
2 B 1
3 A 2
4 D 2
5 X 3
6 G 3
Now can I get the result like below from above table. THe col2 in the below resultset is based on the col3 in table1 above. In above table1 col2, A and B have same id value in COL3 (i.e 1) so in the result set we just separate it in new columns and so on. A and D have same id COL3 (i.e 2) and X and G have same id in COL3 (i.e 3) in above table1. ANyone can write a sql query to get the following result.
col1 col2
A B
A D
X G
SELECT
col1 = t.col2,
col2 = t2.col2
FROM table1 t
INNER JOIN table1 t2 ON t.col3 = t2.col3 AND t.col1 < t2.col1
SELECT
t1.col2 as col1,
t2.col2
FROM Table1 t1
INNER JOIN Table1 t2 on t1.col3 = t2.col3
WHERE t1.col1 > t2.col1
If you are on SQL Server 2005 or later:
WITH ranked AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY col3 ORDER BY col2)
FROM table1
)
SELECT
col1 = r1.col2,
col2 = r2.col2
FROM ranked r1
INNER JOIN ranked r2 ON r1.col3 = r2.col3
WHERE r1.rn = 1
AND r2.rn = 2
select
a.col2 as "col1",
b.col2 as "Col2"
from
table1 a
join table1 b on a.col3 = b.col3
With some assumptions on the table structure, i.e. there exists exactly 2 entries in col3 for every unique value in col3.
DECLARE #table1 TABLE([col1] int, [col2] varchar, [col3] int);
INSERT INTO #table1(col1, col2, col3) VALUES(1, 'A', 1);
INTO #table1(col1, col2, col3) VALUES(2, 'B', 1);
INSERT INTO #table1(col1, col2, col3) VALUES(3, 'A', 2);
INSERT INTO #table1(col1, col2, col3) VALUES(4, 'D', 2);
INSERT INTO #table1(col1, col2, col3) VALUES(5, 'X', 3);
INSERT INTO #table1(col1, col2, col3) VALUES(6, 'G', 3);
SELECT
(SELECT TOP(1) t1.[col2] FROM #table1 AS t1 WHERE t1.[col3] = g.[GroupId] ORDER BY t1.[col1] ASC) AS [a],
(SELECT TOP(1) t2.[col2] FROM #table1 AS t2 WHERE t2.[col3] = g.[GroupId] ORDER BY t2.[col1] DESC) AS [b]
FROM
(SELECT DISTINCT u.col3 AS [GroupId] FROM #table1 AS u) AS g