How to use SQL DISTINCT to remove duplicates from multiple columns? - sql

Let's say I have a table with lots of duplicated values. I want to remove the duplicates for each column individually. Using DISTINCT removes duplicate combinations of columns so other columns still contain duplicated values.
Original table is:
Col1 | Col2 | Col3
-----+------+------
a1 | b1 | c1
a1 | b2 | c1
a2 | b1 | NULL
a2 | b2 | c1
a3 | b1 | c1
a3 | NULL | NULL
My desire result is:
Col1 | Col2 | Col3
-----+------+------
a1 | b1 | c1
a2 | b2 | NULL
a3 | NULL | NULL
I can get this result by several query separately:
SELECT DISTINCT Col1
FROM TABLE
SELECT DISTINCT Col2
FROM TABLE
SELECT DISTINCT Col3
FROM TABLE
But how can I do it in a singe query and return result in one result set?
Thanks

I'd use a group by...
;WITH c1 AS (
SELECT col1
, ROW_NUMBER() OVER (ORDER BY col1) AS [r]
FROM #foo
WHERE col1 IS NOT NULL
GROUP BY col1
)
, c2 AS (
SELECT col2
, ROW_NUMBER() OVER (ORDER BY col2) as [r]
FROM #foo
WHERE col2 IS NOT NULL
GROUP BY col2
)
, c3 AS (
SELECT col3
, ROW_NUMBER() OVER (ORDER BY col3) as [r]
FROM #foo
WHERE col3 IS NOT NULL
GROUP BY col3
)
select c1.col1
, c2.col2
, c3.col3
from c1 LEFT join c2
on c1.r = c2.r
left join c3
on c1.r = c3.r
ORDER BY c1.r ASC;
I wasn't quite sure from the problem description what you wanted. I crafted this based on the ideal-output provided.
Here is the sample data set I used.
CREATE TABLE #foo (
col1 char(2)
, col2 char(2)
, col3 char(2)
);
INSERT INTO #foo (col1, col2, col3)
VALUES ('a1', 'b2', null)
, ('a1', 'b1', 'c1')
, ('a2', Null, 'c1')
, ('a2', 'b1', null)
, ('a3', null, 'c1')
GO
Here is the dataset and output from the query:
Hope this helps!

You can UNION those three queries together:
SELECT DISTINCT Col1 FROM TABLE
UNION
SELECT DISTINCT Col2 FROM TABLE
UNION
SELECT DISTINCT Col3 FROM TABLE
This requires that all three fields be of the same type (can't mix numbers and strings and dates).
This smells of bad design though. If you find yourself unioning these often then perhaps change your table to look like the UNION'd results.

Related

In bigquery how can I check if at least one elemnt from one array is in another array? [duplicate]

I have a column, like ['11999999999','12999999999','31999999999'] and anothher column, like ['5511777777777','5512888888888','5531999999999']. I want to do a CASE WHEN to return 1, if any item on the first column is in any item of the second column. How to do this?
Consider below approach
select *, if(exists (
select * from t.col1 intersect distinct
select * from t.col2
), 1, 0) as has_overlap
from your_table t
if applied to sample data like in your question - output is
See if following helps:
with sample as (
select array_agg(col1) as col1, array_agg(col2) as col2
from (
select '11999999999' as col1, '123345567' as col2
union all
select '12999999999' as col1 , '31999999999' as col2
union all
select '31999999999' as col1 , '5512888888888' as col2
)
)
select (case when array_length(array((SELECT * FROM UNNEST(sample.col1)) INTERSECT DISTINCT (( SELECT * FROM UNNEST(sample.col2))))) > 0 then true else false end) from sample
results => true (because 31999999999 from col1 is in col2 as well)
You can use JOIN to check an element exisits in both arrays.
WITH sample AS (
SELECT ['11999999999','12999999999','31999999999' ] col1,
['5511777777777','5512888888888','5531999999999', '11999999999'] col2
)
SELECT (SELECT 1 FROM UNNEST(col1) c1 JOIN UNNEST(col2) c2 ON c1 = c2)
FROM sample;
--or
SELECT (SELECT 1 FROM UNNEST(col1) c1, UNNEST(col2) c2 WHERE c1 = c2)
FROM sample;
Query results:
+-----+------+
| Row | f0_ |
+-----+------+
| 1 | 1 |
+-----+------+

How to get duplicate records in one table which are not in other table?

I have table1 and table2 as follows:
Table1:
col1 col2
-------------
a1 b1
a2 b1
a3 b2
a4 b3
a5 b3
a5 b4
a5 b2
Table2:
col2 col3
----------
b1 c1
b4 c2
To get all duplicate entries for col2 in table1, I have written following query:
SELECT x.col1,x.col2
FROM table1 x
JOIN (SELECT t.col2
FROM table1 t
GROUP BY t.col2
HAVING COUNT(t.col2) > 1) y ON y.col2 = x.col2
Now I want to remove the entries from above result which are in table2
Expected output:
col1 col2
----------
a3 b2
a4 b3
a5 b3
a5 b2
Query I wrote:
SELECT x.col1,x.col2
FROM table1 x
JOIN (SELECT t.col2
FROM table1 t
GROUP BY t.col2
HAVING COUNT(t.col2) > 1) y ON y.col2 = x.col2 where x.col2 not in (select col2 from table2)
I see the expected results using above query. Is there a more efficient of achieving the same result? and are there any cases that I could be missing?
Thanks
This script leaves b4 from Table2 because b4 in col2 is not a duplicate in col2 from Table1.
DROP TABLE IF EXISTS Table1
DROP TABLE IF EXISTS Table2
CREATE TABLE Table1
(
col1 VARCHAR(10),
col2 VARCHAR(10)
)
GO
CREATE TABLE Table2
(
col2 VARCHAR(10),
col3 VARCHAR(10)
)
GO
INSERT INTO Table1
VALUES
('a1', 'b1'),
('a2', 'b1'),
('a3', 'b2'),
('a4', 'b3'),
('a5', 'b3'),
('a5', 'b4'),
('a5', 'b2')
INSERT INTO Table2
VALUES
('b1', 'c1'),
('b4', 'c2')
SELECT T2.*
FROM Table2 T2
LEFT JOIN
(
SELECT col2
FROM Table1
GROUP BY col2
HAVING COUNT(*) > 1
) T1 ON T1.col2 = T2.col2
WHERE T1.col2 IS NULL
Depending on which DBMS you're actually using, you could use window functions and exist expressions.
SELECT
*
FROM
(
SELECT
*,
COUNT(*) OVER (PARTITION BY col2) AS occurences
FROM
table1
)
t1
WHERE
occurrences > 1
AND
NOT EXISTS (
SELECT *
FROM table2
WHERE col2 = t1.col2
)
If your DBMS supports CTE, Common Table Expressions you may try the following:
with cte as (
select col1,col2, count(*) over (partition by col2) as cn from Table1
)
select T.col1,T.col2 from cte T
left join Table2 D on T.col2=D.col2
where T.cn>1 and D.col2 is null
See a demo from here.
This works on MySQL 8.0 and above and PostgreSQL.

Row count discrepancy between Intersect and Except queries

I'm getting some strange behaviour using intersect and except. Tb1 has the least rows out of the two tables, and the difference in row count between tb1 and the intersect query results is 143 (intersect = 9782, tb1 = 9925).
But when I run the same query with except, it returns 24 lines. My understanding is that it should have returned 143 rows, being the rows that didn't match in the intersect query. Could someone help me understand why this might be?
There is a possibility that both datasets have multiple duplicate rows (being subset data). Could this be the cause of the difference?
SELECT
amount
,date
FROM tb1
INTERSECT
SELECT
amount
,date
FROM tb2
As you're probably already aware, the difference between UNION and UNION ALL is that the former returns a unique result, while the latter doesn't.
The same can be said about INTERSECT versus INTERSECT ALL.
And also about EXCEPT versus EXCEPT ALL.
So when there are dups, then the totals can be different from what you expect.
Here's a simplified demo to illustrate.
create table TableA (
col1 int not null,
col2 varchar(8)
);
create table TableB (
col1 int not null,
col2 varchar(8)
);
insert into TableA (Col1, Col2) values
(1,'A') -- only A
, (3,'AB') -- 1 in both
, (4,'AAB'), (4,'AAB') -- 2 in A, 1 in B
, (5,'ABB') -- 1 in A, 2 in B
, (6,'AABB'), (6,'AABB') -- 2 in both
, (7, NULL); -- 1 NULL in both
8 rows affected
insert into TableB (Col1, Col2) values
(2,'B') -- only B
, (3,'AB') -- 1 in both
, (4,'AAB') -- 2 in A, 1 in B
, (5,'ABB'), (5,'ABB') -- 1 in A, 2 in B
, (6,'AABB'), (6,'AABB') -- 2 in both
, (7, null); -- 1 NULL in both
8 rows affected
select Col1, Col2 from TableA
intersect
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
3 | AB
4 | AAB
5 | ABB
6 | AABB
7 | null
select Col1, Col2 from TableA
intersect all
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
3 | AB
4 | AAB
5 | ABB
6 | AABB
6 | AABB
7 | null
select Col1, Col2 from TableA
except
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
1 | A
select Col1, Col2 from TableA
except all
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
1 | A
4 | AAB
Demo on db<>fiddle here

sql selecting unique rows based on a specific column

I have an table like this :
Col1 Col2 Col3 Col4
asasa 1 d 44
asasa 2 sd 34
asasa 3 f 3
dssd 4 d 2
sdsdsd 5 sd 11
dssd 1 dd 34
xxxsdsds2 d 3
erewer 3 sd 3
I am trying to filter out something like this based on Col1
Col1 Col2 Col3 Col4
asasa 1 d 44
dssd 4 d 2
sdsdsd 5 sd 11
xxxsdsds2 d 3
erewer 3 sd 3
I am trying to get the all unique rows based on the values in Col1. If I have duplicates in Col1, the first row should be taken.
I tried SELECT Col1 FROM tblname GROUP BY Col1 and got unique Col1 but extending it using * is giving me error.
You should be able to achieve your goal using something like the following:
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col2) AS rn FROM MyTable
)
SELECT * FROM CTE WHERE rn = 1
What it does is it creates a CTE (Common Table Expression) that adds a ROW_NUMBER on Col1, ordered by the data in row2.
In the outer select, we then only grab the rows from the CTE where the row number generated is 1.
Try this
;WITH CTE(
SELECT *,
ROW_NUMBER() OVER(PARTITIAN BY Col1 ORDER BY(SELECT NULL))RN
FROM tblname
)
SELECT Col1, Col2, Col3, Col4 FROM CTE;
Depending on the flavor of SQL that you have are using, what may help you are window functions.
In SQL Server, this can be accomplished with the FIRST_VALUE window function like so:
DROP TABLE IF EXISTS #vals;
CREATE TABLE #vals (COL1 VARCHAR(10), COL2 INT, COL3 VARCHAR(5), COL4 INT);
INSERT INTO #vals (COL1, COL2, COL3, COL4)
VALUES ('asasa', 1, 'd', 44),
('asasa', 2, 'sd', 34),
('asasa', 3, 'f', 3),
('dssd' , 4, 'd', 2),
('sdsdsd', 5, 'sd', 11),
('dssd', 1, 'dd', 34),
('xxxsdsds', 2, 'd', 3),
('erewer', 3, 'sd', 3);
SELECT *
FROM #vals
SELECT DISTINCT COL1,
FIRST_VALUE(COL2) OVER (PARTITION BY COL1 ORDER BY Col1) AS Col2,
FIRST_VALUE(COL3) OVER (PARTITION BY COL1 ORDER BY Col1) AS Col3,
FIRST_VALUE(COL4) OVER (PARTITION BY COL1 ORDER BY Col1) AS Col4
FROM #vals AS v1
This returns:
|COL1 | Col2 | Col3 | Col4|
|-----------|-----------|-----------|-------|
|asasa | 1 | d | 44 |
|dssd | 4 | d | 2 |
|erewer | 3 | sd | 3 |
|sdsdsd | 5 | sd | 11 |
|xxxsdsds | 2 | d | 3 |
which may then be ORDERed in whatever way is needed.
Select DISTINCT , should do the trick. Here is a good reference https://www.w3schools.com/sql/sql_distinct.asp

How to select last three non-NULL columns across multiple columns

For example, if my dataset looks like this:
id | col1 | col2 | col3 | col4 | col5 | col6
---+------+------+------+------+------+-----
A | a1 | a2 | a3 | a4 | a5 | a6
B | b1 | b2 | b3 | b4 | NULL | NULL
C | c1 | c2 | c3 | NULL | NULL | NULL
The desired output would be:
id | col1 | col2 | col3 | col4 | col5 | col6
---+------+------+------+------+------+-----
A | a4 | a5 | a6 |
B | b2 | b3 | b4 |
C | c1 | c2 | c3 |
Does anyone know how to achieve that?
I just found this thread: https://dba.stackexchange.com/questions/210431/select-first-and-last-non-empty-blank-column-of-a-record-mysql
This allow me to pick the last non-null column, but I have no idea to to get the second and third last column in the same time as well.
This will do what you request (db <> fiddle)
Edit: The initial version probably didn't do what you want if there were less than three NOT NULL values in a row. This version will shift them left.
SELECT Id,
CA.Col1,
CA.Col2,
CA.Col3,
NULL AS Col4,
NULL AS Col5,
NULL AS Col6
FROM YourTable
CROSS APPLY (SELECT MAX(CASE WHEN RN = 1 THEN val END) AS Col1,
MAX(CASE WHEN RN = 2 THEN val END) AS Col2,
MAX(CASE WHEN RN = 3 THEN val END) AS Col3
FROM (SELECT val,
ROW_NUMBER() OVER (ORDER BY ord) AS RN
FROM
(SELECT TOP 3 *
FROM (VALUES(1, col1),
(2, col2),
(3, col3),
(4, col4),
(5, col5),
(6, col6) ) v(ord, val)
WHERE val IS NOT NULL
ORDER BY ord DESC
) d1
) d2
) CA
You can also use pivot and unpivot to achieve the desired result.
try the following:
;with cte as
(
select id, cols, col as val, ROW_NUMBER() over (partition by id order by cols desc) rn
from #t
unpivot
(
col for cols in ([col1], [col2], [col3], [col4], [col5], [col6])
)upvt
)
select id, ISNULL([3], '') as col1, ISNULL([2], '') as col2, ISNULL([1], '') as col3, '' col4, '' col5, '' col6
from
(
select id, val, rn from cte
)t
pivot
(
max(val) for rn in ([1], [2], [3])
)pvt
order by 1
Please find the db<>fiddle here.