How to select non-distinct rows with a distinct on multiple columns

How to select non-distinct rows with a distinct on multiple columns - sql

I have found many answers on selecting non-distinct rows where they group by a singular column, for example, e-mail. However, there seems to have been issue in our system where we are getting some duplicate data whereby everything is the same except the identity column.
SELECT DISTINCT
COLUMN1,
COLUMN2,
COLUMN3,
...
COLUMN14
FROM TABLE1
How can I get the non-distinct rows from the query above? Ideally it would include the identity column as currently that is obviously missing from the distinct query.

select COLUMN1,COLUMN2,COLUMN3
from TABLE_NAME
group by COLUMN1,COLUMN2,COLUMN3
having COUNT(*) > 1

With _cte (col1, col2, col3, id) As
(
Select cOl1, col2, col3, Count(*)
From mySchema.myTable
Group By Col1, Col2, Col3
Having Count(*) > 1
)
Select t.*
From _Cte As c
Join mySchema.myTable As t
On c.col1 = t.col1
And c.col2 = t.col2
And c.col3 = t.col3

SELECT * FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY COL 1, COL 2, .... COL N ORDER BY COL M
) RN
FROM TABLE_NAME
)T
WHERE T.RN>1

Related

SQL with having statement now want complete rows

Here is a mock table
MYTABLE ROWS
PKEY 1,2,3,4,5,6
COL1 a,b,b,c,d,d
COL2 55,44,33,88,22,33
I want to know which rows have duplicated COL1 values:
select col1, count(*)
from MYTABLE
group by col1
having count(*) > 1
This returns :
b,2
d,2
I now want all the rows that contain b and d. Normally, I would use where in stmt, but with the count column, not certain what type of statement I should use?

maybe you need
select * from MYTABLE
where col1 in
(
select col1
from MYTABLE
group by col1
having count(*) > 1
)

Use a CTE and a windowed aggregate:
WITH CTE AS(
SELECT Pkey,
Col1,
Col2,
COUNT(1) OVER (PARTITION BY Col1) AS C
FROM dbo.YourTable)
SELECT PKey,
Col1,
Col2
FROM CTE
WHERE C > 1;

Lots of ways to solve this here's another
select * from MYTABLE
join
(
select col1 ,count(*)
from MYTABLE
group by col1
having count(*) > 1
) s on s.col1 = mytable.col1;

How do I SELECT two distinct columns?

I want to be able to select two distinct from col1 and col2 ordered by id.
I'm struggling to do this because when I write the following SQL query...
SELECT DISTINCT col1, col2
FROM table
ORDER BY id
I can't ORDER BY id because it's not in the SELECT statement but if I put id in the SELECT statement it will take the DISTINCT id, col1 and col2. Which is basically the whole table as it is since the id column is unique.
How do I do this?

You can use aggregation, and put an aggregate function in the order by clause:
select col1, col2 from mytable group by col1, col2 order by min(id) limit 10

This is one way to do it:
select A.col1, A.col2
from
(select id, col1, col2
from Tablet
order by id) A
left join
(select min(id) id2, col1, col2
from Tablet
GROUP BY COL1, COL2) B
on A.COL1 = B.COL1 AND A.COL2=b.COL2
where A.id = B.id2
LIMIT 4;
Here is the DEMO

Distinct over multiple columns in SQL Server

How to apply distinct on multiple rows in SQL Server? The query that I have tried below does not work on SQL Server.
select distinct(column1, column2), column3
from table_name

select distinct applies to all columns in the row. So, you can do:
select distinct col1, col2, col3
from t;
If you only want col1 and col2 to be distinct, then group by works:
select col1, col2, min(col3)
from t
group by col1, col2;
Or if you want random rows, you can use row_number(). For instance:
select t.*
from (select t.*,
row_number() over (partition by col1, col2 order by newid()) as seqnum
from t
) t
where seqnum = 1;
A clever version of this doesn't require a subquery:
select top (1) with ties t.*
from t
order by row_number() over (partition by col1, col2 order by newid());

Pgsql Delete rows with some columns (not all) duplicate

Table - col_pk, col1, col2,col3, col4, col_date_updated
This table has some rows with duplicate column values for col2 and col3.
I want to keep those rows with col_date_updated is latest(max).
Eg:
col_pk, col1, col2, col3, col4, col_date_updated
1, A, hello, now, 200.00, 2017-12-12 15:09:44.437546
2, B, hello, now, 490.00, 2017-12-12 15:09:42.437065
3, C, hi, now, 300.00, 2017-12-12 15:09:41.436617
4, D, hello, now, 250.00, 2017-12-12 15:09:45.436617
5, E, hi, now, 250.00, 2017-12-12 10:09:41.436617
Expected Result:
col_pk, col1, col2, col3, col4, col_date_updated
3, C, hi, now, 300.00, 2017-12-12 15:09:41.436617
4, D, hello, now, 250.00, 2017-12-12 15:09:45.436617

Check this.
SELECT DISTINCT ON (col2, col3) t.*
FROM table t
ORDER BY col_date_updated DESC
apply distinct on col2 and col3 cause you want them unique and keep the latest with order by desc

If you just want to select to get your expected output, then ROW_NUMBER comes in handy:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col2, col3
ORDER BY col_date_updated DESC) rn
FROM yourTable
)
SELECT col_pk, col1, col2, col3, col4, col_date_updated
FROM cte
WHERE rn = 1;
If you instead want to delete the other records, then we can also reuse the CTE:
DELETE FROM yourTable WHERE col_pk IN (SELECT col_pk FROM cte WHERE rn > 1);

You could try something like this.
SELECT t.*
FROM yourtable t
WHERE col_date_updated IN (SELECT MAX (col_date_updated)
FROM yourtable i
WHERE t.col2 = i.col2 AND t.col3 = i.col3);
So, If you wish to delete other records, you may use this.
DELETE
FROM yourtable t
WHERE col_date_updated NOT IN (SELECT MAX (col_date_updated)
FROM yourtable i
WHERE t.col2 = i.col2 AND t.col3 = i.col3);
DEMO

If you want to suppress all but the most recent rows for any {col2,col3}:
SELECT *
FROM thetable zt
WHERE NOT EXISTS (
-- If a record exists with the same col2,col3,
-- but a more recent date than zt.col_date_updated
-- then zt.* cannot be the most recent one
SELECT *
FROM thetable nx
WHERE nx.col2 = zt.col2 -- same value
AND nx.col3 = zt.col3 -- same value
AND nx.col_date_updated > zt.col_date_updated -- more recent
);
If you want to physically delete all but the most recent rows for the same {col2,col3}:
DELETE
FROM thetable zt
WHERE EXISTS (
-- If a record exists with the same col2,col3,
-- but a more recent date than zt.t.col_date_updated
-- then zt.* cannot be the most recent one
-- and we can delete zt.
SELECT *
FROM thetable nx
WHERE nx.col2 = zt.col2 -- same value
AND nx.col3 = zt.col3 -- same value
AND nx.col_date_updated > zt.col_date_updated -- more recent
);

This is fastest way:
SELECT * FROM tablename WHERE col_pk IN
(SELECT col_pk FROM
(SELECT col_pk, ROW_NUMBER() OVER (partition BY col2, col3 ORDER BY col_date_updated) AS rnum
FROM tablename) t
WHERE t.rnum > 1);
if you want delete:
DELETE FROM tablename WHERE col_pk IN
(SELECT col_pk FROM
(SELECT col_pk, ROW_NUMBER() OVER (partition BY col2, col3 ORDER BY col_date_updated) AS rnum
FROM tablename DESC) t
WHERE t.rnum > 1);

SQL query to simulate distinct

SELECT DISTINCT col1, col2 FROM table t ORDER BY col1;
This gives me distinct combination of col1 & col2. Is there an alternative way of writing the Oracle SQL query to get the unique combination of col1 & col2 records with out using the keyword distinct?

Use the UNIQUE keyword which is a synonym for DISTINCT:
SELECT UNIQUE col1, col2 FROM table t ORDER BY col1;

I don't see why you would want to but you could do
SELECT col1, col2 FROM table_t GROUP BY col1, col2 ORDER BY col1

Another - yet overly complex and somewhat useless - solution:
select *
from (
select col1,
col2,
row_number() over (partition by col1, col2 order by col1, col2) as rn
from the_table
)
where rn = 1
order by col1

select col1, col2
from table
group by col1, col2
order by col1
or a less elegant way:
select col1,col2 from table
UNION
select col1,col2 from table
order by col1;
or a even less elegant way:
select a.col1, a.col2
from (select col1, col2 from table
UNION
select NULL, NULL) a
where a.col1 is not null
order by a.col1

Yet another ...
select
col1,
col2
from
table t1
where
not exists (select *
from table t2
where t2.col1 = t1.col1 and
t2.col2 = t1.col2 and
t2.rowid > t1.rowid)
order by
col1;

Variations on the UNION solution by #aF. :
INTERSECT
SELECT col1, col2 FROM tableX
INTERSECT
SELECT col1, col2 FROM tableX
ORDER BY col1;
MINUS
SELECT col1, col2 FROM tableX
MINUS
SELECT col1, col2 FROM tableX WHERE 0 = 1
ORDER BY col1;
MINUS (2nd version, it will return one row less than the other versions, if there is (NULL, NULL) group)
SELECT col1, col2 FROM tableX
MINUS
SELECT NULL, NULL FROM dual
ORDER BY col1;

Another ...
select col1,
col2
from (
select col1,
col2,
rowid,
min(rowid) over (partition by col1, col2) min_rowid
from table)
where rowid = min_rowid
order by col1;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to select non-distinct rows with a distinct on multiple columns - sql

select COLUMN1,COLUMN2,COLUMN3 from TABLE_NAME group by COLUMN1,COLUMN2,COLUMN3 having COUNT(*) > 1

With _cte (col1, col2, col3, id) As ( Select cOl1, col2, col3, Count() From mySchema.myTable Group By Col1, Col2, Col3 Having Count() > 1 ) Select t.* From _Cte As c Join mySchema.myTable As t On c.col1 = t.col1 And c.col2 = t.col2 And c.col3 = t.col3

SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY COL 1, COL 2, .... COL N ORDER BY COL M ) RN FROM TABLE_NAME )T WHERE T.RN>1

Related

SQL with having statement now want complete rows

How do I SELECT two distinct columns?

Distinct over multiple columns in SQL Server

Pgsql Delete rows with some columns (not all) duplicate

SQL query to simulate distinct

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to select non-distinct rows with a distinct on multiple columns - sql

select COLUMN1,COLUMN2,COLUMN3 from TABLE_NAME group by COLUMN1,COLUMN2,COLUMN3 having COUNT(*) > 1

With _cte (col1, col2, col3, id) As ( Select cOl1, col2, col3, Count(*) From mySchema.myTable Group By Col1, Col2, Col3 Having Count(*) > 1 ) Select t.* From _Cte As c Join mySchema.myTable As t On c.col1 = t.col1 And c.col2 = t.col2 And c.col3 = t.col3

SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY COL 1, COL 2, .... COL N ORDER BY COL M ) RN FROM TABLE_NAME )T WHERE T.RN>1

Related

SQL with having statement now want complete rows

How do I SELECT two distinct columns?

Distinct over multiple columns in SQL Server

Pgsql Delete rows with some columns (not all) duplicate

SQL query to simulate distinct

Categories

Resources

With _cte (col1, col2, col3, id) As ( Select cOl1, col2, col3, Count() From mySchema.myTable Group By Col1, Col2, Col3 Having Count() > 1 ) Select t.* From _Cte As c Join mySchema.myTable As t On c.col1 = t.col1 And c.col2 = t.col2 And c.col3 = t.col3