Identify duplicates rows based on multiple columns

Identify duplicates rows based on multiple columns - sql

#SQL Experts,
I am trying to fetch duplicate records from SQL table where 1st Column and 2nd Column values are same but 3rd column values should be different.
Below is my table
ID NAME DEPT
--------------------
1 VRK CSE
1 VRK ECE
2 AME MEC
3 BMS CVL
From the above table , i am trying to fetch first 2 rows, below is the Query, suggest me why isn't give correct results.
SELECT A.ID, A.NAME, A.DEPT
FROM TBL A
INNER JOIN TBL B ON A.ID = B.ID
AND A.NAME = B.NAME
AND A.DEPT <> B.DEPT
Somehow I am not getting the expected results.

Your sample data does not make it completely clear what you want here. Assuming you want to target groups of records having duplicate first/second columns with all third column values being unique, then we may try:
SELECT ID, NAME, DEPT
FROM
(
SELECT ID, NAME, DEPT,
COUNT(*) OVER (PARTITION BY ID, NAME) cnt,
MIN(DEPT) OVER (PARTITION BY ID, NAME) min_dept,
MAX(DEPT) OVER (PARTITION BY ID, NAME) max_dept
FROM yourTable
) t
WHERE cnt > 1 AND min_dept = max_dept;

UPDATE
select *
from
(
select *,
COUNT(*) over (partition by id, [name]) cnt1,
COUNT(*) over (partition by id, [name], dept) cnt2
from dbo.T
) x
where x.cnt1 > 1 and x.cnt2 < x.cnt1;

For find duplicate column
select x.id, x.name, count(*)
from
(select distinct a.id, a.name, a.dept
from tab a) x
group by x.id, x.name
having count(*) > 1

If you want the original rows, I would just go for exists:
select t.*
from tbl t
where exists (select 1
from tbl t
where t2.id = t.id and t2.name = t.name and
t2.dept <> t.dept
);
If you just want the id/name pairs:
select t.id, t.name
from tbl t
group by t.id, t.name
having min(t.dept) <> max(t.dept);

Related

SQL Return only duplicate records

I want to return rows that have duplicate values in both Full Name and Address columns in SQL. So in the example, I would just want the first two rows return. How do I code this?

Why return duplicate values? Just aggregate and return the count:
select fullname, address, count(*) as cnt
from t
group by fullname, address
having count(*) >= 2;

One option uses window functions:
select *
from (
select t.*, count(*) over(partition by fullname, address) cnt
from mytable t
) t
where cnt > 1
If your table has a primary key, say id, you can also use exists:
select t.*
from mytable t
where exists (
select 1
from mytable t1
where t1.fullname = t.fullname and t1.address = t.address and t1.id <> t.id
)

SQL(Need to print all the duplicate value IDs)

Empid----Name
1 aa
2 bb
3 cc
4 aa
5 bb
I need to get output to print EmpId number for which names are repeated
output Required:
1,2,4,5.

If you are using sql server,use the below script.
;WITH CTE_1 AS
(
SELECT *,COUNT(1)OVER(PARTITION BY Name ORDER BY Name) CNT
FROM [YourTable]
)
SELECT ID
FROM [CTE_1]
WHERE CNT > 1

Try this
select empid from table
where name in (select name from table group by name having count(*)>1)

SELECT *
FROM table AS parent
WHERE EXISTS(
SELECT *
FROM table AS sub
WHERE sub.Name == parent.Name && parent.Empid <> sub.Empid
)

Try this.
select distinct t.Empid from
#Your_Table t inner join
(
select Name, COUNT (Name) as count
from #Your_Table
group by Name
having COUNT (Name) > 1
)a on a.Name=t.Name
order by t.Empid

SELECT * FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) RowNo,*
From Your_Table
) a
WHERE RowNo > 1

SQL query: how to distinct count of a column group by another column

In my table I need to know if each ID has one and only one ID_name. How can I write such query?
I tried:
select ID, count(distinct ID_name) as count_name
from table
group by ID
having count_name > 1
But it takes forever to run.
Any thoughts?

select ID
from YourTable
group by
ID
having count(distinct ID_name) > 1
or
select *
from YourTable yt1
where exists
(
select *
from YourTable yt2
where yt1.ID = yt2.ID
and yt1.ID_Name <> yt2.ID_Name
)
Now, most ID columns are defined as primary key and are unique. So in a regular database you'd expect both queries to return an empty set.

select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_Number() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
group by tt.ID
This gives you every ID with it's total number of ID_Name
If you want only those ID's which have more than one name associated just add a where clause
e.g.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_NUMBER() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
**where tt.myRank > 1**
group by tt.ID

Show all rows that have certain columns duplicated

suppose I have following sql table
objid firstname lastname active
1 test test 0
2 test test 1
3 test1 test1 1
4 test2 test2 0
5 test2 test2 0
6 test3 test3 1
Now, the result I am interested in is as follows:
objid firstname lastname active
1 test test 0
2 test test 1
4 test2 test2 0
5 test2 test2 0
How can I achieve this?
I have tried the following query,
select firstname,lastname from table
group by firstname,lastname
having count(*) > 1
But this query gives results like
firstname lastname
test test
test2 test2

You've found your duplicated records but you're interested in getting all the information attached to them. You need to join your duplicates to your main table to get that information.
select *
from my_table a
join ( select firstname, lastname
from my_table
group by firstname, lastname
having count(*) > 1 ) b
on a.firstname = b.firstname
and a.lastname = b.lastname
This is the same as an inner join and means that for every record in your sub-query, that found the duplicate records you find everything from your main table that has the same firstseen and lastseen combination.
You can also do this with in, though you should test the difference:
select *
from my_table a
where ( firstname, lastname ) in
( select firstname, lastname
from my_table
group by firstname, lastname
having count(*) > 1 )
Further Reading:
A visual representation of joins from Coding Horror
Join explanation from Wikipedia

SELECT DISTINCT t1.*
FROM myTable AS t1
INNER JOIN myTable AS t2
ON t1.firstname = t2.firstname
AND t1.lastname = t2.lastname
AND t1.objid <> t2.objid
This will output every row which has a duplicate, basing on firstname and lastname.

Here's a little more legible way to do Ben's first answer:
WITH duplicates AS (
select firstname, lastname
from my_table
group by firstname, lastname
having count(*) > 1
)
SELECT a.*
FROM my_table a
JOIN duplicates b ON (a.firstname = b.firstname and a.lastname = b.lastname)

SELECT user_name,email_ID
FROM User_Master WHERE
email_ID
in (SELECT email_ID
FROM User_Master GROUP BY
email_ID HAVING COUNT(*)>1)

nice option get all duplicated value from tables
select * from Employee where Name in (select Name from Employee group by Name having COUNT(*)>1)

This is the easiest way:
SELECT * FROM yourtable a WHERE EXISTS (SELECT * FROM yourtable b WHERE a.firstname = b.firstname AND a.secondname = b.secondname AND a.objid <> b.objid)

If you want to print all duplicate IDs from the table:
select * from table where id in (select id from table group By id having count(id)>1)

I'm surprised that there is no answer using Window function. I just came across this use case and this helped me.
select t.objid, t.firstname, t.lastname, t.active
from
(
select t.*, count(*) over (partition by firstname, lastname) as cnt
from my_table t
) t
where t.cnt > 1;
Fiddle - https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=c0cc3b679df63c4d7d632cbb83a9ef13
The format goes like
select
tbl.relevantColumns
from
(
select t.*, count(*) over (partition by key_columns) as cnt
from desiredTable t
) as tbl
where tbl.cnt > 1;
This format selects whatever columns you require from the table (sometimes all columns) where the count > 1 for the key_columns being used to identify the duplicate rows. key_columns can be any number of columns.

This answer may not be great one, but I think it is simple to understand.
SELECT * FROM table1 WHERE (firstname, lastname) IN ( SELECT firstname, lastname FROM table1 GROUP BY firstname, lastname having count() > 1);

This Query returns dupliacates
SELECT * FROM (
SELECT a.*
FROM table a
WHERE (`firstname`,`lastname`) IN (
SELECT `firstname`,`lastname` FROM table
GROUP BY `firstname`,`lastname` HAVING COUNT(*)>1
)
)z WHERE z.`objid` NOT IN (
SELECT MIN(`objid`) FROM table
GROUP BY `firstname`,`lastname` HAVING COUNT(*)>1
)

Please try
WITH cteTemp AS (
SELECT EmployeeID, JoinDT,
row_number() OVER(PARTITION BY EmployeeID, JoinDT ORDER BY EmployeeID) AS [RowFound]
FROM dbo.Employee
)
SELECT * FROM cteTemp WHERE [RowFound] > 1 ORDER BY JoinDT

Find duplicates, display each result in sql

I want to write something like this :
select t.id, t.name, from table t
group by t.name having count(t.name) > 1
To produce the following :
id name count
904834 jim 2
904835 jim 2
90145 Fred 3
90132 Fred 3
90133 Fred 3

For SQL Server 2005+, you can do the following:
SELECT *
FROM (SELECT id, Name, COUNT(*) OVER(PARTITION BY Name) [Count]
FROM table) t
WHERE [Count]>1

If you remove the ID column then you can get all the names that have multiple entries
select t.name
from table t
group by t.name
having count(t.name) > 1
For each name, if you want the minimum or maximum id you can do this
select t.id, t.name, min (t.id) as min_id, max (t.id) as max_id
from table t
group by t.name
having count(t.name) > 1
For each name, if you want all the ids that are duplicates, then you have to use a subquery
select t.id, t.name
from table t
where name in
(
select t1.name
from table t1
group by t1.name
having count(t1.name) > 1
)

Just join the table to a subquery pulling the count for each name
SELECT t.ID, t.Name, d.Count
FROM #MyTable t
JOIN
(
SELECT name, COUNT(*) as Count
FROM #MyTable
GROUP BY Name
HAVING COUNT(*) > 1
) D
ON t.Name = d.Name

Assuming mysql (when I wrote the answer, I do not think the person specified the dbms)
SELECT t.id, t.name, (SELECT COUNT(t2.name) FROM test t2 ) AS t_count
FROM test t
HAVING t_count > 1;

Similar to previous answers with less code. Tested on SQL Server 2008:
SELECT t.id, t.name,COUNT(*)
FROM table t
GROUP BY t.id, t.name
HAVING COUNT(t.id) > 1

Please Check it once .... in SQL Server 2008
SELECT t.id,
t.NAME,
Count(t.id) AS duplicate id,
count(t.NAME) AS duplicate names
FROM t
GROUP BY t.id,
t.NAME
HAVING count(t.NAME) > 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Identify duplicates rows based on multiple columns - sql

UPDATE select * from ( select , COUNT() over (partition by id, [name]) cnt1, COUNT(*) over (partition by id, [name], dept) cnt2 from dbo.T ) x where x.cnt1 > 1 and x.cnt2 < x.cnt1;

For find duplicate column select x.id, x.name, count() from (select distinct a.id, a.name, a.dept from tab a) x group by x.id, x.name having count() > 1

Related

SQL Return only duplicate records

SQL(Need to print all the duplicate value IDs)

SQL query: how to distinct count of a column group by another column

Show all rows that have certain columns duplicated

Find duplicates, display each result in sql

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Identify duplicates rows based on multiple columns - sql

UPDATE select * from ( select *, COUNT(*) over (partition by id, [name]) cnt1, COUNT(*) over (partition by id, [name], dept) cnt2 from dbo.T ) x where x.cnt1 > 1 and x.cnt2 < x.cnt1;

For find duplicate column select x.id, x.name, count(*) from (select distinct a.id, a.name, a.dept from tab a) x group by x.id, x.name having count(*) > 1

Related

SQL Return only duplicate records

SQL(Need to print all the duplicate value IDs)

SQL query: how to distinct count of a column group by another column

Show all rows that have certain columns duplicated

Find duplicates, display each result in sql

Categories

Resources

UPDATE select * from ( select , COUNT() over (partition by id, [name]) cnt1, COUNT(*) over (partition by id, [name], dept) cnt2 from dbo.T ) x where x.cnt1 > 1 and x.cnt2 < x.cnt1;

For find duplicate column select x.id, x.name, count() from (select distinct a.id, a.name, a.dept from tab a) x group by x.id, x.name having count() > 1