Find duplicates in two columns in a table

Find duplicates in two columns in a table - sql

I just browse this post and tried all the scripts but still i am not getting what i am expecting.
Here is my table
Name email
BRIAN MAT BRIAN.MAT#abc.Com
BRIAN MAT BRIAN MAT#abc.Com
AMY Lee AMY.Lee#abc.Com
AMY.Lee AMY.Lee#abc.Com
Madison Taylor Madison.Tyl#abc.com
SELECT Name
FROM Employee
GROUP BY Name
HAVING ( COUNT(Name > 1 )
result
BRIAN MAT
SELECT email
FROM Employee
GROUP BY email
HAVING ( COUNT(email> 1 )
Result
AMY.Lee#abc.Com
I was trying to group this two script but it shows blank
SELECT
Name, email,COUNT(*)
FROM
Employee
GROUP BY
Name, email
HAVING
COUNT(*) > 1
Please correct me what i am missing in my script to acheive the result like below
Name email
BRIAN MAT BRIAN.MAT#abc.Com
BRIAN MAT BRIAN MAT#abc.Com
AMY Lee AMY.Lee#abc.Com
AMY.Lee AMY.Lee#abc.Com

You could use windowed COUNT:
WITH cte AS (
SELECT *,
COUNT(*) OVER(PARTITION BY name) AS c_name,
COUNT(*) OVER(PARTITION BY Email) AS c_email
FROM Employee
)
SELECT name, email
FROM cte
WHERE c_name > 1 OR c_email > 1;

SELECT *
FROM Employee
WHERE Name IN (SELECT Name
FROM Employee
GROUP BY Name
HAVING COUNT(Name > 1)
)
OR Email IN (SELECT email
FROM Employee
GROUP BY email
HAVING COUNT(email> 1)
)

The least complicated. A quick and dirty solution.
SELECT
a.name,
a.email, count(*)
FROM
employee a
INNER JOIN
employee b on b.name = a.name or b.email = a.email
GROUP BY a.name, a.email
HAVING COUNT(*) > 1

Related

SQL Server: Duplicates but based on specific criteria

I am trying to find duplicates based on forename, surname, and dateofbirth in my database. Below is what I have
Customers table:
custid cust_refno forename surname dateofbirth
1 10 David John 10-02-1980
2 20 Peter Broad 15-08-1978
3 30 Sarah Holly 16-09-1982
4 40 Mathew Mark 25-08-2001
5 50 Matt Mark 25-08-2001
Address table:
addid cust_refno addresstype line1
1 10 address No. 10, Mineview Road
2 10 address No. 20, Mineview Lane
3 20 address Rockview cottage, blackthorn
4 30 mobile 0504135864
5 40 address No. 64, New Lane
6 40 mobile 0504896532
7 50 address No. 11, John's cottage
Some customers have multiple addresses, so they are not duplicates. I am trying to find a way to avoid displaying those as duplicates. Can you advice how I can do that?
my query:
SELECT DISTINCT t.FORENAME, t.SURNAME, t.CUST_REFNO, t.DATE_OF_BIRTH , a.LINE1 FROM CUSTOMERS AS t
LEFT OUTER JOIN dbo.ADDRESS a
ON t.CUST_REFNO = a.CUST_REFNO
INNER JOIN (
SELECT FORENAME, surname, DTTM_OF_BIRTH
FROM CUSTOMERS GROUP BY FORENAME, SURNAME, DATE_OF_BIRTH
HAVING COUNT(*) > 1) AS td
ON t.FORENAME = td.FORENAME AND t.DTTM_OF_BIRTH = td.DATE_OF_BIRTH
AND t.SURNAME = td.SURNAME
WHERE a.addresstype = 'address'
my result is:
Forename surname cust_refno dateofbirth line1
David John 10 10-02-1980 No. 10, Mineview Road
David John 10 10-02-1980 No. 20, Mineview Lane
But in reality it is not a duplicate. Its just that the addresses are different. Is there a way to compare the cust_refno and see if it already exists so even if the address is different if the cust_refno is the same it does not show again?

If you want to get the customers with duplicates address, you can count how many times a customer has the same address and return just that with more than one:
SELECT t.FORENAME, t.SURNAME, t.CUST_REFNO, t.DATE_OF_BIRTH , a.LINE1
FROM CUSTOMERS AS t INNER JOIN ADDRESS a ON t.CUST_REFNO = a.CUST_REFNO
GROUP BY t.FORENAME, t.SURNAME, t.CUST_REFNO, t.DATE_OF_BIRTH , a.LINE1
HAVING COUNT(a.LINE1) > 1

You can use window functions to filter out customers with more than one address. Then aggregation can be used to return the duplicates:
select forename, surname, dateofbirth
from customers c join
(select a.*,
count(*) over (partition by cust_refno) as cnt
from addresses a
where addresstype = 'address'
) a
on c.cust_refno = a.cust_refno
where cnt = 1
group by forename, surname, dateofbirth
having count(*) > 1;
If you want the full customer record, just use window functions twice:
select c.*
from (select c.*,
count(*) over (partition by forename, surname, dateofbirth) as cnt
from customers c
) c join
(select a.*,
count(*) over (partition by cust_refno) as cnt
from addresses a
where addresstype = 'address'
) a
on c.cust_refno = a.cust_refno
where a.cnt = 1 and c.cnt > 1;

You can use the analytical function count and row_number as follows:
select * from
(SELECT t.FORENAME, t.SURNAME, t.CUST_REFNO, t.DATE_OF_BIRTH ,
a.LINE1,
row_number() over (partition by t.FORENAME, t.SURNAME, t.DATE_OF_BIRTH
order by 1) as rn,
count(1) over (partition by t.FORENAME, t.SURNAME, t.DATE_OF_BIRTH) as cnt
FROM CUSTOMERS AS t
LEFT OUTER JOIN dbo.ADDRESS a ON t.CUST_REFNO = a.CUST_REFNO
WHERE a.addresstype = 'address') t
where cnt > 1 and rn = 1

SQL Server : finding duplicates based on first few characters on column

I want to find duplicates based on the first three characters of the surname, is there a way a to do that on SQL? I can compare the whole name, but how to do we compare the first few characters?
Below are my tables
custid forename surname dateofbirth
----------------------------------------
1 David John 16-09-1985
2 David Jon 16-09-1985
3 Sarah Smith 10-08-2015
4 Peter Proca 11-06-2011
5 Peter Proka 11-06-2011
This is my query that I am currently running to compare
SELECT
y.id, y.forename, y.surname
FROM
customers y
INNER JOIN
(SELECT
forename, surname, COUNT(*) AS CountOf
FROM customers
GROUP BY forename, surname
HAVING COUNT(*) > 1) dt ON y.forename = dt.forename

You can use left():
select c.*
from (select c.*, count(*) over (partition by left(surname, 3)) as cnt
from customers c
) c
order by surname;
You can include the forename as well in the partition by if you mean forename and first three letters of surname.

You can use exists as follows:
select t.* from t
Where exists
(select 1 from t tt
Where left(t.surname, 3) = left(tt.surname, 3) and t.custid <> tt.custid
)
order by t.surname;

Name of Teacher with Highest Wage - recursive CTE

I am trying to get the max salary of each dept and display that teacher by first name as a separate column. So dept 1 may have 4 rows but one name showing for max salary. I'm Using SQL SERVER
With TeacherList AS(
Select Teachers.FirstName,Teachers.LastName,
Teachers.FacultyID,TeacherID, 1 AS LVL,PrincipalTeacherID AS ManagerID
FROM dbo.Teachers
WHERE PrincipalTeacherID IS NULL
UNION ALL
Select Teachers.FirstName,Teachers.LastName,
Teachers.FacultyID,Teachers.TeacherID, TeacherList.LVL +
1,Teachers.PrincipalTeacherID
FROM dbo.Teachers
INNER JOIN TeacherList ON Teachers.PrincipalTeacherID =
TeacherList.TeacherID
WHERE Teachers.PrincipalTeacherID IS NOT NULL)
SELECT * FROM TeacherList;
SAMPLE OUTPUT :
Teacher First Name | Teacher Last Name | Faculty| Highest Paid In Faculty
Eric Smith 1 Eric
Alex John 1 Eric
Jessica Sewel 1 Eric
Aaron Gaye 2 Aaron
Bob Turf 2 Aaron

I'm not sure from your description but this will return all teachers and the last row is the name of the teacher with the highest pay on the faculty.
select tr.FirstName,
tr.LastName,
tr.FacultyID,
th.FirstName
from Teachers tr
join (
select FacultyID, max(pay) highest_pay
from Teachers
group by FacultyID
) t on tr.FacultyID = t.FacultyID
join Teachers th on th.FacultyID = t.FacultyID and
th.pay = t.highest_pay
this will produce an unexpected result (duplicate rows) if there are more persons with the highest salary on the faculty. In such case you may use window functions as follows:
select tr.FirstName,
tr.LastName,
tr.FacultyID,
t.FirstName
from Teachers tr
join
(
select t.FirstName,
t.FacultyID
from
(
select t.*,
row_number() over (partition by FacultyID order by pay desc) rn
from Teachers t
) t
where t.rn = 1
) t on tr.FacultyID = t.FacultyID
This will display just one random teacher from faculty with highest salary.
dbfiddle demo

You can do this with a CROSS APPLY.
SELECT FirstName, LastName, FacultyID, HighestPaid
FROM Teachers t
CROSS APPLY (SELECT TOP 1 FirstName AS HighestPaid
FROM Teachers
WHERE FacultyID = t.FacultyID
ORDER BY Salary DESC) ca

Query returning records with duplicate data because of the wrong data in one of the columns

I have a record of an employee but my query is returning 2 records of this employee because the address column is different between the 2. How can solve this problem? Is it something that can be done? EMP_ID, CUS_LAST_NAME, CUS_FIRST_NAME, and GUARDIAN_ADDRESS are from 3 separate tables.
Example:
ID EMP_ID CUS_LAST_NAME CUS_FIRST_NAME GUARDIAN_ADDRESS
00000000 11111111 Jackson Michael 1111 Street Apt 1
ID EMP_ID CUS_LAST_NAME CUS_FIRST_NAME GUARDIAN_ADDRESS
00000000 11111111 Jackson Michael 1111 Street

if you can't the delete one of the two
if you don't matter which address the select return you can use an aggregation function for get one row only
select ID , EMP_ID , EMP_LAST_NAME, EMP_FIRST_NAME, min(ADDRESS)
from my_table
group by ID , EMP_ID , EMP_LAST_NAME, EMP_FIRST_NAME

If you want detect what employee have duplicates entries.
SELECT *
FROM employees
WHERE EMP_ID IN (
SELECT EMP_ID
FROM employees
GROUP BY EMP_ID
HAVING COUNT(*) > 1
)

--start with unique list of clients
SELECT DISTINCT a.ID, a.EMP_ID, e.EMP_LAST_NAME, e.EMP_FIRST_NAME, e.ADDRESS
FROM TABLE1 a
--attach on employee data on id
OUTER APPLY (SELECT TOP 1 b.EMP_LAST_NAME, b.EMP_FIRST_NAME, b.ADDRESS
FROM TABLE2 b
WHERE a.id = b.id
--use order by clause to change order and choose what top employee record u want to choose
ORDER BY b.address
) e

The quick and dirty way with max():
select id, emp_id, emp_last_name, emp_first_name, max(address) as address
from employees
group by id, emp_id, emp_last_name, emp_first_name
Alternative using: top with ties
select top 1 with ties
id, emp_id, emp_last_name, emp_first_name, address
from employees
order by row_number() over (partition by emp_id order by address desc)
rextester demo for both: http://rextester.com/EGGA75008

how to find people with the same first and last name

so i had a table with 3 columns:
id \ first_name \ last_name
and i need to find how many of people share the same full name.
i had something like this:
SELECT COUNT(*)
FROM ACTOR
WHERE FIRST_NAME IN (SELECT FIRST_NAME,LAST_NAME
FROM ACTOR
HAVING COUNT(FIRST_NAME,LAST_NAME) >1);

Use GROUP BY
SELECT FIRST_NAME, LAST_NAME, Count(*) AS CNT
FROM ACTOR
GROUP BY FIRST_NAME, LAST_NAME
HAVING COUNT(*) > 1
This returns the first- and lastname and how often they appear for all which have duplicates. If you only want to know how many that are you can use:
In SQL-Server:
SELECT TOP 1 COUNT(*) OVER () AS RecordCount -- TOP 1 because the total-count is repeated for every row
FROM ACTOR
GROUP BY FIRST_NAME, LAST_NAME
HAVING COUNT(*) > 1
all others:
Select COUNT(*) AS RecordCount
From
(
SELECT FIRST_NAME, LAST_NAME
FROM ACTOR
GROUP BY FIRST_NAME, LAST_NAME
HAVING COUNT(*) > 1
) As X

Use concatenate and group by
select id,FIRST_NAME,LAST_NAME,count(*)
from
(
select id,FIRST_NAME,LAST_NAME,FIRST_NAME||LAST_NAME as full_name
from
actor)x
group by id,FIRST_NAME,LAST_NAME
having count(*) > 1;

Try this:
SELECT COUNT(*) as Totals, NAME
FROM
(SELECT FIRST_NAME+LAST_NAME AS NAME
FROM ACTOR)A
GROUP BY NAME

There are several possibilities for fixing your approach. I think the best to learn is EXISTS:
SELECT COUNT(*)
FROM ACTOR a
WHERE EXISTS (SELECT 1
FROM ACTOR a2
WHERE a2.FIRST_NAME = a.FIRST_NAME AND a2.LAST_NAME = a.LAST_NAME AND
a2.id <> a.id
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find duplicates in two columns in a table - sql

You could use windowed COUNT: WITH cte AS ( SELECT , COUNT() OVER(PARTITION BY name) AS c_name, COUNT(*) OVER(PARTITION BY Email) AS c_email FROM Employee ) SELECT name, email FROM cte WHERE c_name > 1 OR c_email > 1;

SELECT * FROM Employee WHERE Name IN (SELECT Name FROM Employee GROUP BY Name HAVING COUNT(Name > 1) ) OR Email IN (SELECT email FROM Employee GROUP BY email HAVING COUNT(email> 1) )

The least complicated. A quick and dirty solution. SELECT a.name, a.email, count() FROM employee a INNER JOIN employee b on b.name = a.name or b.email = a.email GROUP BY a.name, a.email HAVING COUNT() > 1

Related

SQL Server: Duplicates but based on specific criteria

SQL Server : finding duplicates based on first few characters on column

Name of Teacher with Highest Wage - recursive CTE

Query returning records with duplicate data because of the wrong data in one of the columns

how to find people with the same first and last name

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find duplicates in two columns in a table - sql

You could use windowed COUNT: WITH cte AS ( SELECT *, COUNT(*) OVER(PARTITION BY name) AS c_name, COUNT(*) OVER(PARTITION BY Email) AS c_email FROM Employee ) SELECT name, email FROM cte WHERE c_name > 1 OR c_email > 1;

SELECT * FROM Employee WHERE Name IN (SELECT Name FROM Employee GROUP BY Name HAVING COUNT(Name > 1) ) OR Email IN (SELECT email FROM Employee GROUP BY email HAVING COUNT(email> 1) )

The least complicated. A quick and dirty solution. SELECT a.name, a.email, count(*) FROM employee a INNER JOIN employee b on b.name = a.name or b.email = a.email GROUP BY a.name, a.email HAVING COUNT(*) > 1

Related

SQL Server: Duplicates but based on specific criteria

SQL Server : finding duplicates based on first few characters on column

Name of Teacher with Highest Wage - recursive CTE

Query returning records with duplicate data because of the wrong data in one of the columns

how to find people with the same first and last name

Categories

Resources

You could use windowed COUNT: WITH cte AS ( SELECT , COUNT() OVER(PARTITION BY name) AS c_name, COUNT(*) OVER(PARTITION BY Email) AS c_email FROM Employee ) SELECT name, email FROM cte WHERE c_name > 1 OR c_email > 1;

The least complicated. A quick and dirty solution. SELECT a.name, a.email, count() FROM employee a INNER JOIN employee b on b.name = a.name or b.email = a.email GROUP BY a.name, a.email HAVING COUNT() > 1