Find duplicate records based on two columns - sql

I have a table named employee, there are many records in this table. Here is some sample data :
fullname | address | city
AA address1 City1
AA address3 City1
AA address8 City2
BB address5 City2
BB address2 City1
CC address6 City1
CC address7 City2
DD address4 City1
I want to have a SELECT query in sql server which will show only the duplicate records based on the columns fullname and city. For the given data and considering the condition, only the first two records is duplicate. So my expected output should be like below :
fullname | address | city
AA address1 City1
AA address3 City1
To get this output, I have this query :
select fullname, city from employee group by fullname, city having count(*)>1
As you can see, it selects two columns only and thus it is giving the following output :
fullname | city
AA City1
If I re-write the query like below :
select fullname, city, address from employee group by fullname, city, address
having count(*)>1
Unfortunately it is showing no records! Can anybody please help me to write the correct query to get the expected result?

Instead of a grouped COUNT you can use it as a windowed aggregate to access the other columns
SELECT fullname,
COUNT(*) OVER (PARTITION BY fullname, city) AS cnt
FROM employee) e
WHERE cnt > 1

Agree with above answer.
But If you don't want to use Windows functions which might not work properly on all DBs you can join to itself on city and full name after the group by and having and then get the addresses
Select employee.* from employee
join (select fullname, city from employee group by fullname, city having count(*)>1) q1
on q1.fullname = employee.fullname and =

Try the Following Code:
create table ##Employee
(Fullname varchar(25),
Address varchar(25),
City varchar(25))
insert into ##Employee values
( 'AA', 'address1', 'City1')
,( 'AA', 'address3', 'City1')
,( 'AA', 'address8', 'City2')
,( 'BB', 'address5', 'City2')
,( 'BB', 'address2', 'City1')
,( 'CC', 'address6', 'City1')
,( 'CC', 'address7', 'City2')
select E.* from ##Employee E
cross apply(
select Fullname,City,count(Fullname) cnt from ##Employee
group by Fullname,City
having Count(Fullname)>1)x
where E.Fullname=x.Fullname
and E.City=x.City

If you have a unique id or the address is always different, you can try:
select e.*
from employee e
where exists (select 1
from employee e2
where e2.fullname = e.fullname and = and
e2.address <> e.address -- or id or some other unique column
Although I would probably go with the window function approach, you might find that under some circumstances, this is faster (especially if you have an index on employee(fullname, city, address)).

Here you go with the solution:
Fullname VARCHAR(25),
[Address] VARCHAR(25),
City VARCHAR(25)
('AA', 'address1', 'City1')
,('AA', 'address1', 'City1')
,('AA', 'address3', 'City1')
,('AA', 'address8', 'City2')
,('BB', 'address5', 'City2')
,('BB', 'address2', 'City1')
,('CC', 'address6', 'City1')
,('CC', 'address7', 'City2')
;WITH cte AS (
ROW_NUMBER() OVER(PARTITION BY FullName, [Address], [City] ORDER BY Fullname) AS sl,
HashBytes('MD5', FullName + [Address] + [City]) AS RecordId
FROM #Employee AS e
SELECT c.FullName,
FROM cte AS c
ON c.RecordId = c1.RecordId
Result :
FullName Address City
AA address1 City1
AA address1 City1

SELECT Feild1, Feild2, COUNT() FROM table name GROUP BY Feild1, Feild2 HAVING COUNT()>1
This will give you all yours answer


How to query rows matching for a particular string?

This is how table looks like and here Empid and idnumber were unique for each employee and also an employee can have multiple badges with different badge number.
Now,I want to filter employees whose badge start with 6542 and 3214, ie employees carrying both badge starting from 6542 and 3214
Thank u
UPdate 1
There are some records which is having only a single badge starting from 6542 or 3214 . but I want only employees who is carrying both badges .
Do a GROUP BY, use HAVING to ensure both badges:
select empid, name
from Table
where badge like '6542-%' or badge like '3214-%'
group by empid, name
having count(distinct badge) > 1
select empid, name from Table where badge like '6542-%'
select empid, name from Table where badge like '3214-%'
Just use like on badge field
Select empid, name
from TableName
where badge like '6542%' or badge like '3214%'
group by empid, name
having count(*)>1
this will work:
select empid from table_name where regexp_like(Badge,'^(6542)(.*)$')
select empid from table_name where regexp_like(Badge,'^(3214)(.*)$');
sql server equivalent:
select empid from table_name where PATINDEX ('^(6542)(.*)$',Badge) !=0
select empid from table_name where PATINDEX ('^(6542)(.*)$',Badge) !=0
We can use COUNT DISTINCT with CASE in HAVING as below
(1148, '6542-74488', 66448, 'Adam Jhon'),
(1148, '642-8562', 66448, 'Adam Jhon'),
(1148, '3214-52874', 66448, 'Adam Jhon'),
(1149, '3214-45220', 209541, 'Tom Koyaski'),
(1150, '3214-23134', 63339, 'Shirin Abdulla'),
(1151, '3214-42355', 65498, 'Linda Jhon'),
(1151, '6542-2546', 65498, 'Linda Jhon'),
(1152, '3214-47632', 208673, 'Gayeth'),
(1153, '6542-73085', 83209, 'Maria Smith'),
(1153, '3214-58073', 65498, 'Maria Smith'),
(1154, '3214-26735', 208673, 'Ayan Jacob'),
(1155, '642-26739', 53959, 'Wo Li')
SELECT empid, Ename
FROM #test
WHERE badge LIKE '6542%' OR badge LIKE '3214%'
GROUP BY empid, Ename
WHEN badge LIKE '3214%' THEN 2 END))>1
empid Ename
1148 Adam Jhon
1151 Linda Jhon
1153 Maria Smith
just simply do like this
Select a.empid,
from TableName as a
inner join TableName as b on a.Empid = b.Empid and a.idnumber = b.idnumber and b.badge like '3214%'
where a.badge like '6542%'

How to split a row in multiple rows SQL Server?

I want to convert a row in sql table to multiple rows in other table.
example: say if i'm having a table 'UserDetail' and he has 2 address's (home, office...) then the table looks like...
I wand the result to be in the second table as shown in the image
We can use Cross Apply
;WITH CTE(UseriD,Address1Line,Address1City,Address1State,Address2Line,Address2City,Address2State )
SELECT 1,'Line1','City1','State1','Line2','City2','State2'
SELECT UseriD,[Address],City,[State]
CROSS APPLY ( VALUES (Address1Line,Address1City,Address1State ),
(Address2Line,Address2City,Address2State )
)AS Dt([Address],City,[State])
UseriD Address City State
1 Line1 City1 State1
1 Line2 City2 State2
You could use "union all" to do that like:
select * into newTable
select UserId, Address1Line as Address, Address1City as City, Address1State as State
from myTable
union all
select UserId, Address2Line as Address, Address2City as City, Address2State as State
from myTable
) tmp
If you use just UNION instead of UNION ALL you would also be removing the duplicates where Address1 and Address2 is same.
You can CROSS JOIN or CROSS APPLY the table to a list of numbers.
Then use IIF or CASE to get the correspondent numbered fields.
Or CROSS APPLY on the address values directly on the table.
Example snippet:
declare #UserTable table (UserId int, Address1Line varchar(30), Address1City varchar(30), Address1State varchar(30), Address2Line varchar(30), Address2City varchar(30), Address2State varchar(30));
insert into #UserTable (UserId, Address1Line, Address1City, Address1State, Address2Line, Address2City, Address2State) values
(1,'Wonder Lane 42','WonderTown', 'WonderState', 'Somewhere 1 B', 'Nowhere', 'Anywhere'),
(2,'Backstreet 69','Los Libros', 'Everland', 'Immortal Cave 777', 'Ghost City', 'The Wild Lands');
-- Cross Join on numbers
select UserId,
case n when 1 then Address1Line when 2 then Address2Line end as [Address],
case n when 1 then Address1City when 2 then Address2City end as [City],
case n when 1 then Address1State when 2 then Address2State end as [State]
from #UserTable u
cross join (values (1),(2)) as nums(n);
-- Cross Apply on Adress values
select UserId, [Address], [City], [State]
from #UserTable Usr
cross apply (values
(1, Address1Line, Address1City, Address1State),
(2, Address2Line, Address2City, Address2State)
) AS Addr(n, [Address], [City], [State]);
Both return:
UserId Address City State
------ ----------------- ---------- --------------
1 Wonder Lane 42 WonderTown WonderState
1 Somewhere 1 B Nowhere Anywhere
2 Backstreet 69 Los Libros Everland
2 Immortal Cave 777 Ghost City The Wild Lands

How to Select data from SQL Server 2000 using distinct clause on ONLY 1 column

I have a query where I'm trying to select some customer information: name, address, city, state, and zip. I'd like to pull all information and only pull of the records if there is a dupe.
Example of data:
Invoice_Date First Last Addr City State Zip
11/11/14 Jim Jones 12 Cedar alkdjf TN 29430
11/11/15 Ralph Jones 12 Cedar alkdjf TN 29430
11/11/14 Robert Smith 15 block slkjdd TX 10932
What I want to return:
Invoice_Date First Last Addr City State Zip
11/11/15 Ralph Jones 12 Cedar alkdjf TN 29430 (newest Record)
11/11/14 Robert Smith 15 block slkjdd TX 10932
This is my query that is able to pull ALL customers for the specified dates:
Invoice_Tb.Invoice_Date, Invoice_Tb.Customer_First_Name,
Invoice_Tb.Customer_Address, Invoice_Tb.City,
Invoice_Tb.Customer_State, Invoice_Tb.ZIP_Code
Invoice_Detail_Tb ON Invoice_Tb.Store_Number = Invoice_Detail_Tb.Store_Number
AND Invoice_Tb.Invoice_Number = Invoice_Detail_Tb.Invoice_Number
AND Invoice_Tb.Invoice_Date = Invoice_Detail_Tb.Invoice_Date
(Invoice_Tb.Invoice_Date IN ('11/11/14', '11/11/15'))
AND (Invoice_Detail_Tb.Invoice_Detail_Code = 'FSV')
AND (LEN(Invoice_Tb.Customer_Address) > 4)
Now, obviously I can't use Row_Number here, because it's not an option in SQL Server 2000, so that's out.
I've tried Select Distinct - but I'm in need of the other information (first name, last name, etc), and when using select Distinct, it also returns distinct records for First and Last name. I only want 1 record, per address.
How can I return 1 row, for each distinct Address while including first name last name of the MOST recent visit, in this case - 11/11/15.
create table #SomeTest (Invoice_Number Int, InvoiceDt DateTime, FName Varchar(24), LName Varchar(24), Addr Varchar(24), City Varchar(24), St Varchar(2), Zip Varchar(12) )
insert into #SomeTest (Invoice_Number, InvoiceDt, FName, LName, Addr, City, St, Zip) values (1, '11/11/14', 'Jim','Jones', '12 Cedar', 'alkdjf', 'TN', '29430')
insert into #SomeTest (Invoice_Number, InvoiceDt, FName, LName, Addr, City, St, Zip) values (2, '11/11/15', 'Ralph','Jones', '12 Cedar', 'alkdjf', 'TN', '29430')
insert into #SomeTest (Invoice_Number, InvoiceDt, FName, LName, Addr, City, St, Zip) values (3, '11/11/14', 'Robert','Smith', '15 block', 'slkjdd', 'TX', '10932')
select * from #SomeTest
where Invoice_Number in
select Invoice_Number from
(select Invoice_Number = max(Invoice_Number), SupperAddy = Addr + '#' + City + '#' + St + '#' + Zip from #SomeTest
group by Addr + '#' + City + '#' + St + '#' + Zip) X
If you don't want to use any analytic function (like row_number) then you could do something like this:
--This gives you the most recent date for an address
Select max(invoice_Date) as Invoice_date
, Invoice_Tb.City
, Invoice_Tb.Customer_State
, Invoice_Tb.ZIP_Code
into #tmp1
from Invoice_Tb
group by Invoice_Tb.Customer_Address
, Invoice_Tb.City
, Invoice_Tb.Customer_State
, Invoice_Tb.ZIP_Code
--link back to the name for the most recent address
Select a.Invoice_date
,b.Customer_First_Name as [First]
,b.Customer_Last_Name as [Last]
, a.Invoice_Tb.Customer_Address as [Addr]
, a.Invoice_Tb.City
, a.Invoice_Tb.Customer_State as [State]
, a.Invoice_Tb.ZIP_Code as [Zip]
from #tmp1 a
left join Invoice_T b on
a.Invoicedate = b.Invoicedate
a.Customer_Address = b.Customer_Address
a.City = b.City
a.Customer_State = b.Customer_State
a.ZIP_Code = b.ZIP_Code
Here's the "standard" way I always did this in my MsSql 2000 days. In your case, a subquery in the WHERE clause would also work, but I will use the INNER JOIN version of this solution. I employed a little bit of pseudo-code to cut down on my typing. You should be able to figure it out:
t1.Invoice_Date, t1.Customer_First_Name,
t1.Customer_Address, t1.City,
t1.Customer_State, t1.ZIP_Code
Invoice_Tb t1
INNER JOIN Invoice_Tb t2
ON t1.Invoice_Number = (
SELECT TOP 1 t2.Invoice_Number
WHERE t2.Customer_Address=t1.Customer_Address
AND {t2.City,State,Zip=t1.City,State,Zip} --psuedocode
ORDER BY t2.Invoice_Date DESC
Invoice_Detail_Tb ON ...
Also note that if any of the "address" fields can be NULL, you will have to handle that possibility when comparing them in the subquery.
you can do something like this query:
MAX(invoice_date) AS InvoiceDAte,
MAX(first) AS First,
MAX(addr) AS addr,
MAX(zip) AS zip
FROM invoice i
MAX(invoice_date) AS MaxInvoiceDate
FROM invoice
GROUP BY last) a
ON a.last = i.last
AND a.MaxInvoiceDate = i.Invoice_date
GROUP BY i.last
Here max(invoice_date) can be multiple in that case we require to take top 1 on that

SQL Server : query to update record with latest entry

I have a table that maintains records of employers and employees' data. Something like this
EmployerName EmployerPhone EmployerAddress EmployeeName EmployeePhone EmployeeAddress Date
John 12345 NewYork Harry 59786 NewYork 12-1-1991
Mac 22345 Bankok John 12345 Delhi 12-3-1991
Smith 54732 Arab Amar 59226 China 21-6-1991
Sarah 12345 Bhutan Mac 22345 NewYork 5-9-1991
Root 85674 NewYork Smith 54732 Japan 2-11-1991
I have another table that will have generic records on the basis of phone number (both employers and employees).
Table structure is as following
Phone Name Address
I want to put latest records according to date from Table1 to Table2 on the basis of phone..
Like this
Phone Name Address
59786 Harry NewYork
22345 Mac NewYork
59226 Amar China
12345 Sarah Bhutan
22345 Mac NewYork
85674 Root NewYork
54732 Smith Arab
I've written many queries but couldn't find anyone resulted as required.
Any kind of help will be appreciated.
For initialize the table without phone duplicates:
INSERT IGNORE INTO Table2 (Phone, Name, Address)
SELECT EmployeeName,EmployeePhone,EmployeeAddress FROM Table1
SELECT EmployerName,EmployerPhone,EmployerAddress FROM Table1
) X
I think this is what you are looking for if I understand your question correctly. Should work for a once-off
Name varchar(100),
Phone varchar(20),
Addr varchar(100),
[Date] date,
RecType varchar(100)
SELECT EmployerName, EmployerPhone, NULL, MAX([Date]), 'Employer'
FROM #tbl
GROUP BY EmployerName, EmployerPhone
SELECT EmployeeName, EmployeePhone, NULL, MAX([Date]), 'Employee'
FROM #tbl
GROUP BY EmployeeName, EmployeePhone;
WITH LatestData (Name, Phone, [Date])
SELECT Name, Phone, MAX([Date])
FROM #restbl
GROUP BY Name, Phone
INSERT INTO FinalTable (Name, Phone, [Address])
SELECT DISTINCT ld.Name, ld.Phone, ISNULL(tEmployer.EmployerAddress, tEmployee.EmployeeAddress) AS [Address]
FROM LatestData ld
LEFT JOIN #tbl tEmployer ON ld.Name = tEmployer.EmployerName AND ld.Phone = tEmployer.EmployerPhone AND ld.Date = tEmployer.Date
LEFT JOIN #tbl tEmployee ON ld.Name = tEmployee.EmployeeName AND ld.Phone = tEmployee.EmployeePhone AND ld.Date = tEmployee.Date

Please help me design a sql query for this problem

For a particular name i want to fetch other names who have lived in three or more cities lived by this person.
this is what you should do:
redesign your database to have a city table:
city(id int, name varchar)
and a user table:
user(id int, name varchar, ...)
and a user_city table:
user_city(user_id int, city_id int)
that alone will eliminate the limit of 10 cities per user.
to find the cities lived in by a user:
select city_id form user_city where user_id = ?
now how you would find users that lives in 3 or more cities from that list?
one way to do it would be to count the number of cities from the list each user lived in, something like:
select user_id,count(*) n
from user_city
where city_id in (select city_id
from user_city
where user_id = ?)
group by user_id having n >= 3;
I didn't really test this, but it should work.
you will also have to figure out how to index those tables.
You'd need binomial(10,3)^2 OR conditions to do your query. Thats 14 400. You do not want to do that.
You need to redesign your table instaed of
name , city1 , city2 , city3 ,city4 , city5 ,city6 , city7 , city8 , city9 city10
it should be more like
Person, City, rank
name , city1 ,1
name , city2 ,2
name , city3 ,3
name , city4 ,4
name , city5 ,5
name , city6 ,6
name , city7 ,7
name , city8 ,8
name , city9 ,9
name , city10,10
and take TomTom's advice and learn about data normalization!
Respecting your request to not redesign the database
My untried idea, no way to test it right now
Make a view (name, city) by unioning select name, c1, select name, c2 etc...
select from myview m1
inner join myview m2 on =
where = #Name AND m2.Name!=#Name
group by
having count( > 2
You send the table back to whoever designed it with a comment to learn hwo to design tables. First normal form, normalization.
Once the table follows SQL rules, the query is pretty easy.
Try something like this:
SELECT PersonName,COUNT(*) AS CountOf
FROM (SELECT PersonName,city1 FROM PersonCities WHERE city1 IS NOT NULL
UNION SELECT PersonName,city2 FROM PersonCities WHERE city2 IS NOT NULL
UNION SELECT PersonName,city3 FROM PersonCities WHERE city3 IS NOT NULL
UNION SELECT PersonName,city4 FROM PersonCities WHERE city4 IS NOT NULL
UNION SELECT PersonName,city5 FROM PersonCities WHERE city5 IS NOT NULL
) dt
WHERE dt.city1 IN (SELECT city1 FROM PersonCities WHERE PersonName=..SearchPerson.. AND city1 IS NOT NULL
UNION SELECT city2 FROM PersonCities WHERE PersonName=..SearchPerson.. AND city2 IS NOT NULL
UNION SELECT city3 FROM PersonCities WHERE PersonName=..SearchPerson.. AND city3 IS NOT NULL
UNION SELECT city4 FROM PersonCities WHERE PersonName=..SearchPerson.. AND city4 IS NOT NULL
UNION SELECT city5 FROM PersonCities WHERE PersonName=..SearchPerson.. AND city5 IS NOT NULL
AND PersonName!=#SearchPerson
GROUP BY PersonName
I don't have mysql, so here it is running using SQL Server:
DECLARE #PersonCities table(PersonName varchar(10), city1 varchar(10), city2 varchar(10), city3 varchar(10), city4 varchar(10), city5 varchar(10))
INSERT INTO #PersonCities VALUES ('Pat','BBB','DDD','EEE','FFF','GGG')
INSERT INTO #PersonCities VALUES ('Ron','HHH','DDD','EEE','FFF', NULL)
DECLARE #SearchPerson varchar(10)
SET #SearchPerson='Pat'
SELECT PersonName,COUNT(*) AS CountOf
FROM (SELECT PersonName,city1 FROM #PersonCities WHERE city1 IS NOT NULL
UNION SELECT PersonName,city2 FROM #PersonCities WHERE city2 IS NOT NULL
UNION SELECT PersonName,city3 FROM #PersonCities WHERE city3 IS NOT NULL
UNION SELECT PersonName,city4 FROM #PersonCities WHERE city4 IS NOT NULL
UNION SELECT PersonName,city5 FROM #PersonCities WHERE city5 IS NOT NULL
) dt
WHERE dt.city1 IN (SELECT city1 FROM #PersonCities WHERE PersonName=#SearchPerson AND city1 IS NOT NULL
UNION SELECT city2 FROM #PersonCities WHERE PersonName=#SearchPerson AND city2 IS NOT NULL
UNION SELECT city3 FROM #PersonCities WHERE PersonName=#SearchPerson AND city3 IS NOT NULL
UNION SELECT city4 FROM #PersonCities WHERE PersonName=#SearchPerson AND city4 IS NOT NULL
UNION SELECT city5 FROM #PersonCities WHERE PersonName=#SearchPerson AND city5 IS NOT NULL
AND PersonName!=#SearchPerson
GROUP BY PersonName
---------- -----------
Ron 3
(1 row(s) affected)
You need to normalize your database.
Doing that you will get the columns
Name, City (optionally CityOrder).
After that you will need to find a way to combine these results into what you need. Doing this you'll need to understand Join, Count and Group by.
Try this:
< table > Person
< fields > PersonId, PersonName |
< table > City
< fields > CityId, CityName |
< table > LivedIn
< fields > LivedInId, PersonId, CityId
Logically you would do the following things for each scenario:
Find the person who has lived in the maximum number of different cities:
Make a list of the PersonId's (all people)
Iterate over that and count the number of cities each person lived in
Find the maximum cities lived in by anyone person
Find the person name related to the personId that had the max cities
Find all people that lived in 3 or more cities as a give person
Let's call the person Bob
Make a list of all cities (the CityIds) that Bob lived in.
Make a list which includes personId, and common cities (maybe a HashMap in Java)
Iterate over the LivedIn table and update the count of how many cities are common
Find all the people who have a count greater than 3
I would do this with a combination of Java and SQL but I am not that good with either so can't give you the code here without having to look a lot of stuff up.
Breaking this data out into three tables to provide a more flexible many-to-many relationship.
person table to store names
city table to store cities
person_city to relate the two (many to many)
To retrieve other people who have lived in 3 or more cities that navin has:
FROM person p
JOIN person_city pc ON (pc.person_id = p.person_id)
JOIN city c ON (c.city_id = pc.city_id)
WHERE city_id IN (
SELECT c2.city_id
FROM city c2
JOIN person_city pc2 ON (c2.city_id = pc2.city_id)
JOIN person p2 ON (p2.person_id = pc2.person_id)
WHERE = 'navin'
GROUP BY person_id HAVING lived >= 3
) AS multihome
WHERE name <> 'navin';