Find duplicates in SQL - sql

I have a large table with the following data on users.
social security number
name
address
I want to find all possible duplicates in the table
where the ssn is equal but the name is not
My attempt is:
SELECT * FROM Table t1
WHERE (SELECT count(*) from Table t2 where t1.name <> t2.name) > 1

A grouping on SSN should do it
SELECT
ssn
FROM
Table t1
GROUP BY
ssn
HAVING COUNT(*) > 1
..or if you have many rows per ssn and only want to find duplicate names)
...
HAVING COUNT(DISTINCT name) > 1
Edit, oops, misunderstood
SELECT
ssn
FROM
Table t1
GROUP BY
ssn
HAVING MIN(name) <> MAX(name)

This will handle more than two records with duplicate ssn's:
select count(*), name from table t1, (
select count(*) ssn_count, ssn
from table
group by ssn
having count(*) > 1
) t2
where t1.ssn = t2.ssn
group by t1.name
having count(*) <> t2.ssn_count

Related

SQL Return only duplicate records

I want to return rows that have duplicate values in both Full Name and Address columns in SQL. So in the example, I would just want the first two rows return. How do I code this?
Why return duplicate values? Just aggregate and return the count:
select fullname, address, count(*) as cnt
from t
group by fullname, address
having count(*) >= 2;
One option uses window functions:
select *
from (
select t.*, count(*) over(partition by fullname, address) cnt
from mytable t
) t
where cnt > 1
If your table has a primary key, say id, you can also use exists:
select t.*
from mytable t
where exists (
select 1
from mytable t1
where t1.fullname = t.fullname and t1.address = t.address and t1.id <> t.id
)

I have a table without any primary key and i want it all duplicate records

I have a table without any primary key and I want it all duplicate records
Table --
EmpName City
-------------
Shivam Noida
Ankit Delhi
Mani Gurugram
Shivam Faizabad
Mukesh Noida
and want output like this --
EmpName City
-------------
Shivam Noida
Shivam Faizabad
Mukesh Noida
Thanks in Advance.
I think you want exists:
select t.*
from t
where exists (select 1
from t t2
where (t2.empname = t.empname and t2.city <> t.city) or
(t2.city = t.city and t2.empname <> t.empname)
);
You seem to be looking for all rows where the name or city also appears in another row of the table.
select *
from mytable
where city in (select city from mytable group by city having count(*) > 1)
or empname in (select empname from mytable group by empname having count(*) > 1);
You say there is no primary key. This suggests that there can be duplicate rows (same name and city). This makes it impossible in many DBMS to use EXISTS here to look up the other rows. This is why I am suggesting IN and COUNT.
use exists and or condition
with cte as
(
select 'Shivam' as name, 'Noida' as city union all
select 'Ankit' , 'Delhi' union all
select 'Mani' , 'Gurugram' union all
select 'Shivam' , 'Faizabad' union all
select 'Mukesh' , 'Noida'
) select t1.* from cte t1 where exists ( select 1 from cte t2
where t1.name=t2.name
group by name
having count(*)>1
)
or exists
(
select 1 from cte t2
where t1.city=t2.city
group by city
having count(*)>1
)
output
name city
Shivam Noida
Shivam Faizabad
Mukesh Noida
Do a UNION ALL to put both types of names in one column. (d1)
GROUP BY its result, and use HAVING to return only duplicates (d2).
JOIN:
select EmpName, City
from tablename t1
join (select name from
(select EmpName name from tablename
union all
select City from tablename) d1
group by name
having count(*) > 1) d2
on d2.name in (EmpName, City)
select distinct * from table
where col1 in (select col1 from table group by col1 having count(1) > 1)
or col2 in (select col2 from table group by col2 having count(1) > 1)

sql query that partitions the data and orders by time and then returns only specific records within a partition

So what I mean exactly is: data is partitioned by name and ordered by date
I would like now to select only those rows in each partition which are coming after the row where NO is null and GENRE is null (after the rowNo 3 in case of the provided example)
So result of the query should return rowNo 4 and 5
Query used:
select
name, no, genre, date,
ROW_NUMBER() OVER(PARTITION BY name, genre ORDER BY date)
from
sourceTable
Assuming there is only one row per name where no and genre are null, you can use
select t1.*
from tablename t1
join tablename t2 on t1.name = t2.name and t2.no is null and t2.genre is null
where t1.date > t2.date
Why wouldn't you just do this?
select t.*
from (select name, no, genre, date,
ROW_NUMBER() OVER(PARTITION BY name, genre ORDER BY date) as rowno
from sourceTable
) t
where rowno > 3;

Oracle SQL to delete duplicate records based on columns

I have a table with records:
DATE NAME AGE ADDRESS
01/13/2014 abc 27 us
01/29/2014 abc 27 ma <- duplicate
02/03/2014 abc 27 ny <- duplicate
02/03/2014 def 28 ca
I want to delete the record number 2 and 3 since they are duplicates for record 1 based on name and age. DATE column is a timestamp based from the record when it was added (sql date) and considered unique.
I found this sql but not sure if it will work and a bit concerned as the table has 2 million records and delting the wrong ones will be a bad idea:
SELECT A.DATE, A.NAME, A.AGE
FROM table A
WHERE EXISTS (SELECT B.DATE
FROM table B
WHERE B.NAME = A.NAME
AND B.AGE = A.AGE);
There are many instance of this records so if someone can help me write a sql to delete this records?
Query
DELETE FROM tbl t1
WHERE dt IN
(
SELECT t1.dt
FROM tbl t1
JOIN tbl t2 ON
(
t2.name = t1.name
AND t2.age=t1.age
AND t2.dt > t1.dt
)
);
Fiddle demo
delete from table
where (date, name, age) not in ( select max( date ), name, age from table group by name, age )
Before delete verify with
select * from table
where (date, name, age) not in ( select max( date ), name, age from table group by name, age )
ROW_NUMBER analytical function will helpful (supported by Oracle and Sqlserver).
The logic of assigning a unique ordered number for each row inside a partition, needs to be implemented carefully inside ORDER BY clause.
SELECT A_TABLE.*,
ROW_NUMBER ()
OVER (PARTITION BY NAME, AGE
ORDER BY DATE DESC)
seq_no
FROM A_TABLE;
Then you may use the result for delete operation:
Delete A_TABLE
where DATE,NAME,AGE IN
(
SELECT DATE,NAME,AGE FROM
(
SELECT A_TABLE.*,
ROW_NUMBER ()
OVER (PARTITION BY NAME, AGE
ORDER BY DATE DESC)
seq_no
FROM A_TABLE;
)
WHERE seq_no != 1
)

How to remove duplicate rows as well as old records in Oracle database

I search through the net and didn't find answer for this kind.
I have table emp_master_data, which have many columns but I want to use few columns to filter the data ( select query) and then after analyzing, I want to delete those records.
The filter should be applied on three columns emp_card_no, emp_id , enrollment_exp_dt. An employee can be enrolled multiple times , which means you'll have multiple records with same emp_no, emp_id and same/different enrollment_exp_dt.
Now , I need to do this:
Remove the duplicate records if there are multiple records with same enrollment_exp_dt, emp_card_no and emp_id.
If in case I have multiple records for same employee but different enrollment_exp_dt , then remove the old records and keep only the latest record ( doesn't have to be >sysdate).
Please let me know the best way I could do. I did try doing this but doesn't solve all the problems.
SELECT *
FROM brm_staging A
WHERE EXISTS (
SELECT 1 FROM brm_staging
WHERE enrollment_exp_dt = A.enrollment_exp_dt
and emp_id= A.emp_id
and emp_card_no =A.emp_card_no
AND ROWID < A.ROWID
);
i got really complicated. can you try the select statement first before deleting and see if this is working? (this is for second senario)
DELETE FROM YOUR_TABLE T1
INNER JOIN (
SELECT T2.* FROM YOUR_TABLE T2,
(SELECT EMP_ID, CARD_NO, COUNT(*) FROM
YOUR_TABLE
GROUP BY EMP_ID, CARD_NO
HAVING COUNT(*) > 1) T3
WHERE T2.EMP_ID=T3.EMP_ID AND T2.CARD_NO = T3.CARD_NO AND
T2.ENROLLMENT_EXP_DT NOT IN (SELECT MAX(T4.ENROLLMENT_EXP_DT)
FROM YOUR_TABLE T4) T5 ON
T1.EMP_ID=T5.EMP_ID AND T1.CARD_NO=T5.CARD_NO AND T1.ENROLLMENT_EXP_DT=T5.ENROLLMENT_EXO_DT
(EDIT) i think this work too (more simplified)
DELETE FROM YOUR_TABLE T1
WHERE EXISTS (
SELECT T2.* FROM YOUR_TABLE T2,
(SELECT EMP_ID, CARD_NO, COUNT(*) FROM
YOUR_TABLE
GROUP BY EMP_ID, CARD_NO
HAVING COUNT(*) > 1) T3
WHERE T2.EMP_ID=T3.EMP_ID AND T2.CARD_NO = T3.CARD_NO AND
T2.ENROLLMENT_EXP_DT NOT IN (SELECT MAX(T4.ENROLLMENT_EXP_DT)
FROM YOUR_TABLE T4)