Remove only duplicates from specific Postgresql table - sql

I have a table which contains duplicates and I would like to keep only one row for each duplicates.
I can select duplicates with my SQL command :
SELECT DISTINCT ON (email, first_name, last_name) * from customer;
But I would like to use DELETE with my previous command.
This command should work right ?
DELETE FROM customer WHERE customer.id NOT IN
(SELECT id FROM
(SELECT DISTINCT ON (email, first_name, last_name) * from customer));
Is it true ?

I guess you have a id field.
delete from customer
where id not in (
select min(id)
from customer
group by email, first_name, last_name
)
The subquery finds the id of the rows you want to keep.
Then you delete the other rows

I can't find your ID in (SELECT DISTINCT ON (email, first_name, last_name) * from customer));
The distinct on only return the first row of the duplication data that is unpredictable

Related

How to delete all duplicate records from sql

i have table in sql
and I want to remove duplicate records
but remove all duplicate records
First name
last name
a
b
a
b
a
c
after run
First name
last name
a
c
You can use group by:
select first_name, last_name
from t
group by first_name, last_name
having count(*) = 1;

SQL Select column which is not used in select section of subquery which find duplicates

I am trying to find in my database records which has duplicated fields like name, surname and type.
Example:
SELECT name, surname, type, COUNT(*)
FROM customers
GROUP BY name, surname
HAVING COUNT(*)>1
Query results:
Robb|Stark|1|2
Tyrion|Lannister|1|3
So we have duplicated customer with name and surname "Robb Stark" 2 times and "Tyrion Lannister" 3 times
Now, I want to know the id of these records.
I found similar problem described here:
Finding duplicate values in a SQL table
there is answer but no example.
Use COUNT as an analytic function:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY name, surname) cnt
FROM customers
)
SELECT * -- return all columns
FROM cte
WHERE cnt > 1
ORDER BY name, surname;
The simplest way will be to use the EXISTS as follows:
SELECT t.*
FROM customers t
where exists
(select 1 from customers tt
where tt.name = t.name
and tt.surname = t.surname
and tt.id <> t.id)
Or use your original query in IN clause as follows:
select * from customers where (name, surname) in
(SELECT name, surname
FROM customers
GROUP BY name, surname
HAVING COUNT(*)>1)
If you want one row per group of duplicate, with the list of id in a comma separated string, you can just use string aggration with your existing query:
SELECT name, surname, COUNT(*) as cnt,
STRING_AGG(id, ',') WITHIN GROUP (ORDER BY id) as all_ids
FROM customers
GROUP BY name, surname
HAVING COUNT(*) > 1

Retrieving data with single occurrence of repeated data

SELECT *
FROM employee
GROUP BY first_name
HAVING count(first_name) >= 1;
How can i retrieve all rows and columns with single occurrence of duplicates? i want to retrieve all the table contents including repeated data that must occur only at once. In a table first_name,last_name are repeated twice but with different in other info.
Please Help.
try this Sql Query
SELECT * FROM EMPLOYEE WHERE FIRST_NAME NOT IN
(
SELECT FIRST_NAME FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY FIRST_NAME ORDER BY FIRST_NAME) RNK,FIRST_NAME FROM EMPLOYEE
)A WHERE A.RNK=2
)

how to remove a multiple records for same zipcode keeping atleast one record for that zipcode in database table

how to remove a multiple records for same zipcode keeping atleast one record for that zipcode in database table
id zipcode
1 38000
2 38000
3 38000
4 38005
5 38005
i want table with two column with id and zipcode ...
my final will be following
id zipcode
1 38000
4 38005
How about
delete from myTable
where id not in (
select Min( id )
from myTable
group by zipcode )
That lets you keep your lowest IDs, which is what you seemed to want.
To just select that resultset you would use a DISTINCT statement:
SELECT id, zipcode
FROM table
WHERE zipcode IN (SELECT DISTINCT zipcode FROM table)
To delete the other records and keep only one you usea subquery like so:
DELETE FROM table
WHERE id NOT IN
(SELECT id
FROM table
WHERE zipcode IN (SELECT DISTINCT zipcode FROM table)
)
You can also accomplish this using a join if you perfer.
with cte as (
select row_number() over (partitioned by zipcode order by id desc) as rn
from table)
delete from cte
where rn > 1;
This has the advantage of correctly handling duplicates and offers tight control over what gets deleted and what gets kept.
Create temporary table with desired result:
select min(id), zipcode
into tmp_sometable
from sometable
group by zipcode
Remove the original table:
drop table sometable
Rename temporary table:
sp_rename 'tmp_sometable', 'sometable';
or something like:
delete from sometable
where id not in
(
select min(id)
from sometable
group by zipcode
)
delete from table where id not in (select min(id) from table zipcode in(select distinct zipcode from table));
select distinct zipcode from table - would give the distinct zipcode in the table
select min(id) from table zipcode in(select distinct zipcode from table) - would give the record with the min ID for each zip code
delete from table where id not in (select min(id) from table zipcode in(select distinct zipcode from table)) - this would delete all the records in the table that are not there as a result of query 2
There's an easier way if you want the lowest ID number. I just tested this:
SELECT
min(ID),
zipcode
FROM #zip
GROUP BY zipcode

MySQL, return only rows where there are duplicates among two columns

I have a table in MySQL of contact information ;
first name, last name, address, etc.
I would like to run a query on this table that will return only rows with first and last name combinations which appear in the table more than once.
I do not want to group the "duplicates" (which may only be duplicates of the first and last name, but not other information like address or birthdate) -
I want to return all the "duplicate" rows so I can look over the results and determine if they are dupes or not. This seemed like it would be a simple thing to do, but it has not been.
Every solution I can find either groups the dupes and gives me a count only (which is not useful for what I need to do with the results) or doesn't work at all.
Is this kind of logic even possible in a query ? Should I try and do this in Python or something?
You should be able doing this with the GROUP BY approach in a sub-query.
SELECT t.first_name, t.last_name, t.address
FROM your_table t
JOIN ( SELECT first_name, last_name
FROM your_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
) t2
ON ( t.first_name = t2.first_name, t.last_name = t2.last_name )
The sub-query returns all names (first_name and last_name) that exist more than once, and the JOIN returns all records that match these names.
You could do it with a GROUP BY / HAVING and A SUB SELECT. Something like
SELECT t.*
FROM Table t INNER JOIN
(
SELECT FirstName, LastName
FROM Table
GROUP BY FirstName, LastName
HAVING COUNT(*) > 1
) Dups ON t.FirstName = Dups.FirstName
AND t.LastName = Dups.LastName
select * from people
join (select firstName, lastName
from people
group by firstName, lastName
having count(*) > 1
) dupe
using (firstName, lastName)