Remove Duplicate Row Data - sql

I need to remove duplicates from my table (user_info). I always want to remove the row with the id column that is lower of the two rows being returned from my select/having query below. Any ideas on how to write the delete statement to remove the duplicates (lower user_info.id column) from the results of my select/having query below? I'm using Oracle 11g.
user_info table structure:
id (unique primary key number 10 generated by sequence)
user_id (number 10)
first_name (varchar2)
last_name (varchar2)
data example:
id user_id
______ ___________
37265 1455
265798 1455
sql to show duplicates:
select user_id, count(*)
from user_info
group by user_id
HAVING count(*) > 1

You can use the following query:
DELETE
FROM user_info
WHERE id NOT IN
(SELECT MAX(id)
FROM user_info
GROUP BY user_id);
This query will delete all the duplicate rows except the user_id row with maximum id.
Here's a SQL Fiddle which demonstrates the delete.

Start with this to show you only the duplicates
Select user_id, count(*) NumRows, Min(Id) SmallestId, Max(Id) LargestId
From user_info
Group by user_id
HAVING count(*) > 1
This will show you the min and max for each user_id (with the same value for SmallestId and LargestId if there are no duplicates.
Select user_id, count(*) NumRows, Min(Id) SmallestId, Max(Id) LargestId
From user_info
Group by user_id
For a User, you want to keep the MaxId and Delete everything else. So you can write a DELETE statement to be
DELETE From user_info
Where Id Not IN
(
Select Max(Id)
From user_info
Group by user_id
)
This will get the

drop table test;
/
create table test
(
ids number,
user_id number
);
/
insert into test
values(37265,1455);
/
insert into test
values(265798,1455);
/
select * from test;
delete from test t
where t.ids < (select max(ids) from test t1 where T1.USER_ID= T.USER_ID)
This query employs the sub query to do the same !

Related

Deleting multiple duplicate rows

This code I have finds duplicate rows in a table. H
SELECT position, name, count(*) as cnt
FROM team
GROUP BY position, name,
HAVING COUNT(*) > 1
How do I delete the duplicate rows that I have found in Hiveql?
Apart from distinct, you can use row_number for this in Hive. Explicit delete and update can only be performed on tables that support ACID. So insert overwrite is more universal.
insert overwrite table team
select position, name, other1, other2...
from (
select
*,
row_number() over(partition by position, name order by rand()) as rn
from team
) tmp
where rn = 1
;
Please try this.assuming id is primary key column
delete from team where id in (
select t1.id from team t1,
(SELECT position, name, count(*) as cnt ,max(id) as id1
FROM team
GROUP BY position, name,
HAVING COUNT(*) > 1) t2
where t1.position=t2.position
and t1.name=t2.name
and t1.id<>t2.id1)
This is an alternative way, since deletes are expensive in Hive
Create table Team_new
As
Select distinct <col1>, <col2>,...
from Team;
Drop table Team purge;
Alter table Team_new rename to Team;
This is assuming you don’t have an id column. If you have an id column then the 1st query would change slightly as
Create table Team_new
As
Select <col1>,<col2>,...,max(id) as id from Team
Group by <col1>,<col2>,... ;
Other queries (drop & alter post this) would remain the same as above.

How to delete the duplicate data in table (Postgres)

I want to delete the duplicated data in a table , I know there is a way use
SELECT
fruit,
COUNT( fruit )
FROM
basket
GROUP BY
fruit
HAVING
COUNT( fruit )> 1
ORDER BY
fruit;
to find them , buy I need to determine every column's value is equal , which means tableA.* = tableA.* (except id , id is the auto-increment primary key )
and I tried this:
SELECT
*,
COUNT( * )
FROM
myTable
GROUP BY
*
HAVING
COUNT( * )> 1
ORDER BY
id;
but it says I can't use GROUP BY * , so how can I find & delete the duplicated data(need every column's value is equal except id)?
using
SELECT * DISTINCT
DISTINCT remove duplicated result
You need to try something similar to be below query. You apply PARTITION BY for the columns other than Id (as it is incrementing unique value). PARTITION BY should be applied for columns, for which you want to check duplicates.
Also refer to Row_Number in Postgres & Common Table expression in Postgres
WITH DuplicateTableRows AS
(
SELECT Id, Row_Number() OVER (PARTITION BY col1, col2... ORDER BY Id)
FROM
Table1
)
DELETE FROM Table1
WHERE Id IN (SELECT Id FROM Table1 WHERE row_number > 1)
You can do this using JSON:
select (to_jsonb(b) - 'id')
from basket b
group by 1
having count(*) > 1;
The result is as JSON. Unfortunately, to extract the values back into a record, you need to list the columns individually.

delete records from the table sorted by two columns

I have a table with these columns:
tel_number,date,time
There might be several records for a single tel_number. For each tel_number, I want to delete all except the record which has the most recent date from the table and in cases where there are multiple records for the most recent date, the one with the most recent time get to be selected and all the rest should be removed from the table.
for example from records like these:
1 2223333,14/01/28,08:30
2 2223333,14/01/27,08:30
3 2223333,14/01/28,16:30
4 2225555,14/01/27,10:34
5 2225555,13/12/29,10:34
all record except these two should be deleted:
3 2223333,14/01/28,16:30
4 2225555,14/01/27,10:34
edit:
I have tried this so far, but it doesn't delete records which has the same date but different times:
delete from table where (tel_number,date) not in
(select tel_number,max(date) from table group by tel_number);
According to Deleting Duplicate Rows In Oracle:
1. Using MIN(rowid) : The most common method of removing duplicate rows.
DELETE FROM Table1
WHERE ROWID NOT IN (SELECT MIN (ROWID)
FROM Table1
GROUP BY tel_number, Date, Time);
2. Using MIN(rowid) & Join: More or less the same as first one
DELETE FROM Table1 t
WHERE t.ROWID NOT IN (SELECT MIN (b.ROWID)
FROM Table1 b
WHERE b.tel_number = t.tel_number
AND b.Date = t.Date
AND b.Time = t.Time);
3. Using Analytic functions
DELETE FROM Table1
WHERE ROWID IN (
SELECT rid
FROM (SELECT ROWID rid,
ROW_NUMBER () OVER (PARTITION BY tel_number,Date, Time ORDER BY ROWID) rn
FROM Table1 )
WHERE rn <> 1);
I solved the problem with this query:
delete from table a where (a.tel_number,a.DATE,a.TIME) not in
(
select tel_number,date,max(time)
from table group by tel_number,date
having date in
(
select max(date) from table group by tel_number
)
);

How to delete duplicate rows from an Oracle Database?

We have a table that has had the same data inserted into it twice by accident meaning most (but not all) rows appears twice in the table. Simply put, I'd like an SQL statement to delete one version of a row while keeping the other; I don't mind which version is deleted as they're identical.
Table structure is something like:
FID, unique_ID, COL3, COL4....
Unique_ID is the primary key, meaning each one appears only once.
FID is a key that is unique to each feature, so if it appears more than once then the duplicates should be deleted.
To select features that have duplicates would be:
select count(*) from TABLE GROUP by FID
Unfortunately I can't figure out how to go from that to a SQL delete statement that will delete extraneous rows leaving only one of each.
This sort of question has been asked before, and I've tried the create table with distinct, but how do I get all columns without naming them? This only gets the single column FID and itemising all the columns to keep gives an: ORA-00936: missing expression
CREATE TABLE secondtable NOLOGGING as select distinct FID from TABLE
If you don't care which row is retained
DELETE FROM your_table_name a
WHERE EXISTS( SELECT 1
FROM your_table_name b
WHERE a.fid = b.fid
AND a.unique_id < b.unique_id )
Once that's done, you'll want to add a constraint to the table that ensures that FID is unique.
Try this
DELETE FROM table_name A WHERE ROWID > (
SELECT min(rowid) FROM table_name B
WHERE A.FID = B.FID)
A suggestion
DELETE FROM x WHERE ROWID IN
(WITH y AS (SELECT xCOL, MIN(ROWID) FROM x GROUP BY xCOL HAVING COUNT(xCOL) > 1)
SELCT a.ROWID FROM x, y WHERE x.XCOL=y.XCOL and x.ROWIDy.ROWID)
Try with this.
DELETE FROM firsttable WHERE unique_ID NOT IN
(SELECT MAX(unique_ID) FROM firsttable GROUP BY FID)
EDIT:
One explanation:
SELECT MAX(unique_ID) FROM firsttable GROUP BY FID;
This sql statement will pick each maximum unique_ID row from each duplicate rows group. And delete statement will keep these maximum unique_ID rows and delete other rows of each duplicate group.
You can try this.
delete from tablename a
where a.logid, a.pointid, a.routeid) in (select logid, pointid, routeid from tablename
group by logid, pointid, routeid having count(*) > 1)
and rowid not in (select min(rowid) from tablename
group by logid, pointid, routeid having count(*) > 1)

How to keep only one row of a table, removing duplicate rows?

I have a table that has a lot of duplicates in the Name column. I'd
like to only keep one row for each.
The following lists the duplicates, but I don't know how to delete the
duplicates and just keep one:
SELECT name FROM members GROUP BY name HAVING COUNT(*) > 1;
Thank you.
See the following question: Deleting duplicate rows from a table.
The adapted accepted answer from there (which is my answer, so no "theft" here...):
You can do it in a simple way assuming you have a unique ID field: you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their name.
Example query:
DELETE FROM members
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM members
GROUP BY name
)
In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.
It would probably be easier to select the unique ones into a new table, drop the old table, then rename the temp table to replace it.
#create a table with same schema as members
CREATE TABLE tmp (...);
#insert the unique records
INSERT INTO tmp SELECT * FROM members GROUP BY name;
#swap it in
RENAME TABLE members TO members_old, tmp TO members;
#drop the old one
DROP TABLE members_old;
We have a huge database where deleting duplicates is part of the regular maintenance process. We use DISTINCT to select the unique records then write them into a TEMPORARY TABLE. After TRUNCATE we write back the TEMPORARY data into the TABLE.
That is one way of doing it and works as a STORED PROCEDURE.
If we want to see first which rows you are about to delete. Then delete them.
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
Full example at http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
You can join table with yourself by matched field and delete unmatching rows
DELETE t1 FROM table_name t1
LEFT JOIN tablename t2 ON t1.match_field = t2.match_field
WHERE t1.id <> t2.id;
delete dup row keep one
table has duplicate rows and may be some rows have no duplicate rows then it keep one rows if have duplicate or single in a table.
table has two column id and name if we have to remove duplicate name from table
and keep one. Its Work Fine at My end You have to Use this query.
DELETE FROM tablename
WHERE id NOT IN(
SELECT id FROM
(
SELECT MIN(id)AS id
FROM tablename
GROUP BY name HAVING
COUNT(*) > 1
)AS a )
AND id NOT IN(
(SELECT ids FROM
(
SELECT MIN(id)AS ids
FROM tablename
GROUP BY name HAVING
COUNT(*) =1
)AS a1
)
)
before delete table is below see the screenshot:
enter image description here
after delete table is below see the screenshot this query delete amit and akhil duplicate rows and keep one record (amit and akhil):
enter image description here
if you want to remove duplicate record from table.
CREATE TABLE tmp SELECT lastname, firstname, sex
FROM user_tbl;
GROUP BY (lastname, firstname);
DROP TABLE user_tbl;
ALTER TABLE tmp RENAME TO user_tbl;
show record
SELECT `page_url`,count(*) FROM wl_meta_tags GROUP BY page_url HAVING count(*) > 1
delete record
DELETE FROM wl_meta_tags
WHERE meta_id NOT IN( SELECT meta_id
FROM ( SELECT MIN(meta_id)AS meta_id FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) > 1 )AS a )
AND meta_id NOT IN( (SELECT ids FROM (
SELECT MIN(meta_id)AS ids FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) =1 )AS a1 ) )
Source url
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [emp_id] ORDER BY [emp_id]) AS Row, * FROM employee_salary
)
DELETE FROM CTE
WHERE ROW <> 1