This question already has answers here:
How can I remove duplicate rows?
(43 answers)
Closed 4 years ago.
I have a SQL Server table with ~100 columns including the columns Id and CreationDate. Due to a bad constraint in its initial design, there are now many duplicate rows (i.e. rows whose values are identical across ALL columns).
Can you suggest a script to remove those duplicate rows?
Also, what would be a script to select all distinct Ids with the latest CreationDate?
Thanks
You can use the following script to remove duplicate rows from a Microsoft SQL Server table:
SELECT DISTINCT *
INTO duplicate_table
FROM original_table
GROUP BY key_value
HAVING COUNT(key_value) > 1
DELETE original_table
WHERE key_value
IN (SELECT key_value
FROM duplicate_table)
INSERT original_table
SELECT *
FROM duplicate_table
DROP TABLE duplicate_table
When this script is executed, it follows these steps:
It moves one instance of any duplicate row in the original table to a duplicate table.
It deletes all rows from the original table that also reside in the duplicate table.
It moves the rows in the duplicate table back into the original table.
It drops the duplicate table.
Related
This question already has answers here:
Removing duplicate rows from table in Oracle
(24 answers)
How do I find duplicate values in a table in Oracle?
(13 answers)
Closed 6 years ago.
I have a replication of around 200 records in a table, I want to remove all of then except one, how can I do this ??
Source http://www.devx.com
It's easy to introduce duplicate rows of data into Oracle tables by
running a data load twice without the primary key or unique indexes
created or enabled.Here column1, column2, column3 constitute the identifying key for each record.
DELETE FROM our_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM our_table
GROUP BY column1, column2, column3) ;
use the following query. This will be applicable if there is an Id available for the table.
delete from tableA where id in(select top 199 id from tableA)
I have this table, where every column is a VARCHAR (or equivalent):
field001 field002 field003 field004 field005 .... field500
500 VARCHAR columns. No primary keys. And no column is guaranteed to be unique. So the only way to know for sure if two rows are the same is to compare the values of all columns.
(Yes, this should be in TheDailyWTF. No, it's not my fault. Bear with me here).
I inserted a duplicate set of rows by mistake, and I need to find them and remove them.
There's 12 million rows on this table, so I'd rather not recreate it.
However, I do know what rows were mistakenly inserted (I have the .sql file).
So I figured I'd create another table and load it with those. And then I'd do some sort of join that would compare all columns on both tables and then delete the rows that are equal from the first table. I tried a NATURAL JOIN as that looked promising, but nothing was returned.
What are my options?
I'm using Amazon Redshift (so PostgreSQL 8.4 if I recall), but I think this is a general SQL question.
You can treat the whole row as a single record in Postgres (and thus I think in Redshift).
The following works in Postgres, and will keep one of the duplicates
delete from the_table
where ctid not in (select min(ctid)
from the_table
group by the_table); --<< Yes, the group by is correct!
This is going to be slow!
Grouping over so many columns and then deleting with a NOT IN will take quite some time. Especially if a lot of rows are going to be deleted.
If you want to delete all duplicate rows (not keeping any of them), you can use the following:
delete from the_table
where the_table in (select the_table
from the_table
group by the_table
having count(*) > 1);
You should be able to identify all the mistakenly inserted rows using CREATEXID.If you group by CREATEXID on your table as below and get the count you should be able to understand how many rows were inserted in your transaction and remove them using DELETE command.
SELECT CREATEXID,COUNT(1)
FROM yourtable
GROUP BY 1;
One simplistic solution is to recreate the table, e.g.
CREATE TABLE my_temp_table (
-- add column definitions here, just like the original table
);
INSERT INTO my_temp_table SELECT DISTINCT * FROM original_table;
DROP TABLE original_table;
ALTER TABLE my_temp_table RENAME TO original_table;
or even
CREATE TABLE my_temp_table AS SELECT DISTINCT * FROM original_table;
DROP TABLE original_table;
ALTER TABLE my_temp_table RENAME TO original_table;
It is a trick but probably it helps.
Each row in the table containing the transaction ID in which it row was inserted/updated: System Columns. It is xmin column. So using it you can to find the transaction ID in which you inserted the wrong data. Then just delete the rows using
delete from my_table where xmin = <the_wrong_transaction_id>;
PS: Be careful and try it on the some test table first.
I'm new to databases. I want to find the Duplicate records from the database table which is already created i.e. i m not concern about to prevent duplicate insertion but i wanted to know the duplicate records.
i tried with
Distinct
key word but it will show the records by removing duplicate data,
and after that i tried
unique index
which will say the the table name which are having duplicate records but not show or give duplicate records.
thanks in advance.
List the columns that make a record a duplicate and then group by them and count the occourances
select col1. col2
from your_table
group by col1, col2
having count(*) > 1
The having clause lists only those having more than one enry in the table.
This question already has answers here:
How can I remove duplicate rows?
(43 answers)
Closed 9 years ago.
MS Access database was corrupted and in one table few rows was duplicated. They are absolutely same and there isn't any unique field between duplicates, even primary keys. Because of this, primary key was reseted from this table after repairing database.
Now I can only know rows that were duplicated:
select * from tablename
where id in(
select id from tablename
group by id
having count (*) > 1)
To designate primary key I must delete one of two duplicates, but don't know how.
One way you can do this is with a temporary table:
select distinct t.*
into TempTABLE
from tablename t;
delete from tablename;
insert into tablename
select *
from TempTable;
That is, remove the duplicates using distinct. Delete all the rows from the original table, and then insert the unique rows.
This question already has answers here:
Closed 11 years ago.
Possible Duplicates:
Remove duplicates in large MySql table
Can I extract the extract records that are duplicated in sql?
How can I delete duplicate rows in a table
I need something to delete repeated rows from the database.
I found out how many rows are repeated in table using this query :
SELECT GoodCode FROM Good_
and here is distinct query SELECT Distinct GoodCode FROM Good_
The second one has lower records. Please guide me how I can delete repeated rows from the first one.
Simple method:
SELECT DISTINCT *
INTO #TempGood
FROM Good_
TRUNCATE TABLE Good_
INSERT Good_
SELECT *
FROM #TempGood
DROP TABLE #TempGood
create table temptable as select distinct * from Good_;
drop table Good_;
rename temptable to Good_;