I am using SQL 2008 R2. I have 5 composite Primary keys
(NOID ,CODE_CLIENT,CODE_DEST,DATE_CLOTURE,DATE_CLOTUR_REEL)
on my table.
I try to delete the double with this syntax:
DELETE
FROM [LETTRE_VOIT_FINAL]
WHERE EXISTS
(SELECT NOID ,
CODE_CLIENT,
CODE_DEST,
DATE_CLOTURE,
DATE_CLOTUR_REEL
FROM LETTRE_VOIT_FINAL
GROUP BY NOID ,
CODE_CLIENT,
CODE_DEST,
DATE_CLOTURE,
DATE_CLOTUR_REEL HAVING count(*) > 1)
it delete all the entry, fortunately I have made a backup.
Before I just had 4 composite Primary keys, and I add the last one DATE_CLOTUR_REEL. Because the value of primary keys can not be null then I put the value getdate() into this keys. Because of that I can not set all 5 as composite primary keys because I have double.
So now, there are no primary keys on my table.
For deleting duplicate form your table:
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY NOID ,CODE_CLIENT,CODE_DEST,DATE_CLOTURE,DATE_CLOTUR_REEL
ORDER BY ( SELECT 0)) RN
FROM LETTRE_VOIT_FINAL)
DELETE FROM cte
WHERE RN > 1
The problem is that subquery in exists statement is not bound with "delete from" in any way. The result is that existence of ANY doubles delets ALL records in a table. Besides, I think you messed something up with primary keys (as commented by other users).
Anyway, you could use CTE to remove duplicates:
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
Related
This code I have finds duplicate rows in a table. H
SELECT position, name, count(*) as cnt
FROM team
GROUP BY position, name,
HAVING COUNT(*) > 1
How do I delete the duplicate rows that I have found in Hiveql?
Apart from distinct, you can use row_number for this in Hive. Explicit delete and update can only be performed on tables that support ACID. So insert overwrite is more universal.
insert overwrite table team
select position, name, other1, other2...
from (
select
*,
row_number() over(partition by position, name order by rand()) as rn
from team
) tmp
where rn = 1
;
Please try this.assuming id is primary key column
delete from team where id in (
select t1.id from team t1,
(SELECT position, name, count(*) as cnt ,max(id) as id1
FROM team
GROUP BY position, name,
HAVING COUNT(*) > 1) t2
where t1.position=t2.position
and t1.name=t2.name
and t1.id<>t2.id1)
This is an alternative way, since deletes are expensive in Hive
Create table Team_new
As
Select distinct <col1>, <col2>,...
from Team;
Drop table Team purge;
Alter table Team_new rename to Team;
This is assuming you don’t have an id column. If you have an id column then the 1st query would change slightly as
Create table Team_new
As
Select <col1>,<col2>,...,max(id) as id from Team
Group by <col1>,<col2>,... ;
Other queries (drop & alter post this) would remain the same as above.
It is a datawarehouse project in which I load a table in which each column refers to another table. The problem is that due to an error in the process, many duplicate records were loaded (approximately 13,000) but they do not have a unique identifier, therefore they are exactly the same. Is there a way to delete only one of the duplicate records so that I don't have to delete everything and repeat the table loading process?
You can use row_number() and a cte:
with cte as (
select row_number() over(
partition by col1, col2, ...
order by (select null)) rn
from mytable
)
delete from cte where rn > 1
The window functions guarantees that the same number will not be assigned twice within a partition - you need to enumerate all column columns in the partition by clause.
If you are going to delete a significant part of the rows, then it might be simpler to empty and recreate the table:
create table tmptable as select distinct * from mytable;
truncate table mytable; -- back it up first!
insert into mytable select * from tmptable;
drop table tmptable;
You can make use row_number to delete the duplicate rows by first partitioning them and then ordering by one of the columns with that partition.
You have to list all your columns in partition by if records are completely identical.
WITH CTE1 AS (
SELECT A.*
, ROW_NUMBER(PARTITION BY CODDIMALUMNO, (OTHER COLUMNS) ORDER BY CODDIMALUMNO) RN
FROM TABLE1 A
)
DELETE FROM CTE1
WHERE RN > 1;
You can use row_number() and an updatable CTE:
with todelete as (
select t.*, row_number() over (partition by . . . ) as seqnum
from t
)
delete from todelete
where seqnum > 1;
The . . . is for the columns that define duplicates.
I have this table
I want to remove successive similar rows and keep the recent.
so the result I want to have is something like this
Here is how I would do it:
;WITH cte AS (
SELECT valeur, date_reference, id, rownum = ROW_NUMBER() OVER (ORDER BY date_reference) FROM #temperatures
UNION ALL
SELECT NULL, NULL, NULL, (SELECT COUNT(*) FROM #temperatures) + 1
)
SELECT A.* FROM cte AS A INNER JOIN cte AS B ON A.rownum + 1 = B.rownum AND COALESCE(a.valeur, -459) != COALESCE(b.valeur, -459)
I am calling the table #temperatures. Use a CTE to assign a ROW_NUMBER to each record and to include an extra record with the last Row_Number (otherwise the last record will not be included in the following query). Then, SELECT from the CTE where the next ROW_NUMBER does not have the same valeur.
Now, if you want to DELETE from the original table, you can review this query's return to make sure you really want to delete all the records not in this return. Then, assuming historique_id is the primary key, DELETE FROM #temperatures WHERE historique_id NOT IN (SELECT historique_id FROM cte AS A....
You can collect all the rows that you want to held in a temp table, truncate your original table, and insert all the rows from temp table to your original table. This will be more effective than just deleting rows in case you have "a lot of duplicates". Also truncate table have following restrictions
You cannot use TRUNCATE TABLE on tables that:
Are referenced by a FOREIGN KEY constraint. (You can truncate a
table that has a foreign key that references itself.)
Participate in an indexed view.
Are published by using transactional replication or merge
replication.
TRUNCATE TABLE cannot activate a trigger because the operation does
not log individual row deletions. For more information, see CREATE
TRIGGER (Transact-SQL)
In Azure SQL Data Warehouse and Parallel Data Warehouse:
TRUNCATE TABLE is not allowed within the EXPLAIN statement.
TRUNCATE TABLE cannot be ran inside of a transaction.
You can find more information in following topics.
Truncate in SQL SERVER
Deleting Data in SQL Server with TRUNCATE vs DELETE commands
You can use this script for removing duplicate rows by truncate-insert strategy
CREATE TABLE #temp_hisorique(
code varchar(50),
code_trim varchar(50),
libelle varchar(50),
unite varchar(50),
valeur varchar(50),
date_reference datetime,
hisoriqueID int
)
GO
;WITH cte AS (
select *, row_number() over(partition by code, code_trim, libelle, unite, valeur order by date_reference desc) as rownum
from mytable
)
insert into #temp_hisorique(code, code_trim, libelle, unite, valeur, date_reference, hisoriqueID)
select code, code_trim, libelle, unite, valeur, date_reference, hisoriqueID
from cte
where rownum = 1
TRUNCATE TABLE mytable
insert into mytable(code, code_trim, libelle, unite, valeur, date_reference, hisoriqueID)
select code, code_trim, libelle, unite, valeur, date_reference, hisoriqueID
from #temp_hisorique
Or you can just remove the rows by delete command with join.
;WITH cte AS (
select *, row_number() over(partition by code, code_trim, libelle, unite, valeur order by date_reference desc) as rownum
from mytable
)
delete T
from mytable T
join cte on T.hisoriqueID = cte.hisoriqueID
where cte.rownum > 1
My issue is how do we delete a primary key row in case it is duplicated. The other fields may/may not be duplicates. I am interested only in the primary key being duplicated and would like to retain the first instance while deleting the other duplicate entries.
For example,
I have 2 tables with the following data:
Table1:- Portfolio
Columns:- PortfolioID(PK), PortfolioName
Sample data :-
1, North America
2, Europe
3, Asia
Table2:- Account
Columns:- AccountID(PK), PortfolioID(FK), AccountName
Sample data :-
1,1,Quake
1,1,Wind
2,1,Fire
3,1,Quake
4,2,Flood
5,2,Wind
Lets say for PortfolioID = 1,
I am trying to delete row number 2 from the Account table where the AccountID 1 is repeated for PortfolioID =1. I have tried using the CTE expression where I use the ROW_NUMBER statement and try to delete ROWNUMBER <> 1. But this query doesn't work as it deletes all the rows in the table.
The query I tried:
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [Account].[AccountID] ORDER BY [Account].[AccountID]) AS [ROWNUMBER],
[Account].[AccountID]
FROM [Account]
INNER JOIN [Portfolio] ON [Portfolio].[PortfolioID] = [Account]. [PortfolioID]
WHERE [Portfolio].[PortfolioID] = 1
)
DELETE [Account]
FROM [CTE]
WHERE [ROWNUMBER] <> 1
Am I doing something wrong in the query? Thanks in advance for the help.
Firstly, if you define the AccountID column as the primary key in your database, this going forward will help solve having these kinds of problems.
Secondly, are you using Sql Server? Which version?
Assuming you are using Sql Server and a recent version which allows you to use windowing, you can try something like this to delete any duplicates that you have.
This will delete ALL copies of ALL duplicates:
WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY AccountID,PortfolioID)
FROM Account)
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)
This alternative script will keep one of the duplicates if that is what you prefer:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
)
DELETE FROM CTE WHERE RN<>1
Finally, if you want to only delete duplicates for Portfolio Id 1:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
Where PortfolioID = 1
)
DELETE FROM CTE WHERE RN<>1
Primary key column never ever support duplicate entries.
Try with the below query for the desired result based on the given data/inputs.
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY a.[AccountID],a.PortfolioID ORDER BY a.[AccountID]) AS [ROWNUMBER],*
FROM [Account] a
WHERE a.[PortfolioID] = 1
)
DELETE
FROM [CTE]
WHERE [ROWNUMBER] > 1
We have a table that has had the same data inserted into it twice by accident meaning most (but not all) rows appears twice in the table. Simply put, I'd like an SQL statement to delete one version of a row while keeping the other; I don't mind which version is deleted as they're identical.
Table structure is something like:
FID, unique_ID, COL3, COL4....
Unique_ID is the primary key, meaning each one appears only once.
FID is a key that is unique to each feature, so if it appears more than once then the duplicates should be deleted.
To select features that have duplicates would be:
select count(*) from TABLE GROUP by FID
Unfortunately I can't figure out how to go from that to a SQL delete statement that will delete extraneous rows leaving only one of each.
This sort of question has been asked before, and I've tried the create table with distinct, but how do I get all columns without naming them? This only gets the single column FID and itemising all the columns to keep gives an: ORA-00936: missing expression
CREATE TABLE secondtable NOLOGGING as select distinct FID from TABLE
If you don't care which row is retained
DELETE FROM your_table_name a
WHERE EXISTS( SELECT 1
FROM your_table_name b
WHERE a.fid = b.fid
AND a.unique_id < b.unique_id )
Once that's done, you'll want to add a constraint to the table that ensures that FID is unique.
Try this
DELETE FROM table_name A WHERE ROWID > (
SELECT min(rowid) FROM table_name B
WHERE A.FID = B.FID)
A suggestion
DELETE FROM x WHERE ROWID IN
(WITH y AS (SELECT xCOL, MIN(ROWID) FROM x GROUP BY xCOL HAVING COUNT(xCOL) > 1)
SELCT a.ROWID FROM x, y WHERE x.XCOL=y.XCOL and x.ROWIDy.ROWID)
Try with this.
DELETE FROM firsttable WHERE unique_ID NOT IN
(SELECT MAX(unique_ID) FROM firsttable GROUP BY FID)
EDIT:
One explanation:
SELECT MAX(unique_ID) FROM firsttable GROUP BY FID;
This sql statement will pick each maximum unique_ID row from each duplicate rows group. And delete statement will keep these maximum unique_ID rows and delete other rows of each duplicate group.
You can try this.
delete from tablename a
where a.logid, a.pointid, a.routeid) in (select logid, pointid, routeid from tablename
group by logid, pointid, routeid having count(*) > 1)
and rowid not in (select min(rowid) from tablename
group by logid, pointid, routeid having count(*) > 1)