Deleting rows where the Primary key is duplicated - SQL - sql

My issue is how do we delete a primary key row in case it is duplicated. The other fields may/may not be duplicates. I am interested only in the primary key being duplicated and would like to retain the first instance while deleting the other duplicate entries.
For example,
I have 2 tables with the following data:
Table1:- Portfolio
Columns:- PortfolioID(PK), PortfolioName
Sample data :-
1, North America
2, Europe
3, Asia
Table2:- Account
Columns:- AccountID(PK), PortfolioID(FK), AccountName
Sample data :-
1,1,Quake
1,1,Wind
2,1,Fire
3,1,Quake
4,2,Flood
5,2,Wind
Lets say for PortfolioID = 1,
I am trying to delete row number 2 from the Account table where the AccountID 1 is repeated for PortfolioID =1. I have tried using the CTE expression where I use the ROW_NUMBER statement and try to delete ROWNUMBER <> 1. But this query doesn't work as it deletes all the rows in the table.
The query I tried:
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [Account].[AccountID] ORDER BY [Account].[AccountID]) AS [ROWNUMBER],
[Account].[AccountID]
FROM [Account]
INNER JOIN [Portfolio] ON [Portfolio].[PortfolioID] = [Account]. [PortfolioID]
WHERE [Portfolio].[PortfolioID] = 1
)
DELETE [Account]
FROM [CTE]
WHERE [ROWNUMBER] <> 1
Am I doing something wrong in the query? Thanks in advance for the help.

Firstly, if you define the AccountID column as the primary key in your database, this going forward will help solve having these kinds of problems.
Secondly, are you using Sql Server? Which version?
Assuming you are using Sql Server and a recent version which allows you to use windowing, you can try something like this to delete any duplicates that you have.
This will delete ALL copies of ALL duplicates:
WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY AccountID,PortfolioID)
FROM Account)
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)
This alternative script will keep one of the duplicates if that is what you prefer:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
)
DELETE FROM CTE WHERE RN<>1
Finally, if you want to only delete duplicates for Portfolio Id 1:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
Where PortfolioID = 1
)
DELETE FROM CTE WHERE RN<>1

Primary key column never ever support duplicate entries.
Try with the below query for the desired result based on the given data/inputs.
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY a.[AccountID],a.PortfolioID ORDER BY a.[AccountID]) AS [ROWNUMBER],*
FROM [Account] a
WHERE a.[PortfolioID] = 1
)
DELETE
FROM [CTE]
WHERE [ROWNUMBER] > 1

Related

SQL: Deleting Duplicates using Not in and Group by

I have the following SQL Syntax to delete duplicate rows, but never are any rows affected.
DELETE FROM content_stacks WHERE id NOT IN (
SELECT id
FROM content_stacks
GROUP BY user_id, content_id
);
The subquery itself is returning the id list of first entries correctly.
SELECT id
FROM content_stacks
GROUP BY user_id, content_id
When I'm inserting the results list as a string it is working, too:
DELETE FROM content_stacks WHERE id NOT IN (239,231,217,218,219,232,233,220,230,226,234,235,224,225,221,223,222,227,228,229,236,237,238,216,208,209,210,204,211,212,242,203,240,201,241,205,206,207,213,214,215);
I checked many similar examples and this should be working in my opinion. What am I missing?
First find first rows using ROW_NUMBER Then delete record with row number greater than 1:
WITH CTE AS (
SELECT id , ROW_NUMBER() OVER(PARTITION BY user_id, content_id, ORDER BY id) rn
FROM content_stacks
)
DELETE cs
FROM content_stacks cs
INNER JOIN CTE ON CTE.id = cs.id
WHERE rn > 1
Am sorry to ask but if your deleting why would u need to group the records.
Are not just increasing the runtime.
The code from Meyssam Toluie is not working as it is but I made a similar solution with the same idea with rownumbers:
DELETE FROM content_stacks WHERE id IN
(SELECT id FROM (
SELECT id, ROW_NUMBER() OVER(PARTITION BY user_id, content_id)row_num
FROM content_stacks
) sub
WHERE row_num > 1)
This is working for me now.
My first command did not work because: The group by command does not show all ids in the output, but they are still there, so in fact all ids were returned in the NOT IN id-list. The row number seems to be the easiest way for this problem.

How to delete duplicate rows that are exactly the same in SQL Server [duplicate]

This question already has answers here:
Delete duplicate records from a SQL table without a primary key
(20 answers)
Delete duplicate entries keeping one entry of each if id column not available
(2 answers)
Closed 2 years ago.
I loaded some data into a SQL Server table from a .CSV file for test purposes, I don't have any primary key, unique key or auto-generated ID in that table.
Helow is an example of the situation:
select *
from people
where name in (select name
from people
group by name
having count(name) > 1)
When I run this query, I get these results:
The goal is to keep one row and remove other duplicate rows.
Is there any way other than save the content somewhere else, delete all duplicate rows and insert a new one?
Thanks for helping!
You could use an updatable CTE for this.
If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):
with cte as (
select row_number() over(partition by name, age, gender order by (select null)) rn
from people
)
delete from cte where rn > 1
If you want to delete duplicates on name only (as shown in your existing query):
with cte as (
select row_number() over(partition by name order by (select null)) rn
from people
)
delete from cte where rn > 1
How are you defining "duplicate"? Based on your code example, it appears to be by name.
For the deletion, you can use an updatable CTE with row_number():
with todelete as (
select p.*,
row_number() over (partition by name order by (select null)) as seqnum
from people p
)
delete from todelete
where seqnum > 1;
If more columns define the duplicate, then adjust the partition by clause.

How do you create a delete query that deletes rows with criteria?

I have an insert query that goes into a table linked to QuickBooks. The table Test_InvoiceLine has a lot of Part_ID's and Descriptions that are exactly the same.
INSERT INTO InvoiceLine (Part_ID, More_Info )
SELECT Part_ID, Description
FROM Test_InvoiceLine;
How can I write a query that goes into the InvoiceLine table and deletes duplicates with the same Part_ID and Description that are already there?
Use a CTE as a temporary result set. You essentially query for duplicate records that are partitioned by your criteria, and delete the records that are not the first record (keeps the original record).
Double-check me before you blow anything away, because it has been a while since I've done this, but this should work.
WITH CTE AS(
SELECT Part_ID, Description,
RowNum = ROW_NUMBER()OVER(PARTITION BY Part_ID, Description ORDER BY Part_ID, Description)
FROM Test_InvoiceLine
)
DELETE FROM CTE WHERE RowNum > 1
Source: How to delete duplicate records in SQL Server
Is there a reason why you need to do an insert then delete? If you're trying to insert -> delete, it sounds like you're actually trying to update the data by Part_ID.
update InvoiceLine a
set More_Info = b.Description
from ( select Part_ID,
Description
from Test_InvoiceLine ) b
where b.Part_ID = a.Part_ID
This will update More_Info in the InvoiceLine table with Description.
This is SQL Server, however, ROW_NUMBER is widely used in other RDMS.
Here is the query you need.
;WITH Data AS
(
SELECT
Part_ID,Description,
RowNumber = ROW_NUMBER() OVER(PARTITION BY Part_Id,Description ORDER BY Part_Id,Description)
FROM Test_InvoiceLine
)
DELETE FROM Data WHERE RowNumber > 1
I don't know how More_Info will make a difference here as the duplicate key does not include it, according to your post, however, if you need to inspect the more_info values in the delete statement then perhaps you could use something similar to the query below.
;WITH Data AS
(
SELECT
More_Info,
Part_ID,Description,
RowNumber = ROW_NUMBER() OVER(PARTITION BY Part_Id,Description ORDER BY Part_Id,Description)
FROM Test_InvoiceLine
)
DELETE T
FROM Test_InvoiceLine T
INNER JOIN Data D ON D.RowNumber > 1 AND D.MoreInfo = "Y" AND D.Part_Id = T.Part_ID

delete double on SQL with multiple primary keys

I am using SQL 2008 R2. I have 5 composite Primary keys
(NOID ,CODE_CLIENT,CODE_DEST,DATE_CLOTURE,DATE_CLOTUR_REEL)
on my table.
I try to delete the double with this syntax:
DELETE
FROM [LETTRE_VOIT_FINAL]
WHERE EXISTS
(SELECT NOID ,
CODE_CLIENT,
CODE_DEST,
DATE_CLOTURE,
DATE_CLOTUR_REEL
FROM LETTRE_VOIT_FINAL
GROUP BY NOID ,
CODE_CLIENT,
CODE_DEST,
DATE_CLOTURE,
DATE_CLOTUR_REEL HAVING count(*) > 1)
it delete all the entry, fortunately I have made a backup.
Before I just had 4 composite Primary keys, and I add the last one DATE_CLOTUR_REEL. Because the value of primary keys can not be null then I put the value getdate() into this keys. Because of that I can not set all 5 as composite primary keys because I have double.
So now, there are no primary keys on my table.
For deleting duplicate form your table:
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY NOID ,CODE_CLIENT,CODE_DEST,DATE_CLOTURE,DATE_CLOTUR_REEL
ORDER BY ( SELECT 0)) RN
FROM LETTRE_VOIT_FINAL)
DELETE FROM cte
WHERE RN > 1
The problem is that subquery in exists statement is not bound with "delete from" in any way. The result is that existence of ANY doubles delets ALL records in a table. Besides, I think you messed something up with primary keys (as commented by other users).
Anyway, you could use CTE to remove duplicates:
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1

How to delete duplicate record where PK is uniqueidentifier field

I want to know the way we can remove duplicate records where PK is uniqueidentifier.
I have to delete records on the basis of duplicate values in a set of fields.we can use option to get temptable using Row_Number() and except row number one we can delete rest or the records.
But I wanted to build one liner query. Any suggestion?
You can use CTE to do this, without seeing your table structure here is the basic SQL
;with cte as
(
select *, row_number() over(partition by yourfields order by yourfields) rn
from yourTable
)
delete
from cte
where rn > 1
delete from table t using table ta where ta.dup_field=t.dup_field and t.pk >ta.pk
;