Delete double entries in SQL Server - sql

We have a Microsoft SQL Server table [database].[dbo].[UserInAppPurchase] with this columns:
[Id]
,[UserEmail]
,[UserId]
,[PurchaseDate]
,[ProductId]
,[TransactionId]
,[OriginalTransactionId]
,[ValidationTime]
,[ValidationReceipt]
,[ValidFrom]
,[ValidTo]
,[Platfrom]
We have multiple entries with the same [TransactionID], but per TransactionID only one row should be there. Thus we would like to delete all rows with same TransactionID and keep the one with the lowest [Id].
Thanks for the help
Andreas

One nice method uses updatable CTEs:
with todelete as (
select uiap.*,
row_number() over (partition by TransactionID order by id) as seqnum
from UserInAppPurchase uiap
)
delete todelete
where seqnum > 1;
You can, of course, use other methods that are more compatible with other databases, such as:
delete uiap from UserInAppPurchase uiap
where uiap.id > (select min(uiap2.id) from UserInAppPurchase uiap2 where uiap2.TransactionID = uiap.TransactionID);

Related

SQL delete duplicate query

I try to delete duplicate (service column) but not working
DELETE FROM contactactionnodup
WHERE service IN (SELECT service, COUNT(*), contactid
FROM ContactActionNoDup
GROUP BY service, contactid
HAVING COUNT(*) > 1)
Need to correct this query. Thank you
If you want to keep one of the rows for each service/contactid pair, then use an updatable CTE:
with todelete as (
select ca.*, row_number() over (partition by service, contactid order by service) as seqnum
from ContactActionNoDup as ca
)
delete from todelete
where seqnum > 1;

delete duplicates with two ifs

We have a Microsoft SQL Server table [database].[dbo].[UserInAppPurchase] with this columns:
[Id]
,[UserEmail]
,[UserId]
,[PurchaseDate]
,[ProductId]
,[TransactionId]
,[OriginalTransactionId]
,[ValidationTime]
,[ValidationReceipt]
,[ValidFrom]
,[ValidTo]
,[Platfrom]
So a UserID can have multiple records of the same purchase by error. The duplicates would have an identical ValidTo date.
So how would I delete all duplicates? In the end each UserId would have exactly one entry with that particular ValidTo date.
Thanks for the help
Andreas
row_number() with an updatable CTE comes to mind:
with todelete as (
select uiap.*, row_number() over (partition by userid, validto order by id) as seqnum
from UserInAppPurchase uiap
)
delete from todelete
where seqnum > 1;

Removing duplicate data in SQL Server columns

I am trying to remove some duplicate date from a table called [dbo].[FactGunSales] and the column is [sale_id]. I am checking if there are duplicates with the code below which works and then the code below is the code I am having issues with as it returns no rows affected.
-- Detecting Duplicate
SELECT [sale_id], COUNT(*) TotalCount
FROM [dbo].[FactGunSales]
GROUP BY [sale_id]
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
GO
-- Deleting Duplicate
DELETE FROM [dbo].[FactGunSales]
WHERE [sale_id] NOT IN (SELECT MAX([sale_id])
FROM [dbo].[FactGunSales]
GROUP BY [sale_id])
GO
Any help would be great
Use not exists:
Instead, use ROW_NUMBER() or COUNT(*). Your code seems equivalent to:
WITH todelete AS (
SELECT fgs.*, COUNT(*) OVER (PARTITION BY sale_id) as cnt
FROM [dbo].[FactGunSales] fgs
)
DELETE FROM to_delete
WHERE cnt > 1;
Normally, though, you don't want to delete all duplicates. You want to keep one of them. For that, use ROW_NUMBER():
WITH todelete AS (
SELECT fgs.*, ROW_NUMBER() OVER (PARTITION BY sale_id ORDER BY sale_id) as seqnum
FROM [dbo].[FactGunSales] fgs
)
DELETE FROM to_delete
WHERE seqnum > 1;
Your query doesn't give an indication about which row to keep. This version keeps an arbitrary row. You can keep the newest or oldest or biggest or smallest or whateverest by changing the ORDER BY clause.
Your version doesn't delete anything because at least one value of sale_id is NULL. If any value returned by the subquery is NULL, then the WHERE filters out all rows. Usually, I strongly recommend using NOT EXISTS instead, but for this purpose an updatable CTE makes more sense.
You can consider using a cte and ranking the records on the basis of sale_id, and so any duplicate sale_id would have a rank=2,3,4 etc.. After that you would need to delete entries which are <> rank=1
with cte
as (select row_number() over(partition by sale_id order by sale_id) as rnk
,*
from [dbo].[FactGunSales]
)
delete
from cte
where rnk <> 1

Delete duplicates but keep 1 with multiple column key

I have the following SQL select. How can I convert it to a delete statement so it keeps 1 of the rows but deletes the duplicate?
select s.ForsNr, t.*
from [testDeleteDublicates] s
join (
select ForsNr, period, count(*) as qty
from [testDeleteDublicates]
group by ForsNr, period
having count(*) > 1
) t on s.ForsNr = t.ForsNr and s.Period = t.Period
Try using following:
Method 1:
DELETE FROM Mytable WHERE RowID NOT IN (SELECT MIN(RowID) FROM Mytable GROUP BY Col1,Col2,Col3)
Method 2:
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY ForsNr, period
ORDER BY ( SELECT 0)) RN
FROM testDeleteDublicates)
DELETE FROM cte
WHERE RN > 1
Hope this helps!
NOTE:
Please change the table & column names according to your need!
This is easy as long as you have a generated primary key column (which is a good idea). You can simply select the min(id) of each duplicate group and delete everything else - Note that I have removed the having clause so that the ids of non-duplicate rows are also excluded from the delete.
delete from [testDeleteDublicates]
where id not in (
select Min(Id) as Id
from [testDeleteDublicates]
group by ForsNr, period
)
If you don't have an artificial primary key you may have to achieve the same effect using row numbers, which will be a bit more fiddly as their implementation varies from vendor to vendor.
You can do with 2 option.
Add primary-key and delete accordingly
http://www.mssqltips.com/sqlservertip/1103/delete-duplicate-rows-with-no-primary-key-on-a-sql-server-table/
'2. Use row_number() with partition option, runtime add row to each row and then delete duplicate row.
Removing duplicates using partition by SQL Server
--give group by field in partition.
;with cte(
select ROW_NUMBER() over( order by ForsNr, period partition ForsNr, period) RowNo , * from [testDeleteDublicates]
group by ForsNr, period
having count(*) > 1
)
select RowNo from cte
group by ForsNr, period

How do I delete duplicate rows in SQL Server using the OVER clause?

Here are the columns in my table:
Id
EmployeeId
IncidentRecordedById
DateOfIncident
Comments
TypeId
Description
IsAttenIncident
I would like to delete duplicate rows where EmployeeId, DateOfIncident, TypeId and Description are the same - just to clarify - I do want to keep one of them. I think I should be using the OVER clause with PARTITION, but I am not sure.
Thanks
If you want to keep one row of the duplicate-groups you can use ROW_NUMBER. In this example i keep the row with the lowest Id:
WITH CTE AS
(
SELECT rn = ROW_NUMBER()
OVER(
PARTITION BY employeeid, dateofincident, typeid, description
ORDER BY Id ASC), *
FROM dbo.TableName
)
DELETE FROM cte
WHERE rn > 1
use this query without using CTE....
delete a from
(select id,name,place, ROW_NUMBER() over (partition by id,name,place order by id) row_Count
from dup_table) a
where a.row_Count >1
You can use the following query. This has an assumption that you want to keep the latest row and delete the other duplicates.
DELETE [YourTable]
FROM [YourTable]
LEFT OUTER JOIN (
SELECT MAX(ID) as RowId
FROM [YourTable]
GROUP BY EmployeeId, DateOfIncident, TypeId, Description
) as KeepRows ON
[YourTable].ID = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL