delete duplicates with two ifs

delete duplicates with two ifs - sql

We have a Microsoft SQL Server table [database].[dbo].[UserInAppPurchase] with this columns:
[Id]
,[UserEmail]
,[UserId]
,[PurchaseDate]
,[ProductId]
,[TransactionId]
,[OriginalTransactionId]
,[ValidationTime]
,[ValidationReceipt]
,[ValidFrom]
,[ValidTo]
,[Platfrom]
So a UserID can have multiple records of the same purchase by error. The duplicates would have an identical ValidTo date.
So how would I delete all duplicates? In the end each UserId would have exactly one entry with that particular ValidTo date.
Thanks for the help
Andreas

row_number() with an updatable CTE comes to mind:
with todelete as (
select uiap.*, row_number() over (partition by userid, validto order by id) as seqnum
from UserInAppPurchase uiap
)
delete from todelete
where seqnum > 1;

Related

Delete Distinct column and latest date of other column

I have a table where the primary key is a composite key of ID and date. Is there a way that I can delete a single row where ID matches and the date is the latest date?
I am new to SQL, so I have tried a few things, but I either don't get the results I am looking for or cant get the syntax correct
DELETE FROM Master
WHERE ((Identifier = 'SomeID')
AND (EffectiveDate = MAX(EffectiveDate));
There are multiple columns with the same ID, but different dates, ie.
ID EffectiveDate
-------------------------
A '2019-09-18'
A '2019-09-17'
A '2019-09-16'
Is there a way I can delete only the row with A | '2019-09-18'?

You can use window functions and an updatable CTE:
with todelete as (
select t.*, row_number() over (partition by id order by effective_date desc) as seqnum
from t
)
delete from todelete
where seqnum = 1;
Note: If you want to limit this to a single id, then be sure to include a where id = 'a' in either the subquery or outer query.

use row_number()
delete from (select *, row_number() over(partition by id order by effectivedate desc) rn from table_name
) a where a.rn=1

A correlated subquery might get the job done:
DELETE FROM Master
WHERE
Identifier = 'SomeID'
AND EffectiveDate = (
SELECT MAX(EffectiveDate) FROM Master WHERE Identifier = 'SomeID'
)
;

Use the CTE Function to Delete the Row but the below Query will not delete the Record of Max Date of those ID's where Single Record exist against that.
with todelete as (
select t.*, row_number() over (partition by id order by effective_date desc) as seqnum
from t
)
delete from todelete
where seqnum = 1 and id in(select distinct id from todelete where seqnum<>1)

With correlated subquery for all IDs:
delete table1
from table1 t1
where t1.EffectiveDate =
(
select max(t2.EffectiveDate)
from table1 t2
where t2.ID = t1.ID
)

Select an id from a group sql which returns duplicate records?

I have to select CompanyId column only from the following SQL;
select CompanyId,
row_number() over (partition by [GradeName] order by [TankNumber] ) rn
from [Data_DB].[dbo].[Company] where CompanyCode='ASAAA'
In the SQL, I try to figure out duplicate records, and from another table i want to delete some records based on the CompanyId from above query.
that is;
delete from [[dbo].ObservationData
where CompanyId in (select CompanyId,
row_number() over (partition by [GradeName] order by [TankNumber] ) rn
from [Data_DB].[dbo].[Company] where CompanyCode='ASAAA')
How can I modify above query?

Assuming you don't care which duplicate gets retained or deleted, you may try using a deletable CTE here:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [GradeName] ORDER BY [TankNumber]) rn
FROM [Data_DB].[dbo].[Company]
WHERE CompanyCode = 'ASAAA'
)
DELETE
FROM cte
WHERE rn > 1;
This answer arbitrarily retains the "first" duplicate, with first being defined as the record with the earliest row number.

delete from [[dbo].ObservationData
where CompanyId in (select CompanyId from (select CompanyId,
row_number() over (partition by [GradeName] order by [TankNumber] ) rn
from [Datat_DB].[dbo].[Company] where CompanyCode='ASAAA') a where rn > 1 ;

Delete double entries in SQL Server

We have a Microsoft SQL Server table [database].[dbo].[UserInAppPurchase] with this columns:
[Id]
,[UserEmail]
,[UserId]
,[PurchaseDate]
,[ProductId]
,[TransactionId]
,[OriginalTransactionId]
,[ValidationTime]
,[ValidationReceipt]
,[ValidFrom]
,[ValidTo]
,[Platfrom]
We have multiple entries with the same [TransactionID], but per TransactionID only one row should be there. Thus we would like to delete all rows with same TransactionID and keep the one with the lowest [Id].
Thanks for the help
Andreas

One nice method uses updatable CTEs:
with todelete as (
select uiap.*,
row_number() over (partition by TransactionID order by id) as seqnum
from UserInAppPurchase uiap
)
delete todelete
where seqnum > 1;
You can, of course, use other methods that are more compatible with other databases, such as:
delete uiap from UserInAppPurchase uiap
where uiap.id > (select min(uiap2.id) from UserInAppPurchase uiap2 where uiap2.TransactionID = uiap.TransactionID);

Delete Duplicate Rows in SQL

I have a table with unique id but duplicate row information.
I can find the rows with duplicates using this query
SELECT
PersonAliasId, StartDateTime, GroupId, COUNT(*) as Count
FROM
Attendance
GROUP BY
PersonAliasId, StartDateTime, GroupId
HAVING
COUNT(*) > 1
I can manually delete the rows while keeping the 1 I need with this query
Delete
From Attendance
Where Id IN(SELECT
Id
FROM
Attendance
Where PersonAliasId = 15
and StartDateTime = '9/24/2017'
and GroupId = 1429
Order By ModifiedDateTIme Desc
Offset 1 Rows)
I am not versed in SQL enough to figure out how to use the rows in the first query to delete the duplicates leaving behind the most recent. There are over 3481 records returned by the first query to do this one by one manually.
How can I find the duplicate rows like the first query and delete all but the most recent like the second?

You can use a Common Table Expression to delete the duplicates:
WITH Cte AS(
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
)
DELETE FROM Cte WHERE Rn > 1;
This will keep the most recent record for each PersonAliasId - StartDateTime - GroupId combination.

Use the MAX aggregate function to identify the latest startdatetime for each group/person combination. Then delete records which do not have that latest time.
DELETE a
FROM attendance as a
INNER JOIN (
SELECT
PersonAliasId, MAX(StartDateTime) AS LatestTime, GroupId,
FROM
Attendance
GROUP BY
PersonAliasId, GroupId
HAVING
COUNT(*) > 1
) as b
on a.personaliasid=b.personaliasid and a.groupid=b.groupid and a.startdatetime < b.latesttime

Same as the CTE answer - give Felix the check
delete
from ( SELECT rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
) tt
where tt.rn > 1

How do I delete duplicate rows in SQL Server using the OVER clause?

Here are the columns in my table:
Id
EmployeeId
IncidentRecordedById
DateOfIncident
Comments
TypeId
Description
IsAttenIncident
I would like to delete duplicate rows where EmployeeId, DateOfIncident, TypeId and Description are the same - just to clarify - I do want to keep one of them. I think I should be using the OVER clause with PARTITION, but I am not sure.
Thanks

If you want to keep one row of the duplicate-groups you can use ROW_NUMBER. In this example i keep the row with the lowest Id:
WITH CTE AS
(
SELECT rn = ROW_NUMBER()
OVER(
PARTITION BY employeeid, dateofincident, typeid, description
ORDER BY Id ASC), *
FROM dbo.TableName
)
DELETE FROM cte
WHERE rn > 1

use this query without using CTE....
delete a from
(select id,name,place, ROW_NUMBER() over (partition by id,name,place order by id) row_Count
from dup_table) a
where a.row_Count >1

You can use the following query. This has an assumption that you want to keep the latest row and delete the other duplicates.
DELETE [YourTable]
FROM [YourTable]
LEFT OUTER JOIN (
SELECT MAX(ID) as RowId
FROM [YourTable]
GROUP BY EmployeeId, DateOfIncident, TypeId, Description
) as KeepRows ON
[YourTable].ID = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

delete duplicates with two ifs - sql

row_number() with an updatable CTE comes to mind: with todelete as ( select uiap.*, row_number() over (partition by userid, validto order by id) as seqnum from UserInAppPurchase uiap ) delete from todelete where seqnum > 1;

Related

Delete Distinct column and latest date of other column

Select an id from a group sql which returns duplicate records?

Delete double entries in SQL Server

Delete Duplicate Rows in SQL

How do I delete duplicate rows in SQL Server using the OVER clause?

Categories

Resources