Delete duplicate rows with an order by - sql

I have this table:
I want to delete duplicate rows in that table based on different STATUSIN
and this is my query to duplicate rows:
;WITH CTE AS
(
SELECT ID,NIP, ROW_NUMBER()OVER(PARTITION BY STATUSIN ORDER BY STATUSIN) AS RowNumber
FROM DAILYDATAWH
), CTE2 AS
(
SELECT TOP (1000) *
FROM CTE
ORDER BY RowNumber DESC
)
DELETE FROM CTE2 WHERE RowNumber > 1
and this is the output:
how to delete duplicate rows and show the output like this:

In your particular scenario logic which has been written will not work work because if you closely look at the output of your CTE you will always have RowNumber as 1.
You query would be somewhat like this.
DECLARE #Temp AS TABLE
(
ID INT IDENTITY(1,1)
,NIP VARCHAR(2)
,[NAME] VARCHAR(10)
,DEPARTMENT VARCHAR(4)
,STATUSIN DATETIME
)
INsERT INTO #Temp
(
NIP
,[NAME]
,DEPARTMENT
,STATUSIN
)
VALUES
('A1','ARIA','BB',GETDATE())
,('A1','ARIA','BB',GETDATE())
,('A1','ARIA','BB',DATEADD(MINUTE,-1,GETDATE()))
,('A1','ARIA','BB',DATEADD(MINUTE,-1,GETDATE()))
,('A2','CHLOE','BB',DATEADD(MINUTE,-2,GETDATE()))
,('A2','CHLOE','BB',DATEADD(MINUTE,-3,GETDATE()))
,('A2','CHLOE','BB',DATEADD(MINUTE,-3,GETDATE()))
,('A3','Test','BB',DATEADD(MINUTE,-6,GETDATE()))
;WITH CTE AS
(
SELECT
NIP
,[NAME]
,ID = MAX(Id)
,STATUSIN
,ROW_NUMBER()OVER(PARTITION BY [Name] ORDER BY STATUSIN) AS RowNumber
FROM #Temp
GROUP BY
NIP
,[NAME]
,STATUSIN
)
SELECT * -- To do a delete change this line to DELETE T
FROM
#Temp AS T
LEFT OUTER JOIN CTE ON T.ID = CTE.ID
WHERE
CTE.ID IS NULL
ORDER BY
T.[NAME]
,T.STATUSIN
I have only written select which will display the records needs to be deleted. You can verify the results by changing CTE.ID IS NULL to IS NOT NULL.
I hope this will help... Good Luck

You missed ID in the partition Order by. This produces what you want as you are always deleting the 2nd , 3rd .. duplicate. Anchor your query on the first instance as below.
SELECT
ROW_NUMBER()OVER(PARTITION BY STATUSIN ORDER BY ID, STATUSIN) AS RowNumber,
ID, NIP, Name,DEPARTMENT,STATUSIN,STATUSOUT FROM #DAILYDATAWH
I would also mention that you should probably strengthen how you are partitioning. What happens if you get multiple customer records with the exact same time stamp? EG is NIP + Name unique? Added name to the below example.
SELECT
ROW_NUMBER()OVER(PARTITION BY Name,STATUSIN ORDER BY Name, STATUSIN) AS RowNumber,
ID, NIP, Name,DEPARTMENT,STATUSIN,STATUSOUT FROM #DAILYDATAWH
Solution for your query
;WITH CTE AS
(
SELECT
ROW_NUMBER()OVER(PARTITION BY STATUSIN ORDER BY ID, STATUSIN) AS RowNumber
,
ID, NIP, Name,DEPARTMENT,STATUSIN,STATUSOUT FROM #DAILYDATAWH
), CTE2 AS
(
SELECT TOP (1000) *
FROM CTE
ORDER BY RowNumber DESC
)
DELETE FROM CTE2 WHERE RowNumber > 1
SELECT * FROM #DAILYDATAWH

ID, i guess is a primary key and if you want to rearrange records in your main table, it is not possible because it is stored in ascending order.
If you just want to show records like in the below picture, then after deleting the records, use
select * from DAILYDATAWH order by NIP,NAME,StatusIn

Your code includes:
PARTITION BY STATUSIN ORDER BY STATUSIN
Having the same columns in the partition by and order by makes no sense. You say:
I want to delete duplicate rows in that table based on different
STATUSIN
Good, you have defined what the ORDER BY should be. This decides which row to keep among duplicates.
The PARTITION BY part must includes the columns that define which columns are duplicate. A guess would be NIP,NAME, but you have to decide for yourself. So try something like this:
ROW_NUMBER() OVER(PARTITION BY NIP,NAME ORDER BY STATUSIN) AS RowNumber
The rest of the code seems ok to me.

Related

Add DELETE to query that returns duplicate records | SQL Server

i need your help.
I've this query that detect a duplex records:
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
So i need to delete the results, so i've try this:
Delete FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
but doesn't work. Could tell me an help ? thank.
Try this query:
DELETE
FROM users
WHERE ID NOT IN
(SELECT MAX(ID)
FROM users
GROUP BY name, email)
Since there is no ID Column (Primary Key) in your question, assuming there is no ID column.
So below CTE will delete duplicate records.
;WITH CTE AS (
SELECT name
, email
, ROW_NUMBER() Over(Partition by name, email Order by(Select 1)) as Sno
FROM users
)
DELETE FROM CTE WHERE SNO>1
I'm assuming you want to JUST remove the duplicate rows not all the rows relating to the duplicate rows.
You can use ROW_NUMBER here. You haven't provided the whole schema for the USERS table so here's an example
CREATE TABLE dbo.Users(
Id INT IDENTITY(1,1)
,[Name] NVARCHAR(50)
,eMail NVARCHAR(50)
,DateCreated DATETIME
)
SELECT
Id
,[Name]
,eMail
,DateCreated
,RN = ROW_NUMBER()OVER(PARTITION BY Name, eMail ORDER BY DateCreated ASC)
FROM dbo.Users
you can change this query to a Common Table Expression and then you can DELETE from it
;WITH cteDups
AS(
SELECT
Id
,[Name]
,eMail
,DateCreated
,RN = ROW_NUMBER()OVER(PARTITION BY Name, eMail ORDER BY DateCreated ASC)
FROM dbo.Users
)
DELETE FROM cteDups WHERE RN > 1
The RN > 1 will remove the duplicate records only

How to get the row that holds the last value in a queue of identical values? (SQL)

I think it's easier to show you an image:
So, for each fld_call_id, go to the next value, if it's identical. When we get to the last value, I need the value in column fld_menu_id.
Or, to put it in another way, eliminate fld_call_id duplicates and save only the last one.
You can use ROW_NUMBER:
WITH CTE AS(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY fld_call_id ORDER BY fld_id DESC),
fld_menu_id
FROM dbo.TableName
)
SELECT fld_menu_id FROM CTE WHERE RN = 1
You can create a Rank column and only select that row, something along the lines of the following:
;WITH cte AS
(
SELECT
*
,RANK() OVER (PARTITION BY fld_call_id ORDER BY fld_id DESC) Rnk
FROM YourTable
)
SELECT
*
FROM cte
WHERE Rnk=1
So you GROUP BY fld_call_id and ORDER BY fld_id in descending order so that the last value comes first. These are the rows where Rnk=1.
Edit after comments of OP.
SELECT Table.*
FROM Table
INNER JOIN
(
SELECT MAX(fldMenuID) AS fldMenuID,
fldCallID
FROM Table
GROUP BY fldCallID
) maxValues
ON (maxValues.fldMenuID = Table.fldMenuID
AND maxValues.fldCallID= Table.fldCallID)
Hope This works
SELECT A.*
FROM table A
JOIN (SELECT fld_id,
ROW_NUMBER() OVER (PARTITION BY Fld_call_id ORDER BY fld_id DESC) [Row]
FROM table) LU ON A.fld_id = LU.fld_id
WHERE LU.[Row] = 1

Delete field Duplicates from the Same table

I am writing this query to display a bunch of Names from a table filled automatically from an outside source:
select MAX(UN_ID) as [ID] , MAX(UN_Name) from UnavailableNames group by (UN_Name)
I have a lot of name duplicates, so I used "Group by"
I want to delete all the duplicates right after I do this select query..
(Delete where the field UN_Name is available twice, leave it once)
Any way to do this?
Something likes this should work:
WITH CTE AS
(
SELECT rn = ROW_NUMBER()
OVER(
PARTITION BY UN_Name
ORDER BY UN_ID ASC), *
FROM dbo.UnavailableNames
)
DELETE FROM cte
WHERE rn > 1
You basically assign an increasing "row number" within each group that shares the same "un_name".
Then you just delete all rows which have a "row number" higher than 1 and keep all the ones that appeared first.
With CTE As
(
Select uid,ROW_NUMBER() OVER( PARTITION BY uname order by uid) as rownum
From yourTable
)
Delete
From yourTable
where uid in (select uid from CTE where rownum> 1 )

How do I delete duplicate rows in SQL Server using the OVER clause?

Here are the columns in my table:
Id
EmployeeId
IncidentRecordedById
DateOfIncident
Comments
TypeId
Description
IsAttenIncident
I would like to delete duplicate rows where EmployeeId, DateOfIncident, TypeId and Description are the same - just to clarify - I do want to keep one of them. I think I should be using the OVER clause with PARTITION, but I am not sure.
Thanks
If you want to keep one row of the duplicate-groups you can use ROW_NUMBER. In this example i keep the row with the lowest Id:
WITH CTE AS
(
SELECT rn = ROW_NUMBER()
OVER(
PARTITION BY employeeid, dateofincident, typeid, description
ORDER BY Id ASC), *
FROM dbo.TableName
)
DELETE FROM cte
WHERE rn > 1
use this query without using CTE....
delete a from
(select id,name,place, ROW_NUMBER() over (partition by id,name,place order by id) row_Count
from dup_table) a
where a.row_Count >1
You can use the following query. This has an assumption that you want to keep the latest row and delete the other duplicates.
DELETE [YourTable]
FROM [YourTable]
LEFT OUTER JOIN (
SELECT MAX(ID) as RowId
FROM [YourTable]
GROUP BY EmployeeId, DateOfIncident, TypeId, Description
) as KeepRows ON
[YourTable].ID = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL

Select the first instance of a record

I have a table, myTable that has two fields in it ID and patientID. The same patientID can be in the table more than once with a different ID. How can I make sure that I get only ONE instance of every patientID.?
EDIT: I know this isn't perfect design, but I need to get some info out of the database and today and then fix it later.
You could use a CTE with ROW_NUMBER function:
WITH CTE AS(
SELECT myTable.*
, RN = ROW_NUMBER()OVER(PARTITION BY patientID ORDER BY ID)
FROM myTable
)
SELECT * FROM CTE
WHERE RN = 1
It sounds like you're looking for DISTINCT:
SELECT DISTINCT patientID FROM myTable
you can get the same "effect" with GROUP BY:
SELECT patientID FROM myTable GROUP BY patientID
The simple way would be to add LIMIT 1 to the end of your query. This will ensure only a single row is returned in the result set.
WITH CTE AS
(
SELECT tableName.*,ROW_NUMBER() OVER(PARTITION BY patientID ORDER BY patientID) As 'Position' FROM tableName
)
SELECT * FROM CTE
WHERE
Position = 1