Related
As shown in the image. I have a table with Id, OrderId, SubId. For which the some row values in SubId column is zero as shown in the left table in the image. I have to update the SubId rows which are having value 0 to Max(SubId)+1 with respect to OrderId with respect to Id column in ascending order. The result table will be as shown in the right image. Any friends please help me with a solution or some suggetions.
You can use window functions:
with toupdate as (
select t.*,
row_number() over (partition by orderid, subid order by id) as seqnum,
max(subid) over (partition by orderid) as max_orderid
from t
)
update toupdate
set subid = max_orderid + seqnum
where subid = 0;
Here is a db<>fiddle.
I have this table:
I want to delete duplicate rows in that table based on different STATUSIN
and this is my query to duplicate rows:
;WITH CTE AS
(
SELECT ID,NIP, ROW_NUMBER()OVER(PARTITION BY STATUSIN ORDER BY STATUSIN) AS RowNumber
FROM DAILYDATAWH
), CTE2 AS
(
SELECT TOP (1000) *
FROM CTE
ORDER BY RowNumber DESC
)
DELETE FROM CTE2 WHERE RowNumber > 1
and this is the output:
how to delete duplicate rows and show the output like this:
In your particular scenario logic which has been written will not work work because if you closely look at the output of your CTE you will always have RowNumber as 1.
You query would be somewhat like this.
DECLARE #Temp AS TABLE
(
ID INT IDENTITY(1,1)
,NIP VARCHAR(2)
,[NAME] VARCHAR(10)
,DEPARTMENT VARCHAR(4)
,STATUSIN DATETIME
)
INsERT INTO #Temp
(
NIP
,[NAME]
,DEPARTMENT
,STATUSIN
)
VALUES
('A1','ARIA','BB',GETDATE())
,('A1','ARIA','BB',GETDATE())
,('A1','ARIA','BB',DATEADD(MINUTE,-1,GETDATE()))
,('A1','ARIA','BB',DATEADD(MINUTE,-1,GETDATE()))
,('A2','CHLOE','BB',DATEADD(MINUTE,-2,GETDATE()))
,('A2','CHLOE','BB',DATEADD(MINUTE,-3,GETDATE()))
,('A2','CHLOE','BB',DATEADD(MINUTE,-3,GETDATE()))
,('A3','Test','BB',DATEADD(MINUTE,-6,GETDATE()))
;WITH CTE AS
(
SELECT
NIP
,[NAME]
,ID = MAX(Id)
,STATUSIN
,ROW_NUMBER()OVER(PARTITION BY [Name] ORDER BY STATUSIN) AS RowNumber
FROM #Temp
GROUP BY
NIP
,[NAME]
,STATUSIN
)
SELECT * -- To do a delete change this line to DELETE T
FROM
#Temp AS T
LEFT OUTER JOIN CTE ON T.ID = CTE.ID
WHERE
CTE.ID IS NULL
ORDER BY
T.[NAME]
,T.STATUSIN
I have only written select which will display the records needs to be deleted. You can verify the results by changing CTE.ID IS NULL to IS NOT NULL.
I hope this will help... Good Luck
You missed ID in the partition Order by. This produces what you want as you are always deleting the 2nd , 3rd .. duplicate. Anchor your query on the first instance as below.
SELECT
ROW_NUMBER()OVER(PARTITION BY STATUSIN ORDER BY ID, STATUSIN) AS RowNumber,
ID, NIP, Name,DEPARTMENT,STATUSIN,STATUSOUT FROM #DAILYDATAWH
I would also mention that you should probably strengthen how you are partitioning. What happens if you get multiple customer records with the exact same time stamp? EG is NIP + Name unique? Added name to the below example.
SELECT
ROW_NUMBER()OVER(PARTITION BY Name,STATUSIN ORDER BY Name, STATUSIN) AS RowNumber,
ID, NIP, Name,DEPARTMENT,STATUSIN,STATUSOUT FROM #DAILYDATAWH
Solution for your query
;WITH CTE AS
(
SELECT
ROW_NUMBER()OVER(PARTITION BY STATUSIN ORDER BY ID, STATUSIN) AS RowNumber
,
ID, NIP, Name,DEPARTMENT,STATUSIN,STATUSOUT FROM #DAILYDATAWH
), CTE2 AS
(
SELECT TOP (1000) *
FROM CTE
ORDER BY RowNumber DESC
)
DELETE FROM CTE2 WHERE RowNumber > 1
SELECT * FROM #DAILYDATAWH
ID, i guess is a primary key and if you want to rearrange records in your main table, it is not possible because it is stored in ascending order.
If you just want to show records like in the below picture, then after deleting the records, use
select * from DAILYDATAWH order by NIP,NAME,StatusIn
Your code includes:
PARTITION BY STATUSIN ORDER BY STATUSIN
Having the same columns in the partition by and order by makes no sense. You say:
I want to delete duplicate rows in that table based on different
STATUSIN
Good, you have defined what the ORDER BY should be. This decides which row to keep among duplicates.
The PARTITION BY part must includes the columns that define which columns are duplicate. A guess would be NIP,NAME, but you have to decide for yourself. So try something like this:
ROW_NUMBER() OVER(PARTITION BY NIP,NAME ORDER BY STATUSIN) AS RowNumber
The rest of the code seems ok to me.
I have to select CompanyId column only from the following SQL;
select CompanyId,
row_number() over (partition by [GradeName] order by [TankNumber] ) rn
from [Data_DB].[dbo].[Company] where CompanyCode='ASAAA'
In the SQL, I try to figure out duplicate records, and from another table i want to delete some records based on the CompanyId from above query.
that is;
delete from [[dbo].ObservationData
where CompanyId in (select CompanyId,
row_number() over (partition by [GradeName] order by [TankNumber] ) rn
from [Data_DB].[dbo].[Company] where CompanyCode='ASAAA')
How can I modify above query?
Assuming you don't care which duplicate gets retained or deleted, you may try using a deletable CTE here:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [GradeName] ORDER BY [TankNumber]) rn
FROM [Data_DB].[dbo].[Company]
WHERE CompanyCode = 'ASAAA'
)
DELETE
FROM cte
WHERE rn > 1;
This answer arbitrarily retains the "first" duplicate, with first being defined as the record with the earliest row number.
delete from [[dbo].ObservationData
where CompanyId in (select CompanyId from (select CompanyId,
row_number() over (partition by [GradeName] order by [TankNumber] ) rn
from [Datat_DB].[dbo].[Company] where CompanyCode='ASAAA') a where rn > 1 ;
I have a table as follows:
This is a result of this select:
SELECT ParentID, ID, [Default], IsOnTop, OrderBy
FROM [table]
WHERE ParentID IN (SELECT ParentID
FROM [table]
GROUP BY ParentID
HAVING SUM([Default]) <> 1)
ORDER BY ParentID
Now, what I want to do is to: for each ParentID group, set one of the rows as a Default ([Default] = 1), where the row is chosen using this logic:
if group has a row with IsOnTop = 1 then take this row, otherwise take top 1 row ordered by OrderBy.
I'm completly clueless as on how to do that in SQL and I have over 40 of such groups, thus I'd like to ask you for help, preferably with some explanation of your query.
Just slightly modify your current query by assigning a row number, across each ParentID group. The ordering logic for the row number assignment is that records with IsOnTop values of 1 come first, and after that the OrderBy column determines position. I update the CTE under the condition that only the first record in each ParentID group gets assigned a Default value of 1.
WITH cte AS (
SELECT ParentID, ID, [Default], IsOnTop, OrderBy,
ROW_NUMBER() OVER (PARTITION BY ParentID
ORDER BY IsOnTop DESC, OrderBy) rn
FROM [table]
WHERE ParentID IN (SELECT ParentID FROM [table]
GROUP BY ParentID HAVING SUM([Default]) <> 1)
)
UPDATE cte
SET [Default] = 1
WHERE rn = 1;
There might be a quicker way but this is how I would do it.
First create a CTE
First we create a CTE in which we add a row_number over the ParentID's based on if IsOnTop = 1. Else it picks the 1st row based on the OrderBy column.
Then we update the rows with the rownumber 1.
WITH FindSoonToBeDefault AS (
SELECT ParentID, ID, [Default], IsOnTop, OrderBy, row_number() OVER(PARTITION BY ParentID ORDER BY IsOnTop DESC, [OrderBy] ASC) AS [rn]
FROM [table]
WHERE ParentID IN (SELECT ParentID
FROM [table]
GROUP BY ParentID
HAVING SUM([Default]) <> 1)
ORDER BY ParentID
)
UPDATE FindSoonToBeDefault
SET [Default] = 1
WHERE [rn] = 1
In your screenshot row 12 will be default.
Row 13 will be not.
(1-IsOnTop)*OrderBy combines IsOnTop and OrderBy into a single result that can be ranked so that the lowest value is the one you want. Use a derived table to identify the lowest result for each ParentID, thenJOIN to that to identify your defaults.
UPDATE [table]
SET [Default] = 1
FROM [table]
INNER JOIN
(
SELECT ParentID, MIN((1-IsOnTop)*OrderBy) DefaultRank
FROM [table]
GROUP BY ParentID
) AS rankForDefault
ON rankForDefault.ParentID=[table].ParentID
AND rankForDefault.DefaultRank=(1-[table].IsOnTop)*[table].OrderBy
We have a chat system that generates multiple event logs per second sometimes for every event during a chat. The issue is that these consume a massive amount of data storage (which is very expensive on that platform) and we'd like to streamline what we actually store and delete things that really aren't necessary.
To that end, there's an event type for what position in the queue the chat is. We don't care about each position as long as they are not intervening events for that chat. So we want to keep only the first and last in each distinct group where there were no other event types to just get "total time in queue" for that period.
To complicate this, a customer can go in and out of queue as they get transferred by department, so the SAME CHAT can have multiple blocks of these queue position records. I've tried using FIRST_VALUE and LAST_VALUE and it gets me most of the way there, but fails when we have the case of two distinct blocks of these events.
Here's the script to generate the test data:
<!-- language: lang-sql -->
CREATE TABLE #testdata (
id varchar(18),
name varchar(8),
[type] varchar(20),
livechattranscriptid varchar(18),
groupid varchar(40))
INSERT INTO #testdata (id,name,[type],livechattranscriptid,groupid) VALUES
('0DZ14000003I2pOGAS','34128314','ChatRequest','57014000000ltfIAAQ','57014000000ltfIAAQChatRequest'),
('0DZ14000003IGmQGAW','34181980','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IHbqGAG','34185171','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003ILuHGAW','34201743','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IQ6cGAG','34217778','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IR7JGAW','34221794','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003IiDnGAK','34287448','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IiDoGAK','34287545','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003Iut5GAC','34336044','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003Iv7HGAS','34336906','Accept','57014000000ltfIAAQ','57014000000ltfIAAQAccept')
And here is the attempt to identify anything that was the first and last id for it's group ordered by the name field and grouped by the transcriptid:
select *,FIRST_VALUE(id) OVER(Partition BY groupid order by livechattranscriptid,name asc) as firstinstancegroup,
LAST_VALUE(id) OVER(Partition BY groupid order by livechattranscriptid,name asc RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as lastinstancegroup from #testdata order by livechattranscriptid,name
The issue is, it gives me the same first and last id for ALL of them by that entire group rather than treating each group of Enqueue records as a distinct group. How would I treat each distinct grouping instance of Enqueue as a unique group?
Here's a similar solution Grouping contiguous table data
not pretty but you will find the logic based from the OP. contiguous data over the same column
declare #mytable table (
id varchar(18),
name varchar(8),
[type] varchar(20),
livechattranscriptid varchar(18),
groupid varchar(100))
INSERT INTO #mytable (id,name,[type],livechattranscriptid,groupid) VALUES
('0DZ14000003I2pOGAS','34128314','ChatRequest','57014000000ltfIAAQ','57014000000ltfIAAQChatRequest'),
('0DZ14000003IGmQGAW','34181980','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IHbqGAG','34185171','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003ILuHGAW','34201743','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IQ6cGAG','34217778','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IR7JGAW','34221794','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003IiDnGAK','34287448','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IiDoGAK','34287545','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003Iut5GAC','34336044','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003Iv7HGAS','34336906','Accept','57014000000ltfIAAQ','57014000000ltfIAAQAccept')
;with myend as ( --- get all ends
select
*
from
(select
iif(groupid <> lead(groupid,1,groupid) over (order by name),
id,
'x') [newid],name
from #mytable
)x
where newid <> 'x'
)
, mystart as -- get all starts
(
select
*
from
(select
iif(groupid <> lag(groupid,1,groupid) over (order by name),
id,
'x') [newid], name,type,livechattranscriptid
from #mytable
)x
where newid <> 'x'
) ,
finalstart as ( --- get all starts including the first row
select id,
name,type,livechattranscriptid,
row_number() over (order by name) rn
from (
select id,name,type,livechattranscriptid
from (
select top 1 id, name,type,livechattranscriptid
from #mytable
order by name) x
union all
select newid,name,type,livechattranscriptid from mystart
) y
),
finalend as -- get all ends and add the last row
(
select id,
row_number() over (order by name) rn
from (
select id,name from (
select top 1 id,name
from #mytable
order by name desc) x
union all
select newid,name from myend
) y
)
select
s.id [startid]
,s.name
,s.type
,s.livechattranscriptid
,e.id [lastid]
from
finalend e
inner join finalstart s
on e.rn = s.rn --- bind the two results over the positions or row number