Grouping SQL rows into a transaction then calculating duration - sql

I have a table that's structured like the one below. I'm wondering if it's possible to compress each transaction by the TransactionID, pull values from the first and last row, then finally calculate a duration for the transaction.
For instance below, I would like to pull RecordID and CustomerID. Then alias a new column called startingStatus which would be 'Submitted' from the first row. I would alias a second column called endingStatus that would be 'Success' from the last row. I then need to grab the timestamps in the same manner, creating startingTime and endingTime. The final column would be the difference between starting and ending in seconds.
RecordID TransactionID CustomerID Status TimeStamp
1 12 10 Submitted 04/07/2014 14:32:23
2 12 10 Queued 04/07/2014 14:32:24
3 12 10 Processing 04/08/2014 14:32:26
4 12 10 Error 04/09/2014 14:32:27
5 12 10 Resubmitted 04/10/2014 15:12:29
6 12 10 Queued 04/11/2014 15:12:31
7 12 10 Processing 04/12/2014 15:12:34
8 12 10 Success 04/13/2014 15:12:47
I've been trying to group by TransactionID and using MIN and MAX, but I haven't gotten it working yet.
How would I go about doing something like this?

Give this a try:
SELECT
CustomerID,
TransactionID,
Min([TimeStamp]) as StartTime,
Max([TimeStamp]) as EndTime,
DATEDIFF(MINUTE,Min([TimeStamp]),Max([TimeStamp])) as TransactionTime
FROM YourTable
GROUP BY CustomerID, TransactionID
ORDER BY CustomerID, TransactionID

I would use window functions to retrieve the first and last records for a group.
This will work for SQL Server 2005 and later by using the ROW_NUMBER() window function and then pivoting the results with the MAX/CASE WHEN method:
SELECT
[TransactionID],
[CustomerID],
MAX(CASE WHEN [rn_asc] = 1 THEN [Status] ELSE NULL END) [startingStatus],
MAX(CASE WHEN [rn_desc] = 1 THEN [Status] ELSE NULL END) [endingStatus],
MAX(CASE WHEN [rn_asc] = 1 THEN [TimeStamp] ELSE NULL END) [startingTimeStamp],
MAX(CASE WHEN [rn_desc] = 1 THEN [TimeStamp] ELSE NULL END) [endingTimeStamp],
DATEDIFF(
SECOND,
MAX(CASE WHEN [rn_asc] = 1 THEN [TimeStamp] ELSE NULL END),
MAX(CASE WHEN [rn_desc] = 1 THEN [TimeStamp] ELSE NULL END)
) [duration]
FROM
(
SELECT
[TransactionID],
[CustomerID],
ROW_NUMBER() OVER (PARTITION BY [TransactionID], [CustomerID] ORDER BY [RecordID] ASC) [rn_asc],
ROW_NUMBER() OVER (PARTITION BY [TransactionID], [CustomerID] ORDER BY [RecordID] DESC) [rn_desc],
[Status],
[Timestamp]
FROM [tbl]
) A
GROUP BY
[TransactionID],
[CustomerID]
SQL Server 2012 introduced the FIRST_VALUE function, so if you're running that version you can consider this query
SELECT DISTINCT
[TransactionID],
[CustomerID],
FIRST_VALUE([Status]) OVER (PARTITION BY [TransactionID], [CustomerID] ORDER BY [RecordID] ASC) [startingStatus],
FIRST_VALUE([Status]) OVER (PARTITION BY [TransactionID], [CustomerID] ORDER BY [RecordID] DESC) [endingStatus],
FIRST_VALUE([TimeStamp]) OVER (PARTITION BY [TransactionID], [CustomerID] ORDER BY [RecordID] ASC) [startingTimeStamp],
FIRST_VALUE([TimeStamp]) OVER (PARTITION BY [TransactionID], [CustomerID] ORDER BY [RecordID] DESC) [endingTimeStamp],
DATEDIFF(
SECOND,
FIRST_VALUE([TimeStamp]) OVER (PARTITION BY [TransactionID], [CustomerID] ORDER BY [RecordID] ASC),
FIRST_VALUE([TimeStamp]) OVER (PARTITION BY [TransactionID], [CustomerID] ORDER BY [RecordID] DESC)
) [duration]
FROM [tbl]

Related

SQL - Return count of consecutive days where value was unchanged

I have a table like
date
ticker
Action
'2022-03-01'
AAPL
BUY
'2022-03-02'
AAPL
SELL.
'2022-03-03'
AAPL
BUY.
'2022-03-01'
CMG
SELL.
'2022-03-02'
CMG
HOLD.
'2022-03-03'
CMG
HOLD.
'2022-03-01'
GPS
SELL.
'2022-03-02'
GPS
SELL.
'2022-03-03'
GPS
SELL.
I want to do a group by ticker then count all the times that Actions have sequentially been the value that they are as of the last date, here it's 2022-03-03. ie for this example table it'd be like;
ticker
NumSequentialDaysAction
AAPL
0
CMG
1
GPS
2
Fine to pass in 2022-03-03 as a value, don't need to figure that out on the fly.
Tried something like this
---Table Creation---
CREATE TABLE UserTable
([Date] DATETIME2, [Ticker] varchar(5), [Action] varchar(5))
;
INSERT INTO UserTable
([Date], [Ticker], [Action])
VALUES
('2022-03-01' , 'AAPL' , 'BUY'),
('2022-03-02' , 'AAPL' , 'SELL'),
('2022-03-03' , 'AAPL' , 'BUY'),
('2022-03-01' , 'CMG' , 'SELL'),
('2022-03-02' , 'CMG' , 'HOLD'),
('2022-03-03' , 'CMG' , 'HOLD'),
('2022-03-01' , 'GPS' , 'SELL'),
('2022-03-02' , 'GPS' , 'SELL'),
('2022-03-03' , 'GPS' , 'SELL')
;
---Attempted Solution---
I'm thinking that I need to do a sub query to get the last value and join on itself to get the matching values. Then apply a window function, ordered by date to see that the proceeding value is sequential.
WITH CTE AS (SELECT Date, Ticker, Action,
ROW_NUMBER() OVER (PARTITION BY Ticker, Action ORDER BY Date) as row_num
FROM UserTable)
SELECT Ticker, COUNT(DISTINCT Date) as count_of_days
FROM CTE
WHERE row_num = 1
GROUP BY Ticker;
WITH CTE AS (SELECT Date, Ticker, Action,
DENSE_RANK() OVER (PARTITION BY Ticker ORDER BY Action,Date) as rank
FROM table)
SELECT Ticker, COUNT(DISTINCT Date) as count_of_days
FROM CTE
WHERE rank = 1
GROUP BY Ticker;
You can do this with the help of the LEAD function like so. You didn't specify which RDBMS you're using. This solution works in PostgreSQL:
WITH "withSequential" AS (
SELECT
ticker,
(LEAD("Action") OVER (PARTITION BY ticker ORDER BY date ASC) = "Action") AS "nextDayIsSameAction"
FROM UserTable
)
SELECT
ticker,
SUM(
CASE
WHEN "nextDayIsSameAction" IS TRUE THEN 1
ELSE 0
END
) AS "NumSequentialDaysAction"
FROM "withSequential"
GROUP BY ticker
Here is a way to do this using gaps and islands solution.
Thanks for sharing the create and insert scripts, which helps to build the solution quickly.
dbfiddle link.
https://dbfiddle.uk/rZLDTrNR
with data
as (
select date
,ticker
,action
,case when lag(action) over(partition by ticker order by date) <> action then
1
else 0
end as marker
from usertable
)
,interim_data
as (
select *
,sum(marker) over(partition by ticker order by date) as grp_val
from data
)
,interim_data2
as (
select *
,count(*) over(partition by ticker,grp_val) as NumSequentialDaysAction
from interim_data
)
select ticker,NumSequentialDaysAction
from interim_data2
where date='2022-03-03'
Another option, you could use the difference between two row_numbers approach as the following:
select [Ticker], count(*)-1 NumSequentialDaysAction -- you could use (distinct) to remove duplicate rows
from
(
select *,
row_number() over (partition by [Ticker] order by [Date]) -
row_number() over (partition by [Ticker], [Action] order by [Date]) grp
from UserTable
where [date] <= '2022-03-03'
) RN_Groups
/* get only rows where [Action] = last date [Action] */
where [Action] = (select top 1 [Action] from UserTable T
where T.[Ticker] = RN_Groups.[Ticker] and [date] <= '2022-03-03'
order by [Date] desc)
group by [Ticker], [Action], grp
See demo

Continuous subset and ranking using sql

I have a dataset like below:
Now, I need the output as below:
start_time end_time count
10:01 10:04 3
10:05 10:07 2
For this purpose, I wrote a query but it is not giving me the desired sequence. My query is as below:
with on_off as
(
select time,status,case when status!=lag(status) over(order by time) then 1 else 0 end as continuous_count
from time_status
)
,
grp as
(
select *, row_number() over(partition by continuous_count order by time) rnk from on_off
)
select * from grp order by time
It generates the output as below:
But in the rank section I need something as below:
So, what exactly am I doing wrong here?
Here are the PostgresSQL DDLs:
create table time_status(time varchar(10) null, status varchar(10) null);
INSERT into time_status(time,status) values('10:01','ON');
INSERT into time_status(time,status) values('10:02','ON');
INSERT into time_status(time,status) values('10:03','ON');
INSERT into time_status(time,status) values('10:04','OFF');
INSERT into time_status(time,status) values('10:05','ON');
INSERT into time_status(time,status) values('10:06','ON');
INSERT into time_status(time,status) values('10:07','OFF');
Try this query:
SELECT min(time) as start_time,
max(time) as end_time,
sum(case when status = 'ON' then 1 else 0 end) as cnt
FROM (SELECT time, status,
sum(case when status = 'OFF' then 1 else 0 end)
over (order by time desc) as grp
FROM time_status) _
GROUP BY grp
ORDER BY min(time);
->Fiddle

Add "now" row to duration calculation

I have a query that calculates duration of an incident. However, it doesn't include the current time for the still "open" incidents. I am trying to figure out a way to add that to the below. This is running on Azure SQL 12.0.2000.8. Per the example, Incident 18 and 19 are closed (last record has StatusID<>1), so my current calculation is correct. However, Incident 20 is ongoing (last record has StatusId=1) and needs to calculate the time between the last update and now.
Structure:
CREATE TABLE [dbo].[IncidentActions](
[Id] [INT] IDENTITY(1,1) NOT NULL,
[IncidentId] [INT] NOT NULL,
[ActionDate] [DATETIMEOFFSET](7) NOT NULL,
[Description] [NVARCHAR](MAX) NOT NULL,
[StatusId] [INT] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[IncidentActions] VALUES
( 51, 18, N'2020-03-10T13:39:27.8621563+00:00', N'This is a demo of the app ops incident management portal', 1 ),
( 52, 18, N'2020-03-10T13:41:42.4306254+00:00', N'Superfast update we''re on it', 1 ),
( 53, 18, N'2020-03-10T13:42:19.0766735+00:00', N'Found a workaround', 1 ),
( 55, 18, N'2020-03-10T13:44:05.7958553+00:00', N'Suspending for now', 2 ),
( 56, 18, N'2020-03-10T13:44:49.732564+00:00', N'No longer suspended', 1 ),
( 57, 18, N'2020-03-10T13:45:09.8056202+00:00', N'All sorted', 3 ),
( 58, 19, N'2020-03-11T14:47:05.6968653+00:00', N'This is just a test', 1 ),
( 59, 19, N'2020-03-11T14:51:20.4522014+00:00', N'Found workaround and root cause, not yet fixed', 1 ),
( 60, 19, N'2020-03-11T14:52:34.857061+00:00', N'Networking issues, updates suspended', 2 ),
( 61, 19, N'2020-03-11T14:54:48.2262037+00:00', N'Network issue resolved, full functionality restored', 3 ),
( 62, 20, N'2020-03-12T10:49:11.5595048+00:00', N'There is an ongoing issue', 1 ),
( 63, 20, N'2020-03-12T11:29:37.9376805+00:00', N'This incident is ongoing....', 1 )
GO
CREATE TABLE [dbo].[IncidentStatuses](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Description] [nvarchar](500) NOT NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[IncidentStatuses] VALUES
( 1, N'OPEN' ), ( 2, N'SUSPENDED' ), ( 3, N'CLOSED' )
GO
Query:
WITH allActions
AS (SELECT IncidentId,
ActionDate,
IncidentStatuses.Description,
ROW_NUMBER() OVER (PARTITION BY IncidentId ORDER BY ActionDate) AS rowNum
FROM IncidentActions
INNER JOIN dbo.IncidentStatuses ON IncidentStatuses.Id = IncidentActions.StatusId
)
,actionPeriods
AS (SELECT firstAction.IncidentId,
firstAction.Description StartStatus,
secondAction.Description EndStatus,
DATEDIFF(SECOND, firstAction.ActionDate, secondAction.ActionDate) SecondsElapsed
FROM allActions firstAction
INNER JOIN allActions secondAction ON firstAction.rowNum +1 = secondAction.rowNum --the next action
AND firstAction.IncidentId = secondAction.IncidentId --for the same incident
)
SELECT
actionPeriods.IncidentId,
SUM(CASE WHEN actionPeriods.StartStatus = 'OPEN' THEN actionPeriods.SecondsElapsed ELSE 0 END) SecondsActive,
SUM(CASE WHEN actionPeriods.StartStatus <> 'OPEN' THEN actionPeriods.SecondsElapsed ELSE 0 END) SecondsInactive,
SUM(actionPeriods.SecondsElapsed) SecondsElapsed
FROM actionPeriods
GROUP BY actionPeriods.IncidentId
GO
Using a CTE is quit overkill. You could use tsql windowing functions. In this case the lag and or lead functions. I have added a sample code based on your tables.
select
IncidentId,
StatusId,actiondate ,
lag(actiondate) over (partition by incidentid order by incidentid, actiondate) as previousrow,
coalesce(
lead(actiondate) over (partition by incidentid order by incidentid, actiondate),
case
when (max(actiondate) over (partition by incidentid order by incidentid) = actiondate) and (statusid = 3) then actiondate
when (max(actiondate) over (partition by incidentid order by incidentid) = actiondate) and (statusid = 1) then convert([DATETIMEOFFSET](7),getdate())
end
) as nextrow
from
dbo.IncidentActions
order by
IncidentId, ActionDate
Applying LEAD instead of a self-join based on Row_number:
WITH allPeriods AS
(
SELECT IncidentId,
Lead(ActionDate) Over (PARTITION BY IncidentId ORDER BY ActionDate DESC) AS ActionDate,
Lead(st.Description) Over (PARTITION BY IncidentId ORDER BY ActionDate DESC) AS StartStatus,
st.Description AS EndStatus,
CASE -- return NOW if the last row is "open"
WHEN Row_Number() Over (PARTITION BY IncidentId ORDER BY ActionDate DESC) = 1
AND StatusId = 1
THEN getdate()
ELSE ActionDate
END AS nextDate
FROM IncidentActions AS act
JOIN dbo.IncidentStatuses AS st
ON st.Id = act.StatusId
),
elapsed AS
( SELECT *,
DATEDIFF(SECOND, ActionDate, nextDate) AS SecondsElapsed
FROM allPeriods
)
SELECT
IncidentId,
Sum(CASE WHEN StartStatus = 'OPEN' THEN SecondsElapsed ELSE 0 END) SecondsActive,
Sum(CASE WHEN StartStatus <> 'OPEN' THEN SecondsElapsed ELSE 0 END) SecondsInactive,
Sum(SecondsElapsed) SecondsElapsed
FROM elapsed
WHERE ActionDate IS NOT NULL
GROUP BY IncidentId
See fiddle
For the curious, the final version, courtesy of Remko
WITH actionPeriods /* This one determines the elapsed time between actions */
AS (SELECT IncidentId,
IncidentStatuses.Description StatusDesc,
/* LAG(ActionDate) OVER (PARTITION BY IncidentId ORDER BY IncidentId, ActionDate) AS previousrow, */
DATEDIFF(SECOND,ActionDate,
COALESCE(LEAD(ActionDate) OVER (PARTITION BY IncidentId ORDER BY IncidentId, ActionDate), /* Lead gets the next action */
CASE /* If the next aciton is NULL, then get either the current time for active, or the original time for closed */
WHEN (MAX(ActionDate) OVER (PARTITION BY IncidentId ORDER BY IncidentId) = ActionDate) AND (StatusId = (SELECT Id FROM dbo.IncidentStatuses WHERE Description = 'CLOSED')) THEN ActionDate
WHEN (MAX(ActionDate) OVER (PARTITION BY IncidentId ORDER BY IncidentId) = ActionDate) AND (StatusId <> (SELECT Id FROM dbo.IncidentStatuses WHERE Description = 'CLOSED')) THEN GETUTCDATE()
END
)
) SecondsElapsed
FROM dbo.IncidentActions
INNER JOIN dbo.IncidentStatuses ON IncidentStatuses.Id = IncidentActions.StatusId
)
SELECT
actionPeriods.IncidentId,
SUM(CASE WHEN actionPeriods.StatusDesc = 'OPEN' THEN actionPeriods.SecondsElapsed ELSE 0 END) SecondsActive, /* This counts periods that are active */
SUM(CASE WHEN actionPeriods.StatusDesc = 'SUSPENDED' THEN actionPeriods.SecondsElapsed ELSE 0 END) SecondsInactive, /* This count periods that are inactive */
SUM(CASE WHEN actionPeriods.StatusDesc IN ('OPEN','SUSPENDED') THEN actionPeriods.SecondsElapsed ELSE 0 END) SecondsElapsed /* We don't want periods that were CLOSED, as the ticket was not elapsing during that time */
FROM actionPeriods
GROUP BY actionPeriods.IncidentId

SQL - counts for two days

I have a table like
CREATE TABLE [dbo].[Log](
[Id] [BIGINT] IDENTITY(1,1) NOT NULL,
[GuidId] [UNIQUEIDENTIFIER] NOT NULL,
[TableName] [VARCHAR](50) NULL,
[CreatedDate] [DATETIME] NULL,
[Operation] [VARCHAR](50) NULL,
[Status] [VARCHAR](50) NULL,
[UserId] [VARCHAR](50) NOT NULL)
Am trying to get query like
SELECT TOP 25 UserId, TableName, Operation, COUNT(1) Records
FROM dbo.Log
WHERE CreatedDate > GETDATE() - 1
AND Status='failed'
GROUP BY UserId, TableName,Operation
I need to add another column to have output of count that has GetDate() - 7 criteria too in the same select.
Share some thoughts
Something like this?
SELECT TOP 25
UserId
, TableName
, Operation
, SUM(CASE WHEN CreatedDate > GETDATE() - 1 THEN 1 ELSE 0 END) Records1Day
, COUNT(1) Records7Days
FROM dbo.Log
WHERE
CreatedDate > GETDATE() - 7
AND Status='failed'
GROUP BY
UserId
, TableName
, Operation
If I understand correctly, you want the most recent UserID fails that had a MAX(CreatedDate) > yesterday. The subquery dT, below does this using MAX(CreatedDate) for the user ID, and HAVING to filter. This is Ordered in descending faildate order. I subqueried because you added the condition that you want to know the date for 7 days prior to this, and the receiving query does this... using DATEADD to subtract 7 days from the [Most Recent UserID Failure] date.
SELECT dT.UserId
,dT.[Most Recent UserID Failure]
,DATEADD(DAY, -7, dT.[Most Recent UserID Failure]) AS [7 Days ago]
,(SELECT COUNT (*)
FROM #Log
WHERE UserId = dT.UserID
AND CreatedDate >= DATEADD(DAY, -7, CAST(dT.[Most Recent UserID Failure] as date))
AND CreatedDate <= dT.[Most Recent UserID Failure]
AND Status = 'failed'
) AS [Num Fails]
FROM (
SELECT TOP 25
UserID
,MAX(CreatedDate) AS [Most Recent UserID Failure]
FROM Log
WHERE Status = 'failed'
GROUP BY UserID
HAVING MAX(CreatedDate) > GETDATE() - 1
ORDER BY [Most Recent UserID Failure] DESC
) AS dT

SQL Server: query to get the data between two values from same columns and calculate time difference

I have a requirement to get the number of hours between two values, say 20 and 25 or above (this will be user input values and not fixed). Below is the table with sample data.
Consider in the table on 01-09-2016 08:40 value_ID is 25 and it reaches back to 20 on 02-09-2016 13:20, I need to consider the number of hours between these two range ie 12 hours and 40 min it is .. Similarly 04-09-2016 13:20 it reached 26.3 (which is above 25 ) and '06-09-2016 16:20' reached 19.3 (below 20) and number of hours is 45 hours. I tried creating a function, however it's not working..
CODE TO CREATE TABLE:
CREATE TABLE [dbo].[NumOfHrs](
[ID] [float] NULL,
[Date] [datetime] NULL,
[Value_ID] [float] NULL
) ON [PRIMARY]
CODE to insert data :
INSERT INTO [dbo].[NumOfHrs]
([ID]
,[Date]
,[Value_ID])
VALUES
(112233,'8-31-2016 08:20:00',19.2),
(112233,'9-01-2016 08:30:00',24),
(112233,'9-01-2016 08:40:00',25),
(112233,'9-01-2016 09:20:00',26),
(112233,'9-02-2016 10:20:00',27),
(112233,'9-02-2016 10:20:00',24),
(112233,'9-02-2016 10:20:00',23),
(112233,'9-02-2016 11:20:00',22),
(112233,'9-02-2016 12:20:00',21),
(112233,'9-02-2016 13:20:00',20),
(112233,'9-03-2016 13:20:00',19.8),
(112233,'9-04-2016 13:20:00',21),
(112233,'9-04-2016 14:20:00',24),
(112233,'9-04-2016 16:20:00',24.6),
(112233,'9-04-2016 19:20:00',26.3),
(112233,'9-04-2016 23:20:00',27),
(112233,'9-05-2016 00:20:00',22),
(112233,'9-06-2016 16:20:00',19.3),
(112233,'9-07-2016 00:20:00',22),
(112233,'9-08-2016 00:20:00',21),
(112233,'9-09-2016 00:20:00',23),
(445566,'9-10-2016 00:20:00',24),
(445566,'9-11-2016 00:20:00',25),
(445566,'9-12-2016 00:20:00',26),
(445566,'9-13-2016 00:20:00',24),
(445566,'9-14-2016 00:20:00',23),
(445566,'9-15-2016 00:20:00',24),
(445566,'9-16-2016 00:20:00',21),
(445566,'9-17-2016 00:20:00',20),
(445566,'9-18-2016 00:20:00',18.5),
(445566,'9-19-2016 00:20:00',17)
image of the table:
Well, I couldn't think of anything simpler. Here's my try to solve the problem:
;with NumOfHrs_rn as (
select id, [Date], Value_ID,
row_number() over (partition by id order by [date]) AS rn
from [dbo].[NumOfHrs]
), NumOfHrs_lag as (
select t1.id, t1.[date],
t2.Value_ID as prev_value,
t1.Value_ID as curr_value
from NumOfHrs_rn as t1
-- get previous value (lag)
join NumOfHrs_rn as t2 on t1.id = t2.id and t1.rn = t2.rn + 1
), NumOfHrs_flag as (
select id, [Date], prev_value, curr_value,
case
when curr_value >= 25 and prev_value < 25 then 'start'
when curr_value <= 20 and prev_value > 20 then 'stop'
else 'ignore'
end as flag
from NumOfHrs_lag
), NumOfHrs_grp as (
select id, [Date], curr_value, flag,
row_number() over (partition by id order by [Date]) -
case flag
when 'start' then 0
when 'stop' then 1
end as grp
from NumOfHrs_flag
where flag in ('start', 'stop')
)
select min([Date]) AS 'start', max([Date]) as 'stop'
from NumOfHrs_grp
group by id, grp
order by min([Date])
Output:
start stop
------------------------------------------------
2016-09-01 08:40:00.000 2016-09-02 13:20:00.000
2016-09-04 19:20:00.000 2016-09-06 16:20:00.000
2016-09-11 00:20:00.000 2016-09-17 00:20:00.000
You can manipulate the above query in order to get the time difference expressed in hours/minutes/seconds format.
Demo here