select a chain of records using mssql

select a chain of records using mssql - sql

I have a table with records in TimeLines, I need to get rows that form a chain of 45 minutes set.
1|2016-01-01 00:00
2|2016-01-01 00:30
3|2016-01-01 00:45
4|2016-01-01 01:00
How I can find 2nd row depending from it time, cause 2nd, 3rd and 4th rows are indissoluble 15 minutes chain of timeline for 45 min set?
1st and 2nd is not okay, cause interval between timelines is 30 min.
2nd, 3rd and 4th rows are consistent chain of timeline.
2nd row plus 15 min - okay. cause existed 3rd row with that time.
3rd row plus 15 min - okay. cause existed 4th row with that time.
as result i have 45 min consistent timeline chain.
1row plus 15 min - not okay. cause 00:15 time with date not existed.

Try this
DECLARE #Tbl TABLE (Id INT, StartDate DATETIME)
INSERT INTO #Tbl
VALUES
(1,'2016-01-01 00:00'),
(2,'2016-01-01 00:30'),
(3,'2016-01-01 00:45'),
(4,'2016-01-01 01:00')
;WITH CTE
AS
(
SELECT
Id ,
StartDate,
ROW_NUMBER() OVER (ORDER BY Id) AS RowId
FROM
#Tbl
)
SELECT
CurRow.*,
CASE
WHEN
DATEDIFF(MINUTE, CurRow.StartDate, NextRow.StartDate ) = 15 OR
DATEDIFF(MINUTE, PrevRow.StartDate, CurRow.StartDate ) = 15
THEN '15 MIN'
ELSE 'NO' END Flag
FROM
CTE CurRow LEFT JOIN
(SELECT *, C.RowId - 1 AS TmpRowId FROM CTE C) NextRow ON CurRow.RowId = NextRow.TmpRowId LEFT JOIN
(SELECT *, C.RowId + 1 AS TmpRowId FROM CTE C) PrevRow ON CurRow.RowId = PrevRow.TmpRowId
OUTPUT:
Id StartDate RowId Flag
1 2016-01-01 00:00:00.000 1 NO
2 2016-01-01 00:30:00.000 2 15 MIN
3 2016-01-01 00:45:00.000 3 15 MIN
4 2016-01-01 01:00:00.000 4 15 MIN

If I understand you correctly, you can use LEAD/LAG:
WITH Src AS
(
SELECT * FROM (VALUES
(1,'2016-01-01 00:00'),
(2,'2016-01-01 00:30'),
(3,'2016-01-01 00:45'),
(4,'2016-01-01 01:00')) T(ID, [Date])
)
SELECT *, CASE WHEN LEAD([Date]) OVER (ORDER BY ID)=DATEADD(MINUTE, 15, [Date])
OR LAG([Date]) OVER (ORDER BY ID)=DATEADD(MINUTE, -15, [Date])
THEN 'Chained' END [Status]
FROM Src
It produces:
ID Date Status
-- ---- ------
1 2016-01-01 00:00 NULL
2 2016-01-01 00:30 Chained
3 2016-01-01 00:45 Chained
4 2016-01-01 01:00 Chained

You can do this with OUTER APPLY and tricky ROW_NUMBER():
;WITH TimeLines AS ( --This CTE is similar to your table
SELECT *
FROM (VALUES
(1, '2016-01-01 00:00'),(2, '2016-01-01 00:30'),
(3, '2016-01-01 00:45'),(4, '2016-01-01 01:00'),
(5, '2016-01-01 01:05'),(6, '2016-01-01 01:07'),
(7, '2016-01-01 01:15'),(8, '2016-01-01 01:30'),
(9, '2016-01-01 01:45'),(10, '2016-01-01 02:00')
) as t(id, datum)
)
, cte AS (
SELECT t.id,
t.datum,
CASE WHEN ISNULL(DATEDIFF(MINUTE,t1.datum,t.datum),0) != 15 THEN DATEDIFF(MINUTE,t.datum,t2.datum) ELSE 15 END as i
FROM TimeLines t --in this cte with the help of
OUTER APPLY ( --OUTER APPLY we are getting next and previous dates to compare them
SELECT TOP 1 *
FROM TimeLines
WHERE t.datum > datum
ORDER BY datum desc) t1
OUTER APPLY (
SELECT TOP 1 *
FROM TimeLines
WHERE t.datum < datum
ORDER BY datum asc) t2
)
SELECT *, --this is final select to get rows you need with chaines
(ROW_NUMBER() OVER (ORDER BY (SELECT 1))+2)/3 as seq
FROM cte
WHERE i = 15
Output:
id datum i seq
2 2016-01-01 00:30 15 1
3 2016-01-01 00:45 15 1
4 2016-01-01 01:00 15 1
7 2016-01-01 01:15 15 2
8 2016-01-01 01:30 15 2
9 2016-01-01 01:45 15 2
10 2016-01-01 02:00 15 3

Related

Compare date in multiple rows and calculate the downtime

I'm trying to calculate the downtime for a train from a service record, below is a sample scenario
There can be multiple jobs running simultaneous for a train which can overlap at times
For:
Job_number 1 the date diff between the work start and end date is 360 Minute
Job_number 2 the date diff between the work start and end date is 60 Minute but this overlap with Job_number 1 so we shouldn't consider this
Job_number 3 the date diff between the work start and end date is 45 Minute but this partially overlap with Job_number 1 so we should consider only 10 Minute
So the actual down time should be 360 Minute (Job 1) + 0 Minute (Job 2) + 10 Minute (Job 3) = 370 Minute
My desired output is :-
I'm having 20 trains as of now for which I need to calculate the downtime as above
How do I do this?
Sample Data script:
CREATE TABLE [dbo].[tb_ServiceMemo](
[Job_Number] [nvarchar](500) NULL,
[Train_Number] [nvarchar](500) NULL,
[Work_Start_Date] [datetime] NULL,
[Work_Completed_Date] [datetime] NULL
) ON [PRIMARY]
INSERT INTO [dbo].[tb_ServiceMemo]
VALUES (1,1,'01-08-2018 12:35','01-08-18 18:35'),
(2,1,'01-08-2018 14:20','01-08-18 15:20'),
(3,1,'01-08-2018 18:00','01-08-18 18:45')

This is a gaps-and-islands problem, but it is tricky because it has start and end times.
The idea for the solution is to determine when an outage starts. What is the characteristic? Well, the period starts at a time where there is no overlap with preceding work. The tricky part is that more than one "work" effort could start at the same time (although your data does not show this).
Once you know the time when an outage starts, you can use a cumulative sum to assign a group to each record and then simply aggregate by that group (and other information).
The following query should do what you want:
with starts as (
select sm.*,
(case when exists (select 1
from tb_ServiceMemo sm2
where sm2.Train_Number = sm.Train_Number and
sm2.Work_Start_Date < sm.Work_Start_Date and
sm2.Work_Completed_Date >= sm.Work_Start_Date
)
then 0 else 1
end) as isstart
from tb_ServiceMemo sm
)
select Train_Number, min(Work_Start_Date) as outage_start_date, max(Work_Completed_Date) as outage_end_date,
datediff(minute, min(Work_Start_Date), max(Work_Completed_Date))
from (select s.*, sum(isstart) over (partition by Train_Number order by Work_Start_Date) as grp
from starts s
) s
group by Train_Number, grp;
In this db<>fiddle, I added a few more rows to show how the code works in different scenarios.

This is a Gaps and Islands in Sequences problem.
You can try to use recursive CTE, get the minute during every row.
then use every MAX and MIN DateTime to calculate the result.
;WITH CTE AS (
SELECT [Train_Number], [Work_Start_Date] ,[Work_Completed_Date]
FROM [tb_ServiceMemo]
UNION ALL
SELECT [Train_Number], DATEADD(minute,1,[Work_Start_Date]) ,[Work_Completed_Date]
FROM CTE
WHERE DATEADD(minute,1,[Work_Start_Date]) <= [Work_Completed_Date]
),CTE2 AS (
SELECT DISTINCT Train_Number,
Work_Start_Date,
MAX(Work_Completed_Date) OVER(PARTITION BY Train_Number ORDER BY Work_Completed_Date DESC) MAX_Time
FROM CTE
),CTE_RESULT AS (
SELECT *,datediff(mi,MAX_Time,Work_Start_Date) - row_number() over(PARTITION BY Train_Number ORDER BY Work_Start_Date) grp
FROM CTE2
)
SELECT Train_Number,sum(time_diff)
FROM (
SELECT Train_Number,DATEDIFF(MI,MIN(Work_Start_Date),MAX(Work_Start_Date)) time_diff
FROM CTE_RESULT
GROUP BY Train_Number,grp
)t1
GROUP BY Train_Number
option ( MaxRecursion 0 );
sqlfiddle

This is the infamous gaps and islands problem with dates. The following is a solution that uses a recursive CTE. It might be a little tough to understand if you aren't used to working with them, I commented all parts that might need clarifying.
I also added a few more examples to contemplate different scenarios, such as different days on periods and overlapping times exactly at the start/end.
Example setup:
IF OBJECT_ID('tempdb..#tb_ServiceMemo') IS NOT NULL
DROP TABLE #tb_ServiceMemo
CREATE TABLE #tb_ServiceMemo(
Job_Number INT, -- This is an INT not VARCHAR!! (even the name says so)
Train_Number INT, -- This one also!!
Work_Start_Date DATETIME,
Work_Completed_Date DATETIME)
INSERT INTO #tb_ServiceMemo (
Job_Number,
Train_Number,
Work_Start_Date,
Work_Completed_Date)
VALUES
-- Total time train 1: 6h 10m (370m)
(1,1,'2018-08-01 12:35','2018-08-01 18:35'), -- Make sure to write date literals in ISO format (yyyy-MM-dd) to avoid multiple interpretations
(2,1,'2018-08-01 14:20','2018-08-01 15:20'),
(3,1,'2018-08-01 18:00','2018-08-01 18:45'),
-- Total time train 2: 2h (120m)
(4,2,'2018-08-01 12:00','2018-08-01 12:10'),
(5,2,'2018-08-01 12:15','2018-08-01 12:20'),
(6,2,'2018-08-01 13:15','2018-08-01 13:45'),
(9,2,'2018-08-01 13:45','2018-08-01 15:00'),
-- Total time train 3: 3h 45m (225m)
(7,3,'2018-08-01 23:30','2018-08-02 00:30'),
(8,3,'2018-08-02 00:15','2018-08-02 03:15'),
-- Total time train 4: 2d 8h 15m (3375m)
(10,4,'2018-08-01 23:00','2018-08-03 23:00'),
(11,4,'2018-08-02 00:15','2018-08-04 07:15')
The solution:
;WITH TimeLapses AS
(
-- Recursive Anchor: Find the minimum Jobs for each train that doesn't overlap with previous Jobs
SELECT
InitialJobNumber = T.Job_Number,
JobNumber = T.Job_Number,
TrainNumber = T.Train_Number,
IntervalStart = T.Work_Start_Date,
IntervalEnd = T.Work_Completed_Date,
JobExtensionPath = CONVERT(VARCHAR(MAX), T.Job_Number), -- Will store the chained jobs together for clarity
RecursionLevel = 1
FROM
#tb_ServiceMemo AS T
WHERE
NOT EXISTS (
SELECT
'Job doesn''t overlap with previous Jobs (by train)'
FROM
#tb_ServiceMemo AS S
WHERE
S.Train_Number = T.Train_Number AND
S.Job_Number < T.Job_Number AND
S.Work_Completed_Date >= T.Work_Start_Date AND -- Conditions for the periods to overlap
S.Work_Start_Date <= T.Work_Completed_Date)
UNION ALL
-- Recursive Union: Chain overlapping Jobs by train and keep intervals boundaries (min & max)
SELECT
InitialJobNumber = L.InitialJobNumber,
JobNumber = T.Job_Number,
TrainNumber = L.TrainNumber,
IntervalStart = CASE -- Minimum of both starts
WHEN L.IntervalStart <= T.Work_Start_Date THEN L.IntervalStart
ELSE T.Work_Start_Date END,
IntervalEnd = CASE -- Maximum of both ends
WHEN L.IntervalEnd >= T.Work_Completed_Date THEN L.IntervalEnd
ELSE T.Work_Completed_Date END,
JobExtensionPath = L.JobExtensionPath + '->' + CONVERT(VARCHAR(MAX), T.Job_Number),
RecursionLevel = L.RecursionLevel + 1
FROM
TimeLapses AS L -- Recursive CTE!
INNER JOIN #tb_ServiceMemo AS T ON
L.TrainNumber = T.Train_Number AND
T.Work_Completed_Date >= L.IntervalStart AND -- Conditions for the periods to overlap
T.Work_Start_Date <= L.IntervalEnd
WHERE
L.JobNumber < T.Job_Number -- Prevent joining in both directions (that would be "<>") to avoid infinite loops
),
MaxRecursionLevelByTrain AS
(
/*
Max recursion level will hold the longest interval for each train, as there might be recursive paths that skips some jobs. For example: Train 1's job 1 will
join with Job 2 and Job 3 on the first recursive level, then Job 2 will join with Job 3 on the next recursion. The higher the recursion level the more Jobs we
are taking into account for the longest interval.
We also need to group by InitialJobNumber as there might be different, idependent gaps for each train.
*/
SELECT
TrainNumber = T.TrainNumber,
InitialJobNumber = T.InitialJobNumber,
MaxRecursionLevel = MAX(T.RecursionLevel)
FROM
TimeLapses AS T
GROUP BY
T.TrainNumber,
T.InitialJobNumber
),
ExpandedLapses AS
(
SELECT
TrainNumber = T.TrainNumber,
InitialJobNumber = M.InitialJobNumber,
IntervalStart = T.IntervalStart,
IntervalEnd = T.IntervalEnd,
DownTime = DATEDIFF(MINUTE, T.IntervalStart, T.IntervalEnd),
JobExtensionPath = T.JobExtensionPath,
RecursionLevel = T.RecursionLevel
FROM
MaxRecursionLevelByTrain AS M
INNER JOIN TimeLapses AS T ON
M.TrainNumber = T.TrainNumber AND
M.MaxRecursionLevel = T.RecursionLevel AND
M.InitialJobNumber = T.InitialJobNumber
)
SELECT
TrainNumber = E.TrainNumber,
TotalDownTime = SUM(DownTime)
FROM
ExpandedLapses AS E
GROUP BY
E.TrainNumber
And these are the partial results from each CTE, so you can see each step:
TimeLapses:
InitialJobNumber JobNumber TrainNumber IntervalStart IntervalEnd JobExtensionPath RecursionLevel
1 1 1 2018-08-01 12:35:00.000 2018-08-01 18:35:00.000 1 1
1 2 1 2018-08-01 12:35:00.000 2018-08-01 18:35:00.000 1->2 2
1 3 1 2018-08-01 12:35:00.000 2018-08-01 18:45:00.000 1->3 2
1 3 1 2018-08-01 12:35:00.000 2018-08-01 18:45:00.000 1->2->3 3
4 4 2 2018-08-01 12:00:00.000 2018-08-01 12:10:00.000 4 1
5 5 2 2018-08-01 12:15:00.000 2018-08-01 12:20:00.000 5 1
6 6 2 2018-08-01 13:15:00.000 2018-08-01 13:45:00.000 6 1
6 9 2 2018-08-01 13:15:00.000 2018-08-01 15:00:00.000 6->9 2
7 8 3 2018-08-01 23:30:00.000 2018-08-02 03:15:00.000 7->8 2
7 7 3 2018-08-01 23:30:00.000 2018-08-02 00:30:00.000 7 1
10 10 4 2018-08-01 23:00:00.000 2018-08-03 23:00:00.000 10 1
10 11 4 2018-08-01 23:00:00.000 2018-08-04 07:15:00.000 10->11 2
MaxRecursionLevelByTrain:
TrainNumber InitialJobNumber MaxRecursionLevel
1 1 3
2 4 1
2 5 1
2 6 2
3 7 2
4 10 2
ExtendedLapses:
TrainNumber InitialJobNumber IntervalStart IntervalEnd DownTime JobExtensionPath RecursionLevel
1 1 2018-08-01 12:35:00.000 2018-08-01 18:45:00.000 370 1->2->3 3
2 4 2018-08-01 12:00:00.000 2018-08-01 12:10:00.000 10 4 1
2 5 2018-08-01 12:15:00.000 2018-08-01 12:20:00.000 5 5 1
2 6 2018-08-01 13:15:00.000 2018-08-01 15:00:00.000 105 6->9 2
3 7 2018-08-01 23:30:00.000 2018-08-02 03:15:00.000 225 7->8 2
4 10 2018-08-01 23:00:00.000 2018-08-04 07:15:00.000 3375 10->11 2
Final Result:
TrainNumber TotalDownTime
1 370
2 120
3 225
4 3375
A few things worth mentioning:
While this solution will definitely be faster than using a cursor, it might not be the best one available, specially if you have a huge dataset (more than 100k records). There is room for improving performance.
You might benefit from a index on #tb_ServiceMemo (Train_Number, Job_Number, Work_Start_Date) to speed up the query.
You might need to add OPTION (MAXRECURSION N) at the end of the SELECT statement, being N the max recursion level you want to try. Default is 100, so if there are more than 100 periods that chain together for a particular train, an error message will pop up. You can use 0 as N for unlimited.
Make sure that every end time is higher than the start time, and that the job numbers don't repeat, at least by each train.

Can you try this one ? I added other test case to besure but I think it's OK. I also think there is more simple
INSERT INTO [dbo].[tb_ServiceMemo]
SELECT 1, 1, CONVERT(DATETIME, '2018-08-01 09:35:00', 120), CONVERT(DATETIME, '2018-08-01 12:45:00', 120) union
SELECT 2, 1, CONVERT(DATETIME, '2018-08-01 12:35:00', 120), CONVERT(DATETIME, '2018-08-01 18:35:00', 120) union
SELECT 3, 1, CONVERT(DATETIME, '2018-08-01 14:20:00', 120), CONVERT(DATETIME, '2018-08-01 15:20:00', 120) union
SELECT 4, 1, CONVERT(DATETIME, '2018-08-01 18:00:00', 120), CONVERT(DATETIME, '2018-08-01 18:45:00', 120) union
SELECT 5, 1, CONVERT(DATETIME, '2018-08-01 19:00:00', 120), CONVERT(DATETIME, '2018-08-01 19:45:00', 120)
SELECT [Train_Number], SUM(DATEDIFF(MINUTE, T.[Work_Start_Date], T.Work_Completed_Date)) as Delay
FROM (
SELECT
[Job_Number],
[Train_Number],
CASE
WHEN EXISTS(SELECT * FROM [tb_ServiceMemo] T3 WHERE T1.[Work_Start_Date] BETWEEN T3.[Work_Start_Date] AND T3.[Work_Completed_Date] AND T1.[Job_Number] <> T3.[Job_Number] AND T1.Train_Number = T3.Train_Number)
THEN (SELECT MAX(T3.[Work_Completed_Date]) FROM [tb_ServiceMemo] T3 WHERE T1.[Work_Start_Date] BETWEEN T3.[Work_Start_Date] AND T3.[Work_Completed_Date] AND T1.[Job_Number] <> T3.[Job_Number] AND T1.Train_Number = T3.Train_Number)
ELSE [Work_Start_Date] END as [Work_Start_Date],
[Work_Completed_Date]
FROM [tb_ServiceMemo] T1
WHERE NOT EXISTS( -- To kick off the ignored case
SELECT T2.*
FROM [tb_ServiceMemo] T2
WHERE T2.[Work_Start_Date] < T1.[Work_Start_Date] AND T2.[Work_Completed_Date] > T1.[Work_Completed_Date]
)
) as T
GROUP BY [Train_Number]
The idea is to :
ignore the result contained into another
rewrite the start date value of each rown if she is contained into another

Group time series by time intervals (e.g. days) with aggregate of duration

I have a table containing a time series with following information. Each record represents the event of "changing the mode".
Timestamp | Mode
------------------+------
2018-01-01 12:00 | 1
2018-01-01 18:00 | 2
2018-01-02 01:00 | 1
2018-01-02 02:00 | 2
2018-01-04 04:00 | 1
By using the LEAD function, I can create a query with the following result. Now each record contains the information, when and how long the "mode was active".
Please check the 2nd and the 4th record. They "belong" to multiple days.
StartDT | EndDT | Mode | Duration
------------------+------------------+------+----------
2018-01-01 12:00 | 2018-01-01 18:00 | 1 | 6:00
2018-01-01 18:00 | 2018-01-02 01:00 | 2 | 7:00
2018-01-02 01:00 | 2018-01-02 02:00 | 1 | 1:00
2018-01-02 02:00 | 2018-01-04 04:00 | 2 | 50:00
2018-01-04 04:00 | (NULL) | 1 | (NULL)
Now I would like to have a query that groups the data by day and mode and aggregates the duration.
This result table is needed:
Date | Mode | Total
------------+------+-------
2018-01-01 | 1 | 6:00
2018-01-01 | 2 | 6:00
2018-01-02 | 1 | 1:00
2018-01-02 | 2 | 23:00
2018-01-03 | 2 | 24:00
2018-01-04 | 2 | 04:00
I didn't known how to handle the records that "belongs" to multiple days. Any ideas?

create table ChangeMode ( ModeStart datetime2(7), Mode int )
insert into ChangeMode ( ModeStart, Mode ) values
( '2018-11-15T21:00:00.0000000', 1 ),
( '2018-11-16T17:18:19.1231234', 2 ),
( '2018-11-16T18:00:00.5555555', 1 ),
( '2018-11-16T18:00:01.1234567', 2 ),
( '2018-11-16T19:02:22.8888888', 1 ),
( '2018-11-16T20:00:00.9876543', 2 ),
( '2018-11-17T09:00:00.0000000', 1 ),
( '2018-11-17T23:23:23.0230450', 2 ),
( '2018-11-19T17:00:00.0172839', 1 ),
( '2018-11-20T03:07:00.7033077', 2 )
;
with
-- Determine the earliest and latest dates.
-- Cast to date to remove the time portion.
-- Cast results back to datetime because we're going to add hours later.
MinMaxDates
as
(select cast(min(cast(ModeStart as date))as datetime) as MinDate,
cast(max(cast(ModeStart as date))as datetime) as MaxDate from ChangeMode),
-- How many days have passed during that period
Dur
as
(select datediff(day,MinDate,MaxDate) as Duration from MinMaxDates),
-- Create a list of numbers.
-- These will be added to MinDate to get a list of dates.
NumList
as
( select 0 as Num
union all
select Num+1 from NumList,Dur where Num<Duration ),
-- Create a list of dates by adding those numbers to MinDate
DayList
as
( select dateadd(day,Num,MinDate)as ModeDate from NumList, MinMaxDates ),
-- Create a list of day periods
PeriodList
as
( select ModeDate as StartTime,
dateadd(day,1,ModeDate) as EndTime
from DayList ),
-- Use LEAD to get periods for each record
-- Final record would return NULL for ModeEnd
-- We replace that with end of last day
ModePeriodList
as
( select ModeStart,
coalesce( lead(ModeStart)over(order by ModeStart),
dateadd(day,1,MaxDate) ) as ModeEnd,
Mode from ChangeMode, MinMaxDates ),
ModeDayList
as
( select * from ModePeriodList, PeriodList
where ModeStart<=EndTime and ModeEnd>=StartTime
),
-- Keep the later of the mode start time, and the day start time
-- Keep the earlier of the mode end time, and the day end time
ModeDayPeriod
as
( select case when ModeStart>=StartTime then ModeStart else StartTime end as StartTime,
case when ModeEnd<=EndTime then ModeEnd else EndTime end as EndTime,
Mode from ModeDayList ),
SumDurations
as
( select cast(StartTime as date) as ModeDate,
Mode,
DateDiff_Big(nanosecond,StartTime,EndTime)
/3600000000000
as DurationHours from ModeDayPeriod )
-- List the results in order
-- Use MaxRecursion option in case there are more than 100 days
select ModeDate as [Date], Mode, sum(DurationHours) as [Total Duration Hours]
from SumDurations
group by ModeDate, Mode
order by ModeDate, Mode
option (maxrecursion 0)
Result is:
Date Mode Total Duration Hours
---------- ----------- ---------------------------------------
2018-11-15 1 3.00000000000000
2018-11-16 1 18.26605271947221
2018-11-16 2 5.73394728052777
2018-11-17 1 14.38972862361111
2018-11-17 2 9.61027137638888
2018-11-18 2 24.00000000000000
2018-11-19 1 6.99999519891666
2018-11-19 2 17.00000480108333
2018-11-20 1 3.11686202991666
2018-11-20 2 20.88313797008333

you could use a CTE to create a table of days then join the time slots to it
DECLARE #MAX as datetime2 = (SELECT MAX(CAST(Timestamp as date)) MX FROM process);
WITH StartEnd AS (select p1.Timestamp StartDT,
P2.Timestamp EndDT ,
p1.mode
from process p1
outer apply
(SELECT TOP 1 pOP.* FROM
process pOP
where pOP.Timestamp > p1.Timestamp
order by pOP.Timestamp asc) P2
),
CAL AS (SELECT (SELECT MIN(cast(StartDT as date)) MN FROM StartEnd) DT
UNION ALL
SELECT DATEADD(day,1,DT) DT FROM CAL WHERE CAL.DT < #MAX
),
TMS AS
(SELECT CASE WHEN S.StartDT > C.DT THEN S.StartDT ELSE C.DT END AS STP,
CASE WHEN S.EndDT < DATEADD(day,1,C.DT) THEN S.ENDDT ELSE DATEADD(day,1,C.DT) END AS STE
FROM StartEnd S JOIN CAL C ON NOT(S.EndDT <= C.DT OR S.StartDT>= DATEADD(day,1,C.dt))
)
SELECT *,datediff(MI ,TMS.STP, TMS.ste) as x from TMS

The following uses recursive CTE to build a list of dates (a calendar or number table works equally well). It then intersect the dates with date times so that missing dates are populated with matching data. The important bit is that for each row, if start datetime belongs to previous day then it is clamped to 00:00. Likewise for end datetime.
DECLARE #t TABLE (timestamp DATETIME, mode INT);
INSERT INTO #t VALUES
('2018-01-01 12:00', 1),
('2018-01-01 18:00', 2),
('2018-01-02 01:00', 1),
('2018-01-02 02:00', 2),
('2018-01-04 04:00', 1);
WITH cte1 AS (
-- the min and max dates in your data
SELECT
CAST(MIN(timestamp) AS DATE) AS mindate,
CAST(MAX(timestamp) AS DATE) AS maxdate
FROM #t
), cte2 AS (
-- build all dates between min and max dates using recursive cte
SELECT mindate AS day_start, DATEADD(DAY, 1, mindate) AS day_end, maxdate
FROM cte1
UNION ALL
SELECT DATEADD(DAY, 1, day_start), DATEADD(DAY, 2, day_start), maxdate
FROM cte2
WHERE day_start < maxdate
), cte3 AS (
-- pull end datetime from next row into current
SELECT
timestamp AS dt_start,
LEAD(timestamp) OVER (ORDER BY timestamp) AS dt_end,
mode
FROM #t
), cte4 AS (
-- join datetime with date using date overlap query
-- then clamp start datetime to 00:00 of the date
-- and clamp end datetime to 00:00 of next date
SELECT
IIF(dt_start < day_start, day_start, dt_start) AS dt_start_fix,
IIF(dt_end > day_end, day_end, dt_end) AS dt_end_fix,
mode
FROM cte2
INNER JOIN cte3 ON day_end > dt_start AND dt_end > day_start
)
SELECT dt_start_fix, dt_end_fix, mode, datediff(minute, dt_start_fix, dt_end_fix) / 60.0 AS total
FROM cte4
DB Fiddle

Thanks everybody!
The answer from Cato put me on the right track. Here my final solution:
DECLARE #Start AS datetime;
DECLARE #End AS datetime;
DECLARE #Interval AS int;
SET #Start = '2018-01-01';
SET #End = '2018-01-05';
SET #Interval = 24 * 60 * 60;
WITH
cteDurations AS
(SELECT [Timestamp] AS StartDT,
LEAD ([Timestamp]) OVER (ORDER BY [Timestamp]) AS EndDT,
Mode
FROM tblLog
WHERE [Timestamp] BETWEEN #Start AND #End
),
cteTimeslots AS
(SELECT #Start AS StartDT,
DATEADD(SECOND, #Interval, #Start) AS EndDT
UNION ALL
SELECT EndDT,
DATEADD(SECOND, #Interval, EndDT)
FROM cteTimeSlots WHERE StartDT < #End
),
cteDurationsPerTimesplot AS
(SELECT CASE WHEN S.StartDT > C.StartDT THEN S.StartDT ELSE C.StartDT END AS StartDT,
CASE WHEN S.EndDT < C.EndDT THEN S.EndDT ELSE C.EndDT END AS EndDT,
C.StartDT AS Slot,
S.Mode
FROM cteDurations S
JOIN cteTimeslots C ON NOT(S.EndDT <= C.StartDT OR S.StartDT >= C.EndDT)
)
SELECT Slot,
Mode,
SUM(DATEDIFF(SECOND, StartDT, EndDT)) AS Duration
FROM cteDurationsPerTimesplot
GROUP BY Slot, Mode
ORDER BY Slot, Mode;
With the variable #Interval you are able to define the size of the timeslots.
The CTE cteDurations creates a subresult with the durations of all necessary entries by using the TSQL function LEAD (available in MSSQL >= 2012). This will be a lot faster than an OUTER APPLY.
The CTE cteTimeslots generates a list of timeslots with start time and end time.
The CTE cteDurationsPerTimesplot is a subresult with a JOIN between cteDurations and cteTimeslots. This this the magic JOIN statement from Cato!
And finally the SELECT statement will do the grouping and sum calculation per Slot and Mode.
Once again: Thanks a lot to everybody! Especially to Cato! You saved my weekend!
Regards
Oliver

How to add values of weekend and holiday's to the previous working day

I have to add weekend and holiday's value to the previous working day value so that weekend and holiday's should not display in the report but if we don't have previous working day we should simply skip the row as 2018-01-01 skipped in the below output
**DAYS VALUE**
2018-01-01 10 Holiday-1
2018-01-02 20
2018-01-03 30
2018-01-04 40
2018-01-05 50
2018-01-06 60 Saturday
2018-01-07 70 Sunday
2018-01-08 80
2018-01-09 90
2018-01-10 100 Holiday-2
OUTPUT
2018-01-02 20
2018-01-03 30
2018-01-04 40
2018-01-05 180
2018-01-08 80
2018-01-09 190
I am trying with LEAD, LAG, DATEDIFF and in other ways but not getting any solution so please guys help he with this problem.

When there is a row in your Holidays calendar table (I will assume, that weekends are there too), you need to find the max date, prior the current one, for which there is no row in holidays table. Then group by this "real date" and sum the value. Something like this:
declare #t table([DAYS] date, [VALUE] int)
declare #Holidays table([DAYS] date, Note varchar(100))
insert into #t values
('2018-01-01', 10),
('2018-01-02', 20),
('2018-01-03', 30),
('2018-01-04', 40),
('2018-01-05', 50),
('2018-01-06', 60),
('2018-01-07', 70),
('2018-01-08', 80),
('2018-01-09', 90),
('2018-01-10', 100)
insert into #Holidays values
('2018-01-01', 'Holiday-1'),
('2018-01-06', 'Saturday'),
('2018-01-07', 'Sunday'),
('2018-01-10', 'Holiday-2')
;with cte as (
select
IIF(h1.[DAYS] is not null /* i.e. it is a holiday */,
(select max([DAYS])
from #t t2
where t2.[DAYS] < t1.[DAYS] and not exists(select * from #Holidays h2 where h2.[DAYS] = t2.[DAYS])), t1.[DAYS]) as RealDate
, t1.[VALUE]
from #t t1
left join #Holidays h1 on t1.DAYS = h1.[DAYS]
)
select
RealDate
, sum([VALUE]) as RealValue
from cte
where RealDate is not null
group by RealDate

You can do this with cumulative sums (to define groups) and aggregation. Define the groups as the number of non-holidays on or before a given day, then aggregate. This is the same value for a non-holiday followed by a holiday.
Then aggregate:
select max(days) as days, sum(value)
from (select t.*,
sum(case when holiday is null then 1 else 0 end) over (order by days asc) as grp
from t
) t
group by grp;
EDIT:
With a separate holidays table, you just need to add the join:
select max(days) as days, sum(value)
from (select t.*,
sum(case when h.holiday is null then 1 else 0 end) over (order by t.days asc) as grp
from t left join
holidays h
on t.days = h.date
) t
group by grp;

MsSql Compare specific datetimes in sequence based on ID

I have a table where we store our data from a call and it looks like this:
CallID Arrive_Seq DateTime ActivitytypeID
1 1 2018-01-01 05:00:00 1
1 2 2018-01-01 05:00:01 2
1 3 2018-01-01 06:00:00 21
1 4 2018-01-01 06:00:01 28
1 5 2018-01-01 06:00:02 13
1 6 2018-01-01 06:00:03 22
1 7 2018-01-01 06:00:05 29
1 8 2018-01-01 06:05:00 21
1 9 2018-01-01 06:05:01 28
1 10 2018-01-01 06:05:02 13
1 11 2018-01-01 06:05:03 22
1 12 2018-01-01 06:07:45 29
Now I want to select the datediff between ActivitytypeID 21 and 29 in the arrive_sew order. In this example they occur twice (on arrive_seq 3,8 and 7,12). This order is not specific and ActivitytypeID can occur both more and less times in the sequence but they are always connected with eachother. Think of it as ActivitytypeID 21 = 'call started' AND ActivitytypeID = 29 'Call ended'.
In the example the answer whould be:
SELECT DATEDIFF (SECOND, '2018-01-01 06:00:00', '2018-01-01 06:00:05') = 5 -- Compares datetime of arrive_seq 3 and 7
AND
SELECT DATEDIFF (SECOND, '2018-01-01 06:00:05', '2018-01-01 06:07:45') = 460 -- Compares datetime of arrive_seq 21 and 29
Total duration = 465
I have tried with this code but it doesn't work all the time due to row# can change based on arrive_seq and ActivitytypeID
;WITH CallbackDuration AS (
SELECT ROW_NUMBER() OVER(ORDER BY a.time_stamp ASC) AS RowNumber, DATEDIFF(second, a.time_stamp, b.time_stamp) AS 'Duration'
FROM Table a
JOIN Table b on a.call_id = b.call_id
WHERE a.call_id = 1 AND a.activity_type = 21 AND b.activity_type = 29
GROUP BY a.time_stamp, b.time_stamp,a.call_id)
SELECT SUM(Duration) AS 'Duration' FROM CallbackDuration WHERE RowNumber in (1,5,9)

I think this is what you want:
select
call_start,
call_end,
datediff (second, call_start, call_end) as duration
from
(
select
call_timestamp as call_end,
lag(call_timestamp) over (partition by call_id order by call_timestamp) as call_start,
activity_type as call_end_activity,
lag (activity_type) over (partition by call_id order by call_timestamp) as call_start_activity
from
call_log
where
activity_type in (21, 29)
) x
where
call_start_activity = 21;
Result:
call_start call_end duration
----------------------- ----------------------- -----------
2018-01-01 06:00:00.000 2018-01-01 06:00:05.000 5
2018-01-01 06:05:00.000 2018-01-01 06:07:45.000 165
(2 rows affected)
Note that the time of the second call is based on your sample data with start time 2018-01-01 06:05:00

This query seems to return your expected result
declare #x int = 21
declare #y int = 29
;with cte(CallID, Arrive_Seq, DateTime, ActivitytypeID) as (
select
a, b, cast(c as datetime), d
from (values
(1,1,'2018-01-01 05:00:00',1)
,(1,2,'2018-01-01 05:00:01',2)
,(1,3,'2018-01-01 06:00:00',21)
,(1,4,'2018-01-01 06:00:01',28)
,(1,5,'2018-01-01 06:00:02',13)
,(1,6,'2018-01-01 06:00:03',22)
,(1,7,'2018-01-01 06:00:05',29)
,(1,8,'2018-01-01 06:05:00',21)
,(1,9,'2018-01-01 06:05:01',28)
,(1,10,'2018-01-01 06:05:02',13)
,(1,11,'2018-01-01 06:05:03',22)
,(1,12,'2018-01-01 06:07:45',29)
) t(a,b,c,d)
)
select
sum(ss)
from (
select
*, ss = datediff(ss, DateTime, lead(datetime) over (order by Arrive_Seq))
, rn = row_number() over (order by Arrive_Seq)
from
cte
where
ActivitytypeID in (#x, #y)
) t
where
rn % 2 = 1

T/SQL - Group/Multiply records

Source date:
CREATE TABLE #Temp (ID INT Identity(1,1) Primary Key, BeginDate datetime, EndDate datetime, GroupBy INT)
INSERT INTO #Temp
SELECT '2015-06-05 00:00:00.000','2015-06-12 00:00:00.000',7
UNION
SELECT '2015-06-05 00:00:00.000', '2015-06-08 00:00:00.000',7
UNION
SELECT '2015-10-22 00:00:00.000', '2015-10-31 00:00:00.000',7
SELECT *, DATEDIFF(DAY,BeginDate, EndDate) TotalDays FROM #Temp
DROP TABLE #Temp
ID BeginDate EndDate GroupBy TotalDays
1 6/5/15 0:00 6/8/15 0:00 7 3
2 6/5/15 0:00 6/12/15 0:00 7 7
3 10/22/15 0:00 10/31/15 0:00 7 9
Desired Output:
ID BeginDate EndDate GroupBy TotalDays GroupCnt GroupNum
1 6/5/15 0:00 6/8/15 0:00 7 3 1 1
2 6/5/15 0:00 6/12/15 0:00 7 7 1 1
3 10/22/15 0:00 10/29/15 0:00 7 9 2 1
3 10/29/15 0:00 10/31/15 0:00 7 9 2 2
Goal:
Group the records based on ID/BeginDate/EndDate.
Based on the GroupBy number (# of days) and TotalDays (days diff),
if the GroupBy => TotalDays, keep a single group record
else multiply the group records (1 record per GroupBy count) while staying within TotalDays limit.
Apologies if it's confusing but basically, in the above example, there should be one record for each group (ID/BeginDate/EndDate) for the record where days diff b/w Begin/End date = 7 or less (GroupBy).
If the days diff goes above 7 days, create another record (for every additional 7 days diff).
So since 1st two records have days diff of 7 days or less, there's only one record.
The 3rd record has days diff of 9 (7 + 2). Therefore, there should be 2 records (1st for the first 7 days and 2nd for the additional 2 days).
GroupCNT = how many records there're of the grouped records after applying the above records.
GroupNum is basically row number of the group.
GroupBy # can be different for each record. Dataset is huge so performance does matter.
One pattern I was able to figure out was related to the modulus b/w GroupBy and days diff.
When the GroupBy value is < days diff, modulus is always less than GroupBy. When the GroupBy value = days diff, modulus is always 0. And when the GroupBy value > days diff, modulus is always equals GroupBy. I'm not sure if/how to use that to group/multiply records to meet the requirement.
SELECT DISTINCT
ID
, BeginDate
, EndDate
, GroupBy
, DATEDIFF(DAY,BeginDate, EndDate) TotalDays
, CAST(GroupBy as decimal(18,6))%CAST(DATEDIFF(DAY,BeginDate, EndDate) AS decimal(18,6)) Modulus
, CASE WHEN DATEDIFF(DAY,BeginDate, EndDate) <= GroupBy THEN BeginDate END NewBeginDate
, CASE WHEN DATEDIFF(DAY,BeginDate, EndDate) <= GroupBy THEN EndDate END NewEndDate
FROM #Temp
Update:
Forgot to mention/include that the begin/enddate, when the records gets multiplied, will change accordingly. In other words, begin/end date will reflect the GroupBy - desired output shows what I mean more clearly in the 3rd and 4th record.
Also, GroupCnt/GroupNum are not as important to calculate as grouping/multiplying the records.

You could do something like this using a recursive CTE..
;WITH cte AS (
SELECT ID,
BeginDate,
EndDate,
GroupBy,
DATEDIFF(DAY, BeginDate, EndDate) AS TotalDays,
1 AS GroupNum
FROM #Temp
UNION ALL
SELECT ID,
BeginDate,
EndDate,
GroupBy,
TotalDays,
GroupNum + 1
FROM cte
WHERE GroupNum * GroupBy < TotalDays
)
SELECT ID,
BeginDate = CASE WHEN GroupNum = 1 THEN BeginDate
ELSE DATEADD(DAY, GroupBy * (GroupNum - 1), BeginDate)
END ,
EndDate = CASE WHEN TotalDays <= GroupBy THEN EndDate
WHEN DATEADD(DAY, GroupBy * GroupNum, BeginDate) > EndDate THEN EndDate
ELSE DATEADD(DAY, GroupBy * GroupNum, BeginDate)
END ,
GroupBy,
TotalDays,
COUNT(*) OVER (PARTITION BY ID) GroupCnt,
GroupNum
FROM cte
OPTION (MAXRECURSION 0)
the cte builds out a recordset like this.
ID BeginDate EndDate GroupBy TotalDays GroupNum
----------- ----------------------- ----------------------- ----------- ----------- -----------
1 2015-06-05 00:00:00.000 2015-06-08 00:00:00.000 7 3 1
2 2015-06-05 00:00:00.000 2015-06-12 00:00:00.000 7 7 1
3 2015-10-22 00:00:00.000 2015-10-31 00:00:00.000 7 9 1
3 2015-10-22 00:00:00.000 2015-10-31 00:00:00.000 7 9 2
then you just have to take this and use some case statements to determine what the begin and end date should be.
you should end up with
ID BeginDate EndDate GroupBy TotalDays GroupCnt GroupNum
----------- ----------------------- ----------------------- ----------- ----------- ----------- -----------
1 2015-06-05 00:00:00.000 2015-06-08 00:00:00.000 7 3 1 1
2 2015-06-05 00:00:00.000 2015-06-12 00:00:00.000 7 7 1 1
3 2015-10-22 00:00:00.000 2015-10-29 00:00:00.000 7 9 2 1
3 2015-10-29 00:00:00.000 2015-10-31 00:00:00.000 7 9 2 2
since you're using SQL 2012, you can also use the LAG and LEAD functions in your final query.
;WITH cte AS (
SELECT ID,
BeginDate,
EndDate,
GroupBy,
DATEDIFF(DAY, BeginDate, EndDate) AS TotalDays,
1 AS GroupNum
FROM #Temp
UNION ALL
SELECT ID,
BeginDate,
EndDate,
GroupBy,
TotalDays,
GroupNum + 1
FROM cte
WHERE GroupNum * GroupBy < TotalDays
)
SELECT ID,
BeginDate = COALESCE(LAG(BeginDate) OVER (PARTITION BY ID ORDER BY GroupNum) + GroupBy * (GroupNum - 1), BeginDate),
EndDate = COALESCE(LEAD(BeginDate) OVER (PARTITION BY ID ORDER BY GroupNum) + GroupBy * GroupNum, EndDate),
GroupBy,
TotalDays,
COUNT(*) OVER (PARTITION BY ID) GroupCnt,
GroupNum
FROM cte
OPTION (MAXRECURSION 0)

CREATE TABLE dim_number (id INT);
INSERT INTO dim_number VALUES ((0), (1), (2), (3)); -- Populate this to a large number
SELECT
#Temp.Id,
CASE WHEN dim_number.id = 0
THEN #Temp.BeginDate
ELSE DATEADD(DAY, dim_number.id * #Temp.GroupBy, #Temp.BeginDate)
END AS BeginDate,
CASE WHEN dim_number.id = parts.count
THEN #Temp.EndDate
ELSE DATEADD(DAY, (dim_number.id + 1) * #Temp.GroupBy, #Temp.BeginDate)
END AS EndDate,
#Temp.GroupBy AS GroupBy,
DATEDIFF(DAY, #Temp.BeginDate, #Temp.EndDate) AS TotalDays,
parts.count + 1 AS GroupCnt,
dim_number.id + 1 AS GroupNum
FROM
#Temp
CROSS APPLY
(SELECT DATEDIFF(DAY, #Temp.BeginDate, #Temp.EndDate) / #Temp.GroupBy AS count) AS parts
INNER JOIN
dim_number
ON dim_number.id >= 0
AND dim_number.id <= parts.count

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

select a chain of records using mssql - sql

Related

Compare date in multiple rows and calculate the downtime

Group time series by time intervals (e.g. days) with aggregate of duration

How to add values of weekend and holiday's to the previous working day

MsSql Compare specific datetimes in sequence based on ID

T/SQL - Group/Multiply records

Categories

Resources