SQL Server : Gap / Island, datetime, contiguous block 365 day block - sql

I have a table that looks like this:-
tblMeterReadings
id meter period_start period_end amount
1 1 2014-01-01 00:00 2014-01-01 00:29:59 100.3
2 1 2014-01-01 00:30 2014-01-01 00:59:59 50.5
3 1 2014-01-01 01:00 2014-01-01 01:29:59 70.7
4 1 2014-01-01 01:30 2014-01-01 01:59:59 900.1
5 1 2014-01-01 02:00 2014-01-01 02:29:59 400.0
6 1 2014-01-01 02:30 2014-01-01 02:59:59 200.3
7 1 2014-01-01 03:00 2014-01-01 03:29:59 100.8
8 1 2014-01-01 03:30 2014-01-01 03:59:59 140.3
This is a tiny "contiguous block" from '2014-01-01 00:00' to '2014-01-01 3:59:59'.
In the real table there are "contiguous blocks" of years in length.
I need to find the the period_start and period_end of the most recent CONTINUOUS 365 COMPLETE DAYs (fileterd by meter column).
When I say COMPLETE DAYs I mean a day that has entries spanning 00:00 to 23:59.
When I say CONTINUOUS I mean there must be no days missing.
I would like to select all the rows that make up this block of CONTINUOUS COMPLETE DAYs.
I also need an output like:
block_start block_end total_amount_for_block
2013-02-26 00:00 2014-02-26 23:59:59 1034234.5
This is beyond me, so if someone can solve... I will be very impressed.

Since your granularity is 1 second, you need to expand your periods into all the date/times between the start and end at 1 second intervals. To do this you need to cross join with a numbers table (The numbers table is generated on the fly by ranking object ids from an arbitrary system view, I have limited it to TOP 86400 since this is the number of seconds in a day, and you have stated your time periods never span more than one day):
WITH Numbers AS
( SELECT TOP (86400)
Number = ROW_NUMBER() OVER(ORDER BY a.object_id) - 1
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
ORDER BY a.object_id
)
SELECT r.ID, r.meter, dt.[DateTime]
FROM tblMeterReadings r
CROSS JOIN Numbers n
OUTER APPLY
( SELECT [DateTime] = DATEADD(SECOND, n.Number, r.period_start)
) dt
WHERE dt.[DateTime] <= r.Period_End;
You then have your continuous range in which to perform the normal gaps and islands grouping:
WITH Numbers AS
( SELECT TOP (86400)
Number = ROW_NUMBER() OVER(ORDER BY a.object_id) - 1
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
ORDER BY a.object_id
), Grouped AS
( SELECT r.meter,
Amount = CASE WHEN Number = 1 THEN r.Amount ELSE 0 END,
dt.[DateTime],
GroupingSet = DATEADD(SECOND,
-DENSE_RANK() OVER(PARTITION BY r.Meter
ORDER BY dt.[DateTime]),
dt.[DateTime])
FROM tblMeterReadings r
CROSS JOIN Numbers n
OUTER APPLY
( SELECT [DateTime] = DATEADD(SECOND, n.Number, r.period_start)
) dt
WHERE dt.[DateTime] <= r.Period_End
)
SELECT meter,
PeriodStart = MIN([DateTime]),
PeriodEnd = MAX([DateTime]),
Amount = SUM(Amount)
FROM Grouped
GROUP BY meter, GroupingSet
HAVING DATEADD(YEAR, 1, MIN([DateTime])) < MAX([DateTime]);
N.B. Since the join to Number causes amounts to be duplicated, it is necessary to set all duplicates to 0 using CASE WHEN Number = 1 THEN r.Amount ELSE 0 END, i.e only include the amount for the first row for each ID
Removing the Having clause for your sample data will give:
meter | PeriodStart | PeriodEnd | Amount
------+---------------------+---------------------+----------
1 | 2014-01-01 00:00:00 | 2014-01-01 03:59:59 | 1963
Example on SQL Fiddle

You could try this:
Select MIN(period_start) as "block start"
, MAX(period_end) as "block end"
, SUM(amount) as "total amount"
FROM YourTable
GROUP BY datepart(year, period_start)
, datepart(month, period_start)
, datepart(day, period_start)
, datepart(year, period_end)
, datepart(month, period_end)
, datepart(day, period_end)
Having datepart(year, period_start) = datepart(year, period_end)
AND datepart(month, period_start) = datepart(month, period_end)
AND datepart(day, period_start) = datepart(day, period_end)
AND datepart(hour, MIN(period_start)) = 0
AND datepart(minute,MIN(period_start)) = 0
AND datepart(hour, MAX(period_end)) = 23
AND datepart(minute,MIN(period_end)) = 59

Related

Compare date in multiple rows and calculate the downtime

I'm trying to calculate the downtime for a train from a service record, below is a sample scenario
There can be multiple jobs running simultaneous for a train which can overlap at times
For:
Job_number 1 the date diff between the work start and end date is 360 Minute
Job_number 2 the date diff between the work start and end date is 60 Minute but this overlap with Job_number 1 so we shouldn't consider this
Job_number 3 the date diff between the work start and end date is 45 Minute but this partially overlap with Job_number 1 so we should consider only 10 Minute
So the actual down time should be 360 Minute (Job 1) + 0 Minute (Job 2) + 10 Minute (Job 3) = 370 Minute
My desired output is :-
I'm having 20 trains as of now for which I need to calculate the downtime as above
How do I do this?
Sample Data script:
CREATE TABLE [dbo].[tb_ServiceMemo](
[Job_Number] [nvarchar](500) NULL,
[Train_Number] [nvarchar](500) NULL,
[Work_Start_Date] [datetime] NULL,
[Work_Completed_Date] [datetime] NULL
) ON [PRIMARY]
INSERT INTO [dbo].[tb_ServiceMemo]
VALUES (1,1,'01-08-2018 12:35','01-08-18 18:35'),
(2,1,'01-08-2018 14:20','01-08-18 15:20'),
(3,1,'01-08-2018 18:00','01-08-18 18:45')
This is a gaps-and-islands problem, but it is tricky because it has start and end times.
The idea for the solution is to determine when an outage starts. What is the characteristic? Well, the period starts at a time where there is no overlap with preceding work. The tricky part is that more than one "work" effort could start at the same time (although your data does not show this).
Once you know the time when an outage starts, you can use a cumulative sum to assign a group to each record and then simply aggregate by that group (and other information).
The following query should do what you want:
with starts as (
select sm.*,
(case when exists (select 1
from tb_ServiceMemo sm2
where sm2.Train_Number = sm.Train_Number and
sm2.Work_Start_Date < sm.Work_Start_Date and
sm2.Work_Completed_Date >= sm.Work_Start_Date
)
then 0 else 1
end) as isstart
from tb_ServiceMemo sm
)
select Train_Number, min(Work_Start_Date) as outage_start_date, max(Work_Completed_Date) as outage_end_date,
datediff(minute, min(Work_Start_Date), max(Work_Completed_Date))
from (select s.*, sum(isstart) over (partition by Train_Number order by Work_Start_Date) as grp
from starts s
) s
group by Train_Number, grp;
In this db<>fiddle, I added a few more rows to show how the code works in different scenarios.
This is a Gaps and Islands in Sequences problem.
You can try to use recursive CTE, get the minute during every row.
then use every MAX and MIN DateTime to calculate the result.
;WITH CTE AS (
SELECT [Train_Number], [Work_Start_Date] ,[Work_Completed_Date]
FROM [tb_ServiceMemo]
UNION ALL
SELECT [Train_Number], DATEADD(minute,1,[Work_Start_Date]) ,[Work_Completed_Date]
FROM CTE
WHERE DATEADD(minute,1,[Work_Start_Date]) <= [Work_Completed_Date]
),CTE2 AS (
SELECT DISTINCT Train_Number,
Work_Start_Date,
MAX(Work_Completed_Date) OVER(PARTITION BY Train_Number ORDER BY Work_Completed_Date DESC) MAX_Time
FROM CTE
),CTE_RESULT AS (
SELECT *,datediff(mi,MAX_Time,Work_Start_Date) - row_number() over(PARTITION BY Train_Number ORDER BY Work_Start_Date) grp
FROM CTE2
)
SELECT Train_Number,sum(time_diff)
FROM (
SELECT Train_Number,DATEDIFF(MI,MIN(Work_Start_Date),MAX(Work_Start_Date)) time_diff
FROM CTE_RESULT
GROUP BY Train_Number,grp
)t1
GROUP BY Train_Number
option ( MaxRecursion 0 );
sqlfiddle
This is the infamous gaps and islands problem with dates. The following is a solution that uses a recursive CTE. It might be a little tough to understand if you aren't used to working with them, I commented all parts that might need clarifying.
I also added a few more examples to contemplate different scenarios, such as different days on periods and overlapping times exactly at the start/end.
Example setup:
IF OBJECT_ID('tempdb..#tb_ServiceMemo') IS NOT NULL
DROP TABLE #tb_ServiceMemo
CREATE TABLE #tb_ServiceMemo(
Job_Number INT, -- This is an INT not VARCHAR!! (even the name says so)
Train_Number INT, -- This one also!!
Work_Start_Date DATETIME,
Work_Completed_Date DATETIME)
INSERT INTO #tb_ServiceMemo (
Job_Number,
Train_Number,
Work_Start_Date,
Work_Completed_Date)
VALUES
-- Total time train 1: 6h 10m (370m)
(1,1,'2018-08-01 12:35','2018-08-01 18:35'), -- Make sure to write date literals in ISO format (yyyy-MM-dd) to avoid multiple interpretations
(2,1,'2018-08-01 14:20','2018-08-01 15:20'),
(3,1,'2018-08-01 18:00','2018-08-01 18:45'),
-- Total time train 2: 2h (120m)
(4,2,'2018-08-01 12:00','2018-08-01 12:10'),
(5,2,'2018-08-01 12:15','2018-08-01 12:20'),
(6,2,'2018-08-01 13:15','2018-08-01 13:45'),
(9,2,'2018-08-01 13:45','2018-08-01 15:00'),
-- Total time train 3: 3h 45m (225m)
(7,3,'2018-08-01 23:30','2018-08-02 00:30'),
(8,3,'2018-08-02 00:15','2018-08-02 03:15'),
-- Total time train 4: 2d 8h 15m (3375m)
(10,4,'2018-08-01 23:00','2018-08-03 23:00'),
(11,4,'2018-08-02 00:15','2018-08-04 07:15')
The solution:
;WITH TimeLapses AS
(
-- Recursive Anchor: Find the minimum Jobs for each train that doesn't overlap with previous Jobs
SELECT
InitialJobNumber = T.Job_Number,
JobNumber = T.Job_Number,
TrainNumber = T.Train_Number,
IntervalStart = T.Work_Start_Date,
IntervalEnd = T.Work_Completed_Date,
JobExtensionPath = CONVERT(VARCHAR(MAX), T.Job_Number), -- Will store the chained jobs together for clarity
RecursionLevel = 1
FROM
#tb_ServiceMemo AS T
WHERE
NOT EXISTS (
SELECT
'Job doesn''t overlap with previous Jobs (by train)'
FROM
#tb_ServiceMemo AS S
WHERE
S.Train_Number = T.Train_Number AND
S.Job_Number < T.Job_Number AND
S.Work_Completed_Date >= T.Work_Start_Date AND -- Conditions for the periods to overlap
S.Work_Start_Date <= T.Work_Completed_Date)
UNION ALL
-- Recursive Union: Chain overlapping Jobs by train and keep intervals boundaries (min & max)
SELECT
InitialJobNumber = L.InitialJobNumber,
JobNumber = T.Job_Number,
TrainNumber = L.TrainNumber,
IntervalStart = CASE -- Minimum of both starts
WHEN L.IntervalStart <= T.Work_Start_Date THEN L.IntervalStart
ELSE T.Work_Start_Date END,
IntervalEnd = CASE -- Maximum of both ends
WHEN L.IntervalEnd >= T.Work_Completed_Date THEN L.IntervalEnd
ELSE T.Work_Completed_Date END,
JobExtensionPath = L.JobExtensionPath + '->' + CONVERT(VARCHAR(MAX), T.Job_Number),
RecursionLevel = L.RecursionLevel + 1
FROM
TimeLapses AS L -- Recursive CTE!
INNER JOIN #tb_ServiceMemo AS T ON
L.TrainNumber = T.Train_Number AND
T.Work_Completed_Date >= L.IntervalStart AND -- Conditions for the periods to overlap
T.Work_Start_Date <= L.IntervalEnd
WHERE
L.JobNumber < T.Job_Number -- Prevent joining in both directions (that would be "<>") to avoid infinite loops
),
MaxRecursionLevelByTrain AS
(
/*
Max recursion level will hold the longest interval for each train, as there might be recursive paths that skips some jobs. For example: Train 1's job 1 will
join with Job 2 and Job 3 on the first recursive level, then Job 2 will join with Job 3 on the next recursion. The higher the recursion level the more Jobs we
are taking into account for the longest interval.
We also need to group by InitialJobNumber as there might be different, idependent gaps for each train.
*/
SELECT
TrainNumber = T.TrainNumber,
InitialJobNumber = T.InitialJobNumber,
MaxRecursionLevel = MAX(T.RecursionLevel)
FROM
TimeLapses AS T
GROUP BY
T.TrainNumber,
T.InitialJobNumber
),
ExpandedLapses AS
(
SELECT
TrainNumber = T.TrainNumber,
InitialJobNumber = M.InitialJobNumber,
IntervalStart = T.IntervalStart,
IntervalEnd = T.IntervalEnd,
DownTime = DATEDIFF(MINUTE, T.IntervalStart, T.IntervalEnd),
JobExtensionPath = T.JobExtensionPath,
RecursionLevel = T.RecursionLevel
FROM
MaxRecursionLevelByTrain AS M
INNER JOIN TimeLapses AS T ON
M.TrainNumber = T.TrainNumber AND
M.MaxRecursionLevel = T.RecursionLevel AND
M.InitialJobNumber = T.InitialJobNumber
)
SELECT
TrainNumber = E.TrainNumber,
TotalDownTime = SUM(DownTime)
FROM
ExpandedLapses AS E
GROUP BY
E.TrainNumber
And these are the partial results from each CTE, so you can see each step:
TimeLapses:
InitialJobNumber JobNumber TrainNumber IntervalStart IntervalEnd JobExtensionPath RecursionLevel
1 1 1 2018-08-01 12:35:00.000 2018-08-01 18:35:00.000 1 1
1 2 1 2018-08-01 12:35:00.000 2018-08-01 18:35:00.000 1->2 2
1 3 1 2018-08-01 12:35:00.000 2018-08-01 18:45:00.000 1->3 2
1 3 1 2018-08-01 12:35:00.000 2018-08-01 18:45:00.000 1->2->3 3
4 4 2 2018-08-01 12:00:00.000 2018-08-01 12:10:00.000 4 1
5 5 2 2018-08-01 12:15:00.000 2018-08-01 12:20:00.000 5 1
6 6 2 2018-08-01 13:15:00.000 2018-08-01 13:45:00.000 6 1
6 9 2 2018-08-01 13:15:00.000 2018-08-01 15:00:00.000 6->9 2
7 8 3 2018-08-01 23:30:00.000 2018-08-02 03:15:00.000 7->8 2
7 7 3 2018-08-01 23:30:00.000 2018-08-02 00:30:00.000 7 1
10 10 4 2018-08-01 23:00:00.000 2018-08-03 23:00:00.000 10 1
10 11 4 2018-08-01 23:00:00.000 2018-08-04 07:15:00.000 10->11 2
MaxRecursionLevelByTrain:
TrainNumber InitialJobNumber MaxRecursionLevel
1 1 3
2 4 1
2 5 1
2 6 2
3 7 2
4 10 2
ExtendedLapses:
TrainNumber InitialJobNumber IntervalStart IntervalEnd DownTime JobExtensionPath RecursionLevel
1 1 2018-08-01 12:35:00.000 2018-08-01 18:45:00.000 370 1->2->3 3
2 4 2018-08-01 12:00:00.000 2018-08-01 12:10:00.000 10 4 1
2 5 2018-08-01 12:15:00.000 2018-08-01 12:20:00.000 5 5 1
2 6 2018-08-01 13:15:00.000 2018-08-01 15:00:00.000 105 6->9 2
3 7 2018-08-01 23:30:00.000 2018-08-02 03:15:00.000 225 7->8 2
4 10 2018-08-01 23:00:00.000 2018-08-04 07:15:00.000 3375 10->11 2
Final Result:
TrainNumber TotalDownTime
1 370
2 120
3 225
4 3375
A few things worth mentioning:
While this solution will definitely be faster than using a cursor, it might not be the best one available, specially if you have a huge dataset (more than 100k records). There is room for improving performance.
You might benefit from a index on #tb_ServiceMemo (Train_Number, Job_Number, Work_Start_Date) to speed up the query.
You might need to add OPTION (MAXRECURSION N) at the end of the SELECT statement, being N the max recursion level you want to try. Default is 100, so if there are more than 100 periods that chain together for a particular train, an error message will pop up. You can use 0 as N for unlimited.
Make sure that every end time is higher than the start time, and that the job numbers don't repeat, at least by each train.
Can you try this one ? I added other test case to besure but I think it's OK. I also think there is more simple
INSERT INTO [dbo].[tb_ServiceMemo]
SELECT 1, 1, CONVERT(DATETIME, '2018-08-01 09:35:00', 120), CONVERT(DATETIME, '2018-08-01 12:45:00', 120) union
SELECT 2, 1, CONVERT(DATETIME, '2018-08-01 12:35:00', 120), CONVERT(DATETIME, '2018-08-01 18:35:00', 120) union
SELECT 3, 1, CONVERT(DATETIME, '2018-08-01 14:20:00', 120), CONVERT(DATETIME, '2018-08-01 15:20:00', 120) union
SELECT 4, 1, CONVERT(DATETIME, '2018-08-01 18:00:00', 120), CONVERT(DATETIME, '2018-08-01 18:45:00', 120) union
SELECT 5, 1, CONVERT(DATETIME, '2018-08-01 19:00:00', 120), CONVERT(DATETIME, '2018-08-01 19:45:00', 120)
SELECT [Train_Number], SUM(DATEDIFF(MINUTE, T.[Work_Start_Date], T.Work_Completed_Date)) as Delay
FROM (
SELECT
[Job_Number],
[Train_Number],
CASE
WHEN EXISTS(SELECT * FROM [tb_ServiceMemo] T3 WHERE T1.[Work_Start_Date] BETWEEN T3.[Work_Start_Date] AND T3.[Work_Completed_Date] AND T1.[Job_Number] <> T3.[Job_Number] AND T1.Train_Number = T3.Train_Number)
THEN (SELECT MAX(T3.[Work_Completed_Date]) FROM [tb_ServiceMemo] T3 WHERE T1.[Work_Start_Date] BETWEEN T3.[Work_Start_Date] AND T3.[Work_Completed_Date] AND T1.[Job_Number] <> T3.[Job_Number] AND T1.Train_Number = T3.Train_Number)
ELSE [Work_Start_Date] END as [Work_Start_Date],
[Work_Completed_Date]
FROM [tb_ServiceMemo] T1
WHERE NOT EXISTS( -- To kick off the ignored case
SELECT T2.*
FROM [tb_ServiceMemo] T2
WHERE T2.[Work_Start_Date] < T1.[Work_Start_Date] AND T2.[Work_Completed_Date] > T1.[Work_Completed_Date]
)
) as T
GROUP BY [Train_Number]
The idea is to :
ignore the result contained into another
rewrite the start date value of each rown if she is contained into another

Group time series by time intervals (e.g. days) with aggregate of duration

I have a table containing a time series with following information. Each record represents the event of "changing the mode".
Timestamp | Mode
------------------+------
2018-01-01 12:00 | 1
2018-01-01 18:00 | 2
2018-01-02 01:00 | 1
2018-01-02 02:00 | 2
2018-01-04 04:00 | 1
By using the LEAD function, I can create a query with the following result. Now each record contains the information, when and how long the "mode was active".
Please check the 2nd and the 4th record. They "belong" to multiple days.
StartDT | EndDT | Mode | Duration
------------------+------------------+------+----------
2018-01-01 12:00 | 2018-01-01 18:00 | 1 | 6:00
2018-01-01 18:00 | 2018-01-02 01:00 | 2 | 7:00
2018-01-02 01:00 | 2018-01-02 02:00 | 1 | 1:00
2018-01-02 02:00 | 2018-01-04 04:00 | 2 | 50:00
2018-01-04 04:00 | (NULL) | 1 | (NULL)
Now I would like to have a query that groups the data by day and mode and aggregates the duration.
This result table is needed:
Date | Mode | Total
------------+------+-------
2018-01-01 | 1 | 6:00
2018-01-01 | 2 | 6:00
2018-01-02 | 1 | 1:00
2018-01-02 | 2 | 23:00
2018-01-03 | 2 | 24:00
2018-01-04 | 2 | 04:00
I didn't known how to handle the records that "belongs" to multiple days. Any ideas?
create table ChangeMode ( ModeStart datetime2(7), Mode int )
insert into ChangeMode ( ModeStart, Mode ) values
( '2018-11-15T21:00:00.0000000', 1 ),
( '2018-11-16T17:18:19.1231234', 2 ),
( '2018-11-16T18:00:00.5555555', 1 ),
( '2018-11-16T18:00:01.1234567', 2 ),
( '2018-11-16T19:02:22.8888888', 1 ),
( '2018-11-16T20:00:00.9876543', 2 ),
( '2018-11-17T09:00:00.0000000', 1 ),
( '2018-11-17T23:23:23.0230450', 2 ),
( '2018-11-19T17:00:00.0172839', 1 ),
( '2018-11-20T03:07:00.7033077', 2 )
;
with
-- Determine the earliest and latest dates.
-- Cast to date to remove the time portion.
-- Cast results back to datetime because we're going to add hours later.
MinMaxDates
as
(select cast(min(cast(ModeStart as date))as datetime) as MinDate,
cast(max(cast(ModeStart as date))as datetime) as MaxDate from ChangeMode),
-- How many days have passed during that period
Dur
as
(select datediff(day,MinDate,MaxDate) as Duration from MinMaxDates),
-- Create a list of numbers.
-- These will be added to MinDate to get a list of dates.
NumList
as
( select 0 as Num
union all
select Num+1 from NumList,Dur where Num<Duration ),
-- Create a list of dates by adding those numbers to MinDate
DayList
as
( select dateadd(day,Num,MinDate)as ModeDate from NumList, MinMaxDates ),
-- Create a list of day periods
PeriodList
as
( select ModeDate as StartTime,
dateadd(day,1,ModeDate) as EndTime
from DayList ),
-- Use LEAD to get periods for each record
-- Final record would return NULL for ModeEnd
-- We replace that with end of last day
ModePeriodList
as
( select ModeStart,
coalesce( lead(ModeStart)over(order by ModeStart),
dateadd(day,1,MaxDate) ) as ModeEnd,
Mode from ChangeMode, MinMaxDates ),
ModeDayList
as
( select * from ModePeriodList, PeriodList
where ModeStart<=EndTime and ModeEnd>=StartTime
),
-- Keep the later of the mode start time, and the day start time
-- Keep the earlier of the mode end time, and the day end time
ModeDayPeriod
as
( select case when ModeStart>=StartTime then ModeStart else StartTime end as StartTime,
case when ModeEnd<=EndTime then ModeEnd else EndTime end as EndTime,
Mode from ModeDayList ),
SumDurations
as
( select cast(StartTime as date) as ModeDate,
Mode,
DateDiff_Big(nanosecond,StartTime,EndTime)
/3600000000000
as DurationHours from ModeDayPeriod )
-- List the results in order
-- Use MaxRecursion option in case there are more than 100 days
select ModeDate as [Date], Mode, sum(DurationHours) as [Total Duration Hours]
from SumDurations
group by ModeDate, Mode
order by ModeDate, Mode
option (maxrecursion 0)
Result is:
Date Mode Total Duration Hours
---------- ----------- ---------------------------------------
2018-11-15 1 3.00000000000000
2018-11-16 1 18.26605271947221
2018-11-16 2 5.73394728052777
2018-11-17 1 14.38972862361111
2018-11-17 2 9.61027137638888
2018-11-18 2 24.00000000000000
2018-11-19 1 6.99999519891666
2018-11-19 2 17.00000480108333
2018-11-20 1 3.11686202991666
2018-11-20 2 20.88313797008333
you could use a CTE to create a table of days then join the time slots to it
DECLARE #MAX as datetime2 = (SELECT MAX(CAST(Timestamp as date)) MX FROM process);
WITH StartEnd AS (select p1.Timestamp StartDT,
P2.Timestamp EndDT ,
p1.mode
from process p1
outer apply
(SELECT TOP 1 pOP.* FROM
process pOP
where pOP.Timestamp > p1.Timestamp
order by pOP.Timestamp asc) P2
),
CAL AS (SELECT (SELECT MIN(cast(StartDT as date)) MN FROM StartEnd) DT
UNION ALL
SELECT DATEADD(day,1,DT) DT FROM CAL WHERE CAL.DT < #MAX
),
TMS AS
(SELECT CASE WHEN S.StartDT > C.DT THEN S.StartDT ELSE C.DT END AS STP,
CASE WHEN S.EndDT < DATEADD(day,1,C.DT) THEN S.ENDDT ELSE DATEADD(day,1,C.DT) END AS STE
FROM StartEnd S JOIN CAL C ON NOT(S.EndDT <= C.DT OR S.StartDT>= DATEADD(day,1,C.dt))
)
SELECT *,datediff(MI ,TMS.STP, TMS.ste) as x from TMS
The following uses recursive CTE to build a list of dates (a calendar or number table works equally well). It then intersect the dates with date times so that missing dates are populated with matching data. The important bit is that for each row, if start datetime belongs to previous day then it is clamped to 00:00. Likewise for end datetime.
DECLARE #t TABLE (timestamp DATETIME, mode INT);
INSERT INTO #t VALUES
('2018-01-01 12:00', 1),
('2018-01-01 18:00', 2),
('2018-01-02 01:00', 1),
('2018-01-02 02:00', 2),
('2018-01-04 04:00', 1);
WITH cte1 AS (
-- the min and max dates in your data
SELECT
CAST(MIN(timestamp) AS DATE) AS mindate,
CAST(MAX(timestamp) AS DATE) AS maxdate
FROM #t
), cte2 AS (
-- build all dates between min and max dates using recursive cte
SELECT mindate AS day_start, DATEADD(DAY, 1, mindate) AS day_end, maxdate
FROM cte1
UNION ALL
SELECT DATEADD(DAY, 1, day_start), DATEADD(DAY, 2, day_start), maxdate
FROM cte2
WHERE day_start < maxdate
), cte3 AS (
-- pull end datetime from next row into current
SELECT
timestamp AS dt_start,
LEAD(timestamp) OVER (ORDER BY timestamp) AS dt_end,
mode
FROM #t
), cte4 AS (
-- join datetime with date using date overlap query
-- then clamp start datetime to 00:00 of the date
-- and clamp end datetime to 00:00 of next date
SELECT
IIF(dt_start < day_start, day_start, dt_start) AS dt_start_fix,
IIF(dt_end > day_end, day_end, dt_end) AS dt_end_fix,
mode
FROM cte2
INNER JOIN cte3 ON day_end > dt_start AND dt_end > day_start
)
SELECT dt_start_fix, dt_end_fix, mode, datediff(minute, dt_start_fix, dt_end_fix) / 60.0 AS total
FROM cte4
DB Fiddle
Thanks everybody!
The answer from Cato put me on the right track. Here my final solution:
DECLARE #Start AS datetime;
DECLARE #End AS datetime;
DECLARE #Interval AS int;
SET #Start = '2018-01-01';
SET #End = '2018-01-05';
SET #Interval = 24 * 60 * 60;
WITH
cteDurations AS
(SELECT [Timestamp] AS StartDT,
LEAD ([Timestamp]) OVER (ORDER BY [Timestamp]) AS EndDT,
Mode
FROM tblLog
WHERE [Timestamp] BETWEEN #Start AND #End
),
cteTimeslots AS
(SELECT #Start AS StartDT,
DATEADD(SECOND, #Interval, #Start) AS EndDT
UNION ALL
SELECT EndDT,
DATEADD(SECOND, #Interval, EndDT)
FROM cteTimeSlots WHERE StartDT < #End
),
cteDurationsPerTimesplot AS
(SELECT CASE WHEN S.StartDT > C.StartDT THEN S.StartDT ELSE C.StartDT END AS StartDT,
CASE WHEN S.EndDT < C.EndDT THEN S.EndDT ELSE C.EndDT END AS EndDT,
C.StartDT AS Slot,
S.Mode
FROM cteDurations S
JOIN cteTimeslots C ON NOT(S.EndDT <= C.StartDT OR S.StartDT >= C.EndDT)
)
SELECT Slot,
Mode,
SUM(DATEDIFF(SECOND, StartDT, EndDT)) AS Duration
FROM cteDurationsPerTimesplot
GROUP BY Slot, Mode
ORDER BY Slot, Mode;
With the variable #Interval you are able to define the size of the timeslots.
The CTE cteDurations creates a subresult with the durations of all necessary entries by using the TSQL function LEAD (available in MSSQL >= 2012). This will be a lot faster than an OUTER APPLY.
The CTE cteTimeslots generates a list of timeslots with start time and end time.
The CTE cteDurationsPerTimesplot is a subresult with a JOIN between cteDurations and cteTimeslots. This this the magic JOIN statement from Cato!
And finally the SELECT statement will do the grouping and sum calculation per Slot and Mode.
Once again: Thanks a lot to everybody! Especially to Cato! You saved my weekend!
Regards
Oliver

SQL Showing Every Hour of Every Day

I wrote the below code to break out my data that shows patient arrival and departure by day, into patient census by hour of every day.
The code works but for every date, instead of adding one hour each for the hours 0-23, it adds a second line for 0, so it breaks every day into 25 lines instead of 24. I'm pretty sure the problem is somewhere in the Cross Apply below, but I included the rest of the code for your reference.
I'd really appreciate any help you can give. Also, if you have any tips on how to post code in here and have it look more normal, let me know. Thank you!
--Create my temporary table
SELECT *
INTO #Temporary
FROM dbo.Census
WHERE YEAR(startdatetime) >= 2018
ORDER BY
startdatetime
,pt_id
--Use the Cross Apply to split out every day into every hour
SELECT
Date = CAST(D AS DATE)
,Hour = DATEPART(HOUR, D)
,pt_id
,cendate
,locationid
,[room-bed]
,startdatetime
,enddatetime
,minutes
,DayOfWeek
,WeekInt
,MyStartMinutes = 0
,MyEndMinutes = 0
INTO #Temporary2
FROM #Temporary A
CROSS APPLY
(
SELECT TOP ( ABS(DATEDIFF(HOUR, A.startdatetime, A.enddatetime) + 1))
D = DATEADD(HOUR, -1 + ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )), A.startdatetime)
FROM master..spt_values n1
,master..spt_values n2
) B
--Update values for MyStartMinutes and MyEndMinutes
UPDATE #Temporary2
SET MyStartMinutes = CASE WHEN ( DATEPART(HOUR, startdatetime) = Hour )
THEN DATEPART(mi, enddatetime)
ELSE 0 END
UPDATE #Temporary2
SET MyEndMinutes = CASE WHEN ( DATEPART(HOUR, enddatetime) = Hour )
AND DATEDIFF(DAY, enddatetime, cendate) = 0
THEN DATEPART(mi, enddatetime)
ELSE 0 END
--Update values of startdatetime and enddatetime
UPDATE #Temporary2
SET startdatetime = DATEADD(HOUR, Hour, DATEADD(MINUTE, MyStartMinutes, CAST(CAST(startdatetime AS DATE) AS DATETIME)))
UPDATE #Temporary2
SET enddatetime = CASE WHEN ( Hour < 23 )
THEN ( DATEADD(HOUR, Hour + 1, DATEADD(MINUTE, MyEndMinutes, CAST(CAST(startdatetime AS DATE) AS DATETIME))))
WHEN Hour = 23
THEN ( DATEADD(HOUR, 0, DATEADD(MINUTE, MyEndMinutes, CAST(CAST(enddatetime AS DATE) AS DATETIME))))
ELSE '' END
--Update Value of Minutes
UPDATE #Temporary2
SET Minutes = DATEDIFF(mi, startdatetime, enddatetime)
SELECT *
FROM #Temporary2
ORDER BY minutes DESC
Here is the sample data from dbo.Census:
org pt_id cendate location bed startdate enddate minutes DOW
A 5 1/8/2018 7E 50 1/8/2018 8:00 1/9/2018 0:00 960 Mon
A 5 1/9/2018 7E 50 1/9/2018 0:00 1/10/2018 0:00 1440 Tue
A 5 1/10/2018 7E 50 1/10/2018 0:00 1/11/2018 0:00 1440 Wed
A 5 1/11/2018 7E 50 1/11/2018 0:00 1/11/2018 14:00 840 Thu
A 1 10/17/2016 ED 10 10/17/2016 1:05 10/17/2016 10:21 556 Mon
A 2 5/10/2017 4L 20 5/10/2017 15:09 5/11/2017 0:00 531 Wed
A 3 5/14/2017 4L 30 5/14/2017 0:00 5/14/2017 8:12 492 Sun
A 4 6/3/2017 5C 40 6/3/2017 0:00 6/4/2017 0:00 1440 Sat
I think you're correct that your CROSS APPLY is the culprit here. After testing your code on my own sample data, I found that if there were separate records in dbo.Census that had overlapping days between their startdates and enddates, those dates and hours would get duplicated, depending on how many records and how many days they share.
So what I did was add the PK from dbo.Census into the CROSS APPLY, and then used that id column in the subquery to filter the results to only those where the ids matched. Here's the section of code I changed:
SELECT
Date = CAST(D AS DATE)
,Hour = DATEPART(HOUR, D)
,A.pt_id
,cendate
,locationid
,[room-bed]
,startdatetime
,enddatetime
,minutes
,DayOfWeek
,WeekInt
,MyStartMinutes = 0
,MyEndMinutes = 0
INTO #Temporary2
FROM #Temporary A
CROSS APPLY
(
SELECT TOP ( ABS(DATEDIFF(HOUR, A.startdatetime, A.enddatetime) + 1))
D = DATEADD(HOUR, -1 + ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )), A.startdatetime)
,A.pt_id
FROM master..spt_values n1
,master..spt_values n2
) B
WHERE A.pt_id = B.pt_id
I made the assumption that pt_id is the primary key of dbo.Census. If that's not the case, you would just replace pt_id with the PK from dbo.Census.

SQL how to count census points occurring between date records

I’m using MS-SQL-2008 R2 trying to write a script that calculates the Number of Hospital Beds occupied on any given day, at 2 census points: midnight, and 09:00.
I’m working from a data set of patient Ward Stays. Basically, each row in the table is a record of an individual patient's stay on a single ward, and records the date/time the patient is admitted onto the ward, and the date/time the patient leaves the ward.
A sample of this table is below:
Ward_Stay_Primary_Key | Ward_Start_Date_Time | Ward_End_Date_Time
1 | 2017-09-03 15:04:00.000 | 2017-09-27 16:55:00.000
2 | 2017-09-04 18:08:00.000 | 2017-09-06 18:00:00.000
3 | 2017-09-04 13:00:00.000 | 2017-09-04 22:00:00.000
4 | 2017-09-04 20:54:00.000 | 2017-09-08 14:30:00.000
5 | 2017-09-04 20:52:00.000 | 2017-09-13 11:50:00.000
6 | 2017-09-05 13:32:00.000 | 2017-09-11 14:49:00.000
7 | 2017-09-05 13:17:00.000 | 2017-09-12 21:00:00.000
8 | 2017-09-05 23:11:00.000 | 2017-09-06 17:38:00.000
9 | 2017-09-05 11:35:00.000 | 2017-09-14 16:12:00.000
10 | 2017-09-05 14:05:00.000 | 2017-09-11 16:30:00.000
The key thing to note here is that a patient’s Ward Stay can span any length of time, from a few hours to many days.
The following code enables me to calculate the number of beds at both census points for any given day, by specifying the date in the case statement:
SELECT
'05/09/2017' [Date]
,SUM(case when Ward_Start_Date_Time <= '05/09/2017 00:00:00.000' AND (Ward_End_Date_Time >= '05/09/2017 00:00:00.000' OR Ward_End_Date_Time IS NULL)then 1 else 0 end)[No. Beds Occupied at 00:00]
,SUM(case when Ward_Start_Date_Time <= '05/09/2017 09:00:00.000' AND (Ward_End_Date_Time >= '05/09/2017 09:00:00.000' OR Ward_End_Date_Time IS NULL)then 1 else 0 end)[No. Beds Occupied at 09:00]
FROM
WardStaysTable
And, based on the sample 10 records above, generates this output:
Date | No. Beds Occupied at 00:00 | No. Beds Occupied at 09:00
05/09/2017 | 4 | 4
To perform this for any number of days is obviously onerous, so what I’m looking to create is a query where I can specify a start/end date parameter (e.g. 1st-5th Sept), and for the query to then evaluate the Ward_Start_Date_Time and Ward_End_Date_Time variables for each record, and – grouping by the dates defined in the date parameter – count each time the 00:00:00.000 and 09:00:00.000 census points fall between these 2 variables, to give an output something along these lines (based on the above 10 records):
Date | No. Beds Occupied at 00:00 | No. Beds Occupied at 09:00
01/09/2017 | 0 | 0
02/09/2017 | 0 | 0
03/09/2017 | 0 | 0
04/09/2017 | 1 | 1
05/09/2017 | 4 | 4
I’ve approached this (perhaps naively) thinking that if I use a cte to create a table of dates (defined by the input parameters), along with associated midnight and 9am census date/time points, then I could use these variables to group and evaluate the dataset.
So, this code generates the grouping dates and census date/time points:
DECLARE
#StartDate DATE = '01/09/2017'
,#EndDate DATE = '05/09/2017'
,#0900 INT = 540
SELECT
DATEADD(DAY, nbr - 1, #StartDate) [Date]
,CONVERT(DATETIME,(DATEADD(DAY, nbr - 1, #StartDate))) [MidnightDate]
,DATEADD(mi, #0900,(CONVERT(DATETIME,(DATEADD(DAY, nbr - 1, #StartDate))))) [0900Date]
FROM
(
SELECT
ROW_NUMBER() OVER ( ORDER BY c.object_id ) AS nbr
FROM sys.columns c
) nbrs
WHERE nbr - 1 <= DATEDIFF(DAY, #StartDate, #EndDate)
The stumbling block I’ve hit is how to join the cte to the WardStays dataset, because there’s no appropriate key… I’ve tried a few iterations of using a subquery to make this work, but either I’m taking the wrong approach or I’m getting my syntax in a mess.
In simple terms, the logic I’m trying to create to get the output is something like:
SELECT
[Date]
,SUM (case when WST.Ward_Start_Date_Time <= [MidnightDate] AND (WST.Ward_End_Date_Time >= [MidnightDate] OR WST.Ward_End_Date_Time IS NULL then 1 else 0 end) [No. Beds Occupied at 00:00]
,SUM (case when WST.Ward_Start_Date_Time <= [0900Date] AND (WST.Ward_End_Date_Time >= [0900Date] OR WST.Ward_End_Date_Time IS NULL then 1 else 0 end) [No. Beds Occupied at 09:00]
FROM WardStaysTable WST
GROUP BY [Date]
Is the above somehow possible, or am I barking up the wrong tree and need to take a different approach altogether? Appreciate any advice.
I would expect something like this:
WITH dates as (
SELECT CAST(#StartDate as DATETIME) as dte
UNION ALL
SELECT DATEADD(DAY, 1, dte)
FROM dates
WHERE dte < #EndDate
)
SELECT dates.dte [Date],
SUM(CASE WHEN Ward_Start_Date_Time <= dte AND
Ward_END_Date_Time >= dte
THEN 1 ELSE 0
END) as num_beds_0000,
SUM(CASE WHEN Ward_Start_Date_Time <= dte + CAST('09:00' as DATETIME) AND
Ward_END_Date_Time >= dte + CAST('09:00' as DATETIME)
THEN 1 ELSE 0
END) as num_beds_0900
FROM dates LEFT JOIN
WardStaysTable wt
ON wt.Ward_Start_Date_Time <= DATEADD(day, 1, dates.dte) AND
wt.Ward_END_Date_Time >= dates.dte
GROUP BY dates.dte
ORDER BY dates.dte;
The cte is just creating the list of dates.
What a cool exercise. Here is what I came up with:
CREATE TABLE #tmp (ID int, StartDte datetime, EndDte datetime)
INSERT INTO #tmp values(1,'2017-09-03 15:04:00.000','2017-09-27 06:55:00.000')
INSERT INTO #tmp values(2,'2017-09-04 08:08:00.000','2017-09-06 18:00:00.000')
INSERT INTO #tmp values(3,'2017-09-04 13:00:00.000','2017-09-04 22:00:00.000')
INSERT INTO #tmp values(4,'2017-09-04 20:54:00.000','2017-09-08 14:30:00.000')
INSERT INTO #tmp values(5,'2017-09-04 20:52:00.000','2017-09-13 11:50:00.000')
INSERT INTO #tmp values(6,'2017-09-05 13:32:00.000','2017-09-11 14:49:00.000')
INSERT INTO #tmp values(7,'2017-09-05 13:17:00.000','2017-09-12 21:00:00.000')
INSERT INTO #tmp values(8,'2017-09-05 23:11:00.000','2017-09-06 07:38:00.000')
INSERT INTO #tmp values(9,'2017-09-05 11:35:00.000','2017-09-14 16:12:00.000')
INSERT INTO #tmp values(10,'2017-09-05 14:05:00.000','2017-09-11 16:30:00.000')
DECLARE
#StartDate DATE = '09/01/2017'
,#EndDate DATE = '10/01/2017'
, #nHours INT = 9
;WITH d(OrderDate) AS
(
SELECT DATEADD(DAY, n-1, #StartDate)
FROM (SELECT TOP (DATEDIFF(DAY, #StartDate, #EndDate) + 1)
ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_objects) AS x(n)
)
, CTE AS(
select OrderDate, t2.*
from #tmp t2
cross apply(select orderdate from d ) d
where StartDte >= #StartDate and EndDte <= #EndDate)
select OrderDate,
SUM(CASE WHEN OrderDate >= StartDte and OrderDate <= EndDte THEN 1 ELSE 0 END) [No. Beds Occupied at 00:00],
SUM(CASE WHEN StartDTE <= DateAdd(hour,#nHours,CAST(OrderDate as datetime)) and DateAdd(hour,#nHours,CAST(OrderDate as datetime)) <= EndDte THEN 1 ELSE 0 END) [No. Beds Occupied at 09:00]
from CTE
GROUP BY OrderDate
This should allow you to check for any hour of the day using the #nHours parameter if you so choose. If you only want to see records that actually fall within your date range then you can filter the cross apply on start and end dates.

Breaking out yearly payments into monthly payments with month name in a 3 year period

I was wondering where to go from my initial idea. I used the query below to get the month beginning dates for each of the three years:
DECLARE #STARTDATE DATETIME,
#ENDDATE DATETIME;
SELECT #STARTDATE='2013-01-01 00:00:00.000',
#ENDDATE='2015-12-31 00:00:00.000';
WITH [3YearDateMonth]
AS
(
SELECT TOP (DATEDIFF(mm,#STARTDATE,#ENDDATE) + 1)
MonthDate = (DATEADD(mm,DATEDIFF(mm,0,#STARTDATE) + (ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1),0))
FROM sys.all_columns ac1
)
SELECT MonthDate
FROM [3YearDateMonth]
I am not sure if I should DATENAME(Month, Monthdate) it later for the month names or just do it in the cte; any suggestions would be great.
My data looks like this:
BeginDate EndDate Payment
2013-01-01 00:00:00.000 2013-12-31 00:00:00.000 3207.70
2014-01-01 00:00:00.000 2014-12-31 00:00:00.000 3303.93
2015-01-01 00:00:00.000 2015-12-31 00:00:00.000 3403.05
Since the payment is yearly I can use payment/12 to get an average monthly amount. I want my data to look like this:
BeginDate EndDate Month MonthlyAmount
2013-01-01 00:00:00.000 2013-01-31 00:00:00.000 January 267.3083
2013-02-01 00:00:00.000 2013-02-31 00:00:00.000 February 267.3083
...
2014-01-01 00:00:00.000 2014-01-31 00:00:00.000 January 275.3275
2014-02-01 00:00:00.000 2014-02-31 00:00:00.000 February 275.3275
...
2015-01-01 00:00:00.000 2015-01-31 00:00:00.000 January 283.5875
2015-02-01 00:00:00.000 2015-02-31 00:00:00.000 February 283.5875
All the way through December for each yearly pay period.
I will be pivoting the Month column later to put the monthly amounts under the corresponding month they belong in.
Is this doable because I feel lost at this point?
Starting with your three data rows, you can use the following query to get your desired results:
with months as
(
select BeginDate
, EndDate
, Payment = Payment / 12.0
from MyTable
union all
select BeginDate = dateadd(mm, 1, BeginDate)
, EndDate
, Payment
from months
where dateadd(mm, 1, BeginDate) < EndDate
)
select BeginDate
, EndDate = dateadd(dd, -1, dateadd(mm, 1, BeginDate))
, Month = datename(mm, BeginDate)
, MonthlyAmount = Payment
from months
order by BeginDate
SQL Fiddle with demo.
Here's a query for you:
WITH L1 (N) AS (SELECT 1 UNION ALL SELECT 1),
L2 (N) AS (SELECT 1 FROM L1, L1 B),
L3 (N) AS (SELECT 1 FROM L2, L2 B),
Num (N) AS (SELECT Row_Number() OVER (ORDER BY (SELECT 1)) FROM L3)
SELECT
P.BeginDate,
P.EndDate,
M.MonthlyPayDate,
MonthlyAmount =
CASE
WHEN N.N = C.MonthCount
THEN P.Payment - Round(P.Payment / C.MonthCount, 2) * (C.MonthCount - 1)
ELSE Round(P.Payment / C.MonthCount, 2)
END
FROM
dbo.Payment P
CROSS APPLY (
SELECT DateDiff(month, BeginDate, EndDate) + 1
) C (MonthCount)
INNER JOIN Num N
ON C.MonthCount >= N.N
CROSS APPLY (
SELECT DateAdd(month, N.N - 1, BeginDate)
) M (MonthlyPayDate)
ORDER BY
P.BeginDate,
M.MonthlyPayDate
;
See a Live Demo at SQL Fiddle
Pluses:
Doesn't assume 12 months--it will work with any date range.
Properly rounds all non-ultimate months, then assigns the remainder to the last month so that the total sum is accurate. For example, for 2013, the normal monthly payment is 267.31, but December's month's payment is 267.29.
Minuses:
Assumes all dates entirely enclose full months, starting on the 1st and ending on the last day of the month.
If you provide more detail about further requirements regarding pro-rating, I can improve the query for you.