Creating a status log from rows of datetimes of status changes - sql

I'm pulling down some data from a remote API to a local SQL Server table, which is formatted like so. (imagine it's sorted by StatusDT descending)
DriverID StatusDT Status
-------- -------- ------
b103 2019-03-05 05:42:52:000 D
b103 2019-03-03 23:45:42.000 SB
b103 2019-03-03 21:49:41.000 ON
What would be the best way to eventually get to a point where I can return a query showing the total amount of time spent in each status on each day for each driver?
Also, it's possible that there could be gaps of a whole day or more between status updates, in which case I'd need a row showing a continuation of the previous status from 00:00:00 to 23:59:59 for each skipped day. So, if I'm looping through this table to populate another with the structure below, the example above would need to wind up looking like this... (again, sorted descending by date)
DriverID StartDT EndDT Status
-------- --------------- -------------- ------
b103 2019-03-05 05:42:52 D
b103 2019-03-05 00:00:00 2019-03-05 05:42:51 SB
b103 2019-03-04 00:00:00 2019-03-04 23:59:59 SB
b103 2019-03-03 23:45:42 2019-03-03 23:59:59 SB
b103 2019-03-03 21:49:41 2019-03-03 23:45:41 ON
Does that make sense?
I wound up dumping the API data to a "work" table and running a cursor over it to add rows to another table, with the starting and ending date/time, but I'm curious if there's another way that might be more efficient.
Thanks very much.

I think this query is what you need. I couldn't test it, however, for syntax errors:
with x as (
StatusDT as StartDT,
lead(StatusID) over(partition by DriverID order by StatusDT) as EndDT,
from my_table
select -- start & end on the same day
from x
where convert(date, StartDT) = convert(date, EndDT)
or EndDT is null
union all
select -- start & end on different days; first day up to midnight
dateadd(ms, -3, convert(date, EndDT)) as EndDT,
from x
where convert(date, StartDT) <> convert(date, EndDT)
and or EndDT is not null
union all
select -- start & end on different days; next day from midnight
convert(date, EndDT) as StartDT,
from x
where convert(date, StartDT) <> convert(date, EndDT)
and or EndDT is not null
order by StartDT desc

Most of your answer is just using lead():
select driverid, status, statusdt,
lead(statusdt) over (partition by driverid order by statusdt) as enddte
from t;
This does not give the breaks by day. But you can add those. I think the easiest way is to add in the dates (using a recursive CTE) and compute the status at that time. So:
I would do the following:
use a recursive CTE to calculate the dates
"fill in" the statuses and union to the original table
use lead() to get the end date
This looks like:
with day_boundaries as (
select driverid, dateadd(day, 1, convert(min(statusdt) as date) as statusdt, max(statusdt) as finaldt
from t
group by driverid
having datediff(da, min(statusdt), max(statusdt)) > 0
union all
select driverid, dateadd(day, 1, statusdt), finaldt
from day_boundaries
where statusdt < finaldt
unioned as (
select driverid, status, statusdt
from t
union all
select db.driverid, s.status, db.statusdt
from day_boundaries db cross apply
(select top (1) status
from t
where t.statusdt < db.statusdt
order by t.statusdt desc
) s
select driverid, status, statusdt,
lead(statusdt) over (partition by driverid order by statusdt) as enddte
from unioned;
Note that this does not subtract any seconds from the end date. The end date matches the previous start date. Time is continuous. It makes no sense to have gaps for records that should snugly fit together.


Get Start and End date from multiple rows of dates, excluding weekends

I'm trying figure out how to return Start Date and End date based on data like in the below table:
Date From
Date To
The result I am after would show like this:
Date From
Date To
The dates in first table will sometimes go over weekends with the Date From and Date To, but in cases where the row ends on a Friday and next row starts on following Monday it will need to be classified as the same "block", as presented in the second table. I was hoping to use DATEFIRST setting to cater for the weekends to avoid using a calendar table, as per How do I exclude Weekend days in a SQL Server query?, but if calendar table ends up being the easiest way out I'm happy to look into creating one.
In above example I only have 1 Name, but the table will have multiple names and it will need to be grouped by that.
The only examples of this I am seeing are using only 1 date column for records and I struggled changing their code around to cater for my example. The closest example I found doesn't work for me as it is based on datetime fields and the time differences - find start and stop date for contiguous dates in multiple rows
This is a Gaps & Island problem with the twist that you need to consider weekend continuity.
You can do:
select max(name) as name, min(date_from) as date_from, max(date_to) as date_to
from (
select *, sum(inc) over(order by date_to) as grp
from (
select *,
case when lag(ext_to) over(order by date_to) = date_from
then 0 else 1 end as inc
from (
select *,
case when (datepart(weekday, date_to) = 6)
then dateadd(day, 3, date_to)
else dateadd(day, 1, date_to) end as ext_to
from t
) x
) y
) z
group by grp
name date_from date_to
---- ---------- ----------
A 2021-11-08 2021-11-09
A 2021-12-23 2022-01-03
See running example at db<>fiddle #1.
Note: Your question doesn't mention it, but you probably want to segment per person. I didn't do it.
EDIT: Adding partition by name
Partitioning by name is quite easy actually. The following query does it:
select name, min(date_from) as date_from, max(date_to) as date_to
from (
select *, sum(inc) over(partition by name order by date_to) as grp
from (
select *,
case when lag(ext_to) over(partition by name order by date_to) = date_from
then 0 else 1 end as inc
from (
select *,
case when (datepart(weekday, date_to) = 6)
then dateadd(day, 3, date_to)
else dateadd(day, 1, date_to) end as ext_to
from t
) x
) y
) z
group by name, grp
order by name, grp
See running query at db<>fiddle #2.
with extended as (
select name,
case when datepart(weekday, date_to) = 6
then dateadd(day, 2, date_to) else date_to end as date_to
from t
), adjacent as (
select *,
case when dateadd(day, 1,
lag(date_to) over (partition by name order by date_from)) = date_from
then 0 else 1 end as brk
from extended
), blocked as (
select *, sum(brk) over (partition by name order by date_from) as grp
from adjacent
select name, min(date_from), max(date_to) from blocked
group by name, grp;
I'm assuming that ranges do no overlap and that all input dates do fall on weekdays. While hammering this out on my cellphone I originally made two mistakes. For some reason I got to and from dates reversed in my head and then I was thinking that Friday is 5 (as with ##datefirst) rather than 6. (Of course this could otherwise vary with the regional setting anyway.) One advantage of using table expressions is to modularize and bury certain details in lower levels of the logic. In this case it would be very easy to adjust dates should some of these assumptions prove to be wrong.

Collapse multiple rows based on time values

I'm trying to collapse rows with consecutive timeline within the same day into one row but having an issue because of gap in time. For example, my dataset looks like this.
Date StartTime EndTime ID
2017-12-1 09:00:00 11:00:00 12345
2017-12-1 11:00:00 13:00:00 12345
2018-09-08 09:00:00 10:00:00 78465
2018-09-08 10:00:00 12:00:00 78465
2018-09-08 15:00:00 16:00:00 78465
2018-09-08 16:00:00 18:00:00 78465
As up can see, the first two rows can just be combined together without any issue because there's no time gap within that day. However. for the entries on 2019-09-08, there is a gap between 12:00 and 15:00. And I'd like to merge these four records into two different rows like this:
Date StartTime EndTime ID
2017-12-1 09:00:00 13:00:00 12345
2018-09-08 09:00:00 12:00:00 78465
2018-09-08 15:00:00 18:00:00 78465
In other words, I only want to collapse the rows only when the time variables are consecutive within the same day for the same ID.
Could anyone please help me with this? I tried to generate unique group using LAG and LEAD functions but it didn't work.
You can use a recursive cte. Group it as same group if the EndTime is same as next StartTime. And then find the MIN() and MAX()
with cte as
select rn = row_number() over (partition by [ID], [Date] order by [StartTime]),
from tbl
rcte as
-- anchor member
select rn, [ID], [Date], [StartTime], [EndTime], grp = 1
from cte
where rn = 1
union all
-- recursive member
select c.rn, c.[ID], c.[Date], c.[StartTime], c.[EndTime],
grp = case when r.[EndTime] = c.[StartTime]
then r.grp
else r.grp + 1
from rcte r
inner join cte c on r.[ID] = c.[ID]
and r.[Date] = c.[Date]
and r.rn = c.rn - 1
select [ID], [Date],
min([StartTime]) as StartTime,
max([EndTime]) as EndTime
from rcte
group by [ID], [Date], grp
db<>fiddle demo
Unless you have a particular objection to collapsing non-consecutive rows, which are consecutive for that ID, you can just use GROUP BY:
StartTime = MIN(StartTime),
EndTime = MAX(EndTime),
FROM table
Otherwise you can use a solution based on ROW_NUMBER:
FROM table
) t
WHERE rn = 1
This is an example of a gaps-and-islands problem -- actually a pretty simple example. The idea is to assign an "island" grouping to each row specifying that they should be combined because they overlap. Then aggregate.
How do you assign the island? In this case, look at the previous endtime and if it is different from the starttime, then the row starts a new island. Voila! A cumulative sum of the the start flag identifies each island.
select id, date, min(starttime), max(endtime)
from (select t.*,
sum(case when prev_endtime = starttime then 0 else 1 end) over (partition by id, date order by starttime) as grp
from (select t.*,
lag(endtime) over (partition by id, date order by starttime) as prev_endtime
from t
) t
) t
group by id, date, grp;
Here is a db<>fiddle.
Note: This assumes that the time periods never span multiple days. The code can be very easily modified to handle that . . . but with a caveat. The start and end times should be stored as datetime (or a related timestamp) rather than separating the date and times into different columns. Why? SQL Server doesn't support '24:00:00' as a valid time.

Sql split entries into two entries if EndDateTime is the next day(after midnight)

I have this query:
EndDateTime = Lead(sth.CreatedDateTime, 1) over (partition by sth.Id order by sth.Id, sth.CreatedDateTime)
Sth as sth
order by
sth.Id, sth.CreatedDateTime
Which returns these results:
Id StartDateTime EndDate
2746743 2019-11-20 14:35:05.5841266 NULL
2746744 2019-11-20 14:35:05.5841266 NULL
3 2018-06-25 23:35:12.2799952 2018-06-26 09:57:27.8943163
13 2018-06-26 09:57:27.8943163 2018-06-26 10:41:19.2973307
I have been asked to update the above query, split the row with Id=3 into two rows.
Meaning: as you can see the record with Id 3 starts at 23:35 and ends the **next day** at 09:57
What I need it to split this record into two.
The first one should be from 23:35 -> 23:59
And the one below should be from 00:00 -> 09:57
If records span for more than one day. Nothing needs to be done. Also the end solution should be able to work for a history table. More than 3 million rows.
So the record should result to sth like this
Id StartDateTime EndDateTime
3 2018-06-25 23:35:12.2799952 2018-06-25 23:59:59.000000
3 2018-06-26 00:00:00.0000000 2018-06-26 09:57:27.8943163
I hope this makes sense!
All other records will yield similar results. There are records that do not need to be splitted.
The result set in your question cannot be a result from the query you have specified (every id would have a null value for the end date). So, I am interpreting the question as handling the situation where the end date is present and one day after the start date.
I would just use a lateral join:
with t as (
select sth.*, CreatedDateTime as StartDateTime,
Lead(sth.CreatedDateTime, 1) over (partition by sth.Id order by sth.Id, sth.CreatedDateTime) as EndDateTime
from Sth as sth
select, v.*
from t cross apply
(values (startdatetime,
(case when datediff(day, startdatetime, enddatetime) = 1
then dateadd(second, -1, dateadd(day, 1, convert(datetime, convert(date, startdatetime))))
else enddatetime
(dateadd(day, 1, convert(date, startdatetime)),
(case when datediff(day, startdatetime, enddatetime) = 1
then enddatetime
) v(startdatetime, enddatetime)
where v.enddatetime is not null;
Here is a db<>fiddle.

SQL calculates total down time minutes

I'm working on a down time management system that is capable of saving support tickets for problems in a database, my database has the following columns:
I want to obtain the sum of minutes in a day, taking into account that the tickets can be simultaneous, for example:
ID | DateOpen | DateClosed | Total
1 2019-04-01 08:00:00 AM 2019-04-01 08:45:00 45
2 2019-04-01 08:10:00 AM 2019-04-01 08:20:00 10
3 2019-04-01 09:06:00 AM 2019-04-01 09:07:00 1
4 2019-04-01 09:06:00 AM 2019-04-01 09:41:00 33
Someone can helpme with that please!! :c
If I use the query "SUM", it will return 89, but if you see the dates, you will understand that the actual result must be 78 because the tickets 2 and 3 were launched while another ticket was working ...
DECLARE #DateOpen date = '2019-04-01'
SELECT AlarmID, DateOpen, DateClosed, TDT FROM AlarmHistory
WHERE CONVERT(date,DateOpen) = #DateOpen
What you need to do is generate a sequence of integers and use that to generate times of the day. Join that sequence of times on between your open and close dates, then count the number of distinct times.
Here is an example that will work with MySQL:
SET #row_num = 0;
-- this simulates your dateopen and dateclosed table
FROM (SELECT '2019-04-01 08:00:00' open_time, '2019-04-01 08:45:00' close_time
UNION SELECT '2019-04-01 08:10:00', '2019-04-01 08:20:00'
UNION SELECT '2019-04-01 09:06:00', '2019-04-01 09:07:00'
UNION SELECT '2019-04-01 09:06:00', '2019-04-01 09:41:00') times_used
-- generate sequence of minutes in day
SELECT TIME(sequence*100) time_stamp
-- create sequence 1 - 10000
SELECT (#row_num:=#row_num + 1) AS sequence
FROM {table_with_10k+_records}
LIMIT 10000
) minutes
LIMIT 1440
) times ON (time_stamp >= TIME(open_time) AND time_stamp < TIME(close_time));
Since you are selecting only distinct times that are found in the result, minutes that overlap will not be counted.
NOTE: Depending on your database, there may be a better way to go about generating a sequence. MySQL does not have a generate sequence function I did it this way to show the basic idea that can easily be converted to work with whatever database you are using.
#drakin8564's answer adapted for SQL Server which I believe you're using:
FROM sys.all_objects a1
JOIN sys.all_objects a2
FROM incidents inci
ON Gen.t >= CONVERT(TIME, inci.DateOpen)
AND Gen.t < CONVERT(TIME, inci.DateClosed)
Your total for the last record is wrong, says 33 while it's 35, so the query results in 80, not 78.
By the way, just as MarcinJ told you, 41 - 6 is 35, not 33. So the answer is 80, not 78.
The following solution would work even if the date parameter is not one day only (1,440 minutes). Say if the date parameter is a month, or even year, this solution would still work.
Live demo:!18/462ac/5
-- arranged the opening and closing downtime
with a as
DateOpen d, 1 status
from dt
union all
DateClosed, 2
from dt
-- don't compute the downtime from previous date
-- if the current date's status is opened
-- yet the previous status is closed
, downtime_minutes AS
lag(status) over(order by d, status desc) as prev_status,
case when status = 1 and lag(status) over(order by d, status desc) = 2 then
datediff(minute, lag(d) over(order by d, status desc), d)
end as downtime
from a
select sum(downtime) as all_downtime from downtime_minutes;
| all_downtime |
| 80 |
See how it works:
It works by computing the downtime from previous downtime. Don't compute downtime if the current date's status is open and the previous date's status is closed, which means the current downtime is a non-overlapping one. Non-overlapping downtime are denoted by null.
For that new downtime opened, its downtime is null initially, downtime will be computed on succeeding dates up to when it is closed.
Can make the code shorter by reversing the condition:
-- arranged the opening and closing downtime
with a as
DateOpen d, 1 status
from dt
union all
DateClosed, 2
from dt
-- order by d. postgres can do this?
-- don't compute the downtime from previous date
-- if the current date's status is opened
-- yet the previous status is closed
, downtime_minutes AS
lag(status) over(order by d, status desc) as prev_status,
case when not ( status = 1 and lag(status) over(order by d, status desc) = 2 ) then
datediff(minute, lag(d) over(order by d, status desc), d)
end as downtime
from a
select sum(downtime) from downtime_minutes;
Not particularly proud of my original solution:!18/462ac/1
As for the status desc on order by d, status desc, if a DateClosed is similar to other downtime's DateOpen, status desc will sort the DateClosed first.
For this data where 8:00 is present on both DateOpened and DateClosed:
([ID], [DateOpen], [DateClosed], [Total])
(1, '2019-04-01 07:00:00', '2019-04-01 07:50:00', 50),
(2, '2019-04-01 07:45:00', '2019-04-01 08:00:00', 15),
(3, '2019-04-01 08:00:00', '2019-04-01 08:45:00', 45);
For similar time (e.g., 8:00), if we will not sort the closing first before the open, then 7:00 will be computed up to 7:50 only, instead of up to 8:00, as 8:00-open's downtime is initially zero. Here's how the open and closed downtimes are arranged and computed if there's no status desc for similar date, e.g., 8:00. The total downtime is 95 minutes only, which is wrong. It should be 105 minutes.
Here's how that will be arranged and computed if we sort the DateClosed first before the DateOpen (by using status desc) when they have similar date, e.g., 8:00. The total downtime is 105 minutes, which is correct.
Another approach, uses gaps and islands approach. Answer is based on SQL Time Packing of Islands
Live test:!18/462ac/11
with gap_detector as
DateOpen, DateClosed,
case when
lag(DateClosed) over (order by DateOpen) is null
or lag(DateClosed) over (order by DateOpen) < DateOpen
end as gap
from dt
, downtime_grouper as
DateOpen, DateClosed,
sum(gap) over (order by DateOpen) as downtime_group
from gap_detector
-- group's open and closed detector. then computes the group's downtime
min(DateOpen) as group_date_open,
max(DateClosed) as group_date_closed,
datediff(minute, min(DateOpen), max(DateClosed)) as group_downtime,
sum(datediff(minute, min(DateOpen), max(DateClosed)))
over(order by downtime_group) as downtime_running_total
from downtime_grouper
group by downtime_group
How it works
A DateOpen is the start of the series of downtime if it has no previous downtime (indicated by null lag(DateClosed)). A DateOpen is also a start of the series of downtime if it has a gap from the previous downtime's DateClosed.
with gap_detector as
lag(DateClosed) over (order by DateOpen) as previous_downtime_date_closed,
DateOpen, DateClosed,
case when
lag(DateClosed) over (order by DateOpen) is null
or lag(DateClosed) over (order by DateOpen) < DateOpen
end as gap
from dt
select *
from gap_detector
order by DateOpen;
After detecting the gap starters, we do a running total of the gap so we can group downtimes that are contiguous to each other.
with gap_detector as
DateOpen, DateClosed,
case when
lag(DateClosed) over (order by DateOpen) is null
or lag(DateClosed) over (order by DateOpen) < DateOpen
end as gap
from dt
DateOpen, DateClosed, gap,
sum(gap) over (order by DateOpen) as downtime_group
from gap_detector
order by DateOpen;
As we can see from the output above, we can now easily detect the downtime group's earliest DateOpen and latest DateClosed by applying MIN(DateOpen) and MAX(DateClosed) by grouping on downtime_group. On downtime_group 1, we have earliest DateOpen of 08:00 and latest DateClosed of 08:45. On downtime_group 2, we have earliest DateOpen of 09:06 and latest DateClosed of 9:41. And from that we can recalculate the correct downtime even if there are simultaneous downtimes.
We can make the code shorter by eliminating the detection of null previous downtime (the current row we are evaluating is the firstmost row in the table) by reversing the logic. Instead of detecting the gaps, we detect the islands (contiguous downtimes). Something is contiguous if the previous downtime's DateClosed overlaps the DateOpen of the current downtime, denoted by 0. If it does not overlaps, then it is a gap, denoted by 1.
Here's the query:
Live test:!18/462ac/12
with gap_detector as
DateOpen, DateClosed,
case when lag(DateClosed) over (order by DateOpen) >= DateOpen
end as gap
from dt
, downtime_grouper as
DateOpen, DateClosed,
sum(gap) over (order by DateOpen) as downtime_group
from gap_detector
-- group's open and closed detector. then computes the group's downtime
min(DateOpen) as group_date_open,
max(DateClosed) as group_date_closed,
datediff(minute, min(DateOpen), max(DateClosed)) as group_downtime,
sum(datediff(minute, min(DateOpen), max(DateClosed)))
over(order by downtime_group) as downtime_running_total
from downtime_grouper
group by downtime_group
If you are using SQL Server 2012 or higher:
iif(lag(DateClosed) over (order by DateOpen) >= DateOpen, 0, 1) as gap

SQL how to write a query that return missing date ranges?

I am trying to figure out how to write a query that looks at certain records and finds missing date ranges between today and 9999-12-31.
My data looks like below:
ID |start_dt |end_dt |prc_or_disc_1
10412 |2018-07-17 00:00:00.000 |2018-07-20 00:00:00.000 |1050.000000
10413 |2018-07-23 00:00:00.000 |2018-07-26 00:00:00.000 |1040.000000
So for this data I would want my query to return:
2018-07-10 | 2018-07-16
2018-07-21 | 2018-07-22
2018-07-27 | 9999-12-31
I'm not really sure where to start. Is this possible?
You can do that using the lag() function in MS SQL (but that is available starting with 2012?).
with myData as
select *,
lag(end_dt,1) over (order by start_dt) as lagEnd
from myTable),
myMax as
select Max(end_dt) as maxDate from myTable
select dateadd(d,1,lagEnd) as StartDate, dateadd(d, -1, start_dt) as EndDate
from myData
where lagEnd is not null and dateadd(d,1,lagEnd) < start_dt
union all
select dateAdd(d,1,maxDate) as StartDate, cast('99991231' as Datetime) as EndDate
from myMax
where maxDate < '99991231';
If lag() is not available in MS SQL 2008, then you can mimic it with row_number() and joining.
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then end_dt +1 END as F1,
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then ISNULL(LEAD(start_dt) over (order by ID) - 1, '99991231') END as F2
from t
Working SQLFiddle example is -> Here
X.end_dt + 1 as F1,
ISNULL(Y.start_dt-1, '99991231') as F2
WHERE DATEDIFF(day, X.end_dt, ISNULL(Y.start_dt, '99991231')) > 1
Working SQLFiddle example is -> Here
This should work in 2008, it assumes that ranges in your table do not overlap. It will also eliminate rows where the end_date of the current row is a day before the start date of the next row.
with dtRanges as (
select start_dt, end_dt, row_number() over (order by start_dt) as rownum
from table1
select t2.end_dt + 1, coalesce(start_dt_next -1,'99991231')
( select dr1.start_dt, dr1.end_dt,dr2.start_dt as start_dt_next
from dtRanges dr1
left join dtRanges dr2 on dr2.rownum = dr1.rownum + 1
) t2
t2.end_dt + 1 <> coalesce(start_dt_next,'99991231')!18/65238/1
end_dt+1 AS start_dt,
LEAD(start_dt-1, 1, '9999-12-31')
OVER (ORDER BY start_dt)
AS end_dt
gaps.end_dt >= gaps.start_dt
I would, however, strongly urge you to use end dates that are "exclusive". That is, the range is everything up to but excluding the end_dt.
That way, a range of one day becomes '2018-07-09', '2018-07-10'.
It's really clear that my range is one day long, if you subtract one from the other you get a day.
Also, if you ever change to needing hour granularity or minute granularity you don't need to change your data. It just works. Always. Reliably. Intuitively.
If you search the web you'll find plenty of documentation on why inclusive-start and exclusive-end is a very good idea from a software perspective. (Then, in the query above, you can remove the wonky +1 and -1.)
This solves your case, but provide some sample data if there will ever be overlaps, fringe cases, etc.
Take one day after your end date and 1 day before the next line's start date.
DECLARE # TABLE (ID int, start_dt DATETIME, end_dt DATETIME, prc VARCHAR(100))
INSERT INTO # (id, start_dt, end_dt, prc)
(10410, '2018-07-09 00:00:00.00','2018-07-12 00:00:00.000','1025.000000'),
(10412, '2018-07-17 00:00:00.00','2018-07-20 00:00:00.000','1050.000000'),
(10413, '2018-07-23 00:00:00.00','2018-07-26 00:00:00.000','1040.000000')
, DATEADD(DAY, -1, LEAD(start_dt, 1, '9999-12-31') OVER(ORDER BY id) )
You may want to take a look at this:!18/3a224/1
You just have to edit the begin range to today and the end range to 9999-12-31.