Finding the largest time overlap in T-SQL - sql

I'm trying to do this on SQL Server 2008 R2.
I have a table with 4 columns:
parent_id INT
child_id INT
start_time TIME
end_time TIME
You should look at the children as sub-processes that run for the parent program. All these sub-processes are run once every day, and each child run within its given time span. I want to find the largest overlap of time intervals for each parent based on the times of its children, i.e. I want to know the longest possible overlap where all the sub-processes are running. The fact that each time span is repeated every day means that even if child's time interval spans midnight (i.e. 23:00-10:00), it can overlap with a child that only runs in the morning (i.e. 07:00-09:00), because even if they don't overlap on "the first day", they will overlap on all subsequent days.
The output should look like this:
parent_id INT
start_time TIME
end_time TIME
valid BIT
Where valid = 1 if an overlap was found and valid = 0 if no overlap was found.
A couple of important pieces of information:
A time interval can span midnight, i.e. start_time = 23:00 and end_time = 03:00, which is a time interval of 4 hours.
Two time intervals may overlap in two different places, i.e. start_time1 = 13:00, end_time1 = 06:00, start_time2 = 04:00, end_time2 = 14:00. This would give the largest overlap as 04:00 - 06:00 = 2 hours.
There may be no overlap common for the children of a given parent, in which case the out put for that parent would be start_time = NULL, end_time = NULL and valid = 0.
If a child interval spans the whole day, then start_time = NULL and end_time = NULL. This was chosen to avoid having a day as 00:00-24:00, which would slice overlaps crossing midnight in two, i.e. parent 3 below would end up having two overlaps (23:00-24:00 and 00:00 - 004:00), in stead of one (23:00-04:00).
An overlap is only an overlap if the time interval is shared by all the children of a parent.
The time span of one child can never be longer than 24 hours.
Take this example:
parent_id child_id start_time end_time
1 1 06:00 14:00
1 2 13:00 09:00
1 3 07:00 09:00
2 1 12:00 17:00
2 2 09:00 11:00
3 1 NULL NULL
3 2 23:00 04:00
4 1 NULL NULL
4 2 NULL NULL
10 1 06:11 14:00
10 2 06:00 09:00
10 3 05:00 08:44
11 1 11:38 17:00
11 2 09:02 12:11
These data would produce this result set:
parent_id start_time end_time valid
1 07:00 09:00 1
2 NULL NULL 0
3 23:00 04:00 1
4 NULL NULL 1
10 06:11 08:44 1
11 11:38 12:11 1
The overlap for a parent is the time interval that is shared by all its children. So the overlap for parent 10 is found by finding the overlap where all 3 children share time:
Child 1 (06:11-14:00) and 2 (06:00-09:00) overlap from 06:11 to 09:00. This overlap time interval is then applied to child 3 (05:00-08:44), which gives an overlap of 06:11 to 08:44, since this interval is the only interval where all 3 children share common time.
I hope this makes sense.
I can do it with a cursor, but I would really prefer to avoid cursors. I have been wracking my brain about how to do it without cursors, but I have come up short. Is there any way of doing it without cursors?
EDIT: Expanded the text for clause 4, to explain the decision of having a full day be NULL to NULL, in stead of 00:00 to 00:00.
EDIT: Expanded the examples with two more cases. The new cases have parent ID 10 and 11.
EDIT: Inserted explanation of how the overlap for parent 10 is found.
EDIT: Clarified clause 3. Added clauses 5 and 6. Went into detail about what this is all about.

Based on your question, I think your output should be:
parent_id start_time end_time valid
1 07:00 09:00 1
2 NULL NULL 0
3 23:00 04:00 1
4 NULL NULL 1
10 06:11 08:44 1
11 11:38 12:11 1
And here is a set-based solution:
DECLARE #Times TABLE
(
parent_id INT
,child_id INT
,start_time TIME
,end_time TIME
);
INSERT INTO #Times
VALUES
(1, 1, '06:00', '14:00')
,(1, 2, '13:00', '09:00')
,(1, 3, '07:00', '09:00')
,(2, 1, '12:00', '17:00')
,(2, 2, '09:00', '11:00')
,(3, 1, NULL, NULL)
,(3, 2, '23:00', '04:00')
,(4, 1, NULL, NULL)
,(4, 2, NULL, NULL)
,(10, 1, '06:11', '14:00')
,(10, 2, '06:00', '09:00')
,(10, 3, '05:00', '08:44')
,(11, 1, '11:38', '17:00')
,(11, 2, '09:02', '12:11');
DECLARE #Parents TABLE
(
parent_id INT PRIMARY KEY
,ChildCount INT
)
INSERT INTO #Parents
SELECT
parent_id
,COUNT(DISTINCT child_id) AS ChildCount
FROM
#Times
GROUP BY
parent_id
DECLARE #StartTime DATETIME2 = '00:00'
DECLARE #MinutesInTwoDays INT = 2880
DECLARE #Minutes TABLE(ThisMinute DATETIME2 PRIMARY KEY);
WITH
MinutesCTE AS
(
SELECT
1 AS MinuteNumber
,#StartTime AS ThisMinute
UNION ALL
SELECT
NextMinuteNumber
,NextMinute
FROM MinutesCTE
CROSS APPLY (VALUES(MinuteNumber+1,DATEADD(MINUTE,1,ThisMinute))) NextDates(NextMinuteNumber,NextMinute)
WHERE
NextMinuteNumber <= #MinutesInTwoDays
)
INSERT INTO #Minutes
SELECT ThisMinute FROM MinutesCTE M OPTION (MAXRECURSION 2880);
DECLARE #SharedMinutes TABLE
(
ThisMinute DATETIME2
,parent_id INT
,UNIQUE(ThisMinute,parent_id)
);
WITH TimesCTE AS
(
SELECT
Times.parent_id
,Times.child_id
,CAST(ISNULL(Times.start_time,'00:00') AS datetime2) AS start_time
,
DATEADD
(
DAY
,
CASE
WHEN Times.end_time IS NULL THEN 2
WHEN Times.start_time > Times.end_time THEN 1
ELSE 0
END
,CAST(ISNULL(Times.end_time,'00:00') AS datetime2)
) as end_time
FROM
#Times Times
UNION ALL
SELECT
Times.parent_id
,Times.child_id
,DATEADD(DAY,1,CAST(Times.start_time as datetime2)) AS start_time
,DATEADD(DAY,1,CAST(Times.end_time AS datetime2)) AS end_time
FROM
#Times Times
WHERE
start_time < end_time
)
--Get minutes shared by all children of each parent
INSERT INTO #SharedMinutes
SELECT
M.ThisMinute
,P.parent_id
FROM
#Minutes M
JOIN
TimesCTE T
ON
M.ThisMinute BETWEEN start_time AND end_time
JOIN
#Parents P
ON T.parent_id = P.parent_id
GROUP BY
M.ThisMinute
,P.parent_id
,P.ChildCount
HAVING
COUNT(DISTINCT T.child_id) = P.ChildCount
--get results
SELECT
parent_id
,CAST(CASE WHEN start_time = '1900-01-01' AND end_time = '1900-01-02 23:59' THEN NULL ELSE start_time END AS TIME) AS start_time
,CAST(CASE WHEN start_time = '1900-01-01' AND end_time = '1900-01-02 23:59' THEN NULL ELSE end_time END AS TIME) AS end_time
,valid
FROM
(
SELECT
P.parent_id
,MIN(ThisMinute) AS start_time
,MAX(ThisMinute) AS end_time
,CASE WHEN MAX(ThisMinute) IS NOT NULL THEN 1 ELSE 0 END AS valid
FROM
#Parents P
LEFT JOIN
#SharedMinutes SM
ON P.parent_id = SM.parent_id
GROUP BY
P.parent_id
) Results
You may find that the iterative algorithm you have outlined in your question would be more efficient. But I would use a WHILE loop instead of a cursor if you take that approach.

This might be a very verbose method of achieving the desired results, but it works for the given dataset, although it should be tested with larger data.
I've simply joined the table to itself where the parent_id matches and the child_id is different to get all of the combinations of times that might overlap and then performed some DATEDIFF's to calculate the difference, before filtering and grouping the output.
You can run the below in isolation to test and tweak if required:
-- setup initial table
CREATE TABLE #OverlapTable
(
[parent_id] INT ,
[child_id] INT ,
[start_time] TIME ,
[end_time] TIME
);
-- insert dummy data
INSERT INTO #OverlapTable
( [parent_id], [child_id], [start_time], [end_time] )
VALUES ( 1, 1, '06:00', '14:00' ),
( 1, 2, '13:00', '09:00' ),
( 1, 3, '07:00', '09:00' ),
( 2, 1, '12:00', '17:00' ),
( 2, 2, '09:00', '11:00' ),
( 3, 1, NULL, NULL ),
( 3, 2, '23:00', '04:00' ),
( 4, 1, NULL, NULL ),
( 4, 2, NULL, NULL );
-- insert all combinations into a new temp table #Results with overlap calculations
SELECT *
INTO #Results
FROM ( SELECT t1.parent_id ,
t1.start_time ,
t1.end_time ,
t2.start_time AS t2_start_time ,
t2.end_time AS t2_end_time ,
CASE WHEN t1.start_time IS NULL
AND t1.end_time IS NULL THEN 0
WHEN t1.start_time BETWEEN t2.start_time
AND t2.end_time
THEN DATEDIFF(HOUR, t1.start_time, t2.end_time)
WHEN t1.end_time BETWEEN t2.start_time AND t2.end_time
THEN DATEDIFF(HOUR, t2.start_time, t1.end_time)
ELSE NULL
END AS Overlap
FROM #OverlapTable t1
INNER JOIN #OverlapTable t2 ON t2.parent_id = t1.parent_id
AND t2.child_id != t1.child_id
) t
-- SELECT * FROM #Results -- this shows intermediate results
-- filter and group results with the largest overlaps and handle other cases
SELECT DISTINCT
r.parent_id ,
CASE WHEN r.Overlap IS NULL THEN NULL
ELSE CASE WHEN r.start_time IS NULL THEN r.t2_start_time
ELSE r.start_time
END
END start_time ,
CASE WHEN r.Overlap IS NULL THEN NULL
ELSE CASE WHEN r.end_time IS NULL THEN r.t2_end_time
ELSE r.end_time
END
END end_time ,
CASE WHEN r.Overlap IS NULL THEN 0
ELSE 1
END Valid
FROM #Results r
WHERE EXISTS ( SELECT parent_id ,
MAX(Overlap)
FROM #Results
WHERE r.parent_id = parent_id
GROUP BY parent_id
HAVING MAX(Overlap) = r.Overlap
OR ( MAX(Overlap) IS NULL
AND r.Overlap IS NULL
) )
DROP TABLE #Results
DROP TABLE #OverlapTable
Hope that helps.

Related

Table with inconsistent starttime and stoptime

I have a table that contains a date, a starttime, and a stoptime. I have several problems that I don't know how to solve for. Each row contains either a starttime OR a stoptime (not both). While this itself is not a problem, I need to calculate the runtime by date. Also, there are instances of multiple starttimes before there is a stoptime registered. Assume the following:
Date, Starttime, Stoptime
4/1/2016, 23:00:00, NULL
4/2/2016, NULL, 03:00:00
4/2/2016, 05:00:00, NULL
4/2/2016, 07:00:00, NULL
4/2/2016, NULL, 08:00:00
4/2/2016, 10:00:00, NULL
4/2/2016, NULL, 10:15:00
I need the output to be:
4/1/2016, 01:00:00
4/2/2016, 06:15:00
I have tried a few things, with very poor results. Can any experts out there solve this problem?
Here's one way how you could handle this:
Setup:
CREATE TABLE #Table1
([Date] date, [Starttime] time, [Stoptime] time)
;
INSERT INTO #Table1
([Date], [Starttime], [Stoptime])
VALUES
('2016-04-01', '23:00:00', NULL),
('2016-04-02', NULL, '00:00:00'),
('2016-04-02', '00:00:00', NULL),
('2016-04-02', NULL, '03:00:00'),
('2016-04-02', '05:00:00', NULL),
('2016-04-02', '07:00:00', NULL),
('2016-04-02', NULL, '08:00:00'),
('2016-04-02', '10:00:00', NULL),
('2016-04-02', NULL, '10:15:00')
;
SQL:
select
convert(date, StartTime),
convert(time, dateadd(second, sum(datediff(second, StartTime, EndTime)),0))
from (
select
min(Time) as StartTime,
min(case when Type = 0 then Time end) as EndTime
from
(
select
sum(Type) over (order by Time asc, Type Asc) as GRP, *
from (
select
isnull(lag(Type) over (order by Time asc, Type Asc), -1) as LagType, *
from (
select
case when Starttime is NULL then 0 else 1 end as Type,
case when Starttime is NULL
then convert(datetime, [Date]) + convert(datetime, Stoptime)
else convert(datetime, [Date]) + convert(datetime, Starttime) end as Time
from
#Table1
) A
) B
where Type != 1 or LagType != 1
) C
group by GRP
) D
group by convert(date, StartTime)
Result:
2016-04-01 01:00:00
2016-04-02 06:15:00
This requires that you have start and stop record in your data for each of the days, so you'll need to use union or something like that to add that to your initial data.
The innermost select generates typing (0/1) for the rows and calculates the date + start / end time into a single column. The next one adds TypeLag to the data and it's used to remove the duplicate start records. The next select has a running total over Type, so that start and end records belonging together will have a unique group number. The rest will just pick the start time + earliest end time from each group and calculate the durations.

Clone rows based on the column

I have data like shown below:
ID Duration Start Date End Date
------------------------------------------------------
10 2 2013-09-03 05:00:00 2013-09-03 05:02:00
I need output like below:
10 2 2013-09-03 05:00:00 2013-09-03 05:01:00 1
10 2 2013-09-03 05:01:00 2013-09-03 05:02:00 2
Based on the column Duration, if the value is 2, I need rows to be duplicated twice.
And if we see at the Output for Start Date and End Date time should be changed accordingly.
And Row count as an additional column for number rows duplicated in this case 1 / 2 shown above will help a lot.
And if duration is 0 and 1 then do nothing , only when duration > 1 then duplicate rows.
And at last Additional column for number row Sequence 1 , 2 ,3 for showing how many rows was duplicated.
try the sql below, I added some comments where I thought it was seemed necessery.
declare #table table(Id integer not null, Duration int not null, StartDate datetime, EndDate datetime)
insert into #table values (10,2, '2013-09-03 05:00:00', '2013-09-03 05:02:00')
insert into #table values (11,3, '2013-09-04 05:00:00', '2013-09-04 05:03:00')
;WITH
numbers AS (
--this is the number series generator
--(limited to 1000, you can change that to whatever you need
-- max possible duration in your case).
SELECT 1 AS num
UNION ALL
SELECT num+1 FROM numbers WHERE num+1<=100
)
SELECT t.Id
, t.Duration
, StartDate = DATEADD(MINUTE, IsNull(Num,1) - 1, t.StartDate)
, EndDate = DATEADD(MINUTE, IsNull(Num,1), t.StartDate)
, N.num
FROM #table t
LEFT JOIN numbers N
ON t.Duration >= N.Num
-- join it with numbers generator for Duration times
ORDER BY t.Id
, N.Num
This works better when Duration = 0:
declare #table table(Id integer not null, Duration int not null, StartDate datetime, EndDate datetime)
insert into #table values (10,2, '2013-09-03 05:00:00', '2013-09-03 05:02:00')
insert into #table values (11,3, '2013-09-04 05:00:00', '2013-09-04 05:03:00')
insert into #table values (12,0, '2013-09-04 05:00:00', '2013-09-04 05:03:00')
insert into #table values (13,1, '2013-09-04 05:00:00', '2013-09-04 05:03:00')
;WITH
numbers AS (
--this is the number series generator
--(limited to 1000, you can change that to whatever you need
-- max possible duration in your case).
SELECT 1 AS num
UNION ALL
SELECT num+1 FROM numbers WHERE num+1<=100
)
SELECT
Id
, Duration
, StartDate
, EndDate
, num
FROM
(SELECT
t.Id
, t.Duration
, StartDate = DATEADD(MINUTE, Num - 1, t.StartDate)
, EndDate = DATEADD(MINUTE, Num, t.StartDate)
, N.num
FROM #table t
INNER JOIN numbers N
ON t.Duration >= N.Num ) A
-- join it with numbers generator for Duration times
UNION
(SELECT
t.Id
, t.Duration
, StartDate-- = DATEADD(MINUTE, Num - 1, t.StartDate)
, EndDate --= DATEADD(MINUTE, Num, t.StartDate)
, 1 AS num
FROM #table t
WHERE Duration = 0)
ORDER BY Id,Num

Calculate time duration with daily limits in SQL

I have a table that stores some events in the database, which log operational time of machines and I'd like to calculate a total running time within a specific date for a specific shift (or all shifts).
CREATE TABLE events (
ID int,
StartTime datetime,
EndTime datetime,
DurationSeconds bigint,
...
)
I'd like to select total duration of events in the specified date range #dateStart datetime, #dateEnd datetime while considering daily shifts (#shiftStart time, #shiftEnd time). The events will overlap the shifts and the date ranges.
For example, if I have shift starting at 6:00 and ending at 12:00, and the event lasted for whole 2 days (2014/01/01 00:00 - 2014/01/03 00:00), the total time for this row is (48 hours - 2*6 hours = 36 hours).
If an event starts in the middle of the shift, then only the 'in-shift' portion should be considered.
So far I had an implementation without considering the shifts like:
select sum(
--duration minus start overlap and end overlap
duration - dbo.udf_max(datediff(s,#pTo,endtime),0) - dbo.udf_max(datediff(s,starttime,#pFrom),0)
)
from events
where starttime < #pTo and endtime > #pFrom
I'd really like to have a set-based solution as the data sets are rather large and consider a looping cursor-based solution as the last resort.
Ok lets make some test data (I changed the times a little to show variance)
DECLARE #Events TABLE
(
ID int IDENTITY (1, 1),
StartTime datetime,
EndTime datetime,
DurationSeconds bigint
)
INSERT INTO #Events
( StartTime, EndTime )
VALUES
( '2014/01/01 01:00', '2014/01/02 22:00');
DECLARE #Shift TABLE
(
ShiftName VARCHAR(20),
StartTime DATETIME,
EndTime DATETIME
)
INSERT INTO #Shift
( ShiftName, StartTime, EndTime )
VALUES
( 'Night', '2014/01/01 00:00', '2014/01/01 06:00' ),
( 'Morning', '2014/01/01 06:00', '2014/01/01 12:00' ),
( 'Afternoon', '2014/01/01 12:00', '2014/01/01 18:00' ),
( 'Evening', '2014/01/01 18:00', '2014/01/02 00:00' ),
( 'Night', '2014/01/02 00:00', '2014/01/02 06:00' ),
( 'Morning', '2014/01/02 06:00', '2014/01/02 12:00' ),
( 'Afternoon', '2014/01/02 12:00', '2014/01/02 18:00' ),
( 'Evening', '2014/01/02 18:00', '2014/01/03 00:00' );
Here I make a numbers table to find all the minutes for the events duration
DECLARE #StartDate DATETIME = '1/1/2014';
DECLARE #number_of_numbers INT = 100000;
;WITH
a AS (SELECT 1 AS i UNION ALL SELECT 1),
b AS (SELECT 1 AS i FROM a AS x, a AS y),
c AS (SELECT 1 AS i FROM b AS x, b AS y),
d AS (SELECT 1 AS i FROM c AS x, c AS y),
e AS (SELECT 1 AS i FROM d AS x, d AS y),
f AS (SELECT 1 AS i FROM e AS x, e AS y),
numbers AS
(
SELECT TOP(#number_of_numbers)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS number
FROM f
), mins_in_day AS
Now I find all the minutes worked for each shift and total them
(
SELECT DATEADD(MINUTE, n.number, #StartDate) AS DayMinute
FROM numbers n
), output AS
(
SELECT s.ShiftName, CONVERT(DATE, s.StartTime) ShiftDay, COUNT(1) AS TotalMinutes FROM #Shift s
INNER JOIN mins_in_day sc
ON sc.DayMinute >= s.StartTime AND sc.DayMinute < s.EndTime
INNER JOIN #Events e
ON sc.DayMinute >= e.StartTime AND sc.DayMinute <e.EndTime
GROUP BY s.ShiftName, s.StartTime
)
SELECT * FROM output
Here is the output:
ShiftName ShiftDay TotalMinutes
Night 2014-01-01 300
Morning 2014-01-01 360
Afternoon 2014-01-01 360
Evening 2014-01-01 360
Night 2014-01-02 360
Morning 2014-01-02 360
Afternoon 2014-01-02 360
Evening 2014-01-02 240

SQL Counting Total Time but resetting total if large gap

I have a table containing device movements.
MoveID DeviceID Start End
I want to find out if there is a way to sum up the total movement days for each device to the present. However if there is a gap 6 weeks bewtween an end date and the next start date then the time count is reset.
MoveID DeviceID Start End
1 1 2011-1-1 2011-2-1
2 1 2011-9-1 2011-9-20
3 1 2011-9-25 2011-9-28
The total for device should be 24 days as because there is a gap of greater than 6 weeks. Also I'd like to find out the number of days since the first movement in the group in this case 28 days as the latest count group started on the 2011-9-1
I thought I could do it with a stored proc and a cursor etc (which is not good) just wondered if there was anything better?
Thanks
Graeme
create table #test
(
MoveID int,
DeviceID int,
Start date,
End_time date
)
--drop table #test
insert into #test values
(1,1,'2011-1-1','2011-2-1'),
(2,1,'2011-9-1','2011-9-20'),
(3,1,'2011-9-25','2011-9-28')
select
a.DeviceID,
sum(case when datediff(dd, a.End_time, isnull(b.Start, a.end_time)) > 42 /*6 weeks = 42 days*/ then 0 else datediff(dd,a.Start, a.End_time)+1 /*we will count also the last day*/ end) as movement_days,
sum(case when datediff(dd, a.End_time, isnull(b.Start, a.end_time)) > 42 /6 weeks = 42 days/ then 0 else datediff(dd,a.Start, a.End_time)+1 /we will count also the last day/ end + case when b.MoveID is null then datediff(dd, a.Start, a.End_time) + 1 else 0 end) as total_days
from
#test a
left join #test b
on a.DeviceID = b.DeviceID
and a.MoveID + 1 = b.MoveID
group by
a.DeviceID
Let me know if you need some explanation - there can be more ways to do that...
DECLARE #Times TABLE
(
MoveID INT,
DeviceID INT,
Start DATETIME,
[End] DATETIME
)
INSERT INTO #Times VALUES (1, 1, '1/1/2011', '2/1/2011')
INSERT INTO #Times VALUES (2, 1, '9/1/2011', '9/20/2011')
INSERT INTO #Times VALUES (3, 1, '9/25/2011', '9/28/2011')
INSERT INTO #Times VALUES (4, 2, '1/1/2011', '2/1/2011')
INSERT INTO #Times VALUES (5, 2, '3/1/2011', '4/20/2011')
INSERT INTO #Times VALUES (6, 2, '5/1/2011', '6/20/2011')
DECLARE #MaxGapInWeeks INT
SET #MaxGapInWeeks = 6
SELECT
validTimes.DeviceID,
SUM(DATEDIFF(DAY, validTimes.Start, validTimes.[End]) + 1) AS TotalDays,
DATEDIFF(DAY, MIN(validTimes.Start), MAX(validTimes.[End])) + 1 AS TotalDaysInGroup
FROM
#Times validTimes LEFT JOIN
#Times timeGap
ON timeGap.DeviceID = validTimes.DeviceID
AND timeGap.MoveID <> validTimes.MoveID
AND DATEDIFF(WEEK, validTimes.[End], timeGap.Start) > #MaxGapInWeeks
WHERE timeGap.MoveID IS NULL
GROUP BY validTimes.DeviceID

SQL Audit Log Running Totals

I have a table with an audit log:
BugId Timestamp Status
1 2010-06-24 10:00:00 open
2 2010-06-24 11:00:00 open
1 2010-06-25 12:00:00 closed
2 2010-06-26 13:00:00 closed
I want a running total of open and closed bugs like:
Timestamp # Status
2010-06-25 00:00:00 2 open
2010-06-26 00:00:00 1 open
2010-06-26 00:00:00 1 closed
2010-06-27 00:00:00 2 closed
How may I do this query (or similar) in Microsoft SQL Server 2000?
The output is intended to be used to feed a time series chart so I do not care if there are rows with 0 output since I will probably only select a timespan like the last month.
I think the output actually matches the sample data: on the 25th (12am), there are two open bugs. On the 26th, there is one open bug and one closed. And by the 27th, all bugs are closed.
It isn't clear how the main dates should be created. For my example, I pre-loaded the dates that I knew to be right but this could be accomplished in a variety of ways depending on the requirements of the user.
Anyway, the code is below. This should work for instances where a bug is opened and closed multiple times on the same day. It operates under the assumption that a bug cannot be opened and closed at the same time.
/** Setup the tables **/
IF OBJECT_ID('tempdb..#bugs') IS NOT NULL DROP TABLE #bugs
CREATE TABLE #bugs (
BugID INT,
[Timestamp] DATETIME,
[Status] VARCHAR(10)
)
IF OBJECT_ID('tempdb..#dates') IS NOT NULL DROP TABLE #dates
CREATE TABLE #dates (
[Date] DATETIME
)
/** Load the sample data. **/
INSERT #bugs
SELECT 1, '2010-06-24 10:00:00', 'open' UNION ALL
SELECT 2, '2010-06-24 11:00:00', 'open' UNION ALL
SELECT 1, '2010-06-25 12:00:00', 'closed' UNION ALL
SELECT 2, '2010-06-26 13:00:00', 'closed'
/** Build an arbitrary date table **/
INSERT #dates
SELECT '2010-06-24' UNION ALL
SELECT '2010-06-25' UNION ALL
SELECT '2010-06-26' UNION ALL
SELECT '2010-06-27'
/**
Subquery x:
For each date in the #date table,
get the BugID and it's last status.
This is for BugIDs that have been
opened and closed on the same day.
Subquery y:
Drawing from subquery x, get the
date, BugID, and Status of its
last status for that day
Main query:
For each date, get the count
of the most recent statuses for
that date. This will give the
running totals of open and
closed bugs for each date
**/
SELECT
[Date],
COUNT(*) AS [#],
[Status]
FROM (
SELECT
Date,
x.BugID,
b.[Status]
FROM (
SELECT
[Date],
BugID,
MAX([Timestamp]) AS LastStatus
FROM #dates d
INNER JOIN #bugs b
ON d.[Date] > b.[Timestamp]
GROUP BY
[Date],
BugID
) x
INNER JOIN #bugs b
ON x.BugID = b.BugID
AND x.LastStatus = b.[Timestamp]
) y
GROUP BY [Date], [Status]
ORDER BY [Date], CASE WHEN [Status] = 'Open' THEN 1 ELSE 2 END
Results:
Date # Status
----------------------- ----------- ----------
2010-06-25 00:00:00.000 2 open
2010-06-26 00:00:00.000 1 open
2010-06-26 00:00:00.000 1 closed
2010-06-27 00:00:00.000 2 closed
use tempdb
go
create table audit_log
(
BugID integer not null
, dt_entered_utc datetime not null default ( getutcdate () )
, [status] varchar(10) not null
);
INSERT INTO audit_log ( BugID, dt_entered_utc, [status] ) VALUES ( 1, '2010-06-24 10:00', 'open' );
INSERT INTO audit_log ( BugID, dt_entered_utc, [status] ) VALUES ( 2, '2010-06-24 11:00', 'open' );
INSERT INTO audit_log ( BugID, dt_entered_utc, [status] ) VALUES ( 1, '2010-06-25 12:00', 'closed' );
INSERT INTO audit_log ( BugID, dt_entered_utc, [status] ) VALUES ( 2, '2010-06-26 13:00', 'closed' );
SELECT
[Date] = CAST ( CONVERT ( varchar, a.dt_entered_utc, 101 ) as datetime )
, [#] = COUNT ( 1 )
, [Status] = a.status
FROM audit_log a
GROUP BY CAST ( CONVERT ( varchar, a.dt_entered_utc, 101 ) as datetime ), a.status
ORDER by [Date] ASC
Date # Status
2010-06-24 00:00:00.000 2 open
2010-06-25 00:00:00.000 1 closed
2010-06-26 00:00:00.000 1 closed