I have a table with attendance dates in SQL Workbench/J. I need to define the attendance periods of the people, where any attendance period with gaps less or equal than 90 days are merged into a single attendance period and any gaps larger than that are considered a different attendance period. For example for a single person this is the table I have
id
year
start_date
end_date
prev_att_month
diff
1
2012
2012-08-01
2012-08-31
2012-07-01
31
1
2012
2012-07-01
2012-07-31
2012-04-01
91
1
2012
2012-04-01
2012-04-30
2012-03-01
31
1
2012
2012-03-01
2012-03-31
2012-02-01
29
1
2012
2012-02-01
2012-02-29
2012-01-01
31
1
2012
2012-01-01
2012-01-31
2011-12-01
31
1
2011
2011-12-01
2011-12-31
2011-11-01
30
1
2011
2011-11-01
2011-11-30
2011-10-01
31
1
2011
2011-10-01
2011-10-31
2011-09-01
30
1
2011
2011-09-01
2011-09-30
2011-08-01
31
1
2011
2011-08-01
2011-08-31
2011-07-01
31
1
2011
2011-07-01
2011-07-31
2011-05-01
61
1
2011
2011-05-01
2011-05-31
2011-04-01
30
1
2011
2011-04-01
2011-04-30
2011-03-01
31
1
2011
2011-03-01
2011-03-31
2011-02-01
28
1
2011
2011-02-01
2011-02-28
2010-08-01
184
1
2010
2010-08-01
2010-08-31
2010-07-01
31
1
2010
2010-07-01
2010-07-31
2010-06-01
30
1
2010
2010-06-01
2010-06-30
2010-05-01
31
1
2010
2010-05-01
2010-05-31
2010-04-01
30
1
2010
2010-04-01
2010-04-30
where I defined the previous attendance month column with a lag function and then found the difference between that column and the the start_date in the diff column. This way I can check the gaps between the attendance months
I want this output of the attendance periods with the 90 day rule explained above:
id
start_date
end_date
1
1/04/2010
31/08/2010
1
1/02/2011
30/04/2012
1
1/07/2012
31/08/2012
Does any one have an idea of how to do this?
So far I was just able to define the difference between the attendance months but since this is a large data set I have not been able to find a solution to define the attendance periods without making a row to row analysis.
with [table] as (
select id, year, start_date, end_date,
lag(start_date) over (partition by id order by id, year, start_date) as prev_att_month,
start_date-prev_att_month as diff
from source
)
select *
from [table]
where id = 1
One method would be to use a windowed COUNT to count how many times a value greater than 90 has appeared in the diff column, which provides a unique group number. Then you can just group your data into those groups and get the MIN and MAX values:
WITH Grps AS(
SELECT V.id,
V.year,
V.start_date,
V.end_date,
V.prev_att_month,
V.diff,
COUNT(CASE WHEN diff > 90 THEN 1 END) OVER (PARTITION BY ID ORDER BY V.start_date ASC) AS Grp
FROM (VALUES(1,2012,CONVERT(date,'20120801'),CONVERT(date,'20120831'),CONVERT(date,'2012-07-01'),31),
(1,2012,CONVERT(date,'20120701'),CONVERT(date,'20120731'),CONVERT(date,'2012-04-01'),91),
(1,2012,CONVERT(date,'20120401'),CONVERT(date,'20120430'),CONVERT(date,'2012-03-01'),31),
(1,2012,CONVERT(date,'20120301'),CONVERT(date,'20120331'),CONVERT(date,'2012-02-01'),29),
(1,2012,CONVERT(date,'20120201'),CONVERT(date,'20120229'),CONVERT(date,'2012-01-01'),31),
(1,2012,CONVERT(date,'20120101'),CONVERT(date,'20120131'),CONVERT(date,'2011-12-01'),31),
(1,2011,CONVERT(date,'20111201'),CONVERT(date,'20111231'),CONVERT(date,'2011-11-01'),30),
(1,2011,CONVERT(date,'20111101'),CONVERT(date,'20111130'),CONVERT(date,'2011-10-01'),31),
(1,2011,CONVERT(date,'20111001'),CONVERT(date,'20111031'),CONVERT(date,'2011-09-01'),30),
(1,2011,CONVERT(date,'20110901'),CONVERT(date,'20110930'),CONVERT(date,'2011-08-01'),31),
(1,2011,CONVERT(date,'20110801'),CONVERT(date,'20110831'),CONVERT(date,'2011-07-01'),31),
(1,2011,CONVERT(date,'20110701'),CONVERT(date,'20110731'),CONVERT(date,'2011-05-01'),61),
(1,2011,CONVERT(date,'20110501'),CONVERT(date,'20110531'),CONVERT(date,'2011-04-01'),30),
(1,2011,CONVERT(date,'20110401'),CONVERT(date,'20110430'),CONVERT(date,'2011-03-01'),31),
(1,2011,CONVERT(date,'20110301'),CONVERT(date,'20110331'),CONVERT(date,'2011-02-01'),28),
(1,2011,CONVERT(date,'20110201'),CONVERT(date,'20110228'),CONVERT(date,'2010-08-01'),184),
(1,2010,CONVERT(date,'20100801'),CONVERT(date,'20100831'),CONVERT(date,'2010-07-01'),31),
(1,2010,CONVERT(date,'20100701'),CONVERT(date,'20100731'),CONVERT(date,'2010-06-01'),30),
(1,2010,CONVERT(date,'20100601'),CONVERT(date,'20100630'),CONVERT(date,'2010-05-01'),31),
(1,2010,CONVERT(date,'20100501'),CONVERT(date,'20100531'),CONVERT(date,'2010-04-01'),30),
(1,2010,CONVERT(date,'20100401'),CONVERT(date,'20100430'),NULL,NULL))V(id,year,start_date,end_date,prev_att_month,diff))
SELECT id,
MIN(Start_date) AS Start_date,
MAX(End_Date) AS End_Date
FROM Grps
GROUP BY Id,
Grp
ORDER BY id,
Start_date;
Table example:
TableName:DownTime
Columns:
EventID - int,
ReasonID - int,
StartTime - DateTime,
EndTime - DateTime
I would like to calculate the daily downtime of each event over the last 30 days. However, an event could be down for multiple days, so the start time would could start 31 days prior and could end next week. So I need to return a row of data for that event 30 times. The event could also be 10 minutes long, so I need a row of data showing that.
So far I only get one row representing the entire event that occurs over 30 days instead of 30 rows.
MS SQL Server 2005 database
Thank you for any help.
You may want to create a date dimension table - a table that is a collection of days. The absolute simplest example for your purposes would have one column, date, and you would have one row for every consecutive day. Youu could then join this table to your downtime table:
from dim_date t join downtime d on t.date between convert(varchar(10), d.start_date, 120) and convert(varchar(10), d.end_date, 120)
This way if you have down time, you will get a row for every day containing down time.
before you try my code, you need to do a one time setup of a Numbers table. this will create a table names Numbers with a column Number having rows with values from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
This Numbers table is joined to your existing table to "create" the rows you need, try this:
DECLARE #DownTime table (
EventID int,
ReasonID int,
StartTime DateTime,
EndTime DateTime)
INSERT INTO #DownTime VALUES (1,1,'9/10/2009 9:00am','9/10/2009 10:00am')
INSERT INTO #DownTime VALUES (2,1,'9/10/2009 1:00am','9/15/2009 1:00am')
INSERT INTO #DownTime VALUES (3,1,'9/10/2009' ,'9/11/2009')
SELECT
d.EventID,d.ReasonID,d.StartTime,d.EndTime
, DATEADD(day,Number-1,d.StartTime) AS SequenceDate
FROM #DownTime d
INNER JOIN Numbers n ON n.Number<=DATEDIFF(day,d.StartTime,d.EndTime)+1
OUTPUT:
EventID ReasonID StartTime EndTime SequenceDate
------- -------- ----------------------- ----------------------- -----------------------
1 1 2009-09-10 09:00:00.000 2009-09-10 10:00:00.000 2009-09-10 09:00:00.000
2 1 2009-09-10 01:00:00.000 2009-09-15 01:00:00.000 2009-09-10 01:00:00.000
2 1 2009-09-10 01:00:00.000 2009-09-15 01:00:00.000 2009-09-11 01:00:00.000
2 1 2009-09-10 01:00:00.000 2009-09-15 01:00:00.000 2009-09-12 01:00:00.000
2 1 2009-09-10 01:00:00.000 2009-09-15 01:00:00.000 2009-09-13 01:00:00.000
2 1 2009-09-10 01:00:00.000 2009-09-15 01:00:00.000 2009-09-14 01:00:00.000
2 1 2009-09-10 01:00:00.000 2009-09-15 01:00:00.000 2009-09-15 01:00:00.000
3 1 2009-09-10 00:00:00.000 2009-09-11 00:00:00.000 2009-09-10 00:00:00.000
3 1 2009-09-10 00:00:00.000 2009-09-11 00:00:00.000 2009-09-11 00:00:00.000
(9 row(s) affected)