I have a table with attendance dates in SQL Workbench/J. I need to define the attendance periods of the people, where any attendance period with gaps less or equal than 90 days are merged into a single attendance period and any gaps larger than that are considered a different attendance period. For example for a single person this is the table I have
id
year
start_date
end_date
prev_att_month
diff
1
2012
2012-08-01
2012-08-31
2012-07-01
31
1
2012
2012-07-01
2012-07-31
2012-04-01
91
1
2012
2012-04-01
2012-04-30
2012-03-01
31
1
2012
2012-03-01
2012-03-31
2012-02-01
29
1
2012
2012-02-01
2012-02-29
2012-01-01
31
1
2012
2012-01-01
2012-01-31
2011-12-01
31
1
2011
2011-12-01
2011-12-31
2011-11-01
30
1
2011
2011-11-01
2011-11-30
2011-10-01
31
1
2011
2011-10-01
2011-10-31
2011-09-01
30
1
2011
2011-09-01
2011-09-30
2011-08-01
31
1
2011
2011-08-01
2011-08-31
2011-07-01
31
1
2011
2011-07-01
2011-07-31
2011-05-01
61
1
2011
2011-05-01
2011-05-31
2011-04-01
30
1
2011
2011-04-01
2011-04-30
2011-03-01
31
1
2011
2011-03-01
2011-03-31
2011-02-01
28
1
2011
2011-02-01
2011-02-28
2010-08-01
184
1
2010
2010-08-01
2010-08-31
2010-07-01
31
1
2010
2010-07-01
2010-07-31
2010-06-01
30
1
2010
2010-06-01
2010-06-30
2010-05-01
31
1
2010
2010-05-01
2010-05-31
2010-04-01
30
1
2010
2010-04-01
2010-04-30
where I defined the previous attendance month column with a lag function and then found the difference between that column and the the start_date in the diff column. This way I can check the gaps between the attendance months
I want this output of the attendance periods with the 90 day rule explained above:
id
start_date
end_date
1
1/04/2010
31/08/2010
1
1/02/2011
30/04/2012
1
1/07/2012
31/08/2012
Does any one have an idea of how to do this?
So far I was just able to define the difference between the attendance months but since this is a large data set I have not been able to find a solution to define the attendance periods without making a row to row analysis.
with [table] as (
select id, year, start_date, end_date,
lag(start_date) over (partition by id order by id, year, start_date) as prev_att_month,
start_date-prev_att_month as diff
from source
)
select *
from [table]
where id = 1
One method would be to use a windowed COUNT to count how many times a value greater than 90 has appeared in the diff column, which provides a unique group number. Then you can just group your data into those groups and get the MIN and MAX values:
WITH Grps AS(
SELECT V.id,
V.year,
V.start_date,
V.end_date,
V.prev_att_month,
V.diff,
COUNT(CASE WHEN diff > 90 THEN 1 END) OVER (PARTITION BY ID ORDER BY V.start_date ASC) AS Grp
FROM (VALUES(1,2012,CONVERT(date,'20120801'),CONVERT(date,'20120831'),CONVERT(date,'2012-07-01'),31),
(1,2012,CONVERT(date,'20120701'),CONVERT(date,'20120731'),CONVERT(date,'2012-04-01'),91),
(1,2012,CONVERT(date,'20120401'),CONVERT(date,'20120430'),CONVERT(date,'2012-03-01'),31),
(1,2012,CONVERT(date,'20120301'),CONVERT(date,'20120331'),CONVERT(date,'2012-02-01'),29),
(1,2012,CONVERT(date,'20120201'),CONVERT(date,'20120229'),CONVERT(date,'2012-01-01'),31),
(1,2012,CONVERT(date,'20120101'),CONVERT(date,'20120131'),CONVERT(date,'2011-12-01'),31),
(1,2011,CONVERT(date,'20111201'),CONVERT(date,'20111231'),CONVERT(date,'2011-11-01'),30),
(1,2011,CONVERT(date,'20111101'),CONVERT(date,'20111130'),CONVERT(date,'2011-10-01'),31),
(1,2011,CONVERT(date,'20111001'),CONVERT(date,'20111031'),CONVERT(date,'2011-09-01'),30),
(1,2011,CONVERT(date,'20110901'),CONVERT(date,'20110930'),CONVERT(date,'2011-08-01'),31),
(1,2011,CONVERT(date,'20110801'),CONVERT(date,'20110831'),CONVERT(date,'2011-07-01'),31),
(1,2011,CONVERT(date,'20110701'),CONVERT(date,'20110731'),CONVERT(date,'2011-05-01'),61),
(1,2011,CONVERT(date,'20110501'),CONVERT(date,'20110531'),CONVERT(date,'2011-04-01'),30),
(1,2011,CONVERT(date,'20110401'),CONVERT(date,'20110430'),CONVERT(date,'2011-03-01'),31),
(1,2011,CONVERT(date,'20110301'),CONVERT(date,'20110331'),CONVERT(date,'2011-02-01'),28),
(1,2011,CONVERT(date,'20110201'),CONVERT(date,'20110228'),CONVERT(date,'2010-08-01'),184),
(1,2010,CONVERT(date,'20100801'),CONVERT(date,'20100831'),CONVERT(date,'2010-07-01'),31),
(1,2010,CONVERT(date,'20100701'),CONVERT(date,'20100731'),CONVERT(date,'2010-06-01'),30),
(1,2010,CONVERT(date,'20100601'),CONVERT(date,'20100630'),CONVERT(date,'2010-05-01'),31),
(1,2010,CONVERT(date,'20100501'),CONVERT(date,'20100531'),CONVERT(date,'2010-04-01'),30),
(1,2010,CONVERT(date,'20100401'),CONVERT(date,'20100430'),NULL,NULL))V(id,year,start_date,end_date,prev_att_month,diff))
SELECT id,
MIN(Start_date) AS Start_date,
MAX(End_Date) AS End_Date
FROM Grps
GROUP BY Id,
Grp
ORDER BY id,
Start_date;
I've been trying to get start and end dates range for each quarter given a specific date/year, like this:
SELECT DATEADD(mm, (quarter - 1) * 3, year_date) StartDate,
DATEADD(dd, 0, DATEADD(mm, quarter * 3, year_date)) EndDate
--quarter QuarterNo
FROM
(
SELECT '2012-01-01' year_date
) s CROSS JOIN
(
SELECT 1 quarter UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4
) q
which produces the following output:
2012-01-01 00:00:00 2012-04-01 00:00:00
2012-04-01 00:00:00 2012-07-01 00:00:00
2012-07-01 00:00:00 2012-10-01 00:00:00
2012-10-01 00:00:00 2013-01-01 00:00:00
Problem: I need to do this for a given start_date and end_date, the problem being the end_date=current_day, so how can I achieve this:
2012-01-01 00:00:00 2012-04-01 00:00:00
2012-04-01 00:00:00 2012-07-01 00:00:00
2012-07-01 00:00:00 2012-10-01 00:00:00
2012-10-01 00:00:00 2013-01-01 00:00:00
... ...
2021-01-01 00:00:00 2021-01-06 00:00:00
I think here is what you want to do :
SET startdatevar AS DATEtime = '2020-01-10'
;WITH RECURSIVE cte AS (
SELECT startdatevar AS startdate , DATEADD(QUARTER, 1 , startdatevar) enddate , 1 quarter
UNION ALL
SELECT enddate , CASE WHEN DATEADD(QUARTER, 1 , enddate) > CURRENT_DATE() THEN GETDATE() ELSE DATEADD(QUARTER, 1 , enddate) END enddate, quarter + 1
FROM cte
WHERE
cte.enddate <= CURRENT_DATE()
and quarter < 4
)
SELECT * FROM cte
to use your code , if you want to have more than 4 quarters :
SET quarter_limit = DATEDIFF(quarter , <startdate>,<enddate>)
;WITH RECURSIVE cte(q, qDate,enddate) as
(
select 1,
DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
,time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
UNION ALL
select q+1,
DATEADD(q, 1, qdate) -- next quarter start date
,time_slice(qdate::date, (q+1)*3, 'MONTH', 'END')
from cte
where q < quarter_limit -- limiting the number of next quarters
AND cte.endDate <= <enddate>
)
SELECT * FROM cte
After #eshirvana's answer, I came up with this slightly change after your answer:
WITH RECURSIVE cte(q, qDate,enddate) as
(
select 1,
DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
,time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
UNION ALL
select q+1,
DATEADD(q, 1, qdate) -- next quarter start date
,time_slice(qdate::date, (q+1)*3, 'MONTH', 'END')
from cte
where q <4 -- limiting the number of next quarters
AND cte.endDate <= CURRENT_DATE()
)
SELECT * FROM cte
Which works fine for whatever year I pass there (2012 will produce 4 records, 2021 just one, since we're still on the first quarter right now).
[EDIT]: it still doesn't work as expected after your 2nd code sugestion:
WITH RECURSIVE cte(q, qDate,enddate) as
(
select 1,
DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
,CASE WHEN time_slice('2012-01-01'::date, 3, 'MONTH', 'END') > CURRENT_DATE
THEN current_date
ELSE time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
END
UNION ALL
select q+1,
DATEADD(q, 1, qdate) -- next quarter start date
,time_slice(qdate::date, (q+1)*3, 'MONTH', 'END')
from cte
where q < DATEDIFF(quarter , '2012-01-01'::date,'2021-01-06'::date)
AND cte.endDate <= '2021-01-06'::date
)
SELECT * FROM cte
is outputing this:
Sorry #eshirvana, it doesn't work as expected though. It all goes well to some point, but it's not returning all the records. Instead, it produces less records and wrong one, like this:
1 2012-01-01 2012-04-01
2 2012-04-01 2012-07-01
3 2012-07-01 2012-10-01
4 2012-10-01 2013-01-01
5 2013-01-01 2013-10-01
6 2013-04-01 2013-07-01
7 2013-07-01 2013-10-01
8 2013-10-01 2014-01-01
9 2014-01-01 2015-01-01
10 2014-04-01 2015-01-01
11 2014-07-01 2016-10-01
12 2014-10-01 2015-01-01
13 2015-01-01 2015-07-01
14 2015-04-01 2015-07-01
15 2015-07-01 2018-10-01
16 2015-10-01 2018-01-01
17 2016-01-01 2016-10-01
18 2016-04-01 2019-07-01
19 2016-07-01 2017-07-01
20 2016-10-01 2020-01-01
21 2017-01-01 2017-04-01
22 2017-04-01 2019-07-01
23 2017-07-01 2021-10-01
Although my logic it's still not ok for not printing just Q1 dates for 2021, could this output issues be related to date format or something?
Now, it seems to be working, at least for 2012-01-01 till today (2021-01-06).
The code :
WITH RECURSIVE cte(q, qDate,enddate) as
(
select
-- it might not be the first quarter, so better to protect that:
quarter('2012-01-01'::date)::numeric
, DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
, CASE WHEN time_slice('2012-01-01'::date, 3, 'MONTH', 'END') > '2021-01-06'::date
THEN '2021-01-06'::date
ELSE time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
END
UNION ALL
select q+1
, DATEADD(q, 1, qdate) -- next quarter start date
,CASE WHEN time_slice(DATEADD(q, 1, qdate), 3, 'MONTH', 'END')> '2021-01-06'::date
THEN '2021-01-06'::date
ELSE time_slice(DATEADD(q, 1, qdate), 3, 'MONTH', 'END')
END
from cte
where q <= DATEDIFF(quarter , '2012-01-01'::date,'2021-01-06'::date)
AND cte.endDate <= '2021-01-06'::date
)
SELECT * FROM cte
The output:
1 2012-01-01 2012-04-01
2 2012-04-01 2012-07-01
3 2012-07-01 2012-10-01
4 2012-10-01 2013-01-01
5 2013-01-01 2013-04-01
6 2013-04-01 2013-07-01
7 2013-07-01 2013-10-01
8 2013-10-01 2014-01-01
9 2014-01-01 2014-04-01
10 2014-04-01 2014-07-01
11 2014-07-01 2014-10-01
12 2014-10-01 2015-01-01
13 2015-01-01 2015-04-01
14 2015-04-01 2015-07-01
15 2015-07-01 2015-10-01
16 2015-10-01 2016-01-01
17 2016-01-01 2016-04-01
18 2016-04-01 2016-07-01
19 2016-07-01 2016-10-01
20 2016-10-01 2017-01-01
21 2017-01-01 2017-04-01
22 2017-04-01 2017-07-01
23 2017-07-01 2017-10-01
24 2017-10-01 2018-01-01
25 2018-01-01 2018-04-01
26 2018-04-01 2018-07-01
27 2018-07-01 2018-10-01
28 2018-10-01 2019-01-01
29 2019-01-01 2019-04-01
30 2019-04-01 2019-07-01
31 2019-07-01 2019-10-01
32 2019-10-01 2020-01-01
33 2020-01-01 2020-04-01
34 2020-04-01 2020-07-01
35 2020-07-01 2020-10-01
36 2020-10-01 2021-01-01
37 2021-01-01 2021-01-06
In case you're wondering: yes, the idea is to present the end_date as last_day of the month+one. But it could easily be adapted.
It's not pretty, but I think it's somehow easy to understand.