I have a table employee with the columns name, startdate, endate:
name | startdate | enddate
--------------------------------
A | 12/12/2012 | 12/12/2014
B | 05/08/2006 | 07/11/2009
I want result like this:
name | Year of Employee
-------------------------
A | 2012
A | 2013
A | 2014
B | 2006
B | 2007
B | 2008
B | 2009
Do I have to use a loop and/or cross join here?
Create a sample table
CREATE TABLE #test(name VARCHAR(10), startdate DATE, enddate DATE)
Inserting some sample data
INSERT INTO #test VALUES ('A', '12/12/2012', '12/12/2014')
,('B', '05/08/2006', '07/11/2009')
Using recursive CTE
;WITH CTE AS (
SELECT name, DATEPART(YEAR, startdate) AS yr FROM #test
UNION ALL
SELECT #test.name, yr + 1
FROM CTE
INNER JOIN #test ON #test.name = CTE.name
WHERE yr < DATEPART(YEAR, enddate)
)
SELECT name, yr AS [Year of Employee]
FROM CTE
ORDER BY name, yr
Output
name Year of Employee
A 2012
A 2013
A 2014
B 2006
B 2007
B 2008
B 2009
Define and populate a CalYear table that delivers this :
CalYearStartDate CalYearEndDate
2005-01-01 2005-12-12
2006-01-01 2006-12-12
2007-01-01 2007-12-12
2008-01-01 2008-12-12
2009-01-01 2009-12-12
2010-01-01 2010-12-12
2011-01-01 2011-12-12
2012-01-01 2012-12-12
2013-01-01 2013-12-12
2014-01-01 2014-12-12
Define and populate an Emp table that delivers this :
EmpName StartDate EndDate
A 2012-12-12 2014-12-12
B 2006-05-08 2009-07-11
Use this query to retreive results:
Select EmpName,
CalYearStartDate
From Emp
Inner
Join CalYear
on YEAR(CalYearStartDate) >= YEAR(Emp.StartDate)
and YEAR(CalYearEndDate) <= YEAR(Emp.EndDate)
Create statement:
CREATE TABLE #test(name VARCHAR(10), startdate DATE, enddate DATE)
INSERT INTO #test VALUES ('A', '12/12/2012', '12/12/2014')
,('B', '05/08/2006', '07/11/2009')
Query:
SELECT t.name, YEAR(DATEADD(year,n.number, t.startdate)) AS Year
FROM #test t,
(SELECT number
FROM master..spt_values
WHERE [type] = 'P') n
WHERE startdate <= DATEADD(year,n.number, t.startdate)
AND enddate >= DATEADD(year,n.number, t.startdate)
Result:
name Year
A 2012
A 2013
A 2014
B 2006
B 2007
B 2008
B 2009
Related
My knowledge is pretty basic so your help would be highly appreciated.
I'm trying to return the same row multiple times when it meets the condition (I only have access to select query).
I have a table of more than 500000 records with Customer ID, Start Date and End Date, where end date could be null.
I am trying to add a new column called Week_No and list all rows accordingly. For example if the date range is more than one week, then the row must be returned multiple times with corresponding week number. Also I would like to count overlapping days, which will never be more than 7 (week) per row and then count unavailable days using second table.
Sample data below
t1
ID | Start_Date | End_Date
000001 | 12/12/2017 | 03/01/2018
000002 | 13/01/2018 |
000003 | 02/01/2018 | 11/01/2018
...
t2
ID | Unavailable
000002 | 14/01/2018
000003 | 03/01/2018
000003 | 04/01/2018
000003 | 08/01/2018
...
I cannot pass the stage of adding week no. I have tried using CASE and UNION ALL but keep getting errors.
declare #week01start datetime = '2018-01-01 00:00:00'
declare #week01end datetime = '2018-01-07 00:00:00'
declare #week02start datetime = '2018-01-08 00:00:00'
declare #week02end datetime = '2018-01-14 00:00:00'
...
SELECT
ID,
'01' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week01end and End_Date >= #week01start)
or (Start_Date <= #week01end and End_Date is null)
UNION ALL
SELECT
ID,
'02' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week02end and End_Date >= #week02start)
or (Start_Date <= #week02end and End_Date is null)
...
The new table should look like this
ID | Week_No | Year | Start_Date | End_Date | Overlap | Unavail_Days
000001 | 01 | 2018 | 12/12/2017 | 03/01/2018 | 3 |
000002 | 02 | 2018 | 13/01/2018 | | 2 | 1
000003 | 01 | 2018 | 02/01/2018 | 11/01/2018 | 6 | 2
000003 | 02 | 2018 | 02/01/2018 | 11/01/2018 | 4 | 1
...
business wise i cannot understand what you are trying to achieve. You can use the following code though to calculate your overlapping days etc. I did it the way you asked, but i would recommend a separate table, like a Time dimension to produce a "cleaner" solution
/*sample data set in temp table*/
select '000001' as id, '2017-12-12'as start_dt, ' 2018-01-03' as end_dt into #tmp union
select '000002' as id, '2018-01-13 'as start_dt, null as end_dt union
select '000003' as id, '2018-01-02' as start_dt, '2018-01-11' as end_dt
/*calculate week numbers and week diff according to dates*/
select *,
DATEPART(WK,start_dt) as start_weekNumber,
DATEPART(WK,end_dt) as end_weekNumber,
case
when DATEPART(WK,end_dt) - DATEPART(WK,start_dt) > 0 then (DATEPART(WK,end_dt) - DATEPART(WK,start_dt)) +1
else (52 - DATEPART(WK,start_dt)) + DATEPART(WK,end_dt)
end as WeekDiff
into #tmp1
from
(
SELECT *,DATEADD(DAY, 2 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [start_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [startdt_Week_End_Date],
DATEADD(DAY, 2 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_End_Date]
from #tmp
) s
/*cte used to create duplicates when week diff is over 1*/
;with x as
(
SELECT TOP (10) rn = ROW_NUMBER() --modify the max you want
OVER (ORDER BY [object_id])
FROM sys.all_columns
ORDER BY [object_id]
)
/*final query*/
select --*
ID,
start_weekNumber+ (r-1) as Week,
DATEPART(YY,start_dt) as [YEAR],
start_dt,
end_dt,
null as Overlap,
null as unavailable_days
from
(
select *,
ROW_NUMBER() over (partition by id order by id) r
from
(
select d.* from x
CROSS JOIN #tmp1 AS d
WHERE x.rn <= d.WeekDiff
union all
select * from #tmp1
where WeekDiff is null
) a
)a_ext
order by id,start_weekNumber
--drop table #tmp1,#tmp
The above will produce the results you want except the overlap and unavailable columns. Instead of just counting weeks, i added the number of week in the year using start_dt, but you can change that if you don't like it:
ID Week YEAR start_dt end_dt Overlap unavailable_days
000001 50 2017 2017-12-12 2018-01-03 NULL NULL
000001 51 2017 2017-12-12 2018-01-03 NULL NULL
000001 52 2017 2017-12-12 2018-01-03 NULL NULL
000002 2 2018 2018-01-13 NULL NULL NULL
000003 1 2018 2018-01-02 2018-01-11 NULL NULL
000003 2 2018 2018-01-02 2018-01-11 NULL NULL
I have a table with columns for a start- and enddate.
My goal is to get a list of each year in that timespan for each row, so
+-------------------------+
| startdate | enddate |
+------------+------------+
| 2004-08-01 | 2007-01-08 |
| 2005-06-02 | 2007-05-08 |
+------------+------------+
should output this:
+-------+
| years |
+-------+
| 2004 |
| 2005 |
| 2006 |
| 2007 |
| 2005 |
| 2006 |
| 2007 |
+-------+
I have problems now to create the years in between the two dates. My first approach was to use a UNION (order of dates is irrelevant), but the years in between are missing in this case...
Select
Extract(Year From startdate)
From
table1
Union
Select
Extract(Year From enddate)
From
table1
Thanks for any advises!
Row Generator technique
SQL> WITH DATA1 AS(
2 SELECT TO_DATE('2004-08-01','YYYY-MM-DD') STARTDATE, TO_DATE('2007-01-08','YYYY-MM-DD') ENDDATE FROM DUAL UNION ALL
3 SELECT TO_DATE('2005-06-02','YYYY-MM-DD') STARTDATE, TO_DATE('2007-05-08','YYYY-MM-DD') ENDDATE FROM DUAL
4 ),
5 DATA2 AS(
6 SELECT EXTRACT(YEAR FROM STARTDATE) ST, EXTRACT(YEAR FROM ENDDATE) ED FROM DATA1
7 ),
8 data3
9 AS
10 (SELECT level-1 line
11 FROM DUAL
12 CONNECT BY level <=
13 (SELECT MAX(ed-st) FROM data2
14 )
15 )
16 SELECT ST+LINE FROM
17 DATA2, DATA3
18 WHERE LINE <= ED-ST
19 ORDER BY 1
20 /
ST+LINE
----------
2004
2005
2005
2006
2006
2007
6 rows selected.
SQL>
Try this Query
; with CTE as
(
select datepart(year, '2005-12-25') as yr
union all
select yr + 1
from CTE
where yr < datepart(year, '2013-11-14')
)
select yr
from CTE
Try this:
Create a table with years as follow:
CREATE TABLE tblyears(y int)
INSERT INTO tblyears VALUES (1900);
INSERT INTO tblyears VALUES (1901);
INSERT INTO tblyears VALUES (1902);
and so on until
INSERT INTO tblyears VALUES (2100)
So, you'll write this query:
SELECT y.y
FROM tblyears y
JOIN table1 t
ON y.y >= EXTRACT(year from startdate)
AND y.y <= EXTRACT(year from enddate)
ORDER BY y.y
Show SqlFiddle
I have a table in Sql Sever 2005 :
id eid name datetime
-- |----|------- |------------------------
1 | 1 | john | 2013-11-18 15:30:00.000
2 | 1 | john | 2013-11-18 14:10:00.000
3 | 1 | john | 2013-11-18 13:30:00.000
4 | 1 | john | 2013-11-18 16:00:00.000
5 | 1 | john | 2013-11-18 17:00:00.000
6 | 2 | Richard| 2013-11-18 13:40:00.000
7 | 2 | Richard| 2013-11-18 16:20:00.000
8 | 3 | Mandy | 2013-11-18 20:22:00.000
9 | 3 | Mandy | 2013-11-18 20:20:00.000
10| 4 | Micheal| 2013-11-18 13:00:00.000
Input will be a date such as - 2013-11-18 15:50:00.000
Expected Output : Need Minimum and Maximum datetime adjacent(closest) to input date...
Grouping by eid is also required.
id eid name AdjacentMinimumDateTime AdjacentMaximumDateTime
-- |----|------- |---------------------------|------------------------
1 | 1 | john | 2013-11-18 15:30:00.000 | 2013-11-18 16:00:00.000
6 | 2 | Richard| 2013-11-18 13:40:00.000 | 2013-11-18 16:20:00.000
8 | 3 | Mandy | NULL | 2013-11-18 20:20:00.000
9 | 4 | Micheal| 2013-11-18 13:00:00.000 | NULL
Give this a try:
WITH
BEFORE AS (
SELECT eid, max(datetime) date FROM t
WHERE datetime <= '2013-11-18 15:50:00.000'
GROUP BY eid
),
AFTER AS (
SELECT eid, min(datetime) date FROM t
WHERE datetime >= '2013-11-18 15:50:00.000'
GROUP BY eid
)
SELECT t.eid, t.name, max(b.date) beforeDate, min(a.date) afterDate FROM t
LEFT JOIN BEFORE b ON t.eid = b.eid
LEFT JOIN AFTER a ON t.eid = a.eid
GROUP BY t.eid, t.name
ORDER BY t.eid
Or the non-CTE version:
SELECT t.eid, t.name, max(b.date) beforeDate, min(a.date) afterDate FROM t
LEFT JOIN (
SELECT eid, max(datetime) date FROM t
WHERE datetime <= '2013-11-18 15:50:00.000'
GROUP BY eid
) b ON t.eid = b.eid
LEFT JOIN (
SELECT eid, min(datetime) date FROM t
WHERE datetime >= '2013-11-18 15:50:00.000'
GROUP BY eid
) a ON t.eid = a.eid
GROUP BY t.eid, t.name
ORDER BY t.eid
I've added repeated dates to test it works with them too.
Output:
| EID | NAME | BEFOREDATE | AFTERDATE |
|-----|---------|----------------------------|----------------------------|
| 1 | john | November, 18 2013 15:30:00 | November, 18 2013 16:00:00 |
| 2 | Richard | November, 18 2013 13:40:00 | November, 18 2013 16:20:00 |
| 3 | Mandy | (null) | November, 18 2013 20:20:00 |
| 4 | Michael | November, 18 2013 13:00:00 | (null) |
| 5 | Mosty | November, 18 2013 15:00:00 | November, 18 2013 16:00:00 |
Fiddle here.
Try This...
SELECT
MIN(id) [id],
eid,
name,
(SELECT MAX(datetime) FROM table t1 WHERE t1.datetime < inputdate
AND t1.eid = t.eid) [AdjacentMinimumDateTime],
(SELECT MIN(datetime) FROM table t2 WHERE t2.datetime > inputdate
AND t2.eid = t.eid) [AdjacentMaximumDateTime]
FROM table t
GROUP BY t.id, t.Name
Start by finding the one before and the one after, then you can combine them into a single query if you like:
declare #TestTable table (ID int, eid int, Name varchar(10), TestDate datetime)
declare #InputDate datetime = '2013-11-18 15:50:00.000'
insert into #TestTable (ID,eid,Name,TestDate)
values (1,1,'john', '2013-11-18 15:30:00.000')
,(2,1,'john', '2013-11-18 14:10:00.000')
,(3,1,'john', '2013-11-18 13:30:00.000')
,(4,1,'john', '2013-11-18 16:00:00.000')
,(5,1,'john', '2013-11-18 17:00:00.000')
,(6,2,'richard', '2013-11-18 13:40:00.000')
,(7,2,'richard', '2013-11-18 16:20:00.000')
,(8,3,'mandy', '2013-11-18 20:22:00.000')
,(9,3,'mandy', '2013-11-18 20:20:00.000')
,(10,4,'michael', '2013-11-18 13:00:00.000');
SELECT *
FROM #TestTable
ORDER BY TestDate
--get the one previous
SELECT TOP 1 *
FROM #TestTable
WHERE TestDate < #InputDate
ORDER BY TestDate desc
--get the one after
SELECT TOP 1 *
FROM #TestTable
WHERE TestDate > #InputDate
ORDER BY TestDate
Try mine. It works:
declare #TestTable table (ID int, eid int, Name varchar(10), TestDate datetime)
declare #InputDate datetime = '2013-11-18 15:50:00.000'
insert into #TestTable (ID,eid,Name,TestDate)
values (1,1,'john', '2013-11-18 15:30:00.000')
,(2,1,'john', '2013-11-18 14:10:00.000')
,(3,1,'john', '2013-11-18 13:30:00.000')
,(4,1,'john', '2013-11-18 16:00:00.000')
,(5,1,'john', '2013-11-18 17:00:00.000')
,(6,2,'richard', '2013-11-18 13:40:00.000')
,(7,2,'richard', '2013-11-18 16:20:00.000')
,(8,3,'mandy', '2013-11-18 20:22:00.000')
,(9,3,'mandy', '2013-11-18 20:20:00.000')
,(10,4,'michael', '2013-11-18 13:00:00.000');
with cte as
(
select id, eid, name, TestDate, (datediff(s, TestDate, #InputDate)) as DateDiffSeconds
from #TestTable
)
select cte.eid, cte.name, x.testdate as maxunder, y.testdate as minover, #InputDate as InputDateForComparison
from cte
left join
(
select eid, testdate
from cte
join (
select eid as eidmin, min(DateDiffSeconds) as datematchunder
from cte
where DateDiffSeconds >= 0
group by eid
) as datematchunder on datematchunder.datematchunder = cte.DateDiffSeconds
) x on x.eid = cte.eid
left join
(
select eid, testdate
from cte
join (
select eid as eidmin, max(DateDiffSeconds) as datematchover
from cte
where DateDiffSeconds <= 0
group by eid
) as datematchover on datematchover.datematchover = cte.DateDiffSeconds
) y on y.eid = cte.eid
group by cte.eid, cte.name, x.testdate, y.testdate;
BAM!
If I have a table that looks like this
begin date end date data
2013-01-01 2013-01-04 7
2013-01-05 2013-01-06 9
How can I make it be returned like this...
date data
2013-01-01 7
2013-01-02 7
2013-01-03 7
2013-01-04 7
2013-01-05 9
2013-01-06 9
One thing I was thinking of doing is to have another table that just has all the dates and then join the table with just dates to the above table using date>=begin date and date<=end date but that seems a little clunky to have to maintain that extra table with nothing but repetitive dates.
In some instances I don't have a data range but just an as of date which basically looks like my first example but with no end date. The end date is implied by the next row's 'as of' date (ie end date should be the next row's as of -1). I had a "solution" for this that uses the row_number() function to get the next value but I suspect that methodology, which the way I'm doing it has a bunch of nested self joins, contributes to very long query times.
Using some sample data...
create table data (begindate datetime, enddate datetime, data int);
insert data select
'20130101', '20130104', 7 union all select
'20130105', '20130106', 9;
The Query: (Note: if you already have a numbers/tally table - use it)
select dateadd(d,v.number,d.begindate) adate, data
from data d
join master..spt_values v on v.type='P'
and v.number between 0 and datediff(d, begindate, enddate)
order by adate;
Results:
| COLUMN_0 | DATA |
-----------------------------------------
| January, 01 2013 00:00:00+0000 | 7 |
| January, 02 2013 00:00:00+0000 | 7 |
| January, 03 2013 00:00:00+0000 | 7 |
| January, 04 2013 00:00:00+0000 | 7 |
| January, 05 2013 00:00:00+0000 | 9 |
| January, 06 2013 00:00:00+0000 | 9 |
Alternatively you can generate a number table on the fly (0-99) or as many numbers as you need
;WITH Numbers(number) AS (
select top(100) row_number() over (order by (select 0))-1
from sys.columns a
cross join sys.columns b
cross join sys.columns c
cross join sys.columns d
)
select dateadd(d,v.number,d.begindate) adate, data
from data d
join Numbers v on v.number between 0 and datediff(d, begindate, enddate)
order by adate;
SQL Fiddle Demo
You can use recursive CTE to get all the dates between two dates. Another CTE is to get ROW_NUMBERs to help you with those missing EndDates.
DECLARE #startDate DATE
DECLARE #endDate DATE
SELECT #startDate = MIN(begindate) FROM Table1
SELECT #endDate = MAX(enddate) FROM Table1
;WITH CTE_Dates AS
(
SELECT #startDate AS DT
UNION ALL
SELECT DATEADD(DD, 1, DT)
FROM CTE_Dates
WHERE DATEADD(DD, 1, DT) <= #endDate
)
,CTE_Data AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY BeginDate) AS RN FROM Table1
)
SELECT DT, t1.data FROM CTE_Dates d
LEFT JOIN CTE_Data t1 on d.DT
BETWEEN t1.[BeginDate] AND COALESCE(t1.EndDate,
(SELECT DATEADD(DD,-1,t2.BeginDate) FROM CTE_Data t2 WHERE t1.RN + 1 = t2.RN))
SQLFiddle DEMO
Using SQL Server 2008 R2,
I'm trying to combine date ranges into the maximum date range given that one end date is next to the following start date.
The data is about different employments. Some employees may have ended their employment and have rejoined at a later time. Those should count as two different employments (example ID 5). Some people have different types of employment, running after each other (enddate and startdate neck-to-neck), in this case it should be considered as one employment in total (example ID 30).
An employment period that has not ended has an enddate that is null.
Some examples is probably enlightening:
declare #t as table (employmentid int, startdate datetime, enddate datetime)
insert into #t values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null)
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
I've been trying different "islands-and-gaps" techniques but haven't been able to crack this one.
The strange bit you see with my use of the date '31211231' is just a very large date to handle your "no-end-date" scenario. I have assumed you won't really have many date ranges per employee, so I've used a simple Recursive Common Table Expression to combine the ranges.
To make it run faster, the starting anchor query keeps only those dates that will not link up to a prior range (per employee). The rest is just tree-walking the date ranges and growing the range. The final GROUP BY keeps only the largest date range built up per starting ANCHOR (employmentid, startdate) combination.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
create table Tbl (
employmentid int,
startdate datetime,
enddate datetime);
insert Tbl values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null);
/*
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
*/
Query 1:
;with cte as (
select a.employmentid, a.startdate, a.enddate
from Tbl a
left join Tbl b on a.employmentid=b.employmentid and a.startdate-1=b.enddate
where b.employmentid is null
union all
select a.employmentid, a.startdate, b.enddate
from cte a
join Tbl b on a.employmentid=b.employmentid and b.startdate-1=a.enddate
)
select employmentid,
startdate,
nullif(max(isnull(enddate,'32121231')),'32121231') enddate
from cte
group by employmentid, startdate
order by employmentid
Results:
| EMPLOYMENTID | STARTDATE | ENDDATE |
-----------------------------------------------------------------------------------
| 5 | December, 03 2007 00:00:00+0000 | August, 26 2011 00:00:00+0000 |
| 5 | May, 02 2013 00:00:00+0000 | (null) |
| 30 | October, 02 2006 00:00:00+0000 | (null) |
| 66 | September, 24 2007 00:00:00+0000 | (null) |
SET NOCOUNT ON
DECLARE #T TABLE(ID INT,FromDate DATETIME, ToDate DATETIME)
INSERT INTO #T(ID,FromDate,ToDate)
SELECT 1,'20090801','20090803' UNION ALL
SELECT 2,'20090802','20090809' UNION ALL
SELECT 3,'20090805','20090806' UNION ALL
SELECT 4,'20090812','20090813' UNION ALL
SELECT 5,'20090811','20090812' UNION ALL
SELECT 6,'20090802','20090802'
SELECT ROW_NUMBER() OVER(ORDER BY s1.FromDate) AS ID,
s1.FromDate,
MIN(t1.ToDate) AS ToDate
FROM #T s1
INNER JOIN #T t1 ON s1.FromDate <= t1.ToDate
AND NOT EXISTS(SELECT * FROM #T t2
WHERE t1.ToDate >= t2.FromDate
AND t1.ToDate < t2.ToDate)
WHERE NOT EXISTS(SELECT * FROM #T s2
WHERE s1.FromDate > s2.FromDate
AND s1.FromDate <= s2.ToDate)
GROUP BY s1.FromDate
ORDER BY s1.FromDate
An alternative solution that uses window functions rather than recursive CTEs
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT
employmentid,
startdate,
enddate,
DATEADD(
DAY,
-COALESCE(
SUM(DATEDIFF(DAY, startdate, enddate)+1) OVER (PARTITION BY employmentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0
),
startdate
) as grp
FROM #t
) withGroup
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
This works by calculating a grp value that will be the same for all consecutive rows. This is achieved by:
Determine totals days the span occupies (+1 as the dates are inclusive)
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
Cumulative sum the days spanned for each employment, ordered by startdate. This gives us the total days spanned by all the previous employment spans
We coalesce with 0 to ensure we dont have NULLs in our cumulative sum of days spanned
We do not include current row in our cumulative sum, this is because we will use the value against startdate rather than enddate (we cant use it against enddate because of the NULLs)
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
Subtract the cumulative days from the startdate to get our grp. This is the crux of the solution.
If the start date increases at the same rate as the days spanned then the days are consecutive, and subtracting the two will give us the same value.
If the startdate increases faster than the days spanned then there is a gap and we will get a new grp value greater than the previous one.
Although grp is a date, the date itself is meaningless we are using just as a grouping value
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
With the results
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| employmentid | startdate | enddate | daysSpanned | cumulativeDaysSpanned | grp |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 | 1363 | 0 | 2007-12-03 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL | NULL | 1363 | 2009-08-08 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | 2011-01-16 00:00:00.000 | 1568 | 0 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2011-01-17 00:00:00.000 | 2012-08-12 00:00:00.000 | 574 | 1568 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2012-08-13 00:00:00.000 | NULL | NULL | 2142 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL | NULL | 0 | 2007-09-24 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
Finally we can GROUP BY grp to get the get rid of the consecutive days.
Use MIN and MAX to get the new startdate and endate
To handle the NULL enddate we give them a large value to get picked up by MAX then convert them back to NULL again
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
) inner3
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
To get the desired result
+--------------+-------------------------+-------------------------+
| employmentid | startdate | enddate |
+--------------+-------------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 |
+--------------+-------------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
We can combine the inner queries to get the query at the start of this answer. Which is shorter, but less explainable
Limitations of all this required that
there are no overlaps of startdate and enddate for an employment. This could produce collisions in our grp.
startdate is not NULL. However this could be overcome by replacing NULL start dates with small date values
Future developers can decipher the window black magic you performed
A modified script for combining all overlapping periods. For example
01.01.2001-01.01.2010
05.05.2005-05.05.2015
will give one period:
01.01.2001-05.05.2015
tbl.enddate must be completed
;WITH cte
AS(
SELECT
a.employmentid
,a.startdate
,a.enddate
from tbl a
left join tbl c on a.employmentid=c.employmentid
and a.startdate > c.startdate
and a.startdate <= dateadd(day, 1, c.enddate)
WHERE c.employmentid IS NULL
UNION all
SELECT
a.employmentid
,a.startdate
,a.enddate
from cte a
inner join tbl c on a.startdate=c.startdate
and (c.startdate = dateadd(day, 1, a.enddate) or (c.enddate > a.enddate and c.startdate <= a.enddate))
)
select distinct employmentid,
startdate,
nullif(max(enddate),'31.12.2099') enddate
from cte
group by employmentid, startdate