How to return same row multiple times with multiple conditions - sql

My knowledge is pretty basic so your help would be highly appreciated.
I'm trying to return the same row multiple times when it meets the condition (I only have access to select query).
I have a table of more than 500000 records with Customer ID, Start Date and End Date, where end date could be null.
I am trying to add a new column called Week_No and list all rows accordingly. For example if the date range is more than one week, then the row must be returned multiple times with corresponding week number. Also I would like to count overlapping days, which will never be more than 7 (week) per row and then count unavailable days using second table.
Sample data below
t1
ID | Start_Date | End_Date
000001 | 12/12/2017 | 03/01/2018
000002 | 13/01/2018 |
000003 | 02/01/2018 | 11/01/2018
...
t2
ID | Unavailable
000002 | 14/01/2018
000003 | 03/01/2018
000003 | 04/01/2018
000003 | 08/01/2018
...
I cannot pass the stage of adding week no. I have tried using CASE and UNION ALL but keep getting errors.
declare #week01start datetime = '2018-01-01 00:00:00'
declare #week01end datetime = '2018-01-07 00:00:00'
declare #week02start datetime = '2018-01-08 00:00:00'
declare #week02end datetime = '2018-01-14 00:00:00'
...
SELECT
ID,
'01' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week01end and End_Date >= #week01start)
or (Start_Date <= #week01end and End_Date is null)
UNION ALL
SELECT
ID,
'02' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week02end and End_Date >= #week02start)
or (Start_Date <= #week02end and End_Date is null)
...
The new table should look like this
ID | Week_No | Year | Start_Date | End_Date | Overlap | Unavail_Days
000001 | 01 | 2018 | 12/12/2017 | 03/01/2018 | 3 |
000002 | 02 | 2018 | 13/01/2018 | | 2 | 1
000003 | 01 | 2018 | 02/01/2018 | 11/01/2018 | 6 | 2
000003 | 02 | 2018 | 02/01/2018 | 11/01/2018 | 4 | 1
...

business wise i cannot understand what you are trying to achieve. You can use the following code though to calculate your overlapping days etc. I did it the way you asked, but i would recommend a separate table, like a Time dimension to produce a "cleaner" solution
/*sample data set in temp table*/
select '000001' as id, '2017-12-12'as start_dt, ' 2018-01-03' as end_dt into #tmp union
select '000002' as id, '2018-01-13 'as start_dt, null as end_dt union
select '000003' as id, '2018-01-02' as start_dt, '2018-01-11' as end_dt
/*calculate week numbers and week diff according to dates*/
select *,
DATEPART(WK,start_dt) as start_weekNumber,
DATEPART(WK,end_dt) as end_weekNumber,
case
when DATEPART(WK,end_dt) - DATEPART(WK,start_dt) > 0 then (DATEPART(WK,end_dt) - DATEPART(WK,start_dt)) +1
else (52 - DATEPART(WK,start_dt)) + DATEPART(WK,end_dt)
end as WeekDiff
into #tmp1
from
(
SELECT *,DATEADD(DAY, 2 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [start_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [startdt_Week_End_Date],
DATEADD(DAY, 2 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_End_Date]
from #tmp
) s
/*cte used to create duplicates when week diff is over 1*/
;with x as
(
SELECT TOP (10) rn = ROW_NUMBER() --modify the max you want
OVER (ORDER BY [object_id])
FROM sys.all_columns
ORDER BY [object_id]
)
/*final query*/
select --*
ID,
start_weekNumber+ (r-1) as Week,
DATEPART(YY,start_dt) as [YEAR],
start_dt,
end_dt,
null as Overlap,
null as unavailable_days
from
(
select *,
ROW_NUMBER() over (partition by id order by id) r
from
(
select d.* from x
CROSS JOIN #tmp1 AS d
WHERE x.rn <= d.WeekDiff
union all
select * from #tmp1
where WeekDiff is null
) a
)a_ext
order by id,start_weekNumber
--drop table #tmp1,#tmp
The above will produce the results you want except the overlap and unavailable columns. Instead of just counting weeks, i added the number of week in the year using start_dt, but you can change that if you don't like it:
ID Week YEAR start_dt end_dt Overlap unavailable_days
000001 50 2017 2017-12-12 2018-01-03 NULL NULL
000001 51 2017 2017-12-12 2018-01-03 NULL NULL
000001 52 2017 2017-12-12 2018-01-03 NULL NULL
000002 2 2018 2018-01-13 NULL NULL NULL
000003 1 2018 2018-01-02 2018-01-11 NULL NULL
000003 2 2018 2018-01-02 2018-01-11 NULL NULL

Related

SQL to get a row for start and end date for each year given a start date and number of years

I have the following data in a SQL table:
+------------------------------------+
| ID YEARS START_DATE |
+------------------------------------+
| ----------- ----------- ---------- |
| 1 5 2020-12-01 |
| 2 8 2020-12-01 |
+------------------------------------+
Trying to create a SQL that would expand the above data and give me a start and end date for each year depending on YEARS and START_DATE from above table. Sample output below:
+-----------------------------------------------+
| ID YEAR DATE_START DATE_END |
+-----------------------------------------------+
| ----------- ----------- ---------- ---------- |
| 1 1 2020-12-01 2021-11-30 |
| 1 2 2021-12-01 2022-11-30 |
| 1 3 2022-12-01 2023-11-30 |
| 1 4 2023-12-01 2024-11-30 |
| 1 5 2024-12-01 2025-11-30 |
| 2 1 2020-12-01 2021-11-30 |
| 2 2 2021-12-01 2022-11-30 |
| 2 3 2022-12-01 2023-11-30 |
| 2 4 2023-12-01 2024-11-30 |
| 2 5 2024-12-01 2025-11-30 |
| 2 6 2025-12-01 2026-11-30 |
| 2 7 2026-12-01 2027-11-30 |
| 2 8 2027-12-01 2028-11-30 |
+-----------------------------------------------+
I would use an inline tally for this, as they are Far faster than a recursive CTE solution. Assuming you have low values for Years:
WITH YourTable AS(
SELECT *
FROM (VALUES(1,5,CONVERT(date,'20201201')),
(2,8,CONVERT(date,'20201201')))V(ID,Years, StartDate))
SELECT ID,
V.I + 1 AS [Year],
DATEADD(YEAR, V.I, YT.StartDate) AS StartDate,
DATEADD(DAY, -1, DATEADD(YEAR, V.I+1, YT.StartDate)) AS EndDate
FROM YourTable YT
JOIN (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))V(I) ON YT.Years > V.I;
If you have more than 10~ years you can use either create a tally table, or create an large one inline in a CTE. This would start as:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I --remove the -1 if you don't want to start from 0
FROM N N1, N N2) --100 rows, add more Ns for more rows
...
Of course, I doubt you have 1,000 of years of data.
You can use a recursive CTE:
with cte as (
select id, 1 as year, start_date,
dateadd(day, -1, dateadd(year, 1, start_date)) as end_date,
years as num_years
from t
union all
select id, year + 1, dateadd(year, 1, start_date),
dateadd(day, -1, dateadd(year, 1, start_date)) as end_date,
num_years
from cte
where year < num_years
)
select id, year, start_date, end_date
from cte;
Here is a db<>fiddle.
In a query, you can use the following:
DATEADD(YEAR, 1, DATE_START) - 1
to add this to the table you can just create the extra column, and set it equal to the value of the above, e.g.
UPDATE MyTable
SET DATE_END = DATEADD(YEAR, 1, DATE_START) - 1
If you are working with sql server, then you can try to use operator CROSS APPLY with master.dbo.spt_values table to get list of numbers and generate dates:
select ID,T.number+1 as YEAR,
--generate date_start using T.number
dateadd(year,T.number,START_DATE)date_start,
--generate end_date: adding 1 year to start date
dateadd(dd,-1,dateadd(year,1,dateadd(year,T.number,START_DATE)))date_end
from Table
cross apply
master.dbo.spt_values T
where T.type='P' and T.number<YEARS

SQL: Generate Record Per Month In Date Range

I have a table which describes a value which is valid for a certain period of days / months.
The table looks like this:
+----+------------+------------+-------+
| Id | From | To | Value |
+----+------------+------------+-------+
| 1 | 2018-01-01 | 2018-03-31 | ValA |
| 2 | 2018-01-16 | NULL | ValB |
| 3 | 2018-04-01 | 2018-05-12 | ValC |
+----+------------+------------+-------+
As you can see, the only value still valid on this day is ValB (To is nullable, From isn't).
I am trying to achieve a view on this table like this (assuming I render this view someday in july 2018):
+----------+------------+------------+-------+
| RecordId | From | To | Value |
+----------+------------+------------+-------+
| 1 | 2018-01-01 | 2018-01-31 | ValA |
| 1 | 2018-02-01 | 2018-02-28 | ValA |
| 1 | 2018-03-01 | 2018-03-31 | ValA |
| 2 | 2018-01-16 | 2018-01-31 | ValB |
| 2 | 2018-02-01 | 2018-02-28 | ValB |
| 2 | 2018-03-01 | 2018-03-31 | ValB |
| 2 | 2018-04-01 | 2018-04-30 | ValB |
| 2 | 2018-05-01 | 2018-05-31 | ValB |
| 2 | 2018-06-01 | 2018-06-30 | ValB |
| 3 | 2018-04-01 | 2018-04-30 | ValC |
| 3 | 2018-05-01 | 2018-05-12 | ValC |
+----------+------------+------------+-------+
This view basically creates a record for each record in the table, but splitted by month, using the correct dates (especially minding the start and end dates that are not on the first or the last day of the month).
The one record without a To date (so it's still valid to this day), is rendered until the last day of the month in which I render the view, so at the time of writing, this is july 2018.
This is a simple example, but a solution will seriously help me along. I'll need this for multiple calculations, including proration of amounts.
Here's a table script and some insert statements that you can use:
CREATE TABLE [dbo].[Test]
(
[Id] INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[From] SMALLDATETIME NOT NULL,
[To] SMALLDATETIME NULL,
[Value] NVARCHAR(100) NOT NULL
)
INSERT INTO dbo.Test ([From],[To],[Value])
VALUES
('2018-01-01','2018-03-31','ValA'),
('2018-01-16',null,'ValB'),
('2018-04-01','2018-05-12','ValC');
Thanks in advance!
Generate all months that might appear on your values (with start and end), then join where each month overlaps the period of your values. Change the result so if a month doesn't overlap fully, you just display the limits of your period.
DECLARE #StartDate DATE = '2018-01-01'
DECLARE #EndDate DATE = '2020-01-01'
;WITH GeneratedMonths AS
(
SELECT
StartDate = #StartDate,
EndDate = EOMONTH(#StartDate)
UNION ALL
SELECT
StartDate = DATEADD(MONTH, 1, G.StartDate),
EndDate = EOMONTH(DATEADD(MONTH, 1, G.StartDate))
FROM
GeneratedMonths AS G
WHERE
DATEADD(MONTH, 1, G.StartDate) < #EndDate
)
SELECT
T.Id,
[From] = CASE WHEN T.[From] >= G.StartDate THEN T.[From] ELSE G.StartDate END,
[To] = CASE WHEN G.EndDate >= T.[To] THEN T.[To] ELSE G.EndDate END,
T.Value
FROM
dbo.Test AS T
INNER JOIN GeneratedMonths AS G ON
G.EndDate >= T.[From] AND
G.StartDate <= ISNULL(T.[To], GETDATE())
ORDER BY
T.Id,
G.StartDate
OPTION
(MAXRECURSION 3000)
Recursive cte is very simple way if you don't have a large dataset :
with t as (
select id, [from], [to], Value
from Test
union all
select id, dateadd(mm, 1, [from]), [to], value
from t
where dateadd(mm, 1, [from]) < coalesce([to], getdate())
)
select id, [from], (case when eomonth([from]) <= coalesce([to], cast(getdate() as date))
then eomonth([from]) else coalesce([to], eomonth([from]))
end) as [To],
Value
from t
order by id;
By using date functions and recursive CTE.
with cte as
(
Select Id, Cast([From] as date) as [From], EOMONTH([from]) as [To1],
COALESCE([To],EOMONTH(GETDATE())) AS [TO],Value from test
UNION ALL
Select Id, DATEADD(DAY,1,[To1]),
CASE when EOMONTH(DATEADD(DAY,1,[To1])) > [To] THEN CAST([To] AS DATE)
ELSE EOMONTH(DATEADD(DAY,1,[To1])) END as [To1],
[To],Value from cte where TO1 <> [To]
)
Select Id, [From],[To1] as [To], Value from cte order by Id
#EzLo your solution is good but require setting 2 variables with fixed values.
To avoid this you can do recursive CTE on real data
WITH A AS(
SELECT
T.Id, CAST(T.[From] AS DATE) AS [From], CASE WHEN T.[To]<EOMONTH(T.[From], 0) THEN T.[To] ELSE EOMONTH(T.[From], 0) END AS [To], T.Value, CAST(0 AS INTEGER) AS ADD_M
FROM
TEST T
UNION ALL
SELECT
T.Id, DATEADD(DAY, 1, EOMONTH(T.[From], -1+(A.ADD_M+1))), CASE WHEN T.[To]<EOMONTH(T.[From], A.ADD_M+1) THEN T.[To] ELSE EOMONTH(T.[From], A.ADD_M+1) END AS [To], T.Value, A.ADD_M+1
FROM
TEST T
INNER JOIN A ON T.Id=A.Id AND DATEADD(MONTH, A.ADD_M+1, T.[From]) < CASE WHEN T.[To] IS NULL THEN CAST(GETDATE() AS DATE) ELSE T.[To] END
)
SELECT
A.[Id], A.[From], A.[To], A.[Value]
FROM
A
ORDER BY A.[Id], A.[From]

Impala SQL: Merging rows with overlapping dates. WHERE EXISTS and recursive CTE not supported

I am trying to merge rows with overlapping date intervals in a table in Impala SQL. However the solutions I have found to solve this are not supported by Impala eg. WHERE EXISTS and recursive CTEs.
How would I write a query for this in Impala?
Table: #T
ID StartDate EndDate
1 20170101 20170201
2 20170101 20170401
3 20170505 20170531
4 20170530 20170531
5 20170530 20170831
6 20171001 20171005
7 20171101 20171225
8 20171105 20171110
Required Output:
StartDate EndDate
20170101 20170401
20170505 20170831
20171001 20171005
Example of what I am trying to achieve that is not supported in Impala:
SELECT
s1.StartDate,
MIN(t1.EndDate) AS EndDate
FROM #T s1
INNER JOIN #T t1 ON s1.StartDate <= t1.EndDate
AND NOT EXISTS(SELECT * FROM #T t2
WHERE t1.EndDate >= t2.StartDate AND t1.EndDate < t2.EndDate)
WHERE NOT EXISTS(SELECT * FROM #T s2
WHERE s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate)
GROUP BY s1.StartDate
ORDER BY s1.StartDate
Similar questions:
Merge overlapping date intervals
Eliminate and reduce overlapping date ranges
https://gerireshef.wordpress.com/2010/05/02/packing-date-intervals/
https://www.sqlservercentral.com/Forums/Topic826031-8-1.aspx
select min(StartDate) as StartDate
,max(EndDate) as EndDate
from (select StartDate,EndDate
,count (is_gap) over
(
order by StartDate,ID
) as range_id
from (select ID,StartDate,EndDate
,case
when max (EndDate) over
(
order by StartDate,ID
rows between unbounded preceding
and 1 preceding
) < StartDate
then true
end as is_gap
from t
) t
) t
group by range_id
order by StartDate
;
+------------+------------+
| startdate | enddate |
+------------+------------+
| 2017-01-01 | 2017-04-01 |
| 2017-05-05 | 2017-08-31 |
| 2017-10-01 | 2017-10-05 |
| 2017-11-01 | 2017-12-25 |
+------------+------------+

How do I create a series of specific days in a SQL table?

I have a table called CLASSTIMES that has a column of type bool for each day of the week (SUNDAY,MONDAY,TUESDAY...etc) and also contains a STARTDATE and an ENDDATE column. For example, I would have a row that has MONDAY and WEDNESDAY selected with a STARTDATE of June 3, 2013 and an ENDDATE of June 26, 2013.
What I'd like to be able to do is create a SPROC to insert a series of rows into a table called CLASSCALENDAR that contains all of dates between the StartDate and EndDate for each selected day column.
For example, if the user selected Monday and Wednesay from the ClassTimes table, it would generate:
Row Date
1 2013-06-03
2 2013-06-10
3 2013-06-17
4 2013-06-24
5 2013-06-05
6 2013-06-12
7 2013-06-19
8 2013-06-26
I have tried to set this up but it is a bit over my head. Any help would be much appreciated.
Here's an SQL statement that might work for you.
SELECT course,
Convert(varchar(25), course_date, 101) AS course_date
FROM (SELECT course_no AS course,
DATEADD(dd, rn-1, startdate) AS course_date,
DATENAME(dw, DATEADD(dd, rn-1, startdate)) AS dow
FROM (SELECT row_number() OVER (ORDER BY c1) AS rn
FROM dummy) sub1,
classtimes
WHERE rn <= (DATEDIFF(dd, startdate, enddate)+1)
) list_of_dates,
(SELECT course_no,
(CASE monday WHEN 1 THEN 'Monday' END) AS dow
FROM classtimes
UNION
SELECT course_no,
(CASE tuesday WHEN 1 THEN 'Tuesday' END)
FROM classtimes
UNION
SELECT course_no,
(CASE wednesday WHEN 1 THEN 'Wednesday' END)
FROM classtimes
UNION
SELECT course_no,
(CASE thursday WHEN 1 THEN 'Thursday' END)
FROM classtimes
UNION
SELECT course_no,
(CASE friday WHEN 1 THEN 'Friday' END)
FROM classtimes
UNION
SELECT course_no,
(CASE saturday WHEN 1 THEN 'Saturday' END)
FROM classtimes
UNION
SELECT course_no,
(CASE sunday WHEN 1 THEN 'Sunday' END)
FROM classtimes
) class_days
WHERE list_of_dates.dow = class_days.dow
AND list_of_dates.course = class_days.course_no
ORDER BY course_no,
course_date
I used the query
SELECT course_no AS course,
DATEADD(dd, rn-1, startdate) AS course_date,
DATENAME(dw, DATEADD(dd, rn-1, startdate)) AS dow
FROM (SELECT row_number() OVER (ORDER BY c1) AS rn
FROM dummy) sub1,
classtimes
WHERE rn <= (DATEDIFF(dd, startdate, enddate)+1)
to generate a list of all dates between the startdate and enddate for each course. In order for this query to work correctly, the dummy table must contain at least as many rows as the number of days between the startdate and enddate for each course. So the result of this is a list of days between the startdate and enddate for each course, along with the day of week for that date.
| COURSE | COURSE_DATE | DOW |
------------------------------------
| MATH | 06/03/2013 | Monday |
| MATH | 06/04/2013 | Tuesday |
| MATH | 06/05/2013 | Wednesday |
| MATH | 06/06/2013 | Thursday |
| MATH | 06/07/2013 | Friday |
.........
.........
| MATH | 06/24/2013 | Monday |
| MATH | 06/25/2013 | Tuesday |
| MATH | 06/26/2013 | Wednesday |
I then have a subquery that uses a series of UNIONs to take the day of week columns from the classtimes table and generate the days of the week the classes are to be held. I then just join list_of_dates subquery with the class_days subquery to get the dates the classes are to be held.
| COURSE | COURSE_DATE |
-------------------------
| MATH | 06/03/2013 |
| MATH | 06/05/2013 |
| MATH | 06/10/2013 |
| MATH | 06/12/2013 |
| MATH | 06/17/2013 |
| MATH | 06/19/2013 |
| MATH | 06/24/2013 |
| MATH | 06/26/2013 |
I'm sure there's a more efficient/elegant way to generate the list of the days of the week for the class from the classtimes table (the class_days subquery), but I couldn't think of one.

Combine consecutive date ranges

Using SQL Server 2008 R2,
I'm trying to combine date ranges into the maximum date range given that one end date is next to the following start date.
The data is about different employments. Some employees may have ended their employment and have rejoined at a later time. Those should count as two different employments (example ID 5). Some people have different types of employment, running after each other (enddate and startdate neck-to-neck), in this case it should be considered as one employment in total (example ID 30).
An employment period that has not ended has an enddate that is null.
Some examples is probably enlightening:
declare #t as table (employmentid int, startdate datetime, enddate datetime)
insert into #t values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null)
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
I've been trying different "islands-and-gaps" techniques but haven't been able to crack this one.
The strange bit you see with my use of the date '31211231' is just a very large date to handle your "no-end-date" scenario. I have assumed you won't really have many date ranges per employee, so I've used a simple Recursive Common Table Expression to combine the ranges.
To make it run faster, the starting anchor query keeps only those dates that will not link up to a prior range (per employee). The rest is just tree-walking the date ranges and growing the range. The final GROUP BY keeps only the largest date range built up per starting ANCHOR (employmentid, startdate) combination.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
create table Tbl (
employmentid int,
startdate datetime,
enddate datetime);
insert Tbl values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null);
/*
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
*/
Query 1:
;with cte as (
select a.employmentid, a.startdate, a.enddate
from Tbl a
left join Tbl b on a.employmentid=b.employmentid and a.startdate-1=b.enddate
where b.employmentid is null
union all
select a.employmentid, a.startdate, b.enddate
from cte a
join Tbl b on a.employmentid=b.employmentid and b.startdate-1=a.enddate
)
select employmentid,
startdate,
nullif(max(isnull(enddate,'32121231')),'32121231') enddate
from cte
group by employmentid, startdate
order by employmentid
Results:
| EMPLOYMENTID | STARTDATE | ENDDATE |
-----------------------------------------------------------------------------------
| 5 | December, 03 2007 00:00:00+0000 | August, 26 2011 00:00:00+0000 |
| 5 | May, 02 2013 00:00:00+0000 | (null) |
| 30 | October, 02 2006 00:00:00+0000 | (null) |
| 66 | September, 24 2007 00:00:00+0000 | (null) |
SET NOCOUNT ON
DECLARE #T TABLE(ID INT,FromDate DATETIME, ToDate DATETIME)
INSERT INTO #T(ID,FromDate,ToDate)
SELECT 1,'20090801','20090803' UNION ALL
SELECT 2,'20090802','20090809' UNION ALL
SELECT 3,'20090805','20090806' UNION ALL
SELECT 4,'20090812','20090813' UNION ALL
SELECT 5,'20090811','20090812' UNION ALL
SELECT 6,'20090802','20090802'
SELECT ROW_NUMBER() OVER(ORDER BY s1.FromDate) AS ID,
s1.FromDate,
MIN(t1.ToDate) AS ToDate
FROM #T s1
INNER JOIN #T t1 ON s1.FromDate <= t1.ToDate
AND NOT EXISTS(SELECT * FROM #T t2
WHERE t1.ToDate >= t2.FromDate
AND t1.ToDate < t2.ToDate)
WHERE NOT EXISTS(SELECT * FROM #T s2
WHERE s1.FromDate > s2.FromDate
AND s1.FromDate <= s2.ToDate)
GROUP BY s1.FromDate
ORDER BY s1.FromDate
An alternative solution that uses window functions rather than recursive CTEs
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT
employmentid,
startdate,
enddate,
DATEADD(
DAY,
-COALESCE(
SUM(DATEDIFF(DAY, startdate, enddate)+1) OVER (PARTITION BY employmentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0
),
startdate
) as grp
FROM #t
) withGroup
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
This works by calculating a grp value that will be the same for all consecutive rows. This is achieved by:
Determine totals days the span occupies (+1 as the dates are inclusive)
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
Cumulative sum the days spanned for each employment, ordered by startdate. This gives us the total days spanned by all the previous employment spans
We coalesce with 0 to ensure we dont have NULLs in our cumulative sum of days spanned
We do not include current row in our cumulative sum, this is because we will use the value against startdate rather than enddate (we cant use it against enddate because of the NULLs)
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
Subtract the cumulative days from the startdate to get our grp. This is the crux of the solution.
If the start date increases at the same rate as the days spanned then the days are consecutive, and subtracting the two will give us the same value.
If the startdate increases faster than the days spanned then there is a gap and we will get a new grp value greater than the previous one.
Although grp is a date, the date itself is meaningless we are using just as a grouping value
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
With the results
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| employmentid | startdate | enddate | daysSpanned | cumulativeDaysSpanned | grp |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 | 1363 | 0 | 2007-12-03 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL | NULL | 1363 | 2009-08-08 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | 2011-01-16 00:00:00.000 | 1568 | 0 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2011-01-17 00:00:00.000 | 2012-08-12 00:00:00.000 | 574 | 1568 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2012-08-13 00:00:00.000 | NULL | NULL | 2142 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL | NULL | 0 | 2007-09-24 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
Finally we can GROUP BY grp to get the get rid of the consecutive days.
Use MIN and MAX to get the new startdate and endate
To handle the NULL enddate we give them a large value to get picked up by MAX then convert them back to NULL again
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
) inner3
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
To get the desired result
+--------------+-------------------------+-------------------------+
| employmentid | startdate | enddate |
+--------------+-------------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 |
+--------------+-------------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
We can combine the inner queries to get the query at the start of this answer. Which is shorter, but less explainable
Limitations of all this required that
there are no overlaps of startdate and enddate for an employment. This could produce collisions in our grp.
startdate is not NULL. However this could be overcome by replacing NULL start dates with small date values
Future developers can decipher the window black magic you performed
A modified script for combining all overlapping periods. For example
01.01.2001-01.01.2010
05.05.2005-05.05.2015
will give one period:
01.01.2001-05.05.2015
tbl.enddate must be completed
;WITH cte
AS(
SELECT
a.employmentid
,a.startdate
,a.enddate
from tbl a
left join tbl c on a.employmentid=c.employmentid
and a.startdate > c.startdate
and a.startdate <= dateadd(day, 1, c.enddate)
WHERE c.employmentid IS NULL
UNION all
SELECT
a.employmentid
,a.startdate
,a.enddate
from cte a
inner join tbl c on a.startdate=c.startdate
and (c.startdate = dateadd(day, 1, a.enddate) or (c.enddate > a.enddate and c.startdate <= a.enddate))
)
select distinct employmentid,
startdate,
nullif(max(enddate),'31.12.2099') enddate
from cte
group by employmentid, startdate