Selecting DateTime Values with intervals greater than 45 days - sql

I have a list of ID's and Dates ordered from oldest to newest. I'd like to select all ID's and Dates where the difference to the next previous date (with matching ID) is 45 days or greater.
202185, 2021-10-01 09:35:000
202185, 2021-10-02 09:36:000
202185, 2021-10-03 09:14:000
202185, 2022-02-01 09:22:000
202185, 2022-02-02 09:23:000
301133, 2021-11-01 09:35:000
301133, 2021-11-02 09:36:000
301133, 2021-11-03 09:14:000
301133, 2021-12-06 09:22:000
301133, 2022-01-25 09:23:000
SELECTION RETURNS:
202185, 2022-02-01 09:22:000
301133, 2022-01-25 09:23:000
Is there an efficient way to handle this using SQL Server?
Thanks!

select id, date
from (
select id, [date],
datediff(day, lag([date]) over (partition by id order by [date]), [date]) as daydiff
from [MyTable]
) t
where t.dayDiff >= 45
order by id, [date];
See it here:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=69b5b976154d52a04333fbb1b771b3e1

SELECT A.* FROM MyTable A
WHERE A.date >
( SELECT DATEADD(day, 45, MAX(b.date))
FROM MyTable B
WHERE B.ID = A.ID
AND B.date < A.date
)
See it here:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=038a3d02461f0ea54a14140e6be22370

RHere is a solution using LAG() in a CTE.
CREATE TABLE idz(
ID INT,
DAT DATETIME);
INSERT INTO idz VALUES
(202185, '2021-10-01 09:35:000'),
(202185, '2021-10-02 09:36:000'),
(202185, '2021-10-03 09:14:000'),
(202185, '2022-02-01 09:22:000'),
(202185, '2022-02-02 09:23:000'),
(301133, '2021-11-01 09:35:000'),
(301133, '2021-11-02 09:36:000'),
(301133, '2021-11-03 09:14:000'),
(301133, '2021-12-06 09:22:000'),
(301133, '2022-01-25 09:23:000');
WITH i AS
( SELECT
ID,
DAT,
LAG(DAT) OVER ( PARTITION BY ID
ORDER BY DAT) DAT_1
FROM idz)
SELECT
ID,
DAT,
DAT_1,
DATEDIFF(DAY,DAT_1,DAT) DD
FROM i
WHERE DATEDIFF(DAY,DAT_1,DAT)>= 45
ID | DAT | DAT_1 | DD
-----: | :---------------------- | :---------------------- | --:
202185 | 2022-02-01 09:22:00.000 | 2021-10-03 09:14:00.000 | 121
301133 | 2022-01-25 09:23:00.000 | 2021-12-06 09:22:00.000 | 50
db<>fiddle here
Credit due to shawnt00

Related

Group by date and find median of processing time

I select input date and output date from a database. I use a formula to indicate the processing time. Now, I would like the values ​​to be grouped according to the date of receipt and the median of the processing time to be output for all grouped dates of receipt. Something like this:
The data I select:
input date | output date | processing time
2022-01-03 | 2022-01-03 | 0
2022-01-03 | 2022-01-06 | 3
2022-01-03 | 2022-01-11 | 8
2022-01-05 | 2022-01-10 | 5
2022-01-05 | 2022-01-15 | 10
The output I want:
input date | processing time
2022-01-03 | 3
2022-01-05 | 7.5
My SQL Code:
SELECT [received_date]
,CONVERT(date, [exported_on])
,DATEDIFF(day, [received_date], [exported_on]) AS processing_time
FROM [request] WHERE YEAR (received_date) = 2022
GROUP BY received_date, [exported_on]
ORDER BY received_date
How can I do this? Do I need a temp table to do this, or can I modify my query?
You could try using PERCENTILE_CONT
with cte as (
select input_date,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY processing_time) OVER(PARTITION BY input_date) as Median_Process_Time
FROM tableA
)
SELECT *
FROM cte
GROUP BY input_date, Median_Process_Time
db fiddle
Also you check check out the discussion here How to find the SQL medians for a grouping
Here my solution. Thank you for your help.
SET NOCOUNT ON;
DECLARE #working TABLE(entry_date date, exit_date date, work_time int)
INSERT INTO #working
SELECT [received] AS date_of_entry
,CONVERT(date, [exported]) AS date_of_exit
,DATEDIFF(day, [received], [exported]) AS processing_time
FROM [zsdt].[dbo].[antrag] WHERE YEAR([received]) = 2022 AND scanner_name IS NOT NULL AND exportiert_am IS NOT NULL AND NOT scanner_name = 'AP99'
GROUP BY [received], [exported]
ORDER BY [received] ASC
;WITH CTE AS
( SELECT entry_date,
work_time,
[half1] = NTILE(2) OVER(PARTITION BY entry_date ORDER BY work_time),
[half2] = NTILE(2) OVER(PARTITION BY entry_date ORDER BY work_time DESC)
FROM #working
WHERE work_time IS NOT NULL
)
SELECT entry_date,
(MAX(CASE WHEN Half1 = 1 THEN work_time END) +
MIN(CASE WHEN Half2 = 1 THEN work_time END)) / 2.0
FROM CTE
GROUP BY entry_date;

Use CTE to capture volume counts by day and hour

I want to see current patient volumes by days of the week and by hour based off of their registered Start date and Discharge date.
Ex: John doe Start date: 01-01-2022 13:00:00 ; End date 01-01-2022 16:25:00
I would like the data to show each Hour John doe is in the Facility. So output would look like something like this:
John Doe 01-01-2022 ( Hour) 13
John Doe 01-01-2022 ( Hour) 14
John Doe 01-01-2022 ( Hour) 15
John Doe 01-01-2022 ( Hour) 16
I have my start date and discharge dates in a temp table and thought I could use a CTE to get this done, but not sure how to link the CTE results to my table.
How do I get the breakdown of volumes by hour so I can count how many people are in the facility each hour based off of the start and discharge dates?
DECLARE #minDateTime AS DATETIME;
DECLARE #maxDateTime AS DATETIME;
SET #minDateTime = '2022-05-01 05:28:05.000';
SET #maxDateTime = '2022-05-02 06:50:00.000';
;
WITH Dates_CTE
AS
(SELECT #minDateTime AS Dates
UNION ALL
SELECT Dateadd(hh, 1, Dates)
FROM Dates_CTE
WHERE Dates < Dateadd(hh, -1, #maxDateTime)
)
SELECT --Convert(VARCHAR,Year,Dates)
Dates
,Year(Dates) as 'Year'
,Month(Dates) as 'Month'
,Day(Dates) as 'day'
,Datename(DW,Dates) as 'DayName'
,DATEPART(HOUR,Dates) as 'hh'
FROM Dates_CTE
OPTION (MAXRECURSION 0)
Sample Data
AccountNumber ServiceDateTime RegistrationTypeDischargeDateTime
G111 2021-05-07 10:44:19.000 2021-05-07 14:30:00.000
G222 2021-05-08 09:59:00.000 2021-05-08 10:56:00.000
G333 2021-07-02 11:35:07.000 2021-07-02 11:53:00.000
G444 2021-07-07 07:57:16.000 2021-07-07 13:35:00.000
If we have the enter and leave datestamp for each patient in another table we can join you calendar table and group by hour to find the id's of patients present and count them.
create table inTreatment(
patientid int,
enter datetime,
leave datetime
);
insert into inTreatment values
(1,'2022-05-01 09:00:00','2022-05-01 18:00:00'),
(2,'2022-05-01 11:00:00','2022-05-01 14:00:00'),
(3,'2022-05-01 12:00:00','2022-05-02 15:00:00')
GO
3 rows affected
DECLARE #minDateTime AS DATETIME;
DECLARE #maxDateTime AS DATETIME;
SET #minDateTime = '2022-05-01 05:00:00.000';
SET #maxDateTime = '2022-05-02 06:00:00.000';
;
WITH Dates_CTE
AS
(SELECT #minDateTime AS Dates
UNION ALL
SELECT Dateadd(hh, 1, Dates)
FROM Dates_CTE
WHERE Dates < Dateadd(hh, -1, #maxDateTime)
)
SELECT --Convert(VARCHAR,Year,Dates)
string_agg(patientid,',') patients,
count(patientid) no_pats,
Dates
--,Year(Dates) as 'Year'
--,Month(Dates) as 'Month'
--,Day(Dates) as 'day'
----,Datename(DW,Dates) as 'DayName'
--,DATEPART(HOUR,Dates) as 'hh'
FROM Dates_CTE d
left join InTreatment i
on enter <= Dates and leave >= Dates
group by dates
OPTION (MAXRECURSION 0)
GO
patients | no_pats | Dates
:------- | ------: | :----------------------
null | 0 | 2022-05-01 05:00:00.000
null | 0 | 2022-05-01 06:00:00.000
null | 0 | 2022-05-01 07:00:00.000
null | 0 | 2022-05-01 08:00:00.000
1 | 1 | 2022-05-01 09:00:00.000
1 | 1 | 2022-05-01 10:00:00.000
1,2 | 2 | 2022-05-01 11:00:00.000
1,2,3 | 3 | 2022-05-01 12:00:00.000
1,2,3 | 3 | 2022-05-01 13:00:00.000
1,2,3 | 3 | 2022-05-01 14:00:00.000
1,3 | 2 | 2022-05-01 15:00:00.000
1,3 | 2 | 2022-05-01 16:00:00.000
1,3 | 2 | 2022-05-01 17:00:00.000
1,3 | 2 | 2022-05-01 18:00:00.000
3 | 1 | 2022-05-01 19:00:00.000
3 | 1 | 2022-05-01 20:00:00.000
3 | 1 | 2022-05-01 21:00:00.000
3 | 1 | 2022-05-01 22:00:00.000
3 | 1 | 2022-05-01 23:00:00.000
3 | 1 | 2022-05-02 00:00:00.000
3 | 1 | 2022-05-02 01:00:00.000
3 | 1 | 2022-05-02 02:00:00.000
3 | 1 | 2022-05-02 03:00:00.000
3 | 1 | 2022-05-02 04:00:00.000
3 | 1 | 2022-05-02 05:00:00.000
db<>fiddle here
Given this table and sample data:
CREATE TABLE dbo.Admissions
(
AccountNumber char(4),
ServiceDateTime datetime,
RegistrationTypeDischargeDateTime datetime
);
INSERT dbo.Admissions VALUES
('G111','20210507 10:44:19','20210507 14:30:00');
Here's how I would do it:
DECLARE #min datetime = '20210507 05:28:05',
#max datetime = '20210508 06:50:00';
DECLARE #d tinyint = DATEDIFF(HOUR, #min, #max),
#floor datetime = SMALLDATETIMEFROMPARTS
(YEAR(#min), MONTH(#min), DAY(#min), DATEPART(HOUR, #min), 0);
; -- see sqlblog.org/cte
WITH hours(h) AS
(
SELECT #floor UNION ALL
SELECT DATEADD(HOUR, 1, h)
FROM hours WHERE h <= #max
)
SELECT a.AccountNumber, Date = CONVERT(date, hours.h),
Hour = DATEPART(HOUR, hours.h)
FROM hours INNER JOIN dbo.Admissions AS a
ON a.ServiceDateTime < DATEADD(HOUR, 1, hours.h)
AND a.RegistrationTypeDischargeDateTime >= hours.h
OPTION (MAXRECURSION 32767);
Output:
AccountNumber
Date
Hour
G111
2021-05-07
10
G111
2021-05-07
11
G111
2021-05-07
12
G111
2021-05-07
13
G111
2021-05-07
14
Example db<>fiddle
You may need to tweak <=/</>=/> depending on how you want to handle edge cases (e.g. entry or exit right on the hour, or entry and exit < 1 hour).
For a fast, simple way (vs CTE) CROSS APPLY using a numbers table or a tally function. In this case I'm using dbo.fnTally
select a.AccountNumber, cast(a.ServiceDateTime as date) [Date],
datepart(hour, dateadd(hour, fn.N, cast(a.ServiceDateTime as time))) hr
from #Admissions a
cross apply dbo.fnTally(0, datediff(hour,
a.ServiceDateTime,
a.RegistrationTypeDischargeDateTime)) fn;
dbo.fnTally
CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
Jeff Moden Script on SSC: https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
**********************************************************************************************************************/
(#ZeroOrOne BIT, #MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN WITH
H2(N) AS ( SELECT 1
FROM (VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
)V(N)) --16^2 or 256 rows
, H4(N) AS (SELECT 1 FROM H2 a, H2 b) --16^4 or 65,536 rows
, H8(N) AS (SELECT 1 FROM H4 a, H4 b) --16^8 or 4,294,967,296 rows
SELECT N = 0 WHERE #ZeroOrOne = 0 UNION ALL
SELECT TOP(#MaxN)
N = ROW_NUMBER() OVER (ORDER BY N)
FROM H8
;

SQL merge overlapping time intervals

I have a sqlite dataset like this
Startdate | Enddate | ID
2019-04-29 | 2019-05-04 | 12
2019-04-23 | 2019-04-25 | 533
2019-04-23 | 2019-04-24 | 44
2019-04-24 | 2019-04-25 | 79
I'm trying to get an output that is sorted in range from startdate to startdate plus day in a loop until last Endate.
The plan is to get all observation from min(Startdate) to min(Startdate) +1 and min(Startdate) plus +2 and so one
Range | ID
2019-04-23 2019-04-24 | 44
2019-04-23 2019-04-25 | 44
2019-04-23 2019-04-25 | 533
2019-04-23 2019-04-25 | 79
2019-04-23 2019-05-04 | 44
2019-04-23 2019-05-04 | 533
2019-04-23 2019-05-04 | 79
2019-04-23 2019-05-04 | 12
I'm not sure have to achieve this
I believe that the following will provide the results that you want :-
WITH cte(sdate,edate) AS
(
SELECT (SELECT min(startdate) FROM mytable),(SELECT min(startdate) FROM mytable)
UNION ALL
SELECT sdate,date(edate,'+1 days')
FROM cte
WHERE edate <= (
SELECT max(enddate) FROM mytable
)
LIMIT 1000 /* just in case limit to 1000 rows */
)
SELECT sdate||' '||edate AS Range, id
FROM cte
JOIN mytable ON startdate >= sdate
AND enddate <= edate
AND edate IN (SELECT enddate FROM mytable)
Example/Test/Demo
DROP TABLE IF EXISTS mytable;
CREATE TABLE IF NOT EXISTS mytable (startdate TEXT, enddate TEXT, id INTEGER PRIMARY KEY);
INSERT INTO mytable VALUES
('2019-04-29','2019-05-04',12),
('2019-04-23','2019-04-25',533),
('2019-04-23','2019-04-24',44),
('2019-04-24','2019-04-25',79)
;
WITH cte(sdate,edate) AS
(
SELECT (SELECT min(startdate) FROM mytable),(SELECT min(startdate) FROM mytable)
UNION ALL
SELECT sdate,date(edate,'+1 days')
FROM cte
WHERE edate <= (
SELECT max(enddate) FROM mytable
)
LIMIT 1000 /* just in case limit to 1000 rows */
)
SELECT sdate||' '||edate AS Range, id
FROM cte
JOIN mytable ON startdate >= sdate
AND enddate <= edate
AND edate IN (SELECT enddate FROM mytable)
;
DROP TABLE IF EXISTS mytable;
The result is :-
The only difference being the order of the ID column (I can't see any order from your expected result)
As per the comment LIMIT 1000 is not required

How to return same row multiple times with multiple conditions

My knowledge is pretty basic so your help would be highly appreciated.
I'm trying to return the same row multiple times when it meets the condition (I only have access to select query).
I have a table of more than 500000 records with Customer ID, Start Date and End Date, where end date could be null.
I am trying to add a new column called Week_No and list all rows accordingly. For example if the date range is more than one week, then the row must be returned multiple times with corresponding week number. Also I would like to count overlapping days, which will never be more than 7 (week) per row and then count unavailable days using second table.
Sample data below
t1
ID | Start_Date | End_Date
000001 | 12/12/2017 | 03/01/2018
000002 | 13/01/2018 |
000003 | 02/01/2018 | 11/01/2018
...
t2
ID | Unavailable
000002 | 14/01/2018
000003 | 03/01/2018
000003 | 04/01/2018
000003 | 08/01/2018
...
I cannot pass the stage of adding week no. I have tried using CASE and UNION ALL but keep getting errors.
declare #week01start datetime = '2018-01-01 00:00:00'
declare #week01end datetime = '2018-01-07 00:00:00'
declare #week02start datetime = '2018-01-08 00:00:00'
declare #week02end datetime = '2018-01-14 00:00:00'
...
SELECT
ID,
'01' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week01end and End_Date >= #week01start)
or (Start_Date <= #week01end and End_Date is null)
UNION ALL
SELECT
ID,
'02' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week02end and End_Date >= #week02start)
or (Start_Date <= #week02end and End_Date is null)
...
The new table should look like this
ID | Week_No | Year | Start_Date | End_Date | Overlap | Unavail_Days
000001 | 01 | 2018 | 12/12/2017 | 03/01/2018 | 3 |
000002 | 02 | 2018 | 13/01/2018 | | 2 | 1
000003 | 01 | 2018 | 02/01/2018 | 11/01/2018 | 6 | 2
000003 | 02 | 2018 | 02/01/2018 | 11/01/2018 | 4 | 1
...
business wise i cannot understand what you are trying to achieve. You can use the following code though to calculate your overlapping days etc. I did it the way you asked, but i would recommend a separate table, like a Time dimension to produce a "cleaner" solution
/*sample data set in temp table*/
select '000001' as id, '2017-12-12'as start_dt, ' 2018-01-03' as end_dt into #tmp union
select '000002' as id, '2018-01-13 'as start_dt, null as end_dt union
select '000003' as id, '2018-01-02' as start_dt, '2018-01-11' as end_dt
/*calculate week numbers and week diff according to dates*/
select *,
DATEPART(WK,start_dt) as start_weekNumber,
DATEPART(WK,end_dt) as end_weekNumber,
case
when DATEPART(WK,end_dt) - DATEPART(WK,start_dt) > 0 then (DATEPART(WK,end_dt) - DATEPART(WK,start_dt)) +1
else (52 - DATEPART(WK,start_dt)) + DATEPART(WK,end_dt)
end as WeekDiff
into #tmp1
from
(
SELECT *,DATEADD(DAY, 2 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [start_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [startdt_Week_End_Date],
DATEADD(DAY, 2 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_End_Date]
from #tmp
) s
/*cte used to create duplicates when week diff is over 1*/
;with x as
(
SELECT TOP (10) rn = ROW_NUMBER() --modify the max you want
OVER (ORDER BY [object_id])
FROM sys.all_columns
ORDER BY [object_id]
)
/*final query*/
select --*
ID,
start_weekNumber+ (r-1) as Week,
DATEPART(YY,start_dt) as [YEAR],
start_dt,
end_dt,
null as Overlap,
null as unavailable_days
from
(
select *,
ROW_NUMBER() over (partition by id order by id) r
from
(
select d.* from x
CROSS JOIN #tmp1 AS d
WHERE x.rn <= d.WeekDiff
union all
select * from #tmp1
where WeekDiff is null
) a
)a_ext
order by id,start_weekNumber
--drop table #tmp1,#tmp
The above will produce the results you want except the overlap and unavailable columns. Instead of just counting weeks, i added the number of week in the year using start_dt, but you can change that if you don't like it:
ID Week YEAR start_dt end_dt Overlap unavailable_days
000001 50 2017 2017-12-12 2018-01-03 NULL NULL
000001 51 2017 2017-12-12 2018-01-03 NULL NULL
000001 52 2017 2017-12-12 2018-01-03 NULL NULL
000002 2 2018 2018-01-13 NULL NULL NULL
000003 1 2018 2018-01-02 2018-01-11 NULL NULL
000003 2 2018 2018-01-02 2018-01-11 NULL NULL

Combine consecutive date ranges

Using SQL Server 2008 R2,
I'm trying to combine date ranges into the maximum date range given that one end date is next to the following start date.
The data is about different employments. Some employees may have ended their employment and have rejoined at a later time. Those should count as two different employments (example ID 5). Some people have different types of employment, running after each other (enddate and startdate neck-to-neck), in this case it should be considered as one employment in total (example ID 30).
An employment period that has not ended has an enddate that is null.
Some examples is probably enlightening:
declare #t as table (employmentid int, startdate datetime, enddate datetime)
insert into #t values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null)
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
I've been trying different "islands-and-gaps" techniques but haven't been able to crack this one.
The strange bit you see with my use of the date '31211231' is just a very large date to handle your "no-end-date" scenario. I have assumed you won't really have many date ranges per employee, so I've used a simple Recursive Common Table Expression to combine the ranges.
To make it run faster, the starting anchor query keeps only those dates that will not link up to a prior range (per employee). The rest is just tree-walking the date ranges and growing the range. The final GROUP BY keeps only the largest date range built up per starting ANCHOR (employmentid, startdate) combination.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
create table Tbl (
employmentid int,
startdate datetime,
enddate datetime);
insert Tbl values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null);
/*
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
*/
Query 1:
;with cte as (
select a.employmentid, a.startdate, a.enddate
from Tbl a
left join Tbl b on a.employmentid=b.employmentid and a.startdate-1=b.enddate
where b.employmentid is null
union all
select a.employmentid, a.startdate, b.enddate
from cte a
join Tbl b on a.employmentid=b.employmentid and b.startdate-1=a.enddate
)
select employmentid,
startdate,
nullif(max(isnull(enddate,'32121231')),'32121231') enddate
from cte
group by employmentid, startdate
order by employmentid
Results:
| EMPLOYMENTID | STARTDATE | ENDDATE |
-----------------------------------------------------------------------------------
| 5 | December, 03 2007 00:00:00+0000 | August, 26 2011 00:00:00+0000 |
| 5 | May, 02 2013 00:00:00+0000 | (null) |
| 30 | October, 02 2006 00:00:00+0000 | (null) |
| 66 | September, 24 2007 00:00:00+0000 | (null) |
SET NOCOUNT ON
DECLARE #T TABLE(ID INT,FromDate DATETIME, ToDate DATETIME)
INSERT INTO #T(ID,FromDate,ToDate)
SELECT 1,'20090801','20090803' UNION ALL
SELECT 2,'20090802','20090809' UNION ALL
SELECT 3,'20090805','20090806' UNION ALL
SELECT 4,'20090812','20090813' UNION ALL
SELECT 5,'20090811','20090812' UNION ALL
SELECT 6,'20090802','20090802'
SELECT ROW_NUMBER() OVER(ORDER BY s1.FromDate) AS ID,
s1.FromDate,
MIN(t1.ToDate) AS ToDate
FROM #T s1
INNER JOIN #T t1 ON s1.FromDate <= t1.ToDate
AND NOT EXISTS(SELECT * FROM #T t2
WHERE t1.ToDate >= t2.FromDate
AND t1.ToDate < t2.ToDate)
WHERE NOT EXISTS(SELECT * FROM #T s2
WHERE s1.FromDate > s2.FromDate
AND s1.FromDate <= s2.ToDate)
GROUP BY s1.FromDate
ORDER BY s1.FromDate
An alternative solution that uses window functions rather than recursive CTEs
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT
employmentid,
startdate,
enddate,
DATEADD(
DAY,
-COALESCE(
SUM(DATEDIFF(DAY, startdate, enddate)+1) OVER (PARTITION BY employmentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0
),
startdate
) as grp
FROM #t
) withGroup
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
This works by calculating a grp value that will be the same for all consecutive rows. This is achieved by:
Determine totals days the span occupies (+1 as the dates are inclusive)
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
Cumulative sum the days spanned for each employment, ordered by startdate. This gives us the total days spanned by all the previous employment spans
We coalesce with 0 to ensure we dont have NULLs in our cumulative sum of days spanned
We do not include current row in our cumulative sum, this is because we will use the value against startdate rather than enddate (we cant use it against enddate because of the NULLs)
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
Subtract the cumulative days from the startdate to get our grp. This is the crux of the solution.
If the start date increases at the same rate as the days spanned then the days are consecutive, and subtracting the two will give us the same value.
If the startdate increases faster than the days spanned then there is a gap and we will get a new grp value greater than the previous one.
Although grp is a date, the date itself is meaningless we are using just as a grouping value
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
With the results
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| employmentid | startdate | enddate | daysSpanned | cumulativeDaysSpanned | grp |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 | 1363 | 0 | 2007-12-03 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL | NULL | 1363 | 2009-08-08 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | 2011-01-16 00:00:00.000 | 1568 | 0 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2011-01-17 00:00:00.000 | 2012-08-12 00:00:00.000 | 574 | 1568 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2012-08-13 00:00:00.000 | NULL | NULL | 2142 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL | NULL | 0 | 2007-09-24 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
Finally we can GROUP BY grp to get the get rid of the consecutive days.
Use MIN and MAX to get the new startdate and endate
To handle the NULL enddate we give them a large value to get picked up by MAX then convert them back to NULL again
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
) inner3
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
To get the desired result
+--------------+-------------------------+-------------------------+
| employmentid | startdate | enddate |
+--------------+-------------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 |
+--------------+-------------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
We can combine the inner queries to get the query at the start of this answer. Which is shorter, but less explainable
Limitations of all this required that
there are no overlaps of startdate and enddate for an employment. This could produce collisions in our grp.
startdate is not NULL. However this could be overcome by replacing NULL start dates with small date values
Future developers can decipher the window black magic you performed
A modified script for combining all overlapping periods. For example
01.01.2001-01.01.2010
05.05.2005-05.05.2015
will give one period:
01.01.2001-05.05.2015
tbl.enddate must be completed
;WITH cte
AS(
SELECT
a.employmentid
,a.startdate
,a.enddate
from tbl a
left join tbl c on a.employmentid=c.employmentid
and a.startdate > c.startdate
and a.startdate <= dateadd(day, 1, c.enddate)
WHERE c.employmentid IS NULL
UNION all
SELECT
a.employmentid
,a.startdate
,a.enddate
from cte a
inner join tbl c on a.startdate=c.startdate
and (c.startdate = dateadd(day, 1, a.enddate) or (c.enddate > a.enddate and c.startdate <= a.enddate))
)
select distinct employmentid,
startdate,
nullif(max(enddate),'31.12.2099') enddate
from cte
group by employmentid, startdate