Select date ranges where periods do not overlap - sql

I have two tables each containing the start and end dates of several periods. I want an efficient way to find periods (date ranges) where dates are within the ranges of the first table but not within ranges of the second table.
For example, if this is my first table (with dates that I want)
start_date end_date
2001-01-01 2010-01-01
2012-01-01 2015-01-01
And this is my second table (with dates that I do not want)
start_date end_date
2002-01-01 2006-01-01
2003-01-01 2004-01-01
2005-01-01 2009-01-01
2014-01-01 2018-01-01
Then output looks like
start_date end_date
2001-01-01 2001-12-31
2009-01-02 2010-01-01
2012-01-01 2013-12-31
We can safely assume that periods in the first table are non-overlapping, but can not assume periods in the second table are overlapping.
I already have a method for doing this but it is an order of magnitude slower than I can accept. So hoping someone can propose a faster approach.
My present method looks like:
merge table 2 into non-overlapping periods
find the inverse of table 2
join overlapping periods from table 1 and inverted-table-2
I am sure there is a faster way if some of these steps can be merged together.
In more detail
/* (1) merge overlapping preiods */
WITH
spell_starts AS (
SELECT [start_date], [end_date]
FROM table_2 s1
WHERE NOT EXISTS (
SELECT 1
FROM table_2 s2
WHERE s2.[start_date] < s1.[start_date]
AND s1.[start_date] <= s2.[end_date]
)
),
spell_ends AS (
SELECT [start_date], [end_date]
FROM table_2 t1
WHERE NOT EXISTS (
SELECT 1
FROM table_2 t2
WHERE t2.[start_date] <= t1.[end_date]
AND t1.[end_date] < t2.[end_date]
)
)
SELECT s.[start_date], MIN(e.[end_date]) as [end_date]
FROM spell_starts s
INNER JOIN spell_ends e
ON s.[start_date] <= e.[end_date]
GROUP BY s.[start_date]
/* (2) inverse table 2 */
SELECT [start_date], [end_date]
FROM (
/* all forward looking spells */
SELECT DATEADD(DAY, 1, [end_date]) AS [start_date]
,LEAD(DATEADD(DAY, -1, [start_date]), 1, '9999-01-01') OVER ( ORDER BY [start_date] ) AS [end_date]
FROM merge_table_2
UNION ALL
/* back looking spell (to 'origin of time') created separately */
SELECT '1900-01-01' AS [start_date]
,DATEADD(DAY, -1, MIN([start_date])) AS [end_date]
FROM merge_table_2
) k
WHERE [start_date] <= [end_date]
AND '1900-01-01' <= [start_date]
AND [end_date] <= '9999-01-01'
/* (3) overlap spells */
SELECT IIF(t1.start_date < t2.start_date, t2.start_date, t1.start_date) AS start_date
,IIF(t1.end_date < t2.end_date, t1.end_date, t2.end_date) AS end_date
FROM table_1 t1
INNER JOIN inverse_merge_table_2 t2
ON t1.start_date < t2.end_date
AND t2.start_date < t1.end_date

Hope this helps. I have comment the two ctes I am using for explanation purposes
Here you go:
drop table table1
select cast('2001-01-01' as date) as start_date, cast('2010-01-01' as date) as end_date into table1
union select '2012-01-01', '2015-01-01'
drop table table2
select cast('2002-01-01' as date) as start_date, cast('2006-01-01' as date) as end_date into table2
union select '2003-01-01', '2004-01-01'
union select '2005-01-01', '2009-01-01'
union select '2014-01-01', '2018-01-01'
/***** Solution *****/
-- This cte put all dates into one column
with cte as
(
select t
from
(
select start_date as t
from table1
union all
select end_date
from table1
union all
select dateadd(day,-1,start_date) -- for table 2 we bring the start date back one day to make sure we have nothing in the forbidden range
from table2
union all
select dateadd(day,1,end_date) -- for table 2 we bring the end date forward one day to make sure we have nothing in the forbidden range
from table2
)a
),
-- This one adds an end date using the lead function
cte2 as (select t as s, coalesce(LEAD(t,1) OVER ( ORDER BY t ),t) as e from cte a)
-- this query gets all intervals not in table2 but in table1
select s, e
from cte2 a
where not exists(select 1 from table2 b where s between dateadd(day,-1,start_date) and dateadd(day,1,end_date) and e between dateadd(day,-1,start_date) and dateadd(day,1,end_date) )
and exists(select 1 from table1 b where s between start_date and end_date and e between start_date and end_date)
and s <> e

If you want performance, then you want to use window functions.
The idea is to:
Combine the dates with flags of being in-and-out of the two tables.
Use cumulative sums to determine where dates start being in-and-out.
Then you have a gaps-and-islands problem where you want to combine the results.
Finally, filter on the particular periods you want.
This looks like:
with dates as (
select start_date as dte, 1 as in1, 0 as in2
from table1
union all
select dateadd(day, 1, end_date), -1, 0
from table1
union all
select start_date, 0, 1 as in2
from table2
union all
select dateadd(day, 1, end_date), 0, -1
from table2
),
d as (
select dte,
sum(sum(in1)) over (order by dte) as ins_1,
sum(sum(in2)) over (order by dte) as ins_2
from dates
group by dte
)
select min(dte), max(next_dte)
from (select d.*, dateadd(day, -1, lead(dte) over (order by dte)) as next_dte,
row_number() over (order by dte) as seqnum,
row_number() over (partition by case when ins_1 >= 1 and ins_2 = 0 then 'in' else 'out' end order by dte) as seqnum_2
from d
) d
group by (seqnum - seqnum_2)
having max(ins_1) > 0 and max(ins_2) = 0
order by min(dte);
Here is a db<>fiddle.

Thanks to #zip and #Gordon for their answers. Both were superior to my initial approach. However, the following solution was faster than both of their approaches in my environment & context:
WITH acceptable_starts AS (
SELECT [start_date] FROM table1 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.
UNION ALL
SELECT DATEADD(DAY, 1, [end_date]) FROM table2 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.[end_date]
)
),
acceptable_ends AS (
SELECT [end_date] FROM table1 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
)
UNION ALL
SELECT DATEADD(DAY, -1, [start_date]) FROM table2 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
)
)
SELECT s.[start_date], MIN(e.[end_date]) AS [end_date]
FROM acceptable_starts
INNER JOIN acceptable_ends
ON s.[start_date] < e.[end_date]

Related

SQL Finding Missing Dates Between Ranges

I have a table of jobs that have ran for different source systems. These have a "RunDate" and then the FromDate and ToDate.
I want to find the gaps where we are missing any dates in the FromDate and ToDate fields to make sure that we have covered the data in those periods.
Many examples I've looked at work where a single date misses in a single column of ranges, however I have a From and To range that I need to test and ultimately work out where a date may be missed.
CREATE TABLE #temptable ( [SourceSystem] nchar(3), [RunDate] datetime, [ResubmitCount] int, [FromDate] date, [ToDate] date )
INSERT INTO #temptable
VALUES
( N'ILG', N'2021-07-28T15:35:23.207', 0, N'2021-06-01T00:00:00', N'2021-06-01T00:00:00' ),
( N'ILG', N'2021-07-28T15:35:23.707', 0, N'2021-06-05T00:00:00', N'2021-06-06T00:00:00' ),
( N'AAP', N'2021-07-28T15:35:23.833', 0, N'2021-06-01T00:00:00', N'2021-06-02T00:00:00' ),
( N'AAP', N'2021-07-28T15:35:23.833', 0, N'2021-06-04T00:00:00', N'2021-06-04T00:00:00' ),
( N'ZZP', N'2021-07-28T15:35:23.897', 0, N'2021-06-05T00:00:00', N'2021-06-05T00:00:00' )
DROP TABLE #temptable
So obviously using the example above I should be able to ascertain that the period between 2021-06-02 and 2021-06-04 for SourceSystem ILG and period 2021-06-03 to 2021-06-03 is missing for SourceSystem AAP.
Struggling to make it work for ranges, I can work with single dates but the system doesn't log them in this fasion.
UPDATE
I took the accepted answer and then tagged some code to it to be able to explode all the individual dates between the ranges specified.
Included the code in case anyone needs in the future.
WITH
a AS (
SELECT
SourceSystem, FromDate, ToDate,
LEAD(FromDate) OVER(
PARTITION BY SourceSystem
ORDER BY RunDate
) AS NextDate
FROM dbo.WDSubmission ws
),
gap_periods AS
(
SELECT
SourceSystem,
DATEADD(DAY, 1, ToDate) AS GapBeg,
DATEADD(DAY, -1, NextDate) AS GapFin
FROM a
WHERE
NextDate IS NOT NULL AND
DATEADD(DAY, -2, NextDate) >= ToDate
--AND a.SourceSystem = 'OGI'
) , E00(N) AS (SELECT 1 UNION ALL SELECT 1)
,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
,E32(N) AS (SELECT 1 FROM E16 a, E16 b)
,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32)
,DateRange AS
(
SELECT ExplodedDate = DATEADD(DAY,N - 1,'2021-01-01')
FROM cteTally
WHERE N <= DATEDIFF(DAY,'2021-01-01',GETDATE())
)
SELECT *
FROM gap_periods eh
JOIN DateRange d ON d.ExplodedDate >= eh.GapBeg
AND d.ExplodedDate <= eh.GapFin;
Try this:
WITH
a AS (
SELECT
SourceSystem, FromDate, ToDate,
LEAD(FromDate) OVER(
PARTITION BY SourceSystem
ORDER BY RunDate
) AS NextDate
FROM #temptable
)
SELECT
SourceSystem,
DATEADD(DAY, 1, ToDate) AS GapBeg,
DATEADD(DAY, -1, NextDate) AS GapFin
FROM a
WHERE
NextDate IS NOT NULL AND
DATEADD(DAY, -2, NextDate) >= ToDate;
Result:
+--------------+------------+------------+
| SourceSystem | GapBeg | GapFin |
+--------------+------------+------------+
| AAP | 2021-06-03 | 2021-06-03 |
| ILG | 2021-06-02 | 2021-06-04 |
+--------------+------------+------------+
db-fiddle
Its hard to display the missing dates and also it would be too much information to analyze. Bbut you can get the number of days skipped in between using the below query :
select *,datediff(day,lag(ToDate)over(partition by sourcesystem order by ToDate),fromdate) from #temptable
The data would look something like this (last column tells the number of days skipped) :

SQL Union as Subquery to create Date Ranges from Start Date

I have three tabels, each of them has a date column (the date column is an INT field and needs to stay that way). I need a UNION accross all three tables so that I get the list of unique dates in accending order like this:
20040602
20051215
20060628
20100224
20100228
20100422
20100512
20100615
Then I need to add a column to the result of the query where I subtract one from each date and place it one row above as the end date. Basically I need to generate the end date from the start date somehow and this is what I got so far (not working):
With Query1 As (
Select date_one As StartDate
From table_one
Union
Select date_two As StartDate
From table_two
Union
Select date_three e As StartDate
From table_three
Order By Date Asc
)
Select Query1.StartDate - 1 As EndDate
From Query1
Thanks a lot for your help!
Building on your existing union cte, we can use lead() in the outer query to get the start_date of the next record, and withdraw 1 from it.
with q as (
select date_one start_date from table_one
union select date_two from table_two
union select date_three from table_three
)
select
start_date,
dateadd(day, -1, lead(start_date) over(order by start_date)) end_date
from q
order by start_date
If the datatype the original columns are numeric, then you need to do some casting before applying date functions:
with q as (
select cast(cast(date_one as varchar(8)) as date) start_date from table_one
union select cast(cast(date_two as varchar(8)) as date) from table_two
union select cast(cast(date_three as varchar(8)) as date) from table_three
)
select
start_date,
dateadd(day, -1, lead(start_date) over(order by start_date)) end_date
from q
order by start_date

Find missing date as compare to calendar

I am explain problem in short.
select distinct DATE from #Table where DATE >='2016-01-01'
Output :
Date
2016-11-23
2016-11-22
2016-11-21
2016-11-19
2016-11-18
Now i need to find out missing date a compare to our calender dates from year '2016'
i.e. Here date '2016-11-20' is missing.
I want list of missing dates.
Thanks for reading this. Have nice day.
You need to generate dates and you have to find missing ones. Below with recursive cte i have done it
;WITH CTE AS
(
SELECT CONVERT(DATE,'2016-01-01') AS DATE1
UNION ALL
SELECT DATEADD(DD,1,DATE1) FROM CTE WHERE DATE1<'2016-12-31'
)
SELECT DATE1 MISSING_ONE FROM CTE
EXCEPT
SELECT * FROM #TABLE1
option(maxrecursion 0)
Using CTE and get all dates in CTE table then compare with your table.
CREATE TABLE #yourTable(_Values DATE)
INSERT INTO #yourTable(_Values)
SELECT '2016-11-23' UNION ALL
SELECT '2016-11-22' UNION ALL
SELECT '2016-11-21' UNION ALL
SELECT '2016-11-19' UNION ALL
SELECT '2016-11-18'
DECLARE #DATE DATE = '2016-11-01'
;WITH CTEYear (_Date) AS
(
SELECT #DATE
UNION ALL
SELECT DATEADD(DAY,1,_Date)
FROM CTEYear
WHERE _Date < EOMONTH(#DATE,0)
)
SELECT * FROM CTEYear
WHERE NOT EXISTS(SELECT 1 FROM #yourTable WHERE _Date = _Values)
OPTION(maxrecursion 0)
You need to generate the dates and then find the missing ones. A recursive CTE is one way to generate a handful of dates. Another way is to use master..spt_values as a list of numbers:
with n as (
select row_number() over (order by (select null)) - 1 as n
from master..spt_values
),
d as (
select dateadd(day, n.n, cast('2016-01-01' as date)) as dte
from n
where n <= 365
)
select d.date
from d left join
#table t
on d.dte = t.date
where t.date is null;
If you are happy enough with ranges of missing dates, you don't need a list of dates at all:
select date, (datediff(day, date, next_date) - 1) as num_missing
from (select t.*, lead(t.date) over (order by t.date) as next_date
from #table t
where t.date >= '2016-01-01'
) t
where next_date <> dateadd(day, 1, date);

Calculating per day in SQL

I have an sql table like that:
Id Date Price
1 21.09.09 25
2 31.08.09 16
1 23.09.09 21
2 03.09.09 12
So what I need is to get min and max date for each id and dif in days between them. It is kind of easy. Using SQLlite syntax:
SELECT id,
min(date),
max(date),
julianday(max(date)) - julianday(min(date)) as dif
from table group by id
Then the tricky one: how can I receive the price per day during this difference period. I mean something like this:
ID Date PricePerDay
1 21.09.09 25
1 22.09.09 0
1 23.09.09 21
2 31.08.09 16
2 01.09.09 0
2 02.09.09 0
2 03.09.09 12
I create a cte as you mentioned with calendar but dont know how to get the desired result:
WITH RECURSIVE
cnt(x) AS (
SELECT 0
UNION ALL
SELECT x+1 FROM cnt
LIMIT (SELECT ((julianday('2015-12-31') - julianday('2015-01-01')) + 1)))
SELECT date(julianday('2015-01-01'), '+' || x || ' days') as date FROM cnt
p.s. If it will be in sqllite syntax-would be awesome!
You can use a recursive CTE to calculate all the days between the min date and max date. The rest is just a left join and some logic:
with recursive cte as (
select t.id, min(date) as thedate, max(date) as maxdate
from t
group by id
union all
select cte.id, date(thedate, '+1 day') as thedate, cte.maxdate
from cte
where cte.thedate < cte.maxdate
)
select cte.id, cte.date,
coalesce(t.price, 0) as PricePerDay
from cte left join
t
on cte.id = t.id and cte.thedate = t.date;
One method is using a tally table.
To build a list of dates and join that with the table.
The date stamps in the DD.MM.YY format are first changed to the YYYY-MM-DD date format.
To make it possible to actually use them as a date in the SQL.
At the final select they are formatted back to the DD.MM.YY format.
First some test data:
create table testtable (Id int, [Date] varchar(8), Price int);
insert into testtable (Id,[Date],Price) values (1,'21.09.09',25);
insert into testtable (Id,[Date],Price) values (1,'23.09.09',21);
insert into testtable (Id,[Date],Price) values (2,'31.08.09',16);
insert into testtable (Id,[Date],Price) values (2,'03.09.09',12);
The SQL:
with Digits as (
select 0 as n
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9
),
t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
Dates as (
select Id, date(MinDate,'+'||(d2.n*10+d1.n)||' days') as [Date]
from (
select Id, min([Date]) as MinDate, max([Date]) as MaxDate
from t
group by Id
) q
join Digits d1
join Digits d2
where date(MinDate,'+'||(d2.n*10+d1.n)||' days') <= MaxDate
)
select d.Id,
(substr(d.[Date],9,2)||'.'||substr(d.[Date],6,2)||'.'||substr(d.[Date],3,2)) as [Date],
coalesce(t.Price,0) as Price
from Dates d
left join t on (d.Id = t.Id and d.[Date] = t.[Date])
order by d.Id, d.[Date];
The recursive SQL below was totally inspired by the excellent answer from Gordon Linoff.
And a recursive SQL is probably more performant for this anyway.
(He should get the 15 points for the accepted answer).
The difference in this version is that the datestamps are first formatted to YYYY-MM-DD.
with t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
cte as (
select Id, min([Date]) as [Date], max([Date]) as MaxDate from t
group by Id
union all
select Id, date([Date], '+1 day'), MaxDate from cte
where [Date] < MaxDate
)
select cte.Id,
(substr(cte.[Date],9,2)||'.'||substr(cte.[Date],6,2)||'.'||substr(cte.[Date],3,2)) as [Date],
coalesce(t.Price, 0) as PricePerDay
from cte
left join t
on (cte.Id = t.Id and cte.[Date] = t.[Date])
order by cte.Id, cte.[Date];

SQL calculate date segments within calendar year

What I need is to calculate the missing time periods within the calendar year given a table such as this in SQL:
DatesTable
|ID|DateStart |DateEnd |
1 NULL NULL
2 2015-1-1 2015-12-31
3 2015-3-1 2015-12-31
4 2015-1-1 2015-9-30
5 2015-1-1 2015-3-31
5 2015-6-1 2015-12-31
6 2015-3-1 2015-6-30
6 2015-7-1 2015-10-31
Expected return would be:
1 2015-1-1 2015-12-31
3 2015-1-1 2015-2-28
4 2015-10-1 2015-12-31
5 2015-4-1 2015-5-31
6 2015-1-1 2015-2-28
6 2015-11-1 2015-12-31
It's essentially work blocks. What I need to show is the part of the calendar year which was NOT worked. So for ID = 3, he worked from 3/1 through the rest of the year. But he did not work from 1/1 till 2/28. That's what I'm looking for.
You can do it using LEAD, LAG window functions available from SQL Server 2012+:
;WITH CTE AS (
SELECT ID,
LAG(DateEnd) OVER (PARTITION BY ID ORDER BY DateEnd) AS PrevEnd,
DateStart,
DateEnd,
LEAD(DateStart) OVER (PARTITION BY ID ORDER BY DateEnd) AS NextStart
FROM DatesTable
)
SELECT ID, DateStart, DateEnd
FROM (
-- Get interval right before current [DateStart, DateEnd] interval
SELECT ID,
CASE
WHEN DateStart IS NULL THEN '20150101'
WHEN DateStart > start THEN start
ELSE NULL
END AS DateStart,
CASE
WHEN DateStart IS NULL THEN '20151231'
WHEN DateStart > start THEN DATEADD(d, -1, DateStart)
ELSE NULL
END AS DateEnd
FROM CTE
CROSS APPLY (SELECT COALESCE(DATEADD(d, 1, PrevEnd), '20150101')) x(start)
-- If there is no next interval then get interval right after current
-- [DateStart, DateEnd] interval (up-to end of year)
UNION ALL
SELECT ID, DATEADD(d, 1, DateEnd) AS DateStart, '20151231' AS DateEnd
FROM CTE
WHERE DateStart IS NOT NULl -- Do not re-examine [Null, Null] interval
AND NextStart IS NULL -- There is no next [DateStart, DateEnd] interval
AND DateEnd < '20151231' -- Current [DateStart, DateEnd] interval
-- does not terminate on 31/12/2015
) AS t
WHERE t.DateStart IS NOT NULL
ORDER BY ID, DateStart
The idea behind the above query is simple: for every [DateStart, DateEnd] interval get 'not worked' interval right before it. If there is no interval following the current interval, then also get successive 'not worked' interval (if any).
Also note that I assume that if DateStart is NULL then DateStart is also NULL for the same ID.
Demo here
If your data is not too big, this approach will work. It expands all the days and ids and then re-groups them:
with d as (
select cast('2015-01-01' as date)
union all
select dateadd(day, 1, d)
from d
where d < cast('2015-12-31' as date)
),
td as (
select *
from d cross join
(select distinct id from t) t
where not exists (select 1
from t t2
where d.d between t2.startdate and t2.enddate
)
)
select id, min(d) as startdate, max(d) as enddate
from (select td.*,
dateadd(day, - row_number() over (partition by id order by d), d) as grp
from td
) td
group by id, grp
order by id, grp;
An alternative method relies on cumulative sums and similar functionality that is much easier to expression in SQL Server 2012+.
Somewhat simpler approach I think.
Basically create a list of dates for all work block ranges (A). Then create a list of dates for the whole year for each ID (B). Then remove the A from B. Compile the remaining list of dates into date ranges for each ID.
DECLARE #startdate DATETIME, #enddate DATETIME
SET #startdate = '2015-01-01'
SET #enddate = '2015-12-31'
--Build date ranges from remaining date list
;WITH dateRange(ID, dates, Grouping)
AS
(
SELECT dt1.id, dt1.Dates, dt1.Dates + row_number() over (order by dt1.id asc, dt1.Dates desc) AS Grouping
FROM
(
--Remove (A) from (B)
SELECT distinct dt.ID, tmp.Dates FROM DatesTable dt
CROSS APPLY
(
--GET (B) here
SELECT DATEADD(DAY, number, #startdate) [Dates]
FROM master..spt_values
WHERE type = 'P' AND DATEADD(DAY, number, #startdate) <= #enddate
) tmp
left join
(
--GET (A) here
SELECT DISTINCT T.Id,
D.Dates
FROM DatesTable AS T
INNER JOIN master..spt_values as N on N.number between 0 and datediff(day, T.DateStart, T.DateEnd)
CROSS APPLY (select dateadd(day, N.number, T.DateStart)) as D(Dates)
WHERE N.type ='P'
) dr
ON dr.Id = dt.Id and dr.Dates = tmp.Dates
WHERE dr.id is null
) dt1
)
SELECT ID, CAST(MIN(Dates) AS DATE) DateStart, CAST(MAX(Dates) AS DATE) DateEnd
FROM dateRange
GROUP BY ID, Grouping
ORDER BY ID
Heres the code:
http://sqlfiddle.com/#!3/f3615/1
I hope this helps!