SQL Server - Efficient generation of dates in a range - sql

Using SQL Server 2016.
I have a stored procedure that produces a list of options against a range of dates. Carriage options against days for clarity but unimportant to the specifics here.
The first step in the stored procedure generates a list of dates to store additional data against, and generating this list is taking substantially longer than the balance of the code. While this process is individual short, the number of calls means that this one piece of code is putting the system under more load than anything else.
With that in mind I have been testing efficiency of several options.
Iterative common table expression:
CREATE FUNCTION [dbo].[udf_DateRange_CTE] (#StartDate DATE,#EndDate DATE)
RETURNS #Return TABLE (Date DATE NOT NULL)
AS
BEGIN
WITH dates(date)
AS (SELECT #StartDate [Date]
UNION ALL
SELECT DATEADD(dd, 1, [Date])
FROM dates
WHERE [Date] < #EndDate
)
INSERT INTO #Return
SELECT date
FROM dates
OPTION (MAXRECURSION 0)
RETURN
END
A while loop:
CREATE FUNCTION [dbo].[udf_DateRange_While] (#StartDate DATE,#EndDate DATE)
RETURNS #Retun TABLE (Date DATE NOT NULL,PRIMARY KEY (Date))
AS
BEGIN
WHILE #StartDate <= #EndDate
BEGIN
INSERT INTO #Retun
VALUES (#StartDate)
SET #StartDate = DATEADD(DAY,1,#StartDate)
END
RETURN
END
A lookup from a pre-populated table of dates:
CREATE FUNCTION [dbo].[udf_DateRange_query] (#StartDate DATE,#EndDate DATE)
RETURNS #Return TABLE (Date DATE NOT NULL)
AS
BEGIN
INSERT INTO #Return
SELECT Date
FROM DateLookup
WHERE Date >= #StartDate
AND Date <= #EndDate
RETURN
END
In terms of efficiency I have test generating a years worth of dates, 1000 times and had the following results:
CTE: 10.0 Seconds
While: 7.7 Seconds
Query: 2.6 Seconds
From this the query is definitely the faster option but does require a permanent table of dates that needs to be created and maintained. This means that the query is no loner "self-contained" and it would be possible to request a date outside of the given date range.
Does anyone know of any more efficient ways of generating dates for a range, or any optimisation I can apply to the above?
Many thanks.

You can try like following. This should be fast compared CTE or WHILE loop.
DECLARE #StartDate DATETIME = Getdate() - 1000
DECLARE #EndTime DATETIME = Getdate()
SELECT *
FROM (SELECT #StartDate + RN AS DATE
FROM (SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT NULL)) RN
FROM master..[spt_values]) T) T1
WHERE T1.DATE <= #EndTime
ORDER BY DATE
Note: This will work for day difference <= 2537 days
If you want to support more range, you can use CROSS JOIN on master..[spt_values] to generate range between 0 - 6436369 days like following.
DECLARE #StartDate DATETIME = Getdate() - 10000
DECLARE #EndTime DATETIME = Getdate()
SELECT #StartDate + RN AS DATE FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..[spt_values] T1
CROSS JOIN master..[spt_values] T2
) T
WHERE RN <= DATEDIFF(DAY,#StartDate,#EndTime)

Related

Generate random records for datetime columns by stored procedure in SQL

I want to generate 5 random records from a field which is a datetime column and contains several records of (OrderDate) for a given date range using stored procedure for the table named Orders
CREATE PROCEDURE test
#StartDate DATETIME = NULL,
#EndDate DATETIME = NULL,
AS
BEGIN
SELECT OrderDate = DATEADD(......)
FROM Orders
END
May I get some help!
A while loop works ok for this purpose, especially if you're concerned with limiting your randomness to a bounded date range.
The downside is that potentially many insert queries get executed vs. a single insert for a recursive CTE as in the other answer.
create procedure dbo.spGenDates2
#MinDate datetime,
#MaxDate datetime,
#RecordCount int = 5
as
SET NOCOUNT ON;
DECLARE #Range int, #DayOffset int, #Cnt int
SET #Range = DATEDIFF(dd, #MinDate, #MaxDate)
SET #Cnt = 1
WHILE #Cnt <= #RecordCount
BEGIN
SET #DayOffset = RAND() * (#Range + 1)
INSERT INTO _test (Dt) VALUES(DATEADD(dd, #DayOffset, #MinDate))
SET #Cnt = #Cnt + 1
END
Based on your syntax I'm assuming you're using SQL Server...
Note that you cannot reliably use the sql random number generator function RAND() within the context of a single query because it does not get reseeded per row so you end up receiving the same, single random number for each row result. Instead, an approach using NEWID() converted into a numeric does the trick when generating random values within the execution of a single query.
Here's a procedure that will give you n number of sample dates in the near past.
create procedure dbo.spGenDates
#MaxDate datetime,
#RecordCount int = 5
as
WITH dates as (
SELECT DATEADD(MILLISECOND, ABS(CHECKSUM(NEWID())) * -1, #MaxDate) D,
1 as Cnt
UNION ALL
SELECT DATEADD(MILLISECOND, ABS(CHECKSUM(NEWID())) * -1, #MaxDate) D,
x.Cnt + 1 as Cnt
FROM dates x
WHERE x.Cnt < #RecordCount
)
INSERT INTO _test (Dt)
SELECT D
FROM dates
The wording of the question has been clarified (see comments on another answer) to be a desire to SELECT 5 random sample dates within a bounded range from a table.
A query like this will yield the desired result.
SELECT TOP (5) OrderDate
FROM Orders
WHERE OrderDate >= #StartDate
AND OrderDate < #EndDate
ORDER BY NEWID()

Selecting all dates from a table within a date range and including 1 row per empty date

I am trying to refactor some code in an ASP.Net website and having a problem with a stored procedure I am writing.
What I want to do is get a date range, then select all data within that range from a table BUT if a date is not present I need to still select a row.
My idea for this as you can see in the code below is to create a temporary table, and populate it with all the dates within my date range, then join this onto the table I am selecting from however this does not work. Am I doing something wrong here? The tempDate column is always null in this join however I have checked the table and it deffinately has data in it.
-- Parameters
DECLARE #DutyDate datetime='2012-01-01 00:00:00'
DECLARE #InstructorID nvarchar(2) = N'29'
DECLARE #datesTBL TABLE (tempDate DATETIME)
-- Variables
DECLARE #StartDate DATETIME
DECLARE #EndDate DATETIME
SELECT
#StartDate =StartDate,
#EndDate = EndDate
FROM
DutyPeriodTbl
WHERE
(StartDate <= #DutyDate)
AND
(EndDate >= #DutyDate)
DECLARE #d DATETIME = #StartDate
WHILE #d<=#EndDate
BEGIN
INSERT INTO #datesTBL VALUES (CONVERT(DATETIME, #d, 102))
SET #d=DATEADD(day,1,#d)
END
SELECT
dt.tempDate ,
InstructorID, EventStart,
EventEnd, cancelled,
cancelledInstructor,
EventType, DevName,
Room, SimLocation,
ClassLocation, Event,
Duration, TrainingDesc,
Crew, Notes,
LastAmended, InstLastAmended,
ChangeAcknowledged, Type,
OtherType, OtherTypeDesc,
CourseType
FROM
OpsInstructorEventsView iv
LEFT OUTER JOIN
#datesTBL dt
ON
CONVERT(DATETIME, iv.EventStart, 102) = CONVERT(DATETIME, dt.tempDate, 102)
WHERE
InstructorID = #InstructorID
AND
EventStart BETWEEN CONVERT(DATETIME, #StartDate, 102) AND CONVERT(DATETIME, #EndDate, 102)
ORDER BY
EventStart
There are several ways of dealing with missing rows, but all are about having another set of data to combine with your current results.
That could be derived from your results, created by a CTE or other process (such as your example), or (my preference) by using a permanent template to join against.
The template in your case could just be a table of dates, like your #datesTBL. The difference being that it's created in advance with, for example, 100 years worth of dates.
Your query may then be similar to your example, but I would try the following...
SELECT
dt.tempDate ,
InstructorID, EventStart,
EventEnd, cancelled,
cancelledInstructor,
EventType, DevName,
Room, SimLocation,
ClassLocation, Event,
Duration, TrainingDesc,
Crew, Notes,
LastAmended, InstLastAmended,
ChangeAcknowledged, Type,
OtherType, OtherTypeDesc,
CourseType
FROM
#datesTBL dt
LEFT OUTER JOIN
OpsInstructorEventsView iv
ON iv.EventStart >= dt.tempDate
AND iv.EventStart < dt.tempDate + 1
AND iv.InstructorID = #InstructorID
WHERE
dt.tempDate >= #StartDate
AND dt.tempDate <= #EndDate
ORDER BY
dt.tempDate,
iv.EventStart
This puts the calendar template on the LEFT, and so makes many queries easier as you know the calendar's date field is always populated, is always a date only (no time part) value, is in order, is simple to GROUP BY, etc.
Well, idea is the same, but i would write function, that returns table with all dates in period. Look at this:
Create Function [dbo].[Interval]
(
#DateFrom Date,
#DateTo Date
)
Returns #tab Table
(
MyDate DateTime
)
As
Begin
Declare #Days int
Declare #i int
Set #Days = DateDiff(Day, #DateFrom, #DateTo)
Set #i = 0;
While (#Days > #i)
Begin
Insert Into #tab(MyDate)
Values (DateAdd(Day, #i, #DateTo))
Set #i = #i + 1
End
return
End
And reuse the function whenever you need it..
Select *
From [dbo].[Interval]('2011-01-01', GETDATE())

How can I generate a temporary table filled with dates in SQL Server 2000?

I need to make a temporary table that holds of range of dates, as well as a couple of columns that hold placeholder values (0) for future use. The dates I need are the first day of each month between $startDate and $endDate where these variables can be several years apart.
My original sql statement looked like this:
select dbo.FirstOfMonth(InsertDate) as Month, 0 as Trials, 0 as Sales
into #dates
from customer
group by dbo.FirstOfMonth(InsertDate)
"FirstOfMonth" is a user-defined function I made that pretty much does what it says, returning the first day of the month for the provided date with the time at exactly midnight.
This produced almost exactly what I needed until I discovered there were occasionally gaps in my dates where I had a few months were there were no records insert dates. Since my result must still have the missing months I need a different approach.
I have added the following declarations to the stored procedure anticipating their need for the range of the dates I need ...
declare $startDate set $startDate = select min(InsertDate) from customer
declare $endDate set $endDate = select max(InsertDate) from customer
... but I have no idea what to do from here.
I know this question is similar to this question but, quite frankly, that answer is over my head (I don't often work with SQL and when I do it tends to be on older versions of SQL Server) and there are a few minor differences that are throwing me off.
I needed something similar, but all DAYS instead of all MONTHS.
Using the code from MatBailie as a starting point, here's the SQL for creating a permanent table with all dates from 2000-01-01 to 2099-12-31:
CREATE TABLE _Dates (
d DATE,
PRIMARY KEY (d)
)
DECLARE #dIncr DATE = '2000-01-01'
DECLARE #dEnd DATE = '2100-01-01'
WHILE ( #dIncr < #dEnd )
BEGIN
INSERT INTO _Dates (d) VALUES( #dIncr )
SELECT #dIncr = DATEADD(DAY, 1, #dIncr )
END
This will quickly populate a table with 170 years worth of dates.
CREATE TABLE CalendarMonths (
date DATETIME,
PRIMARY KEY (date)
)
DECLARE
#basedate DATETIME,
#offset INT
SELECT
#basedate = '01 Jan 2000',
#offset = 1
WHILE (#offset < 2048)
BEGIN
INSERT INTO CalendarMonths SELECT DATEADD(MONTH, #offset, date) FROM CalendarMonths
SELECT #offset = #offset + #offset
END
You can then use it by LEFT joining on to that table, for the range of dates you require.
I would probably use a Calendar table. Create a permanent table in your database and fill it with all of the dates. Even if you covered a 100 year range, the table would still only have ~36,525 rows in it.
CREATE TABLE dbo.Calendar (
calendar_date DATETIME NOT NULL,
is_weekend BIT NOT NULL,
is_holiday BIT NOT NULL,
CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED (calendar_date)
)
Once the table is created, just populate it once in a loop, so that it's always out there and available to you.
Your query then could be something like this:
SELECT
C.calendar_date,
0 AS trials,
0 AS sales
FROM
dbo.Calendar C
WHERE
C.calendar_date BETWEEN #start_date AND #end_date AND
DAY(C.calendar_date) = 1
You can join in the Customers table however you need to, outer joining on FirstOfMonth(InsertDate) = C.calendar_date if that's what you want.
You can also include a column for day_of_month if you want which would avoid the overhead of calling the DAY() function, but that's fairly trivial, so it probably doesn't matter one way or another.
This of course will not work in SQL-Server 2000 but in a modern database where you don't want to create a permanent table. You can use a table variable instead creating a table so you can left join the data try this. Change the DAY to HOUR etc to change the increment type.
declare #CalendarMonths table (date DATETIME, PRIMARY KEY (date)
)
DECLARE
#basedate DATETIME,
#offset INT
SELECT
#basedate = '01 Jan 2014',
#offset = 1
INSERT INTO #CalendarMonths SELECT #basedate
WHILE ( DATEADD(DAY, #offset, #basedate) < CURRENT_TIMESTAMP)
BEGIN
INSERT INTO #CalendarMonths SELECT DATEADD(HOUR, #offset, date) FROM #CalendarMonths where DATEADD(DAY, #offset, date) < CURRENT_TIMESTAMP
SELECT #offset = #offset + #offset
END
A starting point of a useful kludge to specify a range or specific list of dates:
SELECT *
FROM
(SELECT CONVERT(DateTime,'2017-1-1')+number AS [Date]
FROM master..spt_values WHERE type='P' AND number<370) AS DatesList
WHERE DatesList.Date IN ('2017-1-1','2017-4-14','2017-4-17','2017-12-25','2017-12-26')
You can get 0 to 2047 out of master..spt_values WHERE type='P', so that's five and a half year's worth of dates if you need it!
Tested below and it works, though it's a bit convoluted.
I assigned arbitrary values to the dates for the test.
DECLARE #SD smalldatetime,
#ED smalldatetime,
#FD smalldatetime,
#LD smalldatetime,
#Mct int,
#currct int = 0
SET #SD = '1/15/2011'
SET #ED = '2/02/2012'
SET #FD = (DATEADD(dd, -1*(Datepart(dd, #SD)-1), #sd))
SET #LD = (DATEADD(dd, -1*(Datepart(dd, #ED)-1), #ED))
SET #Mct = DATEDIFF(mm, #FD, #LD)
CREATE TABLE #MyTempTable (FoM smalldatetime, Trials int, Sales money)
WHILE #currct <= #Mct
BEGIN
INSERT INTO #MyTempTable (FoM, Trials, Sales)
VALUES
(DATEADD(MM, #currct, #FD), 0, 0)
SET #currct = #currct + 1
END
SELECT * FROM #MyTempTable
DROP TABLE #MyTempTable
For SQL Server 2000, this stackoverflow post looks promising for a way to temporarily generate dates calculated off of a start and end date. It's not exactly the same but quite similar. This post has a very in-depth answer on truncating dates, if needed.
In case anyone stumbles on this question and is working in PostgreSQL instead of SQL Server 2000, here is how you might do it there...
PostgreSQL has a nifty series generating function. For your example, you could use this series of all days instead of generating an entire calendar table, and then do groupings and matchups from there.
SELECT current_date + s.a AS dates FROM generate_series(0,14,7) AS s(a);
dates
------------
2004-02-05
2004-02-12
2004-02-19
(3 rows)
SELECT * FROM generate_series('2008-03-01 00:00'::timestamp,
'2008-03-04 12:00', '10 hours');
generate_series
---------------------
2008-03-01 00:00:00
2008-03-01 10:00:00
2008-03-01 20:00:00
2008-03-02 06:00:00
2008-03-02 16:00:00
2008-03-03 02:00:00
2008-03-03 12:00:00
2008-03-03 22:00:00
2008-03-04 08:00:00
(9 rows)
I would also look into date_trunc from PostgreSQL using 'month' for the truncator field to maybe refactor your original query to easily match with a date_trunc version of the calendar series.
select top (datediff(D,#start,#end)) dateadd(D,id-1,#start)
from BIG_TABLE_WITH_NO_JUMPS_IN_ID
declare #start datetime
set #start = '2016-09-01'
declare #end datetime
set #end = '2016-09-30'
create table #Date
(
table_id int identity(1,1) NOT NULL,
counterDate datetime NULL
);
insert into #Date select top (datediff(D,#start,#end)) NULL from SOME_TABLE
update #Date set counterDate = dateadd(D,table_id - 1, #start)
The code above should populate the table with all the dates between the start and end. You would then just join on this table to get all of the dates needed. If you only needed a certain day of each month, you could dateadd a month instead.
SELECT P.Id
, DATEADD ( DD, -P.Id, P.Date ) AS Date
FROM (SELECT TOP 1000 ROW_NUMBER () OVER (ORDER BY (SELECT NULL)) AS Id, CAST(GETDATE () AS DATE) AS Date FROM master.dbo.spt_values) AS P
This query returns a table calendar for the last 1000 days or so. It can be put in a temporary or other table.
Create a table variable containing a date for each month in a year:
declare #months table (reportMonth date, PRIMARY KEY (reportMonth));
declare #start date = '2018', #month int = 0; -- base 0 month
while (#month < 12)
begin
insert into #months select dateAdd(month, #month, #start);
select #month = #month + 1;
end
--verify
select * from #months;
This is by far the quickest method I have found (much quicker than inserting rows 1 by 1 in a WHILE loop):
DECLARE #startDate DATE = '1900-01-01'
DECLARE #endDate DATE = '2050-01-01'
SELECT DATEADD(DAY, sequenceNumber, #startDate) AS TheDate
INTO #TheDates
FROM (
SELECT ones.n + 10*tens.n + 100*hundreds.n + 1000*thousands.n + 10000*tenthousands.n AS sequenceNumber
FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) hundreds(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) thousands(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tenthousands(n)
WHERE ones.n + 10*tens.n + 100*hundreds.n + 1000*thousands.n + 10000*tenthousands.n <= DATEDIFF(day, #startDate, #endDate)
) theNumbers
SELECT *
FROM #TheDates
ORDER BY TheDate
The recursive answer:
DECLARE #startDate AS date = '20220315';
DECLARE #endDate AS date = '20230316'; -- inclusive
WITH cte_minutes(dt)
AS (
SELECT
DATEFROMPARTS(YEAR(#startDate), MONTH(#startDate), 1)
UNION ALL
SELECT
DATEADD(month, 1, dt)
FROM
cte_minutes
WHERE DATEADD(month, 1, dt) < #endDate
)
SELECT
dt
into #dates
FROM
cte_minutes
WHERE
dt >= #startDate
AND
dt <= #endDate
OPTION (MAXRECURSION 2000);
DROP TABLE dbo.#dates

SQL first of every month

Supposing that I wanted to write table valued function in SQL that returns a table with the first day of every month between the argument dates, what is the simplest way to do this?
For example fnFirstOfMonths('10/31/10', '2/17/11') would return a one-column table with 11/1/10, 12/1/10, 1/1/11, and 2/1/11 as the elements.
My first instinct is just to use a while loop and repeatedly insert first days of months until I get to before the start date. It seems like there should be a more elegant way to do this though.
Thanks for any help you can provide.
Something like this would work without being inside a function:
DECLARE #LowerDate DATE
SET #LowerDate = GETDATE()
DECLARE #UpperLimit DATE
SET #UpperLimit = '20111231'
;WITH Firsts AS
(
SELECT
DATEADD(DAY, -1 * DAY(#LowerDate) + 1, #LowerDate) AS 'FirstOfMonth'
UNION ALL
SELECT
DATEADD(MONTH, 1, f.FirstOfMonth) AS 'FirstOfMonth'
FROM
Firsts f
WHERE
DATEADD(MONTH, 1, f.FirstOfMonth) <= #UpperLimit
)
SELECT *
FROM Firsts
It uses a thing called CTE (Common Table Expression) - available in SQL Server 2005 and up and other database systems.
In this case, I start the recursive CTE by determining the first of the month for the #LowerDate date specified, and then I iterate adding one month to the previous first of month, until the upper limit is reached.
Or if you want to package it up in a stored function, you can do so, too:
CREATE FUNCTION dbo.GetFirstOfMonth(#LowerLimit DATE, #UpperLimit DATE)
RETURNS TABLE
AS
RETURN
WITH Firsts AS
(
SELECT
DATEADD(DAY, -1 * DAY(#LowerLimit) + 1, #LowerLimit) AS 'FirstOfMonth'
UNION ALL
SELECT
DATEADD(MONTH, 1, f.FirstOfMonth) AS 'FirstOfMonth'
FROM
Firsts f
WHERE
DATEADD(MONTH, 1, f.FirstOfMonth) <= #UpperLimit
)
SELECT * FROM Firsts
and then call it like this:
SELECT * FROM dbo.GetFirstOfMonth('20100522', '20100831')
to get an output like this:
FirstOfMonth
2010-05-01
2010-06-01
2010-07-01
2010-08-01
PS: by using the DATE datatype - which is present in SQL Server 2008 and newer - I fixed the two "bugs" that Richard commented about. If you're on SQL Server 2005, you'll have to use DATETIME instead - and deal with the fact you're getting a time portion, too.
create function dbo.fnFirstOfMonths(#d1 datetime, #d2 datetime)
returns table as return
select dateadd(m,datediff(m,0,#d1)+v.number,0) as FirstDay
from master..spt_values v
where v.type='P' and v.number between 0 and datediff(m, #d1, #d2)
and dateadd(m,datediff(m,0,#d1)+v.number,0) between #d1 and #d2
GO
Notes
master..spt_values is a source for general purpose sequence numbers in SQL Server
dateadd(m, datediff(m is a technique for working out the first day of month for any date
+v.number is used to increase it by one month each time
0 and datediff(m, #d1, #d2) this condition gives us all the numbers we need to generate a first-of-month date for each month between #d1 and #d2, inclusive of both months
and dateadd(m,datediff(m,0,#d1)+v.number,0) between #d1 and #d2 the final filter to verify that the first-of-month date generated is between #d1 and #d2
Performance comparison against marc_s's code
Summary
8220 ms (CTE)
4173 ms (master..spt_values)
Test
declare #t table (dt datetime)
declare #d datetime
declare #i int
set nocount on
set #d = GETDATE()
set #i = 0
while #i < 10000
begin
insert #t select * from dbo.getfirstofmonth('20090102', '20100506')
delete #t
set #i = #i + 1
end
print datediff(ms, #d, getdate())
set #d = GETDATE()
set #i = 0
while #i < 10000
begin
insert #t select * from dbo.fnfirstofmonths('20090102', '20100506')
delete #t
set #i = #i + 1
end
print datediff(ms, #d, getdate())
Performante
It will loop just between the months involved (4 times in the example):
set dateformat mdy;
declare #date1 smalldatetime,#date2 smalldatetime,#i int
set #date1= '10-31-2010'
set #date2= '02-17-2011'
set #i=1
while(#i<=DATEDIFF(mm,#date1,#date2))
begin
select convert(smalldatetime,CONVERT(varchar(6),DATEADD(mm,#i,#date1),112)+'01',112)
set #i=#i+1
end
I realize this isn't a function, but I'm going to throw this into the mix anyway.
select cal_date from calendar
where day_of_month = 1
and cal_date between '2011-01-01' and '2012-01-01'
This calendar table runs on a PostgreSQL server at work. I'll port it to SQL Server tonight, and run some speed comparisons. (Why? Because this stuff is fun, that's why.)
Just in case anybody is still reading this ...
I cannot imaging that any of the aforementioned functions is faster than this:
declare #DatFirst date = '20101031', #DatLast date = '21110217';
declare #DatFirstOfFirstMonth date = dateadd(day,1-day(#DatFirst),#DatFirst);
select DatFirstOfMonth = dateadd(month,n,#DatFirstOfFirstMonth)
from (
select top (datediff(month,#DatFirstOfFirstMonth,#DatLast)+1)
n=row_number() over (order by (select 1))-1
from (values (1),(1),(1),(1),(1),(1),(1),(1)) a (n)
cross join (values (1),(1),(1),(1),(1),(1),(1),(1)) b (n)
cross join (values (1),(1),(1),(1),(1),(1),(1),(1)) c (n)
cross join (values (1),(1),(1),(1),(1),(1),(1),(1)) d (n)
) x

What is a good way to find gaps in a set of datespans?

What is a way to find gaps in a set of date spans?
For example, I have these date spans:
1/ 1/11 - 1/10/11
1/13/11 - 1/15/11
1/20/11 - 1/30/11
Then I have a start and end date of 1/7/11 and 1/14/11.
I want to be able to tell that between 1/10/11 and 1/13/11 there is a gap so the start and end date is not possible. Or I want to return only the datespans up to the first gap encountered.
If this can be done in SQL server that would be good.
I was thinking to go through each date to find out if it lands in a datespan... if it does not then there's a gap on that day.
Jump to 2nd last code block for: *I want to be able to tell that
between 1/10/11 and 1/13/11 there is
a gap so the start and end date is*
not possible.
Jump to last code block for: *I want to return only
the datespans up to the first gap
encountered.*
First of all, here's a virtual table to discuss
create table spans (date1 datetime, date2 datetime);
insert into spans select '20110101', '20110110';
insert into spans select '20110113', '20110115';
insert into spans select '20110120', '20110130';
This is a query that will list, individually, all the dates in the calendar
declare #startdate datetime, #enddate datetime
select #startdate = '20110107', #enddate = '20110114'
select distinct a.date1+v.number
from spans A
inner join master..spt_values v
on v.type='P' and v.number between 0 and datediff(d, a.date1, a.date2)
-- we don't care about spans that don't intersect with our range
where A.date1 <= #enddate
and #startdate <= A.date2
Armed with this query, we can now test to see if there are any gaps, by
counting the days in the calendar against the expected number of days
declare #startdate datetime, #enddate datetime
select #startdate = '20110107', #enddate = '20110114'
select case when count(distinct a.date1+v.number)
= datediff(d,#startdate, #enddate) + 1
then 'No gaps' else 'Gap' end
from spans A
inner join master..spt_values v
on v.type='P' and v.number between 0 and datediff(d, a.date1, a.date2)
-- we don't care about spans that don't intersect with our range
where A.date1 <= #enddate
and #startdate <= A.date2
-- count only those dates within our range
and a.date1 + v.number between #startdate and #enddate
Another way to do this is to just build the calendar from #start
to #end up front and look to see if there is a span with this date
declare #startdate datetime, #enddate datetime
select #startdate = '20110107', #enddate = '20110114'
-- startdate+v.number is a day on the calendar
select #startdate + v.number
from master..spt_values v
where v.type='P' and v.number between 0
and datediff(d, #startdate, #enddate)
-- run the part above this line alone to see the calendar
-- the condition checks for dates that are not in any span (gap)
and not exists (
select *
from spans
where #startdate + v.number between date1 and date2)
The query returns ALL dates that are gaps in the date range #start - #end
A TOP 1 can be added to just see if there are gaps
To return all records that are before the gap, use the query as a
derived table in a larger query
declare #startdate datetime, #enddate datetime
select #startdate = '20110107', #enddate = '20110114'
select *
from spans
where date1 <= #enddate and #startdate <= date2 -- overlaps
and date2 < ( -- before the gap
select top 1 #startdate + v.number
from master..spt_values v
where v.type='P' and v.number between 0
and datediff(d, #startdate, #enddate)
and not exists (
select *
from spans
where #startdate + v.number between date1 and date2)
order by 1 ASC
)
Assuming MySQL, something like this would work:
select #olddate := null;
select start_date, end_date, datediff(end_date, #olddate) as diff, #olddate:=enddate
from table
order by start_date asc, end_date asc
having diff > 1;
Basically: cache the previous row's end_date in the #olddate variable, and then do a diff on that "old" value with the currel enddate. THe having clause will return only the records where the difference between two rows is greater than a day.
disclaimer: Haven't tested this, but the basic query construct should work.
I want to be able to tell that between
1/10/11 and 1/13/11 there is a gap so
the start and end date is not
possible.
I think you're asking this question: does the data in your table have a gap between the start date and the end date?
I created a one-column table, date_span, and inserted your date spans into it.
You can identify a gap by counting the number of days between start date and end date, and comparing that the the number of rows in date_span for the same range.
select
date '2011-01-14' - date '2011-01-07' + 1 as elapsed_days,
count(*) from date_span
where cal_date between '2011-01-07' and '2011-01-14';
returns
elapsed_days count
-- --
8 6
Since they're not equal, there's a gap in the table "date_span" between 2011-01-07 and 2011-01-14. I'll stop there for now, because I'm really not certain what you're trying to do.