Convert Dates range column into months - sql

I have table call students and it has three columns ID, DateFrom, DateTo now I wanted to convert two Dates columns range Datefrom and Dateto into the one column months as output.
The table structure is as below. Date format (yyyy-mm-dd)
ID DateFrom DateTo
123 2019-12-03 2020-02-03
456 2020-02-03 2020-02-21
Output Structures
ID Months
123 2019-12
123 2020-01
123 2020-02
456 2020-02

The easy way to do it would be to use an auxiliary calendar table - that would make a simple join between both tables:
select distinct a.id, convert(char(7), b.[Date], 120) as month
from yourTable as a
join Calendar as b
on a.DateFrom <= b.[Date]
and a.DateTo >= b.[Date]
;
If you don't have a calendar table, you might want to consider adding one.
If you can't (or don't want to) add a calendar table, you can get the same result using a numbers (tally) table.
If you don't have a tally table you can generate one on the fly, using a common table expression.
Here's how your SQL would look like in that case:
WITH N10 AS
(
SELECT n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9))V(n)
), Tally AS
(
SELECT TOP (SELECT MAX(DATEDIFF(MONTH, DateFrom, DateTo)) + 1 FROM yourTable) ROW_NUMBER() OVER(ORDER BY ##SPID) -1 As n
FROM N10 As Ten
CROSS JOIN N10 As Hundred
--CROSS JOIN N10 As Thousand
)
SELECT [ID], CONVERT(char(7), Months, 120) As Months
FROM yourTable
CROSS APPLY
(
SELECT N, DATEADD(MONTH, N, [DateFrom]) As Months
FROM Tally
) X
WHERE Months <= EOMONTH([DateTo])
ORDER BY Id, Months

let's say that you table is called "dataval"
try this
;WITH minmax AS (
SELECT Min(Eomonth(datefrom)) AS MinDate,
Max(Eomonth(dateto)) AS MaxDate
FROM dataval),
months (date) AS (
SELECT mindate
FROM minmax
UNION ALL
SELECT Dateadd(month, 1, date)
FROM months
WHERE Dateadd(month, 1, date) <= (SELECT maxdate FROM minmax))
SELECT ID, FORMAT (date, 'yyyy-MM') Months
FROM dataval
INNER JOIN months
ON months.date BETWEEN Eomonth(dataval.datefrom) AND Eomonth(dataval.dateto)
fiddle here

Related

How to insert values based on another column value

Below is a subset of my table (for the first id)
date
id
value
01/01/2022
1
5
08/01/2022
1
2
For each id, the dates are not consecutive (e.g., for id 1, the min date is 01/01/2022 and the max date is 08/01/2022)--there are 7 days in between both dates. I want to insert rows to make the dates for each id consecutive and contiguous - the value for the value field/column to be filled with 0s so that the updated table looks like:
date
id
value
01/01/2022
1
5
02/01/2022
1
0
03/01/2022
1
0
04/01/2022
1
0
05/01/2022
1
0
06/01/2022
1
0
07/01/2022
1
0
08/01/2022
1
2
Any SQL code on how to implement this would be highly appreciated. I have a calendar table but am unsure how to join it with the above table so that I fill in missing dates dynamically for each id with 0s.
My calendar table looks like:
date
01/01/2022
02/01/2022
03/01/2022
04/01/2022
Considering you state you have a calendar table, it seems what you need to do with JOIN to it with the MIN and MAX dates from your other table, and the LEFT JOIN back to your table:
WITH MinMax AS(
SELECT ID,
MIN(date) AS MinDate,
MAX(date) AS MaxDate
FROM dbo.YourTable
GROUP BY ID),
Dates AS(
SELECT MM.ID,
C.CalendarDate AS [Date]
FROM MinMax MM
JOIN dbo.CalendarTable C ON MM.MinDate <= C.CalendarDate
AND MM.MaxDate >= C.CalendarDate)
SELECT D.ID,
D.[Date],
ISNULL(YT.[Value],0) AS [Value]
FROM Dates D
LEFT JOIN dbo.YourTable YT ON D.ID = YT.ID
AND D.[Date] = YT.[Date];
SET DATEFORMAT DMY
-- CREATE A TABLE WITH OUR INPUT DATA
DROP TABLE IF EXISTS #TheData
GO
CREATE TABLE #TheData
(TheDate DATE, id INT, TheValue INT)
INSERT INTO #TheData
(TheDate,id,Thevalue)
VALUES
('01/01/2022',1,5),
('08/01/2022',1,2),
('17/01/2022',2,7),
('25/01/2022',2,7),
('15/02/2022',2,7)
-- CREATE A CALENDAR CTE
DECLARE #StartDate date = '20210101';
DECLARE #CutoffDate date = DATEADD(DAY, -1, DATEADD(YEAR, 2, #StartDate));
;WITH DateSeq(TheDate) AS
(
SELECT #StartDate
UNION ALL
SELECT DATEADD(dd,1,TheDate) FROM DateSeq
WHERE TheDate < #CutoffDate
)
-- CROSS JOIN OUR CALENDAR CTE TO OUR SOURCE DATA. DERIVED TABLE TO GET FIRST AND LAST OF EACH RANGE TO USE FOR JOIN
SELECT
ds.*
,SourceDataRangesByID.ID
,ISNULL(td.TheValue,0) AS TheValue
FROM
DateSeq ds
CROSS JOIN
(
SELECT
d.ID
,MIN(d.TheDate) AS MinDatePerID
,MAX(d.TheDate) AS MaxDatePerID
FROM #TheData d
GROUP BY d.ID
) SourceDataRangesByID
LEFT JOIN #TheData td ON td.id = SourceDataRangesByID.ID AND td.TheDate = ds.TheDate
WHERE ds.TheDate >= SourceDataRangesByID.MinDatePerID
AND ds.TheDate <= SourceDataRangesByID.MaxDatePerID
OPTION (MAXRECURSION 0);
try the generate_series to create a date table then right join with it and coalesce for the non null value
SELECT generate_series('2016-01-01', -- series start date
'2018-06-30', -- series end date
'1 day'::interval)::date AS day) AS daily_series
from mytable
See Generate_Series for TSQL
https://dba.stackexchange.com/questions/255165/does-ms-sql-server-have-generate-series-function
(Sql server 2022)
https://learn.microsoft.com/en-us/sql/t-sql/functions/generate-series-transact-sql?view=sql-server-ver16

Finding Active Clients By Date

I'm having trouble writing a recursive function that would count the number of active clients on any given day.
Say I have a table like this:
Client
Start Date
End Date
1
1-Jan-22
2
1-Jan-22
3-Jan-22
3
3-Jan-22
4
4-Jan-22
5-Jan-22
5
4-Jan-22
6-Jan-22
6
7-Jan-22
9-Jan-22
I want to return a table that would look like this:
Date
NumActive
1-Jan-22
2
2-Jan-22
2
3-Jan-22
3
4-Jan-22
4
5-Jan-22
4
6-Jan-22
3
7-Jan-22
3
8-Jan-22
3
9-Jan-22
4
Is there a way to do this? Ideally, I'd have a fixed start date and go to today's date.
Some pieces I have tried:
Creating a recursive date table
Truncated to Feb 1, 2022 for simplicity:
WITH DateDiffs AS (
SELECT DATEDIFF(DAY, '2022-02-02', GETDATE()) AS NumDays
)
, Numbers(Numbers) AS (
SELECT MAX(NumDays) FROM DateDiffs
UNION ALL
SELECT Numbers-1 FROM Numbers WHERE Numbers > 0
)
, Dates AS (
SELECT
Numbers
, DATEADD(DAY, -Numbers, CAST(GETDATE() -1 AS DATE)) AS [Date]
FROM Numbers
)
I would like to be able to loop over the dates in that table, such as by modifying the query below for each date, such as by #loopdate. Then UNION ALL it to a larger final query.
I'm now stuck as to how I can run the query to count the number of active users:
SELECT
COUNT(Client)
FROM clients
WHERE [Start Date] >= #loopdate AND ([End Date] <= #loopdate OR [End Date] IS NULL)
Thank you!
You don't need anything recursive in this particular case, you need as a minimum a list of dates in the range you want to report on, ideally a permanent calendar table.
for purposes of demonstration you can create something on the fly, and use it like so, with the list of dates something you outer join to:
with dates as (
select top(9)
Convert(date,DateAdd(day, -1 + Row_Number() over(order by (select null)), '20220101')) dt
from master.dbo.spt_values
)
select d.dt [Date], c.NumActive
from dates d
outer apply (
select Count(*) NumActive
from t
where d.dt >= t.StartDate and (d.dt <= t.EndDate or t.EndDate is null)
)c
See this Demo Fiddle

How do I get the month number with the maximum number of days from the date range?

I have a table with 10 million rows, where there are two columns that contain the start date and the end date of the range. For example, 2019-09-25 and 2019-10-20. I want to extract the month number with the maximum number of days, in this example it will be 10. In addition to dates that are separated by one month, there are also such examples: 2019-07-01 and 2019-07-29 (within one month), as well as 2019-07-01 and 2019-09-05 (more than one month). How can I implement this?
Seems like you could do something like this:
SELECT CASE WHEN DATEDIFF(DAY, DATEFROMPARTS(YEAR(EndDate),MONTH(EndDate),1),EndDate) >= DATEDIFF(DAY, StartDate, EOMONTH(StartDate)) THEN DATEPART(MONTH,EndDate)
ELSE DATEPART(MONTH,StartDate)
END
FROM (VALUES('20190925','20191020'))V(StartDate,EndDate);
Does the following fit your requirements?
You can build a table of days-in-month (this would be permanent ideally)
and then join to it using the month numbers of your min and max dates.
declare #start date='20190925', #end date='20191020';
--declare #start date='20190701', #end date='20190729';
--declare #start date='20190701', #end date='20190905';
with dim as (
select m,DAY(DATEADD(DD,-1,DATEADD(mm, DATEDIFF(mm, 0, DateFromParts(Year(GetDate()),m,1) )+1, 0)))d
from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12))m(m)
)
select top(1) with ties m
from dim
where m between Month(#start) and Month(#end)
order by d desc
You don't state how you determin the most days where there are several months with the same number of months, so with ties includes all qualifying months.
Edit
So I don't know if there is a requirement to span years - the sample data suggests not - however with a permanent list of dates and corresponding days in month values (this is often part of a calendar table) a slight tweak will accomodate it.
with dim as (
select Year(#start)*100 + m m, Day(DATEADD(DD,-1,DATEADD(mm, DATEDIFF(mm, 0, DateFromParts(Year(#start),m,1) )+1, 0)))d
from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12))m(m)
union all
select Year(#end)*100 + m m, Day(DATEADD(DD,-1,DATEADD(mm, DATEDIFF(mm, 0, DateFromParts(Year(#end),m,1) )+1, 0)))d
from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12))m(m)
)
select top(1) with ties m
from dim
where m between Year(#start)*100 + Month(#start) and Year(#end)*100 + Month(#end)
order by d desc
You could try something like this
with
l0(n) as (
select 1 n
from (values (1),(1),(1),(1),(1),(1),(1),(1)) as v(n))
select top(1) with ties
vTable.*, calc.dt month_with_most_days
from (values ('20190925','20191020'),
('20190925','20191120')) vTable(startdate, enddate)
cross apply (values (datediff(month, vTable.startdate, vTable.enddate))) diff(mo_count)
cross apply (select top (diff.mo_count+1)
row_number() over (order by (select null)) n
from l0 l1, l0 l2, l0 l3, l0 l4) tally /* 8^4 months possible */
cross apply (values (cast(case when tally.n=1 then startdate
when tally.n=diff.mo_count+1 then enddate
else eomonth(dateadd(month, tally.n-1, startdate)) end as date))) calc(dt)
order by row_number() over (partition by startdate, enddate
order by day(calc.dt) desc);
startdate enddate month_with_most_days
20190925 20191020 2019-09-25
20190925 20191120 2019-10-31

Find missing date as compare to calendar

I am explain problem in short.
select distinct DATE from #Table where DATE >='2016-01-01'
Output :
Date
2016-11-23
2016-11-22
2016-11-21
2016-11-19
2016-11-18
Now i need to find out missing date a compare to our calender dates from year '2016'
i.e. Here date '2016-11-20' is missing.
I want list of missing dates.
Thanks for reading this. Have nice day.
You need to generate dates and you have to find missing ones. Below with recursive cte i have done it
;WITH CTE AS
(
SELECT CONVERT(DATE,'2016-01-01') AS DATE1
UNION ALL
SELECT DATEADD(DD,1,DATE1) FROM CTE WHERE DATE1<'2016-12-31'
)
SELECT DATE1 MISSING_ONE FROM CTE
EXCEPT
SELECT * FROM #TABLE1
option(maxrecursion 0)
Using CTE and get all dates in CTE table then compare with your table.
CREATE TABLE #yourTable(_Values DATE)
INSERT INTO #yourTable(_Values)
SELECT '2016-11-23' UNION ALL
SELECT '2016-11-22' UNION ALL
SELECT '2016-11-21' UNION ALL
SELECT '2016-11-19' UNION ALL
SELECT '2016-11-18'
DECLARE #DATE DATE = '2016-11-01'
;WITH CTEYear (_Date) AS
(
SELECT #DATE
UNION ALL
SELECT DATEADD(DAY,1,_Date)
FROM CTEYear
WHERE _Date < EOMONTH(#DATE,0)
)
SELECT * FROM CTEYear
WHERE NOT EXISTS(SELECT 1 FROM #yourTable WHERE _Date = _Values)
OPTION(maxrecursion 0)
You need to generate the dates and then find the missing ones. A recursive CTE is one way to generate a handful of dates. Another way is to use master..spt_values as a list of numbers:
with n as (
select row_number() over (order by (select null)) - 1 as n
from master..spt_values
),
d as (
select dateadd(day, n.n, cast('2016-01-01' as date)) as dte
from n
where n <= 365
)
select d.date
from d left join
#table t
on d.dte = t.date
where t.date is null;
If you are happy enough with ranges of missing dates, you don't need a list of dates at all:
select date, (datediff(day, date, next_date) - 1) as num_missing
from (select t.*, lead(t.date) over (order by t.date) as next_date
from #table t
where t.date >= '2016-01-01'
) t
where next_date <> dateadd(day, 1, date);

Merge overlapping date intervals

Is there a better way of merging overlapping date intervals?
The solution I came up with is so simple that now I wonder if someone else has a better idea of how this could be done.
/***** DATA EXAMPLE *****/
DECLARE #T TABLE (d1 DATETIME, d2 DATETIME)
INSERT INTO #T (d1, d2)
SELECT '2010-01-01','2010-03-31' UNION SELECT '2010-04-01','2010-05-31'
UNION SELECT '2010-06-15','2010-06-25' UNION SELECT '2010-06-26','2010-07-10'
UNION SELECT '2010-08-01','2010-08-05' UNION SELECT '2010-08-01','2010-08-09'
UNION SELECT '2010-08-02','2010-08-07' UNION SELECT '2010-08-08','2010-08-08'
UNION SELECT '2010-08-09','2010-08-12' UNION SELECT '2010-07-04','2010-08-16'
UNION SELECT '2010-11-01','2010-12-31' UNION SELECT '2010-03-01','2010-06-13'
/***** INTERVAL ANALYSIS *****/
WHILE (1=1) BEGIN
UPDATE t1 SET t1.d2 = t2.d2
FROM #T AS t1 INNER JOIN #T AS t2 ON
DATEADD(day, 1, t1.d2) BETWEEN t2.d1 AND t2.d2
IF ##ROWCOUNT = 0 BREAK
END
/***** RESULT *****/
SELECT StartDate = MIN(d1) , EndDate = d2
FROM #T
GROUP BY d2
ORDER BY StartDate, EndDate
/***** OUTPUT *****/
/*****
StartDate EndDate
2010-01-01 2010-06-13
2010-06-15 2010-08-16
2010-11-01 2010-12-31
*****/
I was looking for the same solution and came across this post on Combine overlapping datetime to return single overlapping range record.
There is another thread on Packing Date Intervals.
I tested this with various date ranges, including the ones listed here, and it works correctly every time.
SELECT
s1.StartDate,
--t1.EndDate
MIN(t1.EndDate) AS EndDate
FROM #T s1
INNER JOIN #T t1 ON s1.StartDate <= t1.EndDate
AND NOT EXISTS(SELECT * FROM #T t2
WHERE t1.EndDate >= t2.StartDate AND t1.EndDate < t2.EndDate)
WHERE NOT EXISTS(SELECT * FROM #T s2
WHERE s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate)
GROUP BY s1.StartDate
ORDER BY s1.StartDate
The result is:
StartDate | EndDate
2010-01-01 | 2010-06-13
2010-06-15 | 2010-06-25
2010-06-26 | 2010-08-16
2010-11-01 | 2010-12-31
You asked this back in 2010 but don't specify any particular version.
An answer for people on SQL Server 2012+
WITH T1
AS (SELECT *,
MAX(d2) OVER (ORDER BY d1) AS max_d2_so_far
FROM #T),
T2
AS (SELECT *,
CASE
WHEN d1 <= DATEADD(DAY, 1, LAG(max_d2_so_far) OVER (ORDER BY d1))
THEN 0
ELSE 1
END AS range_start
FROM T1),
T3
AS (SELECT *,
SUM(range_start) OVER (ORDER BY d1) AS range_group
FROM T2)
SELECT range_group,
MIN(d1) AS d1,
MAX(d2) AS d2
FROM T3
GROUP BY range_group
Which returns
+-------------+------------+------------+
| range_group | d1 | d2 |
+-------------+------------+------------+
| 1 | 2010-01-01 | 2010-06-13 |
| 2 | 2010-06-15 | 2010-08-16 |
| 3 | 2010-11-01 | 2010-12-31 |
+-------------+------------+------------+
DATEADD(DAY, 1 is used because your desired results show you want a period ending on 2010-06-25 to be collapsed into one starting 2010-06-26. For other use cases this may need adjusting.
Here is a solution with just three simple scans. No CTEs, no recursion, no joins, no table updates in a loop, no "group by" — as a result, this solution should scale the best (I think).
I think number of scans can be reduced to two, if min and max dates are known in advance;
the logic itself just needs two scans — find gaps, applied twice.
declare #datefrom datetime, #datethru datetime
DECLARE #T TABLE (d1 DATETIME, d2 DATETIME)
INSERT INTO #T (d1, d2)
SELECT '2010-01-01','2010-03-31'
UNION SELECT '2010-03-01','2010-06-13'
UNION SELECT '2010-04-01','2010-05-31'
UNION SELECT '2010-06-15','2010-06-25'
UNION SELECT '2010-06-26','2010-07-10'
UNION SELECT '2010-08-01','2010-08-05'
UNION SELECT '2010-08-01','2010-08-09'
UNION SELECT '2010-08-02','2010-08-07'
UNION SELECT '2010-08-08','2010-08-08'
UNION SELECT '2010-08-09','2010-08-12'
UNION SELECT '2010-07-04','2010-08-16'
UNION SELECT '2010-11-01','2010-12-31'
select #datefrom = min(d1) - 1, #datethru = max(d2) + 1 from #t
SELECT
StartDate, EndDate
FROM
(
SELECT
MAX(EndDate) OVER (ORDER BY StartDate) + 1 StartDate,
LEAD(StartDate ) OVER (ORDER BY StartDate) - 1 EndDate
FROM
(
SELECT
StartDate, EndDate
FROM
(
SELECT
MAX(EndDate) OVER (ORDER BY StartDate) + 1 StartDate,
LEAD(StartDate) OVER (ORDER BY StartDate) - 1 EndDate
FROM
(
SELECT d1 StartDate, d2 EndDate from #T
UNION ALL
SELECT #datefrom StartDate, #datefrom EndDate
UNION ALL
SELECT #datethru StartDate, #datethru EndDate
) T
) T
WHERE StartDate <= EndDate
UNION ALL
SELECT #datefrom StartDate, #datefrom EndDate
UNION ALL
SELECT #datethru StartDate, #datethru EndDate
) T
) T
WHERE StartDate <= EndDate
The result is:
StartDate EndDate
2010-01-01 2010-06-13
2010-06-15 2010-08-16
2010-11-01 2010-12-31
The idea is to simulate the scanning algorithm for merging intervals. My solution makes sure it works across a wide range of SQL implementations. I've tested it on MySQL, Postgres, SQL-Server 2017, SQLite and even Hive.
Assuming the table schema is the following.
CREATE TABLE t (
a DATETIME,
b DATETIME
);
We also assume the interval is half-open like [a,b).
When (a,i,j) is in the table, it shows that there are j intervals covering a, and there are i intervals covering the previous point.
CREATE VIEW r AS
SELECT a,
Sum(d) OVER (ORDER BY a ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS i,
Sum(d) OVER (ORDER BY a ROWS UNBOUNDED PRECEDING) AS j
FROM (SELECT a, Sum(d) AS d
FROM (SELECT a, 1 AS d FROM t
UNION ALL
SELECT b, -1 AS d FROM t) e
GROUP BY a) f;
We produce all the endpoints in the union of the intervals and pair up adjacent ones. Finally, we produce the set of intervals by only picking the odd-numbered rows.
SELECT a, b
FROM (SELECT a,
Lead(a) OVER (ORDER BY a) AS b,
Row_number() OVER (ORDER BY a) AS n
FROM r
WHERE j=0 OR i=0 OR i is null) e
WHERE n%2 = 1;
I've created a sample DB-fiddle and SQL-fiddle. I also wrote a blog post on union intervals in SQL.
A Geometric Approach
Here and elsewhere I've noticed that date packing questions don't provide a geometric approach to this problem. After all, any range, date-ranges included, can be interpreted as a line. So why not convert them to a sql geometry type and utilize geometry::UnionAggregate to merge the ranges.
Why?
This has the advantage of handling all types of overlaps, including fully nested ranges. It also works like any other aggregate query, so it's a little more intuitive in that respect. You also get the bonus of a visual representation of your results if you care to use it. Finally, it is the approach I use for simultaneous range packing (you work with rectangles instead of lines in that case, and there are many more considerations). I just couldn't get the existing approaches to work in that scenario.
This has the disadvantage of requiring more recent versions of SQL Server. It also requires a numbers table and it's annoying to extract the individually produced lines from the aggregated shape. But hopefully in the future Microsoft adds a TVF that allows you to do this easily without a numbers table (or you can just build one yourself). Also, geometrical objects work with floats, so you have conversion annoyances and precision concerns to keep in mind.
Performance-wise I don't know how it compares, but I've done a few things (not shown here) to make it work for me even with large datasets.
Code Description
In 'numbers':
I build a table representing a sequence
Swap it out with your favorite way to make a numbers table.
For a union operation, you won't ever need more rows than in
your original table, so I just use it as the base to build it.
In 'mergeLines':
I convert the dates to floats and use those floats
to create geometrical points.
In this problem, we're working in
'integer space,' meaning there are no time considerations, and so
an begin date in one range that is one day apart from an end date
in another should be merged with that other. In order to make
that merge happen, we need to convert to 'real space.', so we
add 1 to the tail of all ranges (we undo this later).
I then connect these points via STUnion and STEnvelope.
Finally, I merge all these lines via UnionAggregate. The resulting
'lines' geometry object might contain multiple lines, but if they
overlap, they turn into one line.
In the outer query:
I use the numbers CTE to extract the individual lines inside 'lines'.
I envelope the lines which here ensures that the lines are stored
only as its two endpoints.
I read the endpoint x values and convert them back to their time
representations, ensuring to put them back into 'integer space'.
The Code
with
numbers as (
select row_number() over (order by (select null)) i
from #t
),
mergeLines as (
select lines = geometry::UnionAggregate(line)
from #t
cross apply (select line =
geometry::Point(convert(float, d1), 0, 0).STUnion(
geometry::Point(convert(float, d2) + 1, 0, 0)
).STEnvelope()
) l
)
select ap.StartDate,
ap.EndDate
from mergeLines ml
join numbers n on n.i between 1 and ml.lines.STNumGeometries()
cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
cross apply (select
StartDate = convert(datetime,l.line.STPointN(1).STX),
EndDate = convert(datetime,l.line.STPointN(3).STX) - 1
) ap
order by ap.StartDate;
In this solution, I created a temporary Calendar table which stores a value for every day across a range. This type of table can be made static. In addition, I'm only storing 400 some odd dates starting with 2009-12-31. Obviously, if your dates span a larger range, you would need more values.
In addition, this solution will only work with SQL Server 2005+ in that I'm using a CTE.
With Calendar As
(
Select DateAdd(d, ROW_NUMBER() OVER ( ORDER BY s1.object_id ), '1900-01-01') As [Date]
From sys.columns as s1
Cross Join sys.columns as s2
)
, StopDates As
(
Select C.[Date]
From Calendar As C
Left Join #T As T
On C.[Date] Between T.d1 And T.d2
Where C.[Date] >= ( Select Min(T2.d1) From #T As T2 )
And C.[Date] <= ( Select Max(T2.d2) From #T As T2 )
And T.d1 Is Null
)
, StopDatesInUse As
(
Select D1.[Date]
From StopDates As D1
Left Join StopDates As D2
On D1.[Date] = DateAdd(d,1,D2.Date)
Where D2.[Date] Is Null
)
, DataWithEariestStopDate As
(
Select *
, (Select Min(SD2.[Date])
From StopDatesInUse As SD2
Where T.d2 < SD2.[Date] ) As StopDate
From #T As T
)
Select Min(d1), Max(d2)
From DataWithEariestStopDate
Group By StopDate
Order By Min(d1)
EDIT The problem with using dates in 2009 has nothing to do with the final query. The problem is that the Calendar table is not big enough. I started the Calendar table at 2009-12-31. I have revised it start at 1900-01-01.
Try this
;WITH T1 AS
(
SELECT d1, d2, ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS R
FROM #T
), NUMS AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS R
FROM T1 A
CROSS JOIN T1 B
CROSS JOIN T1 C
), ONERANGE AS
(
SELECT DISTINCT DATEADD(DAY, ROW_NUMBER() OVER(PARTITION BY T1.R ORDER BY (SELECT 0)) - 1, T1.D1) AS ELEMENT
FROM T1
CROSS JOIN NUMS
WHERE NUMS.R <= DATEDIFF(DAY, d1, d2) + 1
), SEQUENCE AS
(
SELECT ELEMENT, DATEDIFF(DAY, '19000101', ELEMENT) - ROW_NUMBER() OVER(ORDER BY ELEMENT) AS rownum
FROM ONERANGE
)
SELECT MIN(ELEMENT) AS StartDate, MAX(ELEMENT) as EndDate
FROM SEQUENCE
GROUP BY rownum
The basic idea is to first unroll the existing data, so you get a separate row for each day. This is done in ONERANGE
Then, identify the relationship between how dates increment and the way the row numbers do.
The difference remains constant within an existing range/island. As soon as you get to a new data island, the difference between them increases because the date increments by more than 1, while the row number increments by 1.
This Solution is similar to the 1st solution with additional Deletion Condition.
This will sort the data in the main table itself instead of using different table to store the result.
DROP TABLE IF EXISTS #SampleTable;
CREATE TABLE #SampleTable (StartTime DATETIME NULL, EndTime DATETIME NULL);
INSERT INTO #SampleTable(StartTime, EndTime)
VALUES
(N'2010-01-01T00:00:00', N'2010-03-31T00:00:00'),
(N'2010-03-01T00:00:00', N'2010-06-13T00:00:00'),
(N'2010-04-01T00:00:00', N'2010-05-31T00:00:00'),
(N'2010-06-15T00:00:00', N'2010-06-25T00:00:00'),
(N'2010-06-26T00:00:00', N'2010-07-10T00:00:00'),
(N'2010-07-04T00:00:00', N'2010-08-16T00:00:00'),
(N'2010-08-01T00:00:00', N'2010-08-05T00:00:00'),
(N'2010-08-01T00:00:00', N'2010-08-09T00:00:00'),
(N'2010-08-02T00:00:00', N'2010-08-07T00:00:00'),
(N'2010-08-08T00:00:00', N'2010-08-08T00:00:00'),
(N'2010-08-09T00:00:00', N'2010-08-12T00:00:00'),
(N'2010-11-01T00:00:00', N'2010-12-31T00:00:00');
--
DECLARE #RowCount INT=0;
WHILE(1=1) --
BEGIN
SET #RowCount=0;
--
UPDATE T1
SET T1.EndTime=T2.EndTime
FROM dbo.#SampleTable AS T1
INNER JOIN dbo.#SampleTable AS T2 ON DATEADD(DAY, 1, T1.EndTime) BETWEEN T2.StartTime AND T2.EndTime;
--
SET #RowCount=#RowCount+##ROWCOUNT;
--
DELETE T1
FROM dbo.#SampleTable AS T1
INNER JOIN dbo.#SampleTable AS T2 ON T1.EndTime=T2.EndTime AND T1.StartTime>T2.StartTime;
--
SET #RowCount=#RowCount+##ROWCOUNT;
--
IF #RowCount=0 --
BREAK;
END;
SELECT * FROM #SampleTable
I was inspired by the Geometric Approach given by pwilcox, but wanted to try a different approach. This is using Trino, but I hope the functions used can also be found in other versions of SQL.
WITH Geo AS (
SELECT
transform( -- 6) See Below~
ST_Geometries( -- 5) Extracts an array of individual lines from the union.
geometry_union( -- 4) Returns the union of aggregated lines, melding all lines together into a single geometric multi-line.
array_agg( -- 3) Aggregation function that joins all lines together.
ST_LineString( -- 2) Makes the pairs of geometric points into lines.
ARRAY[ST_Point(0, to_unixtime(d1)), ST_Point(0, to_unixtime(d2))] -- 1) Takes unix time start and end values and makes them into an array of geometric points.
)
)
)
)
, x -> (ST_YMin(x), ST_Length(x))) AS timestamp_duration -- 6) From the array of lines, The minimum value and length of each line is extracted.
FROM #T -- The miniumum value is a timestamp, length is duration.
WHERE d1 <> d2 -- I had errors any time this was the case.
)
-- 7) Finally, I unnest the produced array and convert the values back into timestamps.
SELECT from_unixtime(timestamp) AS StartDate
, from_unixtime(timestamp + duration) AS EndDate
FROM Geo
CROSS JOIN UNNEST(timestamp_duration) AS t(timestamp, duration)
For reference, this took my company cluster about 2 minutes to make 400k start/end timestamps into 700 distinct start/end timestamps.
It also runs in just 2 stages.