Loop within id and combine dates between rows in SQL [duplicate] - sql

I have a table in the following format
Id StartDate EndDate Type
1 2012-02-18 2012-03-18 1
1 2012-03-17 2012-06-29 1
1 2012-06-27 2012-09-27 1
1 2014-08-23 2014-09-24 3
1 2014-09-23 2014-10-24 3
1 2014-10-23 2014-11-24 3
2 2015-07-04 2015-08-06 1
2 2015-08-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
I found similar questions here, but not something that could help me solve my problem. I want to merge rows that has the same Id, Type and overlapping date periods.
The result from the above table should be
Id StartDate EndDate Type
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
In another server, I was able to do it with the following restrictions and the query below:
Didn't care about the Type column, but just the Id
Had a newer version of SQL Server (2012), but now I have 2008 which the code is not compatible
SELECT Id
, MIN(StartDate) AS StartDate
, MAX(EndDate) AS EndDate
FROM (
SELECT *
, SUM(CASE WHEN a.EndDate = a.StartDate THEN 0
ELSE 1
END
) OVER (ORDER BY Id, StartDate) sm
FROM (
SELECT Id
, StartDate
, EndDate
, LAG(EndDate, 1, NULL) OVER (PARTITION BY Id ORDER BY Id, EndDate) EndDate
FROM #temptable
) a
) b
GROUP BY Id, sm
Any advice how I can
Include Type on the process
Make it work on SQL Server 2008

This approach uses an additional temp table to identify the groups of overlapping dates, and then performs a quick aggregate based on the groupings.
SELECT *, ROW_NUMBER() OVER (ORDER BY Id, Type) AS UID,
ROW_NUMBER() OVER (ORDER BY Id, Type) AS GroupId INTO #G FROM #TempTable
WHILE ##ROWCOUNT <> 0 BEGIN
UPDATE T1 SET
GroupId = T2.GroupId
FROM #G T1
INNER JOIN (
SELECT T1.UID, CASE WHEN T1.GroupId < T2.GroupId THEN T1.GroupId ELSE T2.GroupId END
FROM #G T1
LEFT OUTER JOIN #G T2
ON T1.Id = T2.Id AND T1.Type = T2.Type AND T1.GroupId <> T2.GroupId
AND T1.StartDate <= T2.EndDate AND T2.StartDate <= T1.EndDate
) T2 (UID, GroupId)
ON T1.UID = T2.UID
WHERE T1.GroupId <> T2.GroupId
END
SELECT Id, MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate, Type
FROM #G G GROUP BY GroupId, Id, Type
This returns the expected values
Id StartDate EndDate Type
----------- ---------- ---------- -----------
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0

This is 2008 compatible. A CTE really is the best way to link up all overlapping records in my opinion. The date overlap logic came from this thread: SO Date Overlap
I added extra data that's more complex to make sure that it's working as expected.
DECLARE #Data table (Id INT, StartDate DATE, EndDate DATE, Type INT)
INSERT INTO #data
SELECT 1,'2/18/2012' ,'3/18/2012', 1 UNION ALL
select 1,'3/17/2012','6/29/2012',1 UNION ALL
select 1,'6/27/2012','9/27/2012',1 UNION ALL
select 1,'8/23/2014','9/24/2014',3 UNION ALL
select 1,'9/23/2014','10/24/2014',3 UNION ALL
select 1,'10/23/2014','11/24/2014',3 UNION ALL
select 2,'7/4/2015','8/6/2015',1 UNION ALL
select 2,'8/4/2015','9/6/2015',1 UNION ALL
select 3,'11/1/2013','12/1/2013',0 UNION ALL
select 3,'1/9/2018','2/9/2018',0 UNION ALL
select 4,'1/1/2018','1/2/2018',0 UNION ALL --many non overlapping dates
select 4,'1/4/2018','1/5/2018',0 UNION ALL
select 4,'1/7/2018','1/9/2018',0 UNION ALL
select 4,'1/11/2018','1/13/2018',0 UNION ALL
select 4,'2/7/2018','2/8/2018',0 UNION ALL --many overlapping dates
select 4,'2/8/2018','2/9/2018',0 UNION ALL
select 4,'2/9/2018','2/10/2018',0 UNION all
select 4,'2/10/2018','2/11/2018',0 UNION all
select 4,'2/11/2018','2/12/2018',0 UNION all
select 4,'2/12/2018','2/13/2018',0 UNION all
select 4,'3/7/2018','3/8/2018',0 UNION ALL --many overlapping dates, second instance of id 4, type 0
select 4,'3/8/2018','3/9/2018',0 UNION ALL
select 4,'3/9/2018','3/10/2018',0 UNION all
select 4,'3/10/2018','3/11/2018',0 UNION all
select 4,'3/11/2018','3/12/2018',0 UNION all
select 4,'3/12/2018','3/13/2018',0
;
WITH cdata
AS (SELECT Id,
d.Type,
d.StartDate,
d.EndDate,
CurrentStart = d.StartDate
FROM #Data d
WHERE
NOT EXISTS (
SELECT * FROM #Data x WHERE x.StartDate < d.StartDate AND d.StartDate <= x.EndDate AND d.EndDate >= x.StartDate AND d.Id = x.Id AND d.Type = x.Type --get first records for overlapping ranges
)
UNION ALL
SELECT d.Id,
d.Type,
StartDate = CASE WHEN d2.StartDate < d.StartDate THEN d2.StartDate ELSE d.StartDate END,
EndDate = CASE WHEN d2.EndDate > d.EndDate THEN d2.EndDate ELSE d.EndDate END,
CurrentStart = d2.StartDate
FROM cdata d
INNER JOIN #Data d2
ON (
d.StartDate <= d2.EndDate
AND d.EndDate >= d2.StartDate
)
AND d2.Id = d.Id
AND d2.Type = d.Type
AND d2.StartDate > d.CurrentStart)
SELECT cdata.Id, cdata.Type, cdata.StartDate, EndDate = MAX(cdata.EndDate)
FROM cdata
GROUP BY cdata.Id, cdata.Type, cdata.StartDate

This looks like a Packing Intervals problem. See the post by Itzik Ben-Gan for all the details and what indexes he recommends to make it work efficiently. He presents a solution without recursive CTE.
Two notes.
The query below assumes that intervals are [closed; open), i.e. StartDate is inclusive and EndDate is exclusive. This way to represent such data is often the most convenient. (in the same sense as having arrays as zero-based instead of 1-based is usually more convenient in programming languages).
I added a RowID column to have unambiguous sorting.
Sample data
DECLARE #T TABLE
(
RowID int IDENTITY,
id int,
StartDate date,
EndDate date,
tp int
);
INSERT INTO #T(Id, StartDate, EndDate, tp) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
-- Make EndDate an opened interval, make it exclusive
-- [Start; End)
UPDATE #T
SET EndDate = DATEADD(day, 1, EndDate)
;
Recommended indexes
-- indexes to support solutions
CREATE UNIQUE INDEX idx_start_id ON T(id, tp, StartDate, RowID);
CREATE UNIQUE INDEX idx_end_id ON T(id, tp, EndDate, RowID);
Query
Read the Itzik's post to understand what is going on. He has nice illustrations there. In short, each timestamp (start or end) is treated as an event. Each event has a + or - type. Each time we encounter a + event (some interval starts) we increase the running counter. Each time we encounter a - event (some interval ends) we decrease the running counter. When the running counter is 0 it means that the streak of overlapping intervals is over.
I took Itzik's query as is and simply changed the column names to match your names.
WITH C1 AS
-- let e = end ordinals, let s = start ordinals
(
SELECT
RowID, id, tp, StartDate AS ts, +1 AS EventType,
NULL AS e,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY StartDate, RowID) AS s
FROM #T
UNION ALL
SELECT
RowID, id, tp, EndDate AS ts, -1 AS EventType,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY EndDate, RowID) AS e,
NULL AS s
FROM #T
),
C2 AS
-- let se = start or end ordinal, namely, how many events (start or end) happened so far
(
SELECT C1.*,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts, EventType DESC, RowID) AS se
FROM C1
),
C3 AS
-- For start events, the expression s - (se - s) - 1 represents how many sessions were active
-- just before the current (hence - 1)
--
-- For end events, the expression (se - e) - e represents how many sessions are active
-- right after this one
--
-- The above two expressions are 0 exactly when a group of packed intervals
-- either starts or ends, respectively
--
-- After filtering only events when a group of packed intervals either starts or ends,
-- group each pair of adjacent start/end events
(
SELECT id, tp, ts,
((ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0
)
SELECT id, tp, MIN(ts) AS StartDate, DATEADD(day, -1, MAX(ts)) AS EndDate
FROM C3
GROUP BY id, tp, grpnum
ORDER BY id, tp, StartDate;
Result
+----+----+------------+------------+
| id | tp | StartDate | EndDate |
+----+----+------------+------------+
| 1 | 1 | 2012-02-18 | 2012-09-27 |
| 1 | 3 | 2014-08-23 | 2014-11-24 |
| 2 | 1 | 2015-07-04 | 2015-09-06 |
| 3 | 0 | 2013-11-01 | 2013-12-01 |
| 3 | 0 | 2018-01-09 | 2018-02-09 |
+----+----+------------+------------+

create table #table
(Id int,StartDate date, EndDate date, Type int)
insert into #table
values
('1','2012-02-18','2012-03-18','1'),('1','2012-03-19','2012-06-19','1'),
('1','2012-06-27','2012-09-27','1'),('1','2014-08-23','2014-09-24','3'),
('1','2014-09-23','2014-10-24','3'),('1','2014-10-23','2014-11-24','3'),
('2','2015-07-04','2015-08-06','1'),('2','2015-08-04','2015-09-06','1'),
('3','2013-11-01','2013-12-01','0'),('3','2018-01-09','2018-02-09','0')
select ID,MIN(startdate)sd,MAX(EndDate)ed,type from #table
group by ID,TYPE,YEAR(startdate),YEAR(EndDate)

this can be easily achieved by using some window-functions and CTE's. Here is the solution
DECLARE #table TABLE
(id INT,
StartDate DATE,
EndDate DATE,
[Type] INT
);
INSERT INTO #table(Id, StartDate, EndDate, [Type]) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
WITH C1 AS
(
SELECT *,
MAX(EndDate) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS PrevEnd
FROM #table
),
C2 AS
(
SELECT *,
SUM(StartFlag) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS UNBOUNDED PRECEDING) AS GroupID
FROM C1
CROSS APPLY ( VALUES(CASE WHEN StartDate <= PrevEnd THEN NULL ELSE 1 END) ) AS A(StartFlag)
)
SELECT Id, [Type], MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate
FROM C2
GROUP BY Id, [Type], GroupID;

Related

Overlapping between first record enddate and next record start date in SQL Server

I have the below kind of data and I need below kind of output.
Input:
id startdate enddate
1 21/01/2019 23/01/2019
1 23/01/2019 24/01/2019
1 24/01/2029 27/01/2019
1 29/01/2019 02/02/2019
Output:
id startdate enddate
1 21/01/2019 27/01/2019
1 29/01/2019 02/02/2019
We need to use the logic of matching the first record enddate and nth record startdate.
This is a gaps-and-islands problem, where you want to group together "adjacent" dates. Here is one approach using window functions: the idea is to compare the current start date to the end date of the "previous" row, and use a window sum to define the groups:
select id, min(startdate) startdate, max(enddate) enddate
from (
select t.*,
sum(case when startdate = lag_enddate then 0 else 1 end) over(partition by id order by startdate) grp
from (
select t.*,
lag(enddate) over(partition by id order by startdate) lag_enddate
from mytable t
) t
) t
group by id, grp
Demo on DB Fiddle - with credits to Sander for creating the DDL statements in the first place:
id | startdate | enddate
-: | :--------- | :---------
1 | 2019-01-21 | 2019-01-27
1 | 2019-01-29 | 2019-02-02
have a look at
NEXT VALUE FOR method, works 2016 and later
Use a CTE or subquery (works in 2008) where you join on your own table using the previous value as a join. Here a sample script I use showing backup growth
declare #backupType char(1)
, #DatabaseName sysname
set #DatabaseName = db_name() --> Name of current database, null for all databaseson server
set #backupType ='D' /* valid options are:
D = Database
I = Database Differential
L = Log
F = File or Filegroup
G = File Differential
P = Partial
Q = Partial Differential
*/
select backup_start_date
, backup_finish_date
, DurationSec
, database_name,backup_size
, PreviouseBackupSize
, backup_size-PreviouseBackupSize as growth
,KbSec= format(KbSec,'N2')
FROM (
select backup_start_date
, backup_finish_date
, datediff(second,backup_start_date,b.backup_finish_date) as DurationSec
, b.database_name
, b.backup_size/1024./1024. as backup_size
,case when datediff(second,backup_start_date,b.backup_finish_date) >0
then ( b.backup_size/1024.)/datediff(second,backup_start_date,b.backup_finish_date)
else 0 end as KbSec
-- , b.compressed_backup_size
, (
select top (1) p.backup_size/1024./1024.
from msdb.dbo.backupset p
where p.database_name = b.database_name
and p.database_backup_lsn< b.database_backup_lsn
and type=#backupType
order by p.database_backup_lsn desc
) as PreviouseBackupSize
from msdb.dbo.backupset as b
where #DatabaseName IS NULL OR database_name =#DatabaseName
and type=#backupType
)as A
order by backup_start_date desc
using a "cursor local fast_forward" to loop over the data on a row-by-row and use a temporary table where you store & compaire prev value
Here is a solution with common table expressions that could work.
Sample data
create table data
(
id int,
startdate date,
enddate date
);
insert into data (id, startdate, enddate) values
(1, '2019-01-21', '2019-01-23'),
(1, '2019-01-23', '2019-01-24'),
(1, '2019-01-24', '2019-01-27'),
(1, '2019-01-29', '2019-02-02');
Solution
-- determine start dates
with cte_start as
(
select s.id,
s.startdate
from data s
where not exists ( select 'x'
from data e
where e.id = s.id
and e.enddate = s.startdate )
),
-- determine date boundaries
cte_startnext as
(
select s.id,
s.startdate,
lead(s.startdate) over (partition by s.id order by s.startdate) as startdate_next
from cte_start s
)
-- determine periods
select sn.id,
sn.startdate,
e.enddate
from cte_startnext sn
cross apply ( select top 1 e.enddate
from data e
where e.id = sn.id
and e.startdate >= sn.startdate
and (e.startdate < sn.startdate_next or sn.startdate_next is null)
order by e.enddate desc ) e
order by sn.id,
sn.startdate;
Result
id startdate enddate
-- ---------- ----------
1 2019-01-21 2019-01-27
1 2019-01-29 2019-02-02
Fiddle to see build up of solution and intermediate CTE results.

Merge rows if date columns are overlapping in TSQL

I have a table in the following format
Id StartDate EndDate Type
1 2012-02-18 2012-03-18 1
1 2012-03-17 2012-06-29 1
1 2012-06-27 2012-09-27 1
1 2014-08-23 2014-09-24 3
1 2014-09-23 2014-10-24 3
1 2014-10-23 2014-11-24 3
2 2015-07-04 2015-08-06 1
2 2015-08-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
I found similar questions here, but not something that could help me solve my problem. I want to merge rows that has the same Id, Type and overlapping date periods.
The result from the above table should be
Id StartDate EndDate Type
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
In another server, I was able to do it with the following restrictions and the query below:
Didn't care about the Type column, but just the Id
Had a newer version of SQL Server (2012), but now I have 2008 which the code is not compatible
SELECT Id
, MIN(StartDate) AS StartDate
, MAX(EndDate) AS EndDate
FROM (
SELECT *
, SUM(CASE WHEN a.EndDate = a.StartDate THEN 0
ELSE 1
END
) OVER (ORDER BY Id, StartDate) sm
FROM (
SELECT Id
, StartDate
, EndDate
, LAG(EndDate, 1, NULL) OVER (PARTITION BY Id ORDER BY Id, EndDate) EndDate
FROM #temptable
) a
) b
GROUP BY Id, sm
Any advice how I can
Include Type on the process
Make it work on SQL Server 2008
This approach uses an additional temp table to identify the groups of overlapping dates, and then performs a quick aggregate based on the groupings.
SELECT *, ROW_NUMBER() OVER (ORDER BY Id, Type) AS UID,
ROW_NUMBER() OVER (ORDER BY Id, Type) AS GroupId INTO #G FROM #TempTable
WHILE ##ROWCOUNT <> 0 BEGIN
UPDATE T1 SET
GroupId = T2.GroupId
FROM #G T1
INNER JOIN (
SELECT T1.UID, CASE WHEN T1.GroupId < T2.GroupId THEN T1.GroupId ELSE T2.GroupId END
FROM #G T1
LEFT OUTER JOIN #G T2
ON T1.Id = T2.Id AND T1.Type = T2.Type AND T1.GroupId <> T2.GroupId
AND T1.StartDate <= T2.EndDate AND T2.StartDate <= T1.EndDate
) T2 (UID, GroupId)
ON T1.UID = T2.UID
WHERE T1.GroupId <> T2.GroupId
END
SELECT Id, MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate, Type
FROM #G G GROUP BY GroupId, Id, Type
This returns the expected values
Id StartDate EndDate Type
----------- ---------- ---------- -----------
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
This is 2008 compatible. A CTE really is the best way to link up all overlapping records in my opinion. The date overlap logic came from this thread: SO Date Overlap
I added extra data that's more complex to make sure that it's working as expected.
DECLARE #Data table (Id INT, StartDate DATE, EndDate DATE, Type INT)
INSERT INTO #data
SELECT 1,'2/18/2012' ,'3/18/2012', 1 UNION ALL
select 1,'3/17/2012','6/29/2012',1 UNION ALL
select 1,'6/27/2012','9/27/2012',1 UNION ALL
select 1,'8/23/2014','9/24/2014',3 UNION ALL
select 1,'9/23/2014','10/24/2014',3 UNION ALL
select 1,'10/23/2014','11/24/2014',3 UNION ALL
select 2,'7/4/2015','8/6/2015',1 UNION ALL
select 2,'8/4/2015','9/6/2015',1 UNION ALL
select 3,'11/1/2013','12/1/2013',0 UNION ALL
select 3,'1/9/2018','2/9/2018',0 UNION ALL
select 4,'1/1/2018','1/2/2018',0 UNION ALL --many non overlapping dates
select 4,'1/4/2018','1/5/2018',0 UNION ALL
select 4,'1/7/2018','1/9/2018',0 UNION ALL
select 4,'1/11/2018','1/13/2018',0 UNION ALL
select 4,'2/7/2018','2/8/2018',0 UNION ALL --many overlapping dates
select 4,'2/8/2018','2/9/2018',0 UNION ALL
select 4,'2/9/2018','2/10/2018',0 UNION all
select 4,'2/10/2018','2/11/2018',0 UNION all
select 4,'2/11/2018','2/12/2018',0 UNION all
select 4,'2/12/2018','2/13/2018',0 UNION all
select 4,'3/7/2018','3/8/2018',0 UNION ALL --many overlapping dates, second instance of id 4, type 0
select 4,'3/8/2018','3/9/2018',0 UNION ALL
select 4,'3/9/2018','3/10/2018',0 UNION all
select 4,'3/10/2018','3/11/2018',0 UNION all
select 4,'3/11/2018','3/12/2018',0 UNION all
select 4,'3/12/2018','3/13/2018',0
;
WITH cdata
AS (SELECT Id,
d.Type,
d.StartDate,
d.EndDate,
CurrentStart = d.StartDate
FROM #Data d
WHERE
NOT EXISTS (
SELECT * FROM #Data x WHERE x.StartDate < d.StartDate AND d.StartDate <= x.EndDate AND d.EndDate >= x.StartDate AND d.Id = x.Id AND d.Type = x.Type --get first records for overlapping ranges
)
UNION ALL
SELECT d.Id,
d.Type,
StartDate = CASE WHEN d2.StartDate < d.StartDate THEN d2.StartDate ELSE d.StartDate END,
EndDate = CASE WHEN d2.EndDate > d.EndDate THEN d2.EndDate ELSE d.EndDate END,
CurrentStart = d2.StartDate
FROM cdata d
INNER JOIN #Data d2
ON (
d.StartDate <= d2.EndDate
AND d.EndDate >= d2.StartDate
)
AND d2.Id = d.Id
AND d2.Type = d.Type
AND d2.StartDate > d.CurrentStart)
SELECT cdata.Id, cdata.Type, cdata.StartDate, EndDate = MAX(cdata.EndDate)
FROM cdata
GROUP BY cdata.Id, cdata.Type, cdata.StartDate
This looks like a Packing Intervals problem. See the post by Itzik Ben-Gan for all the details and what indexes he recommends to make it work efficiently. He presents a solution without recursive CTE.
Two notes.
The query below assumes that intervals are [closed; open), i.e. StartDate is inclusive and EndDate is exclusive. This way to represent such data is often the most convenient. (in the same sense as having arrays as zero-based instead of 1-based is usually more convenient in programming languages).
I added a RowID column to have unambiguous sorting.
Sample data
DECLARE #T TABLE
(
RowID int IDENTITY,
id int,
StartDate date,
EndDate date,
tp int
);
INSERT INTO #T(Id, StartDate, EndDate, tp) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
-- Make EndDate an opened interval, make it exclusive
-- [Start; End)
UPDATE #T
SET EndDate = DATEADD(day, 1, EndDate)
;
Recommended indexes
-- indexes to support solutions
CREATE UNIQUE INDEX idx_start_id ON T(id, tp, StartDate, RowID);
CREATE UNIQUE INDEX idx_end_id ON T(id, tp, EndDate, RowID);
Query
Read the Itzik's post to understand what is going on. He has nice illustrations there. In short, each timestamp (start or end) is treated as an event. Each event has a + or - type. Each time we encounter a + event (some interval starts) we increase the running counter. Each time we encounter a - event (some interval ends) we decrease the running counter. When the running counter is 0 it means that the streak of overlapping intervals is over.
I took Itzik's query as is and simply changed the column names to match your names.
WITH C1 AS
-- let e = end ordinals, let s = start ordinals
(
SELECT
RowID, id, tp, StartDate AS ts, +1 AS EventType,
NULL AS e,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY StartDate, RowID) AS s
FROM #T
UNION ALL
SELECT
RowID, id, tp, EndDate AS ts, -1 AS EventType,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY EndDate, RowID) AS e,
NULL AS s
FROM #T
),
C2 AS
-- let se = start or end ordinal, namely, how many events (start or end) happened so far
(
SELECT C1.*,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts, EventType DESC, RowID) AS se
FROM C1
),
C3 AS
-- For start events, the expression s - (se - s) - 1 represents how many sessions were active
-- just before the current (hence - 1)
--
-- For end events, the expression (se - e) - e represents how many sessions are active
-- right after this one
--
-- The above two expressions are 0 exactly when a group of packed intervals
-- either starts or ends, respectively
--
-- After filtering only events when a group of packed intervals either starts or ends,
-- group each pair of adjacent start/end events
(
SELECT id, tp, ts,
((ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0
)
SELECT id, tp, MIN(ts) AS StartDate, DATEADD(day, -1, MAX(ts)) AS EndDate
FROM C3
GROUP BY id, tp, grpnum
ORDER BY id, tp, StartDate;
Result
+----+----+------------+------------+
| id | tp | StartDate | EndDate |
+----+----+------------+------------+
| 1 | 1 | 2012-02-18 | 2012-09-27 |
| 1 | 3 | 2014-08-23 | 2014-11-24 |
| 2 | 1 | 2015-07-04 | 2015-09-06 |
| 3 | 0 | 2013-11-01 | 2013-12-01 |
| 3 | 0 | 2018-01-09 | 2018-02-09 |
+----+----+------------+------------+
create table #table
(Id int,StartDate date, EndDate date, Type int)
insert into #table
values
('1','2012-02-18','2012-03-18','1'),('1','2012-03-19','2012-06-19','1'),
('1','2012-06-27','2012-09-27','1'),('1','2014-08-23','2014-09-24','3'),
('1','2014-09-23','2014-10-24','3'),('1','2014-10-23','2014-11-24','3'),
('2','2015-07-04','2015-08-06','1'),('2','2015-08-04','2015-09-06','1'),
('3','2013-11-01','2013-12-01','0'),('3','2018-01-09','2018-02-09','0')
select ID,MIN(startdate)sd,MAX(EndDate)ed,type from #table
group by ID,TYPE,YEAR(startdate),YEAR(EndDate)
this can be easily achieved by using some window-functions and CTE's. Here is the solution
DECLARE #table TABLE
(id INT,
StartDate DATE,
EndDate DATE,
[Type] INT
);
INSERT INTO #table(Id, StartDate, EndDate, [Type]) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
WITH C1 AS
(
SELECT *,
MAX(EndDate) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS PrevEnd
FROM #table
),
C2 AS
(
SELECT *,
SUM(StartFlag) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS UNBOUNDED PRECEDING) AS GroupID
FROM C1
CROSS APPLY ( VALUES(CASE WHEN StartDate <= PrevEnd THEN NULL ELSE 1 END) ) AS A(StartFlag)
)
SELECT Id, [Type], MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate
FROM C2
GROUP BY Id, [Type], GroupID;

Get all overlapping date ranges when all overlap at the same time

I'm struggling with this for a few days... trying to write an SQL query to get all date ranges when all units overlap at the same time. It's better to see it graphically.
Here is the simplified table with the image for reference:
UnitId Start End
====== ========== ==========
1 05/01/2018 09/01/2018
1 10/01/2018 13/01/2018
2 04/01/2018 15/01/2018
2 19/01/2018 23/01/2018
3 06/01/2018 12/01/2018
3 14/01/2018 22/01/2018
Expected result:
Start End
====== ==========
06/01/2018 09/01/2018
10/01/2018 12/01/2018
What I currently have:
DECLARE #sourceTable TABLE (UnitId int, StartDate datetime, EndDate datetime);
INSERT INTO #sourceTable VALUES
(1, '2018-01-05', '2018-01-09')
,(1, '2018-01-10', '2018-01-13')
,(2, '2018-01-04', '2018-01-15')
,(2, '2018-01-19', '2018-01-23')
,(3, '2018-01-06', '2018-01-12')
,(3, '2018-01-14', '2018-01-22');
SELECT DISTINCT
(SELECT max(v) FROM (values(A.StartDate), (B.StartDate)) as value(v)) StartDate
,(SELECT min(v) FROM (values(A.EndDate), (B.EndDate)) as value(v)) EndDate
FROM #sourceTable A
JOIN #sourceTable B
ON A.startDate <= B.endDate AND A.endDate >= B.startDate AND A.UnitId != B.UnitId
I believe it is "count number of overlapping intervals" problem (this picture should help). Here is one solution to it:
DECLARE #t TABLE (UnitId INT, [Start] DATE, [End] DATE);
INSERT INTO #t VALUES
(1, '2018-01-05', '2018-01-09'),
(1, '2018-01-10', '2018-01-13'),
(2, '2018-01-04', '2018-01-15'),
(2, '2018-01-19', '2018-01-23'),
(3, '2018-01-06', '2018-01-12'),
(3, '2018-01-14', '2018-01-22');
WITH cte1(date, val) AS (
SELECT [Start], 1 FROM #t AS t
UNION ALL
SELECT [End], 0 FROM #t AS t
UNION ALL
SELECT DATEADD(DAY, 1, [End]), -1 FROM #t AS t
), cte2 AS (
SELECT date, SUM(val) OVER (ORDER BY date, val) AS usage
FROM cte1
)
SELECT date, MAX(usage) AS usage
FROM cte2
GROUP BY date
It will give you a list of all dates at which the use count (possibly) changed:
date usage
2018-01-04 1
2018-01-05 2
2018-01-06 3
2018-01-09 3
2018-01-10 3
2018-01-12 3
2018-01-13 2
2018-01-14 2
2018-01-15 2
2018-01-16 1
2018-01-19 2
2018-01-22 2
2018-01-23 1
2018-01-24 0
With this approach you do not need a calendar table or rCTE to build missing dates. Converting the above to ranges (2018-01-05 ... 2018-01-15, 2018-01-19 ... 2018-01-22 etc) is not very difficult.
DECLARE #t TABLE (UnitId INT, [Start] DATE, [End] DATE);
INSERT INTO #t VALUES
(1, '2018-01-05', '2018-01-09'),
(1, '2018-01-10', '2018-01-13'),
(2, '2018-01-04', '2018-01-15'),
(2, '2018-01-19', '2018-01-23'),
(3, '2018-01-06', '2018-01-12'),
(3, '2018-01-14', '2018-01-22');
WITH cte1(date, val) AS (
SELECT [Start], 1 FROM #t AS t -- starting date increments counter
UNION ALL
SELECT [End], 0 FROM #t AS t -- we need all edges in the result
UNION ALL
SELECT DATEADD(DAY, 1, [End]), -1 FROM #t AS t -- end date + 1 decrements counter
), cte2 AS (
SELECT date, SUM(val) OVER (ORDER BY date, val) AS usage -- running sum for counter
FROM cte1
), cte3 AS (
SELECT date, MAX(usage) AS usage -- group multiple events on same date together
FROM cte2
GROUP BY date
), cte4 AS (
SELECT date, usage, CASE
WHEN usage > 1 AND LAG(usage) OVER (ORDER BY date) > 1 THEN 0
WHEN usage < 2 AND LAG(usage) OVER (ORDER BY date) < 2 THEN 0
ELSE 1
END AS chg -- start new group if prev and curr usage are on opposite side of 1
FROM cte3
), cte5 AS (
SELECT date, usage, SUM(chg) OVER (ORDER BY date) AS grp -- number groups for each change
FROM cte4
)
SELECT MIN(date) date1, MAX(date) date2
FROM cte5
GROUP BY grp
HAVING MIN(usage) > 1
Result:
date1 date2
2018-01-05 2018-01-15
2018-01-19 2018-01-22
You are looking for date ranges where all units overlap. So look for start dates where all units exist and end dates where all units exist and then join the two.
I'm using ROW_NUMBER to join the first start date with the first end date, the second start date with the second end date and so on.
select s.startdate, e.enddate
from
(
select startdate, row_number() over (order by startdate) as rn
from #sourceTable s1
where
(
select count(*)
from #sourceTable s2
where s1.startdate between s2.startdate and s2.enddate
) = (select count(distinct unitid) from #sourceTable)
) s
join
(
select enddate, row_number() over (order by startdate) as rn
from #sourceTable s1
where
(
select count(*)
from #sourceTable s2
where s1.enddate between s2.startdate and s2.enddate
) = (select count(distinct unitid) from #sourceTable)
) e on e.rn = s.rn
order by s.startdate;
There may be more elegant ways to solve this, but I guess this query is at least easy to understand :-)
Rextester demo: https://rextester.com/GRRSW89045

How to aggregate (counting distinct items) over a sliding window in SQL Server?

I am currently using this query (in SQL Server) to count the number of unique item each day:
SELECT Date, COUNT(DISTINCT item)
FROM myTable
GROUP BY Date
ORDER BY Date
How can I transform this to get for each date the number of unique item over the past 3 days (including the current day)?
The output should be a table with 2 columns:
one columns with all dates in the original table. On the second column, we have the number of unique item per date.
for instance if original table is:
Date Item
01/01/2018 A
01/01/2018 B
02/01/2018 C
03/01/2018 C
04/01/2018 C
With my query above I currently get the unique count for each day:
Date count
01/01/2018 2
02/01/2018 1
03/01/2018 1
04/01/2018 1
and I am looking to get as result the unique count over 3 days rolling window:
Date count
01/01/2018 2
02/01/2018 3 (because items ABC on 1st and 2nd Jan)
03/01/2018 3 (because items ABC on 1st,2nd,3rd Jan)
04/01/2018 1 (because only item C on 2nd,3rd,4th Jan)
Using an apply provides a convenient way to form sliding windows
CREATE TABLE myTable
([DateCol] datetime, [Item] varchar(1))
;
INSERT INTO myTable
([DateCol], [Item])
VALUES
('2018-01-01 00:00:00', 'A'),
('2018-01-01 00:00:00', 'B'),
('2018-01-02 00:00:00', 'C'),
('2018-01-03 00:00:00', 'C'),
('2018-01-04 00:00:00', 'C')
;
CREATE NONCLUSTERED INDEX IX_DateCol
ON MyTable([Date])
;
Query:
select distinct
t1.dateCol
, oa.ItemCount
from myTable t1
outer apply (
select count(distinct t2.item) as ItemCount
from myTable t2
where t2.DateCol between dateadd(day,-2,t1.DateCol) and t1.DateCol
) oa
order by t1.dateCol ASC
Results:
| dateCol | ItemCount |
|----------------------|-----------|
| 2018-01-01T00:00:00Z | 2 |
| 2018-01-02T00:00:00Z | 3 |
| 2018-01-03T00:00:00Z | 3 |
| 2018-01-04T00:00:00Z | 1 |
There may be some performance gains by reducing the date column prior to using the apply, like so:
select
d.date
, oa.ItemCount
from (
select distinct t1.date
from myTable t1
) d
outer apply (
select count(distinct t2.item) as ItemCount
from myTable t2
where t2.Date between dateadd(day,-2,d.Date) and d.Date
) oa
order by d.date ASC
;
Instead of using select distinct in that subquery you could use group by instead but the execution plan will remain the same.
Demo at SQL Fiddle
The most straight forward solution is to join the table with itself based on dates:
SELECT t1.DateCol, COUNT(DISTINCT t2.Item) AS C
FROM testdata AS t1
LEFT JOIN testdata AS t2 ON t2.DateCol BETWEEN DATEADD(dd, -2, t1.DateCol) AND t1.DateCol
GROUP BY t1.DateCol
ORDER BY t1.DateCol
Output:
| DateCol | C |
|-------------------------|---|
| 2018-01-01 00:00:00.000 | 2 |
| 2018-01-02 00:00:00.000 | 3 |
| 2018-01-03 00:00:00.000 | 3 |
| 2018-01-04 00:00:00.000 | 1 |
GROUP BY should be faster then DISTINCT (make sure to have an index on your Date column)
DECLARE #tbl TABLE([Date] DATE, [Item] VARCHAR(100))
;
INSERT INTO #tbl VALUES
('2018-01-01 00:00:00', 'A'),
('2018-01-01 00:00:00', 'B'),
('2018-01-02 00:00:00', 'C'),
('2018-01-03 00:00:00', 'C'),
('2018-01-04 00:00:00', 'C');
SELECT t.[Date]
--Just for control. You can take this part away
,(SELECT DISTINCT t2.[Item] AS [*]
FROM #tbl AS t2
WHERE t2.[Date]<=t.[Date]
AND t2.[Date]>=DATEADD(DAY,-2,t.[Date]) FOR XML PATH('')) AS CountedItems
--This sub-select comes back with your counts
,(SELECT COUNT(DISTINCT t2.[Item])
FROM #tbl AS t2
WHERE t2.[Date]<=t.[Date]
AND t2.[Date]>=DATEADD(DAY,-2,t.[Date])) AS ItemCount
FROM #tbl AS t
GROUP BY t.[Date];
The result
Date CountedItems ItemCount
2018-01-01 AB 2
2018-01-02 ABC 3
2018-01-03 ABC 3
2018-01-04 C 1
This solution is different from other solutions. Can you check performance of this query on real data with comparison to other answers?
The basic idea is that each row can participate in the window for its own date, the day after, or the day after that. So this first expands the row out into three rows with those different dates attached and then it can just use a regular COUNT(DISTINCT) aggregating on the computed date. The HAVING clause is just to avoid returning results for dates that were solely computed and not present in the base data.
with cte(Date, Item) as (
select cast(a as datetime), b
from (values
('01/01/2018','A')
,('01/01/2018','B')
,('02/01/2018','C')
,('03/01/2018','C')
,('04/01/2018','C')) t(a,b)
)
select
[Date] = dateadd(dd, n, Date), [Count] = count(distinct Item)
from
cte
cross join (values (0),(1),(2)) t(n)
group by dateadd(dd, n, Date)
having max(iif(n = 0, 1, 0)) = 1
option (force order)
Output:
| Date | Count |
|-------------------------|-------|
| 2018-01-01 00:00:00.000 | 2 |
| 2018-01-02 00:00:00.000 | 3 |
| 2018-01-03 00:00:00.000 | 3 |
| 2018-01-04 00:00:00.000 | 1 |
It might be faster if you have many duplicate rows:
select
[Date] = dateadd(dd, n, Date), [Count] = count(distinct Item)
from
(select distinct Date, Item from cte) c
cross join (values (0),(1),(2)) t(n)
group by dateadd(dd, n, Date)
having max(iif(n = 0, 1, 0)) = 1
option (force order)
Use GETDATE() function to get current date, and DATEADD() to get the last 3 days
SELECT Date, count(DISTINCT item)
FROM myTable
WHERE [Date] >= DATEADD(day,-3, GETDATE())
GROUP BY Date
ORDER BY Date
SQL
SELECT DISTINCT Date,
(SELECT COUNT(DISTINCT item)
FROM myTable t2
WHERE t2.Date BETWEEN DATEADD(day, -2, t1.Date) AND t1.Date) AS count
FROM myTable t1
ORDER BY Date;
Demo
Rextester demo: http://rextester.com/ZRDQ22190
Since COUNT(DISTINCT item) OVER (PARTITION BY [Date]) is not supported you can use dense_rank to emulate that:
SELECT Date, dense_rank() over (partition by [Date] order by [item])
+ dense_rank() over (partition by [Date] order by [item] desc)
- 1 as count_distinct_item
FROM myTable
One thing to note is that dense_rank will count null as whereas COUNT will not.
Refer this post for more details.
Here is a simple solution that uses myTable itself as the source of grouping dates (edited for SQLServer dateadd). Note that this query assumes there will be at least one record in myTable for every date; if any date is absent, it will not appear in the query results, even if there are records for the 2 days prior:
select
date,
(select
count(distinct item)
from (select distinct date, item from myTable) as d2
where
d2.date between dateadd(day,-2,d.date) and d.date
) as count
from (select distinct date from myTable) as d
I solve this question with Math.
z (any day) = 3x + y (y is mode 3 value)
I need from 3 * (x - 1) + y + 1 to 3 * (x - 1) + y + 3
3 * (x- 1) + y + 1 = 3* (z / 3 - 1) + z % 3 + 1
In that case; I can use group by (between 3* (z / 3 - 1) + z % 3 + 1 and z)
SELECT iif(OrderDate between 3 * (cast(OrderDate as int) / 3 - 1) + (cast(OrderDate as int) % 3) + 1
and orderdate, Orderdate, 0)
, count(sh.SalesOrderID) FROM Sales.SalesOrderDetail shd
JOIN Sales.SalesOrderHeader sh on sh.SalesOrderID = shd.SalesOrderID
group by iif(OrderDate between 3 * (cast(OrderDate as int) / 3 - 1) + (cast(OrderDate as int) % 3) + 1
and orderdate, Orderdate, 0)
order by iif(OrderDate between 3 * (cast(OrderDate as int) / 3 - 1) + (cast(OrderDate as int) % 3) + 1
and orderdate, Orderdate, 0)
If you need else day group, you can use;
declare #n int = 4 (another day count)
SELECT iif(OrderDate between #n * (cast(OrderDate as int) / #n - 1) + (cast(OrderDate as int) % #n) + 1
and orderdate, Orderdate, 0)
, count(sh.SalesOrderID) FROM Sales.SalesOrderDetail shd
JOIN Sales.SalesOrderHeader sh on sh.SalesOrderID = shd.SalesOrderID
group by iif(OrderDate between #n * (cast(OrderDate as int) / #n - 1) + (cast(OrderDate as int) % #n) + 1
and orderdate, Orderdate, 0)
order by iif(OrderDate between #n * (cast(OrderDate as int) / #n - 1) + (cast(OrderDate as int) % #n) + 1
and orderdate, Orderdate, 0)

Condense Time Periods with SQL

I have a large data set which for the purpose of this question has 3 fields:
Group Identifier
From Date
To Date
On any given row the From Date will always be less than the To Date but within each group the time periods (which are in no particular order) represented by the date pairs could overlap, be contained one within another, or even be identical.
What I'd like to end up with is a query that condenses the results for each group down to just the continuous periods. For example a group that looks like this:
| Group ID | From Date | To Date |
--------------------------------------
| A | 01/01/2012 | 12/31/2012 |
| A | 12/01/2013 | 11/30/2014 |
| A | 01/01/2015 | 12/31/2015 |
| A | 01/01/2015 | 12/31/2015 |
| A | 02/01/2015 | 03/31/2015 |
| A | 01/01/2013 | 12/31/2013 |
Would result in this:
| Group ID | From Date | To Date |
--------------------------------------
| A | 01/01/2012 | 11/30/2014 |
| A | 01/01/2015 | 12/31/2015 |
I've read a number of articles on date packing but I can't quite figure out how to apply that to my data set.
How can construct a query that would give me those results?
The solution from book "Microsoft® SQL Server ® 2012 High-Performance T-SQL Using Window Functions"
;with C1 as(
select GroupID, FromDate as ts, +1 as type, 1 as sub
from dbo.table_name
union all
select GroupID, dateadd(day, +1, ToDate) as ts, -1 as type, 0 as sub
from dbo.table_name),
C2 as(
select C1.*
, sum(type) over(partition by GroupID order by ts, type desc
rows between unbounded preceding and current row) - sub as cnt
from C1),
C3 as(
select GroupID, ts, floor((row_number() over(partition by GroupID order by ts) - 1) / 2 + 1) as grpnum
from C2
where cnt = 0)
select GroupID, min(ts) as FromDate, dateadd(day, -1, max(ts)) as ToDate
from C3
group by GroupID, grpnum;
Create table:
if object_id('table_name') is not null
drop table table_name
create table table_name(GroupID varchar(100), FromDate datetime,ToDate datetime)
insert into table_name
select 'A', '01/01/2012', '12/31/2012' union all
select 'A', '12/01/2013', '11/30/2014' union all
select 'A', '01/01/2015', '12/31/2015' union all
select 'A', '01/01/2015', '12/31/2015' union all
select 'A', '02/01/2015', '03/31/2015' union all
select 'A', '01/01/2013', '12/31/2013'
I'd use a Calendar table. This table simply has a list of dates for several decades.
CREATE TABLE [dbo].[Calendar](
[dt] [date] NOT NULL,
CONSTRAINT [PK_Calendar] PRIMARY KEY CLUSTERED
(
[dt] ASC
))
There are many ways to populate such table.
For example, 100K rows (~270 years) from 1900-01-01:
INSERT INTO dbo.Calendar (dt)
SELECT TOP (100000)
DATEADD(day, ROW_NUMBER() OVER (ORDER BY s1.[object_id])-1, '19000101') AS dt
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);
Once you have a Calendar table, here is how to use it.
Each original row is joined with the Calendar table to return as many rows as there are dates between From and To.
Then possible duplicates are removed.
Then classic gaps-and-islands by numbering the rows in two sequences.
Then grouping found islands together to get the new From and To.
Sample data
I added a second group.
DECLARE #T TABLE (GroupID int, FromDate date, ToDate date);
INSERT INTO #T (GroupID, FromDate, ToDate) VALUES
(1, '2012-01-01', '2012-12-31'),
(1, '2013-12-01', '2014-11-30'),
(1, '2015-01-01', '2015-12-31'),
(1, '2015-01-01', '2015-12-31'),
(1, '2015-02-01', '2015-03-31'),
(1, '2013-01-01', '2013-12-31'),
(2, '2012-01-01', '2012-12-31'),
(2, '2013-01-01', '2013-12-31');
Query
WITH
CTE_AllDates
AS
(
SELECT DISTINCT
T.GroupID
,CA.dt
FROM
#T AS T
CROSS APPLY
(
SELECT dbo.Calendar.dt
FROM dbo.Calendar
WHERE
dbo.Calendar.dt >= T.FromDate
AND dbo.Calendar.dt <= T.ToDate
) AS CA
)
,CTE_Sequences
AS
(
SELECT
GroupID
,dt
,ROW_NUMBER() OVER(PARTITION BY GroupID ORDER BY dt) AS Seq1
,DATEDIFF(day, '2001-01-01', dt) AS Seq2
,DATEDIFF(day, '2001-01-01', dt) -
ROW_NUMBER() OVER(PARTITION BY GroupID ORDER BY dt) AS IslandNumber
FROM CTE_AllDates
)
SELECT
GroupID
,MIN(dt) AS NewFromDate
,MAX(dt) AS NewToDate
FROM CTE_Sequences
GROUP BY GroupID, IslandNumber
ORDER BY GroupID, NewFromDate;
Result
+---------+-------------+------------+
| GroupID | NewFromDate | NewToDate |
+---------+-------------+------------+
| 1 | 2012-01-01 | 2014-11-30 |
| 1 | 2015-01-01 | 2015-12-31 |
| 2 | 2012-01-01 | 2013-12-31 |
+---------+-------------+------------+
; with
cte as
(
select *, rn = row_number() over (partition by [Group ID] order by [From Date])
from tbl
),
rcte as
(
select rn, [Group ID], [From Date], [To Date], GrpNo = 1, GrpFrom = [From Date], GrpTo = [To Date]
from cte
where rn = 1
union all
select c.rn, c.[Group ID], c.[From Date], c.[To Date],
GrpNo = case when c.[From Date] between r.GrpFrom and dateadd(day, 1, r.GrpTo)
or c.[To Date] between r.GrpFrom and r.GrpTo
then r.GrpNo
else r.GrpNo + 1
end,
GrpFrom= case when c.[From Date] between r.GrpFrom and dateadd(day, 1, r.GrpTo)
or c.[To Date] between r.GrpFrom and r.GrpTo
then case when c.[From Date] > r.GrpFrom then c.[From Date] else r.GrpFrom end
else c.[From Date]
end,
GrpTo = case when c.[From Date] between r.GrpFrom and dateadd(day, 1, r.GrpTo)
or c.[To Date] between r.GrpFrom and dateadd(day, 1, r.GrpTo)
then case when c.[To Date] > r.GrpTo then c.[To Date] else r.GrpTo end
else c.[To Date]
end
from rcte r
inner join cte c on r.[Group ID] = c.[Group ID]
and r.rn = c.rn - 1
)
select [Group ID], min(GrpFrom), max(GrpTo)
from rcte
group by [Group ID], GrpNo
A Geometric Approach
Here and elsewhere I've noticed that date packing questions
don't provide a geometric approach to this problem. After all,
any range, date-ranges included, can be interpreted as a line.
So why not convert them to a sql geometry type and utilize
geometry::UnionAggregate to merge the ranges. So I gave a stab
at it with your post.
Code Description
In 'numbers':
I build a table representing a sequence
Swap it out with your favorite way to make a numbers table.
For a union operation, you won't ever need more rows than in
your original table, so I just use it as the base to build it.
In 'mergeLines':
I convert the dates to floats and use those floats
to create geometrical points.
In this problem, we're working in
'integer space,' meaning there are no time considerations, and so
an begin date in one range that is one day apart from an end date
in another should be merged with that other. In order to make
that merge happen, we need to convert to 'real space.', so we
add 1 to the tail of all ranges (we undo this later).
I then connect these points via STUnion and STEnvelope.
Finally, I merge all these lines via UnionAggregate. The resulting
'lines' geometry object might contain multiple lines, but if they
overlap, they turn into one line.
In the outer query:
I use the numbers CTE to extract the individual lines inside 'lines'.
I envelope the lines which here ensures that the lines are stored
only as its two endpoints.
I read the endpoint x values and convert them back to their time
representations, ensuring to put them back into 'integer space'.
The Code
with
numbers as (
select row_number() over (order by (select null)) i
from #spans -- Where I put your data
),
mergeLines as (
select groupId,
lines = geometry::UnionAggregate(line)
from #spans
cross apply (select
startP = geometry::Point(convert(float,fromDate), 0, 0),
stopP = geometry::Point(convert(float,toDate) + 1, 0, 0)
) pointify
cross apply (select line = startP.STUnion(stopP).STEnvelope()) lineify
group by groupId
)
select groupId, fromDate, toDate
from mergeLines ml
join numbers n on n.i between 1 and ml.lines.STNumGeometries()
cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
cross apply (select
fromDate = convert(datetime, l.line.STPointN(1).STX),
toDate = convert(datetime, l.line.STPointN(3).STX) - 1
) unprepare
order by groupId, fromDate;