Get all overlapping date ranges when all overlap at the same time

Get all overlapping date ranges when all overlap at the same time - sql

I'm struggling with this for a few days... trying to write an SQL query to get all date ranges when all units overlap at the same time. It's better to see it graphically.
Here is the simplified table with the image for reference:
UnitId Start End
====== ========== ==========
1 05/01/2018 09/01/2018
1 10/01/2018 13/01/2018
2 04/01/2018 15/01/2018
2 19/01/2018 23/01/2018
3 06/01/2018 12/01/2018
3 14/01/2018 22/01/2018
Expected result:
Start End
====== ==========
06/01/2018 09/01/2018
10/01/2018 12/01/2018
What I currently have:
DECLARE #sourceTable TABLE (UnitId int, StartDate datetime, EndDate datetime);
INSERT INTO #sourceTable VALUES
(1, '2018-01-05', '2018-01-09')
,(1, '2018-01-10', '2018-01-13')
,(2, '2018-01-04', '2018-01-15')
,(2, '2018-01-19', '2018-01-23')
,(3, '2018-01-06', '2018-01-12')
,(3, '2018-01-14', '2018-01-22');
SELECT DISTINCT
(SELECT max(v) FROM (values(A.StartDate), (B.StartDate)) as value(v)) StartDate
,(SELECT min(v) FROM (values(A.EndDate), (B.EndDate)) as value(v)) EndDate
FROM #sourceTable A
JOIN #sourceTable B
ON A.startDate <= B.endDate AND A.endDate >= B.startDate AND A.UnitId != B.UnitId

I believe it is "count number of overlapping intervals" problem (this picture should help). Here is one solution to it:
DECLARE #t TABLE (UnitId INT, [Start] DATE, [End] DATE);
INSERT INTO #t VALUES
(1, '2018-01-05', '2018-01-09'),
(1, '2018-01-10', '2018-01-13'),
(2, '2018-01-04', '2018-01-15'),
(2, '2018-01-19', '2018-01-23'),
(3, '2018-01-06', '2018-01-12'),
(3, '2018-01-14', '2018-01-22');
WITH cte1(date, val) AS (
SELECT [Start], 1 FROM #t AS t
UNION ALL
SELECT [End], 0 FROM #t AS t
UNION ALL
SELECT DATEADD(DAY, 1, [End]), -1 FROM #t AS t
), cte2 AS (
SELECT date, SUM(val) OVER (ORDER BY date, val) AS usage
FROM cte1
)
SELECT date, MAX(usage) AS usage
FROM cte2
GROUP BY date
It will give you a list of all dates at which the use count (possibly) changed:
date usage
2018-01-04 1
2018-01-05 2
2018-01-06 3
2018-01-09 3
2018-01-10 3
2018-01-12 3
2018-01-13 2
2018-01-14 2
2018-01-15 2
2018-01-16 1
2018-01-19 2
2018-01-22 2
2018-01-23 1
2018-01-24 0
With this approach you do not need a calendar table or rCTE to build missing dates. Converting the above to ranges (2018-01-05 ... 2018-01-15, 2018-01-19 ... 2018-01-22 etc) is not very difficult.
DECLARE #t TABLE (UnitId INT, [Start] DATE, [End] DATE);
INSERT INTO #t VALUES
(1, '2018-01-05', '2018-01-09'),
(1, '2018-01-10', '2018-01-13'),
(2, '2018-01-04', '2018-01-15'),
(2, '2018-01-19', '2018-01-23'),
(3, '2018-01-06', '2018-01-12'),
(3, '2018-01-14', '2018-01-22');
WITH cte1(date, val) AS (
SELECT [Start], 1 FROM #t AS t -- starting date increments counter
UNION ALL
SELECT [End], 0 FROM #t AS t -- we need all edges in the result
UNION ALL
SELECT DATEADD(DAY, 1, [End]), -1 FROM #t AS t -- end date + 1 decrements counter
), cte2 AS (
SELECT date, SUM(val) OVER (ORDER BY date, val) AS usage -- running sum for counter
FROM cte1
), cte3 AS (
SELECT date, MAX(usage) AS usage -- group multiple events on same date together
FROM cte2
GROUP BY date
), cte4 AS (
SELECT date, usage, CASE
WHEN usage > 1 AND LAG(usage) OVER (ORDER BY date) > 1 THEN 0
WHEN usage < 2 AND LAG(usage) OVER (ORDER BY date) < 2 THEN 0
ELSE 1
END AS chg -- start new group if prev and curr usage are on opposite side of 1
FROM cte3
), cte5 AS (
SELECT date, usage, SUM(chg) OVER (ORDER BY date) AS grp -- number groups for each change
FROM cte4
)
SELECT MIN(date) date1, MAX(date) date2
FROM cte5
GROUP BY grp
HAVING MIN(usage) > 1
Result:
date1 date2
2018-01-05 2018-01-15
2018-01-19 2018-01-22

You are looking for date ranges where all units overlap. So look for start dates where all units exist and end dates where all units exist and then join the two.
I'm using ROW_NUMBER to join the first start date with the first end date, the second start date with the second end date and so on.
select s.startdate, e.enddate
from
(
select startdate, row_number() over (order by startdate) as rn
from #sourceTable s1
where
(
select count(*)
from #sourceTable s2
where s1.startdate between s2.startdate and s2.enddate
) = (select count(distinct unitid) from #sourceTable)
) s
join
(
select enddate, row_number() over (order by startdate) as rn
from #sourceTable s1
where
(
select count(*)
from #sourceTable s2
where s1.enddate between s2.startdate and s2.enddate
) = (select count(distinct unitid) from #sourceTable)
) e on e.rn = s.rn
order by s.startdate;
There may be more elegant ways to solve this, but I guess this query is at least easy to understand :-)
Rextester demo: https://rextester.com/GRRSW89045

Related

Loop within id and combine dates between rows in SQL [duplicate]

I have a table in the following format
Id StartDate EndDate Type
1 2012-02-18 2012-03-18 1
1 2012-03-17 2012-06-29 1
1 2012-06-27 2012-09-27 1
1 2014-08-23 2014-09-24 3
1 2014-09-23 2014-10-24 3
1 2014-10-23 2014-11-24 3
2 2015-07-04 2015-08-06 1
2 2015-08-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
I found similar questions here, but not something that could help me solve my problem. I want to merge rows that has the same Id, Type and overlapping date periods.
The result from the above table should be
Id StartDate EndDate Type
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
In another server, I was able to do it with the following restrictions and the query below:
Didn't care about the Type column, but just the Id
Had a newer version of SQL Server (2012), but now I have 2008 which the code is not compatible
SELECT Id
, MIN(StartDate) AS StartDate
, MAX(EndDate) AS EndDate
FROM (
SELECT *
, SUM(CASE WHEN a.EndDate = a.StartDate THEN 0
ELSE 1
END
) OVER (ORDER BY Id, StartDate) sm
FROM (
SELECT Id
, StartDate
, EndDate
, LAG(EndDate, 1, NULL) OVER (PARTITION BY Id ORDER BY Id, EndDate) EndDate
FROM #temptable
) a
) b
GROUP BY Id, sm
Any advice how I can
Include Type on the process
Make it work on SQL Server 2008

This approach uses an additional temp table to identify the groups of overlapping dates, and then performs a quick aggregate based on the groupings.
SELECT *, ROW_NUMBER() OVER (ORDER BY Id, Type) AS UID,
ROW_NUMBER() OVER (ORDER BY Id, Type) AS GroupId INTO #G FROM #TempTable
WHILE ##ROWCOUNT <> 0 BEGIN
UPDATE T1 SET
GroupId = T2.GroupId
FROM #G T1
INNER JOIN (
SELECT T1.UID, CASE WHEN T1.GroupId < T2.GroupId THEN T1.GroupId ELSE T2.GroupId END
FROM #G T1
LEFT OUTER JOIN #G T2
ON T1.Id = T2.Id AND T1.Type = T2.Type AND T1.GroupId <> T2.GroupId
AND T1.StartDate <= T2.EndDate AND T2.StartDate <= T1.EndDate
) T2 (UID, GroupId)
ON T1.UID = T2.UID
WHERE T1.GroupId <> T2.GroupId
END
SELECT Id, MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate, Type
FROM #G G GROUP BY GroupId, Id, Type
This returns the expected values
Id StartDate EndDate Type
----------- ---------- ---------- -----------
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0

This is 2008 compatible. A CTE really is the best way to link up all overlapping records in my opinion. The date overlap logic came from this thread: SO Date Overlap
I added extra data that's more complex to make sure that it's working as expected.
DECLARE #Data table (Id INT, StartDate DATE, EndDate DATE, Type INT)
INSERT INTO #data
SELECT 1,'2/18/2012' ,'3/18/2012', 1 UNION ALL
select 1,'3/17/2012','6/29/2012',1 UNION ALL
select 1,'6/27/2012','9/27/2012',1 UNION ALL
select 1,'8/23/2014','9/24/2014',3 UNION ALL
select 1,'9/23/2014','10/24/2014',3 UNION ALL
select 1,'10/23/2014','11/24/2014',3 UNION ALL
select 2,'7/4/2015','8/6/2015',1 UNION ALL
select 2,'8/4/2015','9/6/2015',1 UNION ALL
select 3,'11/1/2013','12/1/2013',0 UNION ALL
select 3,'1/9/2018','2/9/2018',0 UNION ALL
select 4,'1/1/2018','1/2/2018',0 UNION ALL --many non overlapping dates
select 4,'1/4/2018','1/5/2018',0 UNION ALL
select 4,'1/7/2018','1/9/2018',0 UNION ALL
select 4,'1/11/2018','1/13/2018',0 UNION ALL
select 4,'2/7/2018','2/8/2018',0 UNION ALL --many overlapping dates
select 4,'2/8/2018','2/9/2018',0 UNION ALL
select 4,'2/9/2018','2/10/2018',0 UNION all
select 4,'2/10/2018','2/11/2018',0 UNION all
select 4,'2/11/2018','2/12/2018',0 UNION all
select 4,'2/12/2018','2/13/2018',0 UNION all
select 4,'3/7/2018','3/8/2018',0 UNION ALL --many overlapping dates, second instance of id 4, type 0
select 4,'3/8/2018','3/9/2018',0 UNION ALL
select 4,'3/9/2018','3/10/2018',0 UNION all
select 4,'3/10/2018','3/11/2018',0 UNION all
select 4,'3/11/2018','3/12/2018',0 UNION all
select 4,'3/12/2018','3/13/2018',0
;
WITH cdata
AS (SELECT Id,
d.Type,
d.StartDate,
d.EndDate,
CurrentStart = d.StartDate
FROM #Data d
WHERE
NOT EXISTS (
SELECT * FROM #Data x WHERE x.StartDate < d.StartDate AND d.StartDate <= x.EndDate AND d.EndDate >= x.StartDate AND d.Id = x.Id AND d.Type = x.Type --get first records for overlapping ranges
)
UNION ALL
SELECT d.Id,
d.Type,
StartDate = CASE WHEN d2.StartDate < d.StartDate THEN d2.StartDate ELSE d.StartDate END,
EndDate = CASE WHEN d2.EndDate > d.EndDate THEN d2.EndDate ELSE d.EndDate END,
CurrentStart = d2.StartDate
FROM cdata d
INNER JOIN #Data d2
ON (
d.StartDate <= d2.EndDate
AND d.EndDate >= d2.StartDate
)
AND d2.Id = d.Id
AND d2.Type = d.Type
AND d2.StartDate > d.CurrentStart)
SELECT cdata.Id, cdata.Type, cdata.StartDate, EndDate = MAX(cdata.EndDate)
FROM cdata
GROUP BY cdata.Id, cdata.Type, cdata.StartDate

This looks like a Packing Intervals problem. See the post by Itzik Ben-Gan for all the details and what indexes he recommends to make it work efficiently. He presents a solution without recursive CTE.
Two notes.
The query below assumes that intervals are [closed; open), i.e. StartDate is inclusive and EndDate is exclusive. This way to represent such data is often the most convenient. (in the same sense as having arrays as zero-based instead of 1-based is usually more convenient in programming languages).
I added a RowID column to have unambiguous sorting.
Sample data
DECLARE #T TABLE
(
RowID int IDENTITY,
id int,
StartDate date,
EndDate date,
tp int
);
INSERT INTO #T(Id, StartDate, EndDate, tp) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
-- Make EndDate an opened interval, make it exclusive
-- [Start; End)
UPDATE #T
SET EndDate = DATEADD(day, 1, EndDate)
;
Recommended indexes
-- indexes to support solutions
CREATE UNIQUE INDEX idx_start_id ON T(id, tp, StartDate, RowID);
CREATE UNIQUE INDEX idx_end_id ON T(id, tp, EndDate, RowID);
Query
Read the Itzik's post to understand what is going on. He has nice illustrations there. In short, each timestamp (start or end) is treated as an event. Each event has a + or - type. Each time we encounter a + event (some interval starts) we increase the running counter. Each time we encounter a - event (some interval ends) we decrease the running counter. When the running counter is 0 it means that the streak of overlapping intervals is over.
I took Itzik's query as is and simply changed the column names to match your names.
WITH C1 AS
-- let e = end ordinals, let s = start ordinals
(
SELECT
RowID, id, tp, StartDate AS ts, +1 AS EventType,
NULL AS e,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY StartDate, RowID) AS s
FROM #T
UNION ALL
SELECT
RowID, id, tp, EndDate AS ts, -1 AS EventType,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY EndDate, RowID) AS e,
NULL AS s
FROM #T
),
C2 AS
-- let se = start or end ordinal, namely, how many events (start or end) happened so far
(
SELECT C1.*,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts, EventType DESC, RowID) AS se
FROM C1
),
C3 AS
-- For start events, the expression s - (se - s) - 1 represents how many sessions were active
-- just before the current (hence - 1)
--
-- For end events, the expression (se - e) - e represents how many sessions are active
-- right after this one
--
-- The above two expressions are 0 exactly when a group of packed intervals
-- either starts or ends, respectively
--
-- After filtering only events when a group of packed intervals either starts or ends,
-- group each pair of adjacent start/end events
(
SELECT id, tp, ts,
((ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0
)
SELECT id, tp, MIN(ts) AS StartDate, DATEADD(day, -1, MAX(ts)) AS EndDate
FROM C3
GROUP BY id, tp, grpnum
ORDER BY id, tp, StartDate;
Result
+----+----+------------+------------+
| id | tp | StartDate | EndDate |
+----+----+------------+------------+
| 1 | 1 | 2012-02-18 | 2012-09-27 |
| 1 | 3 | 2014-08-23 | 2014-11-24 |
| 2 | 1 | 2015-07-04 | 2015-09-06 |
| 3 | 0 | 2013-11-01 | 2013-12-01 |
| 3 | 0 | 2018-01-09 | 2018-02-09 |
+----+----+------------+------------+

create table #table
(Id int,StartDate date, EndDate date, Type int)
insert into #table
values
('1','2012-02-18','2012-03-18','1'),('1','2012-03-19','2012-06-19','1'),
('1','2012-06-27','2012-09-27','1'),('1','2014-08-23','2014-09-24','3'),
('1','2014-09-23','2014-10-24','3'),('1','2014-10-23','2014-11-24','3'),
('2','2015-07-04','2015-08-06','1'),('2','2015-08-04','2015-09-06','1'),
('3','2013-11-01','2013-12-01','0'),('3','2018-01-09','2018-02-09','0')
select ID,MIN(startdate)sd,MAX(EndDate)ed,type from #table
group by ID,TYPE,YEAR(startdate),YEAR(EndDate)

this can be easily achieved by using some window-functions and CTE's. Here is the solution
DECLARE #table TABLE
(id INT,
StartDate DATE,
EndDate DATE,
[Type] INT
);
INSERT INTO #table(Id, StartDate, EndDate, [Type]) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
WITH C1 AS
(
SELECT *,
MAX(EndDate) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS PrevEnd
FROM #table
),
C2 AS
(
SELECT *,
SUM(StartFlag) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS UNBOUNDED PRECEDING) AS GroupID
FROM C1
CROSS APPLY ( VALUES(CASE WHEN StartDate <= PrevEnd THEN NULL ELSE 1 END) ) AS A(StartFlag)
)
SELECT Id, [Type], MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate
FROM C2
GROUP BY Id, [Type], GroupID;

Calculate number of cycles available between dates and arrange row value as column value

I have the vehicle information data in SQL as following order,
S.No
Vehicle_ID
status
date_on
1
1
Start
2018-05-23
2
1
Start
2021-06-15
3
1
Failed
2020-08-10
4
2
Start
2019-06-23
5
3
Start
2010-04-20
6
3
Failed
2010-05-10
7
4
Start
2011-01-20
8
4
Failed
2015-01-14
9
4
Start
2016-02-25
10
4
Failed
2019-04-10
Vehicle ID : 1
1st start date = 2018-05-23
1st failed =2020-08-10
2nd start date = 2021-06-15
Here, there is no failed date for 2nd start date so we take today's date as failed date.
Vehicle ID : 2
1st start date = 2020-08-10
Here, there is no failed date for 1st start date so we take today's date as failed date.
Based on above condition, Required result as per below,
Vehicle_ID
Start
Failed/Running
Cycle
1
2018-05-23
2020-08-10
1
1
2021-06-15
Today's date
2
2
2019-06-23
Today's date
1
3
2010-04-20
2010-05-10
1
4
2011-01-20
2015-01-14
1
4
2016-02-25
2019-04-10
2
I tried PIVOT function for switching row to column but I am straggle to assign the Cycle and date order.
Tried code shown below,
SELECT * FROM (
SELECT
S_No,
Vehicle_ID,
date_on,
status
FROM vehicle_table where Vehicle_ID is not null Group by Vehicle_ID,status,S_No,date_on
) Vehicle_Detail
PIVOT (
MAX(date_on)
FOR status
IN (
[Start],
[Failed]
)
) AS PivotTable where [status] is not null order by [Vehicle_ID]

I would use an outer apply top 1 construct here
create table vehicle_detail (S_No int, Vehicle_ID int, status varchar(6), date_on date);
insert vehicle_detail values
(1, 1, 'Start', '2018-05-23'),
(2, 1, 'Start', '2021-06-15'),
(3, 1, 'Failed', '2020-08-10'),
(4, 2, 'Start', '2019-06-23'),
(5, 3, 'Start', '2010-04-20'),
(6, 3, 'Failed', '2010-05-10'),
(7, 4, 'Start', '2011-01-20'),
(8, 4, 'Failed', '2015-01-14'),
(9, 4, 'Start', '2016-02-25'),
(10, 4, 'Failed', '2019-04-10');
select starts.vehicle_id,
start = starts.date_on,
[failed/running] = isnull(fails.date_on, cast(getdate() as date)),
cycle = row_number() over
(
partition by starts.vehicle_id
order by starts.date_on asc
)
from vehicle_detail starts
outer apply (
select top 1 date_on
from vehicle_detail v
where v.vehicle_id = starts.vehicle_id
and v.status = 'Failed'
and v.date_on >= starts.date_on
order by date_on asc
) fails
where starts.status = 'Start'
order by starts.vehicle_id,
starts.date_on;

Another approach might be a simple left join and a subquery:
DECLARE #t TABLE(
SNo INT
,Vehicle_ID int
,stat VARCHAR(10)
,date_on DATE
)
INSERT INTO #t VALUES
(1 ,1 ,'Start', '2018-05-23')
,(2 ,1 ,'Start', '2021-06-15')
,(3 ,1 ,'Failed', '2020-08-10')
,(4 ,2 ,'Start', '2019-06-23')
,(5 ,3 ,'Start', '2010-04-20')
,(6 ,3 ,'Failed', '2010-05-10')
,(7 ,4 ,'Start', '2011-01-20')
,(8 ,4 ,'Failed', '2015-01-14')
,(9 ,4 ,'Start', '2016-02-25')
,(10 ,4 ,'Failed', '2019-04-10');
WITH cte AS(
SELECT t.Vehicle_ID, t.stat, t.date_on AS StartDate, ISNULL(LEAD(DATEADD(d, -1, t.date_on)) OVER (PARTITION BY t.Vehicle_ID ORDER BY t.date_on), t.date_on) AS StartDateNext
FROM #t t
WHERE t.stat = 'Start'
)
SELECT t.Vehicle_ID
,t.StartDate
,ISNULL(te.date_on, CAST(GETDATE() AS DATE)) AS FailedRunningDate
,ROW_NUMBER() OVER (PARTITION BY t.Vehicle_ID ORDER BY t.StartDate) AS Cycle
FROM cte t
LEFT JOIN #t te ON te.Vehicle_ID = t.Vehicle_ID
AND te.stat = 'Failed'
AND te.date_on >= t.StartDate
AND te.date_on <= t.StartDateNext

Provided there's no any fail without a start, you can match start / fail by row_number()
with ordered as(
SELECT
S_No,
Vehicle_ID,
date_on,
status,
row_number() over(partition by Vehicle_ID, status order by date_on) rn
FROM vehicle_table
WHERE Vehicle_ID is not null
)
select s.Vehicle_ID, s.date_on start, s.rn cycle, coalesce(f.date_on, getdate()) [Failed/Running]
from ordered s
left join ordered f on s.Vehicle_ID = f.Vehicle_ID and s.rn = f.rn
and f.status ='Failed'
-- sanity check
and (s.date_on <= f.date_on or f.date_on is null)
where s.status ='Start'
order by s.Vehicle_ID, s.date_on, s.rn

Use ROW_NUMBER() to sort your data the way you want
WITH tSortRaw
AS (
SELECT ROW_NUMBER() OVER (
PARTITION BY vehicle_id, STATUS
ORDER BY date_on DESC
) [R],
*
FROM Vehicle_Detail
),
tMerge
AS (
SELECT tStart.vehicle_id,
tStart.date_on [Start],
IIF(tFailed.date_on IS NULL, GETDATE(), tFailed.date_on) [Failed]
FROM (
SELECT *
FROM tSortRaw
WHERE STATUS = 'Start'
) [tStart]
LEFT JOIN (
SELECT *
FROM tSortRaw
WHERE STATUS = 'Failed'
) [tFailed]
ON tStart.vehicle_id = tFailed.vehicle_id
AND tStart.R = tFailed.R
)
SELECT vehicle_id,
Start,
Failed [Failed/Running],
ROW_NUMBER() OVER (
PARTITION BY Vehicle_ID
ORDER BY Start
) [Cycle]
FROM tMerge

Merge rows if date columns are overlapping in TSQL

I have a table in the following format
Id StartDate EndDate Type
1 2012-02-18 2012-03-18 1
1 2012-03-17 2012-06-29 1
1 2012-06-27 2012-09-27 1
1 2014-08-23 2014-09-24 3
1 2014-09-23 2014-10-24 3
1 2014-10-23 2014-11-24 3
2 2015-07-04 2015-08-06 1
2 2015-08-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
I found similar questions here, but not something that could help me solve my problem. I want to merge rows that has the same Id, Type and overlapping date periods.
The result from the above table should be
Id StartDate EndDate Type
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
In another server, I was able to do it with the following restrictions and the query below:
Didn't care about the Type column, but just the Id
Had a newer version of SQL Server (2012), but now I have 2008 which the code is not compatible
SELECT Id
, MIN(StartDate) AS StartDate
, MAX(EndDate) AS EndDate
FROM (
SELECT *
, SUM(CASE WHEN a.EndDate = a.StartDate THEN 0
ELSE 1
END
) OVER (ORDER BY Id, StartDate) sm
FROM (
SELECT Id
, StartDate
, EndDate
, LAG(EndDate, 1, NULL) OVER (PARTITION BY Id ORDER BY Id, EndDate) EndDate
FROM #temptable
) a
) b
GROUP BY Id, sm
Any advice how I can
Include Type on the process
Make it work on SQL Server 2008

This approach uses an additional temp table to identify the groups of overlapping dates, and then performs a quick aggregate based on the groupings.
SELECT *, ROW_NUMBER() OVER (ORDER BY Id, Type) AS UID,
ROW_NUMBER() OVER (ORDER BY Id, Type) AS GroupId INTO #G FROM #TempTable
WHILE ##ROWCOUNT <> 0 BEGIN
UPDATE T1 SET
GroupId = T2.GroupId
FROM #G T1
INNER JOIN (
SELECT T1.UID, CASE WHEN T1.GroupId < T2.GroupId THEN T1.GroupId ELSE T2.GroupId END
FROM #G T1
LEFT OUTER JOIN #G T2
ON T1.Id = T2.Id AND T1.Type = T2.Type AND T1.GroupId <> T2.GroupId
AND T1.StartDate <= T2.EndDate AND T2.StartDate <= T1.EndDate
) T2 (UID, GroupId)
ON T1.UID = T2.UID
WHERE T1.GroupId <> T2.GroupId
END
SELECT Id, MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate, Type
FROM #G G GROUP BY GroupId, Id, Type
This returns the expected values
Id StartDate EndDate Type
----------- ---------- ---------- -----------
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0

This is 2008 compatible. A CTE really is the best way to link up all overlapping records in my opinion. The date overlap logic came from this thread: SO Date Overlap
I added extra data that's more complex to make sure that it's working as expected.
DECLARE #Data table (Id INT, StartDate DATE, EndDate DATE, Type INT)
INSERT INTO #data
SELECT 1,'2/18/2012' ,'3/18/2012', 1 UNION ALL
select 1,'3/17/2012','6/29/2012',1 UNION ALL
select 1,'6/27/2012','9/27/2012',1 UNION ALL
select 1,'8/23/2014','9/24/2014',3 UNION ALL
select 1,'9/23/2014','10/24/2014',3 UNION ALL
select 1,'10/23/2014','11/24/2014',3 UNION ALL
select 2,'7/4/2015','8/6/2015',1 UNION ALL
select 2,'8/4/2015','9/6/2015',1 UNION ALL
select 3,'11/1/2013','12/1/2013',0 UNION ALL
select 3,'1/9/2018','2/9/2018',0 UNION ALL
select 4,'1/1/2018','1/2/2018',0 UNION ALL --many non overlapping dates
select 4,'1/4/2018','1/5/2018',0 UNION ALL
select 4,'1/7/2018','1/9/2018',0 UNION ALL
select 4,'1/11/2018','1/13/2018',0 UNION ALL
select 4,'2/7/2018','2/8/2018',0 UNION ALL --many overlapping dates
select 4,'2/8/2018','2/9/2018',0 UNION ALL
select 4,'2/9/2018','2/10/2018',0 UNION all
select 4,'2/10/2018','2/11/2018',0 UNION all
select 4,'2/11/2018','2/12/2018',0 UNION all
select 4,'2/12/2018','2/13/2018',0 UNION all
select 4,'3/7/2018','3/8/2018',0 UNION ALL --many overlapping dates, second instance of id 4, type 0
select 4,'3/8/2018','3/9/2018',0 UNION ALL
select 4,'3/9/2018','3/10/2018',0 UNION all
select 4,'3/10/2018','3/11/2018',0 UNION all
select 4,'3/11/2018','3/12/2018',0 UNION all
select 4,'3/12/2018','3/13/2018',0
;
WITH cdata
AS (SELECT Id,
d.Type,
d.StartDate,
d.EndDate,
CurrentStart = d.StartDate
FROM #Data d
WHERE
NOT EXISTS (
SELECT * FROM #Data x WHERE x.StartDate < d.StartDate AND d.StartDate <= x.EndDate AND d.EndDate >= x.StartDate AND d.Id = x.Id AND d.Type = x.Type --get first records for overlapping ranges
)
UNION ALL
SELECT d.Id,
d.Type,
StartDate = CASE WHEN d2.StartDate < d.StartDate THEN d2.StartDate ELSE d.StartDate END,
EndDate = CASE WHEN d2.EndDate > d.EndDate THEN d2.EndDate ELSE d.EndDate END,
CurrentStart = d2.StartDate
FROM cdata d
INNER JOIN #Data d2
ON (
d.StartDate <= d2.EndDate
AND d.EndDate >= d2.StartDate
)
AND d2.Id = d.Id
AND d2.Type = d.Type
AND d2.StartDate > d.CurrentStart)
SELECT cdata.Id, cdata.Type, cdata.StartDate, EndDate = MAX(cdata.EndDate)
FROM cdata
GROUP BY cdata.Id, cdata.Type, cdata.StartDate

This looks like a Packing Intervals problem. See the post by Itzik Ben-Gan for all the details and what indexes he recommends to make it work efficiently. He presents a solution without recursive CTE.
Two notes.
The query below assumes that intervals are [closed; open), i.e. StartDate is inclusive and EndDate is exclusive. This way to represent such data is often the most convenient. (in the same sense as having arrays as zero-based instead of 1-based is usually more convenient in programming languages).
I added a RowID column to have unambiguous sorting.
Sample data
DECLARE #T TABLE
(
RowID int IDENTITY,
id int,
StartDate date,
EndDate date,
tp int
);
INSERT INTO #T(Id, StartDate, EndDate, tp) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
-- Make EndDate an opened interval, make it exclusive
-- [Start; End)
UPDATE #T
SET EndDate = DATEADD(day, 1, EndDate)
;
Recommended indexes
-- indexes to support solutions
CREATE UNIQUE INDEX idx_start_id ON T(id, tp, StartDate, RowID);
CREATE UNIQUE INDEX idx_end_id ON T(id, tp, EndDate, RowID);
Query
Read the Itzik's post to understand what is going on. He has nice illustrations there. In short, each timestamp (start or end) is treated as an event. Each event has a + or - type. Each time we encounter a + event (some interval starts) we increase the running counter. Each time we encounter a - event (some interval ends) we decrease the running counter. When the running counter is 0 it means that the streak of overlapping intervals is over.
I took Itzik's query as is and simply changed the column names to match your names.
WITH C1 AS
-- let e = end ordinals, let s = start ordinals
(
SELECT
RowID, id, tp, StartDate AS ts, +1 AS EventType,
NULL AS e,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY StartDate, RowID) AS s
FROM #T
UNION ALL
SELECT
RowID, id, tp, EndDate AS ts, -1 AS EventType,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY EndDate, RowID) AS e,
NULL AS s
FROM #T
),
C2 AS
-- let se = start or end ordinal, namely, how many events (start or end) happened so far
(
SELECT C1.*,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts, EventType DESC, RowID) AS se
FROM C1
),
C3 AS
-- For start events, the expression s - (se - s) - 1 represents how many sessions were active
-- just before the current (hence - 1)
--
-- For end events, the expression (se - e) - e represents how many sessions are active
-- right after this one
--
-- The above two expressions are 0 exactly when a group of packed intervals
-- either starts or ends, respectively
--
-- After filtering only events when a group of packed intervals either starts or ends,
-- group each pair of adjacent start/end events
(
SELECT id, tp, ts,
((ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0
)
SELECT id, tp, MIN(ts) AS StartDate, DATEADD(day, -1, MAX(ts)) AS EndDate
FROM C3
GROUP BY id, tp, grpnum
ORDER BY id, tp, StartDate;
Result
+----+----+------------+------------+
| id | tp | StartDate | EndDate |
+----+----+------------+------------+
| 1 | 1 | 2012-02-18 | 2012-09-27 |
| 1 | 3 | 2014-08-23 | 2014-11-24 |
| 2 | 1 | 2015-07-04 | 2015-09-06 |
| 3 | 0 | 2013-11-01 | 2013-12-01 |
| 3 | 0 | 2018-01-09 | 2018-02-09 |
+----+----+------------+------------+

create table #table
(Id int,StartDate date, EndDate date, Type int)
insert into #table
values
('1','2012-02-18','2012-03-18','1'),('1','2012-03-19','2012-06-19','1'),
('1','2012-06-27','2012-09-27','1'),('1','2014-08-23','2014-09-24','3'),
('1','2014-09-23','2014-10-24','3'),('1','2014-10-23','2014-11-24','3'),
('2','2015-07-04','2015-08-06','1'),('2','2015-08-04','2015-09-06','1'),
('3','2013-11-01','2013-12-01','0'),('3','2018-01-09','2018-02-09','0')
select ID,MIN(startdate)sd,MAX(EndDate)ed,type from #table
group by ID,TYPE,YEAR(startdate),YEAR(EndDate)

this can be easily achieved by using some window-functions and CTE's. Here is the solution
DECLARE #table TABLE
(id INT,
StartDate DATE,
EndDate DATE,
[Type] INT
);
INSERT INTO #table(Id, StartDate, EndDate, [Type]) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
WITH C1 AS
(
SELECT *,
MAX(EndDate) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS PrevEnd
FROM #table
),
C2 AS
(
SELECT *,
SUM(StartFlag) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS UNBOUNDED PRECEDING) AS GroupID
FROM C1
CROSS APPLY ( VALUES(CASE WHEN StartDate <= PrevEnd THEN NULL ELSE 1 END) ) AS A(StartFlag)
)
SELECT Id, [Type], MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate
FROM C2
GROUP BY Id, [Type], GroupID;

Getting a date + 3 days ( using specific date) SQL

I'm trying to get the next available day after a result set.
This is the query I'm using but is totally wrong:
SELECT DateID = ROW_NUMBER() over (order by B.Date_Key) , B.ClosingDate, C.dates AS RecDay
FROM DIM_DATE B JOIN [dbo].[WorkDay_Calendar] C on C.dates = DATEADD(DAY,3, B.ClosingDate) WHERE YEAR(B.ClosingDate) >= '2018'
AND C.[Sentday] = 0 and C.[RecDay] = 0
This query is
retrieving the RecDay when Closingdate +3 days = to Sentday AND What I want is
when Closingdate + 3(Sentday) then pick the next RecDay,
something like C.dates = DATEADD(DAY,3(Sentday), B.ClosingDate).
This is how are looking my tables:
Dim_Date table
WorkDay_Calendar Table
Notice that when Sentday and RecDay are valid when = 0 if 1 is not valid because is a weekend or holiday.
Based on this information for example if I pick from the Dim_Date table 2018-02-02 as one of the Closingdate then the RecDay should be:
DateID RecDay
------------------------
1 2018-02-07
And with the current query is retrieving this which is totally wrong:
DateID RecDay
-----------------------
1 2018-02-05
Graphic explanation below and please follow the 0 in Bold:
More output examples:
Using the dates below as ClosingDate:
Date_Key ClosingDate:
38284 2018-07-24
38287 2018-01-10
38290 2018-03-08
38291 2018-07-13
38293 2018-02-08
Using the same order of the ClosingDates these should be the outputs, I incluided the ClosingDate column so you can follow the order:
OUTPUTS:
DateID ClosingDate RecDay (output)
1 2018-07-24 2018-07-30
2 2018-01-10 2018-01-16
3 2018-03-08 2018-03-13
4 2018-07-13 2018-07-18
5 2018-02-08 2018-02-13

I'm not sure If if followed you correctly, but based on your condition, you want to check the date dimension table based on calendar table. If ClosingDate + 3 days is equal to SentDay then you need to get the ReceiveDay. if that's what you need. then try this out :
UPDATED
SELECT
ROW_NUMBER() OVER (ORDER BY Date_key) DateID,
ClosingDateOLD,
C.Dates
FROM (
SELECT
Date_key,
ClosingDate AS ClosingDateOLD,
CASE
WHEN DATENAME(dw, DATEADD(DAY, 4, ClosingDate)) IN ('Saturday') THEN DATEADD(DAY, 6, ClosingDate)
WHEN DATENAME(dw, DATEADD(DAY, 4, ClosingDate)) IN ('Sunday') THEN DATEADD(DAY, 5, ClosingDate)
ELSE DATEADD(DAY, 5, ClosingDate)
END AS ClosingDate
FROM
#DIM_DATE
WHERE
ClosingDate IS NOT NULL
) D
JOIN #Calendar C ON C.Dates = ClosingDate

As I understand the requirements it would be something like this.
I am posting a full working example in case somebody wants to take a crack at this.
create table #DIM_DATE
(
DateKey int
, ClosingDate date
)
insert #DIM_DATE values
(1, NULL)
, (2, '2018-01-02')
, (3, NULL)
, (4, NULL)
create table #CalendarTable
(
ID int
, SentDay date
, ReceiveDay date
)
insert #CalendarTable values
(1, '2018-01-03', '2018-01-02')
, (2, '2018-01-04', '2018-01-03')
, (3, '2018-01-05', '2018-01-08')
SELECT DateID = ROW_NUMBER() over (order by d.DateKey)
, ct.ReceiveDay
FROM #DIM_DATE d
join #CalendarTable ct on ct.SentDay = dateadd(day, 3, d.ClosingDate)
drop table #DIM_DATE
drop table #CalendarTable

How to find the previous and current row difference

This table has 2 rows for each id, not more than 2 rows for the id
id date amt
-------------------
001 01/01/2012 100
001 01/12/2011 200
002 01/01/2013 100
002 01/12/2012 200
003 12/08/2012 500
003 31/12/2011 200
...
I want to display the max(date) of row for each id with current and previous row amt differnce
Expected Output
id date amt
------------------
001 01/01/2012 100
002 01/01/2013 100
003 12/08/2012 300
...
How to do this?

You can try with this:
DECLARE #tbl TABLE (id VARCHAR(10), date DATE, amt INT)
INSERT #tbl VALUES
('001', CONVERT(DATE, '01/01/2012', 103), 100),
('001', CONVERT(DATE, '01/12/2011', 103), 200),
('002', CONVERT(DATE, '01/01/2013', 103), 100),
('002', CONVERT(DATE, '01/12/2012', 103), 200),
('003', CONVERT(DATE, '12/08/2012', 103), 500),
('003', CONVERT(DATE, '31/12/2011', 103), 200),
-- Added to display the one id - one row situation
('004', CONVERT(DATE, '14/02/2011', 103), 999),
('000', CONVERT(DATE, '02/02/2012', 103), 100),
('100', CONVERT(DATE, '09/09/2011', 103), 999)
;WITH a AS
(
SELECT id
, amt
, date
, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date ASC) num
, COUNT(*) OVER (PARTITION BY id) cnt
FROM #tbl
)
SELECT t1.id
, t1.date
, ABS(t1.amt - t2.amt)
FROM a t1
JOIN a t2 ON (t1.id = t2.id AND t1.num = t2.num + 1)
OR (t1.id = t2.id AND t2.cnt = 1)
Difference of first and second row for ids 1 and 2 is -100 so I added the ABS function that returns absolute value.
I also added sample data to show how the situation is handled when there is only one record per id.

Try this:
WITH CTE
AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Date DESC) AS Rownum
FROM Table1
), CTE2
AS
(
SELECT
id,
date,
amt, ROW_NUMBER() OVER(ORDER BY Id) AS rank
FROM CTE c1
WHERE rownum = 1
)
SELECT
id,
date,
amt - ISNULL((SELECT c2.amt
FROM CTE2 c2
WHERE c1.rank - c2.rank = 1), 0)
FROM CTE2 c1;
SQL Fiddle Demo

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get all overlapping date ranges when all overlap at the same time - sql

Related

Loop within id and combine dates between rows in SQL [duplicate]

Calculate number of cycles available between dates and arrange row value as column value

Merge rows if date columns are overlapping in TSQL

Getting a date + 3 days ( using specific date) SQL

How to find the previous and current row difference

Categories

Resources