How to Count Consecutive Enteries for Users in SQL Server - sql

Given the table with the following columns:
UserId int,
DateEntered DateTime
And the data:
1 | 2016-02-24
1 | 2016-02-23
1 | 2016-02-22
1 | 2016-02-20
2 | 2016-02-24
2 | 2016-02-14
3 | 2016-02-23
3 | 2016-02-22
3 | 2016-02-21
2 | 2016-01-30
2 | 2016-01-29
2 | 2016-01-28
2 | 2016-01-27
2 | 2016-01-26
2 | 2016-01-25
I would like to return the latest streak of entries for each user and actually for a specific user from today.
Case 1
Today = 2016-02-24
userid = 1
Return value = 3 // User missed Day 21 so streak is from 22-24
Case 2
Today = 2016-02-24
userid = 2
Return value = 1 // Even though user has a longer streak from 1/25 – 1/30, it is not his latest streak
Case 3
Today = 2016-02-24
userid = 3
Return value = 0 // User missed today. Therefore, he has no consecutive days counting today
Any ideas on how this can be done in T-SQL?
Update 1:
Based on the response, I've modified the example query given as follows: Yet, the value returned in the second column is always only either 1 or 0, even though the data shows that there are more consecutive days present.
select
a.UserId,
sum(case when dayseq = '2016-02-01' then 1 else 0 end)
from
(select
t.*,
dateadd(day, 1 - row_number() over (partition by UserId order by DateCreated), DateCreated) as dayseq
from
fa.User_Journal t) a
where
DateCreated <= '2016-02-01'
group by
a.UserId;
Update 2
The following query illustrates the problem further. The solution provided below almost resolves this.
In this query, I illustrate what "should" happen given the #EndDate values. By un-commenting the desires assignment to #EndDate, you can see that the query does not return the desired result according to the cases provided.
Any assistance would be greatly appreciated.
DECLARE #Temp TABLE
(
UserId nvarchar(128),
DateCreated Date
)
INSERT INTO #TEMP (UserId, DateCreated) values ('uid123', '2016-01-19');
INSERT INTO #TEMP (UserId, DateCreated) values ('uid123', '2016-01-24');
INSERT INTO #TEMP (UserId, DateCreated) values ('uid123', '2016-01-28');
INSERT INTO #TEMP (UserId, DateCreated) values ('uid123', '2016-01-29');
INSERT INTO #TEMP (UserId, DateCreated) values ('uid123', '2016-02-01');
INSERT INTO #TEMP (UserId, DateCreated) values ('uid123', '2016-02-02');
INSERT INTO #TEMP (UserId, DateCreated) values ('uid123', '2016-02-03');
INSERT INTO #TEMP (UserId, DateCreated) values ('uid123', '2016-02-07');
INSERT INTO #TEMP (UserId, DateCreated) values ('uid123', '2016-01-19');
DECLARE #EndDate Date
SET #EndDate = '2016-02-03' -- Should return 5, as they are 5 consecutive days since #EndDate
--SET #EndDate = '2016-02-02' -- Should return 4, as they are 4 consecutive days since #EndDate
--SET #EndDate = '2016-02-19' -- Should return 1, as they are 4 consecutive days since #EndDate
--SET #EndDate = '2016-02-18' -- Should return 0, as they are 0 consecutive days since #EndDate
SELECT a.UserId,
SUM(CASE WHEN dayseq <= #EndDate then 1 else 0 end) -1 as FitStreak
from (select t.*,
dateadd(day,
1 - row_number() over (partition by UserId order by DateCreated),
DateCreated) as dayseq
from #Temp t
) a
where DateCreated <= #EndDate
group by a.UserId;

I think the following does what you want:
select t.UserId,
sum(case when dayseq = '2014-02-24' then 1 else 0 end)
from (select t.*,
dateadd(day,
row_number() over (partition by UserId order by DateEntered desc) - 1,
DateEntered) as dayseq
from t
) t
where DateEntered <= '2014-02-24'
group by t.UserId;
This subtracts a sequential number from each date, starting with 0. It then compares the result to the "current date" -- and voila, the value is the "current date" for the initial sequence.
The logic in the case statement can also be in the where. However, the results will then filter out users with no days.
Note two important assumptions:
There is no time component on the date
There are no duplicate dates
Both of these can be easily handled, but the resulting query is a bit more complicated.
EDIT:
With a time component, truncate the value:
select t.UserId,
sum(case when dayseq = '2014-02-24' then 1 else 0 end)
from (select t.*,
dateadd(day,
row_number() over (partition by UserId order by DateEntered desc) - 1,
cast(DateEntered as date)) as dayseq
from t
) t
where DateEntered < dateadd(day, 1, '2014-02-24')
group by t.UserId;
Here is a SQL Fiddle illustrating the fixed code.

Related

Loop within id and combine dates between rows in SQL [duplicate]

I have a table in the following format
Id StartDate EndDate Type
1 2012-02-18 2012-03-18 1
1 2012-03-17 2012-06-29 1
1 2012-06-27 2012-09-27 1
1 2014-08-23 2014-09-24 3
1 2014-09-23 2014-10-24 3
1 2014-10-23 2014-11-24 3
2 2015-07-04 2015-08-06 1
2 2015-08-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
I found similar questions here, but not something that could help me solve my problem. I want to merge rows that has the same Id, Type and overlapping date periods.
The result from the above table should be
Id StartDate EndDate Type
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
In another server, I was able to do it with the following restrictions and the query below:
Didn't care about the Type column, but just the Id
Had a newer version of SQL Server (2012), but now I have 2008 which the code is not compatible
SELECT Id
, MIN(StartDate) AS StartDate
, MAX(EndDate) AS EndDate
FROM (
SELECT *
, SUM(CASE WHEN a.EndDate = a.StartDate THEN 0
ELSE 1
END
) OVER (ORDER BY Id, StartDate) sm
FROM (
SELECT Id
, StartDate
, EndDate
, LAG(EndDate, 1, NULL) OVER (PARTITION BY Id ORDER BY Id, EndDate) EndDate
FROM #temptable
) a
) b
GROUP BY Id, sm
Any advice how I can
Include Type on the process
Make it work on SQL Server 2008
This approach uses an additional temp table to identify the groups of overlapping dates, and then performs a quick aggregate based on the groupings.
SELECT *, ROW_NUMBER() OVER (ORDER BY Id, Type) AS UID,
ROW_NUMBER() OVER (ORDER BY Id, Type) AS GroupId INTO #G FROM #TempTable
WHILE ##ROWCOUNT <> 0 BEGIN
UPDATE T1 SET
GroupId = T2.GroupId
FROM #G T1
INNER JOIN (
SELECT T1.UID, CASE WHEN T1.GroupId < T2.GroupId THEN T1.GroupId ELSE T2.GroupId END
FROM #G T1
LEFT OUTER JOIN #G T2
ON T1.Id = T2.Id AND T1.Type = T2.Type AND T1.GroupId <> T2.GroupId
AND T1.StartDate <= T2.EndDate AND T2.StartDate <= T1.EndDate
) T2 (UID, GroupId)
ON T1.UID = T2.UID
WHERE T1.GroupId <> T2.GroupId
END
SELECT Id, MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate, Type
FROM #G G GROUP BY GroupId, Id, Type
This returns the expected values
Id StartDate EndDate Type
----------- ---------- ---------- -----------
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
This is 2008 compatible. A CTE really is the best way to link up all overlapping records in my opinion. The date overlap logic came from this thread: SO Date Overlap
I added extra data that's more complex to make sure that it's working as expected.
DECLARE #Data table (Id INT, StartDate DATE, EndDate DATE, Type INT)
INSERT INTO #data
SELECT 1,'2/18/2012' ,'3/18/2012', 1 UNION ALL
select 1,'3/17/2012','6/29/2012',1 UNION ALL
select 1,'6/27/2012','9/27/2012',1 UNION ALL
select 1,'8/23/2014','9/24/2014',3 UNION ALL
select 1,'9/23/2014','10/24/2014',3 UNION ALL
select 1,'10/23/2014','11/24/2014',3 UNION ALL
select 2,'7/4/2015','8/6/2015',1 UNION ALL
select 2,'8/4/2015','9/6/2015',1 UNION ALL
select 3,'11/1/2013','12/1/2013',0 UNION ALL
select 3,'1/9/2018','2/9/2018',0 UNION ALL
select 4,'1/1/2018','1/2/2018',0 UNION ALL --many non overlapping dates
select 4,'1/4/2018','1/5/2018',0 UNION ALL
select 4,'1/7/2018','1/9/2018',0 UNION ALL
select 4,'1/11/2018','1/13/2018',0 UNION ALL
select 4,'2/7/2018','2/8/2018',0 UNION ALL --many overlapping dates
select 4,'2/8/2018','2/9/2018',0 UNION ALL
select 4,'2/9/2018','2/10/2018',0 UNION all
select 4,'2/10/2018','2/11/2018',0 UNION all
select 4,'2/11/2018','2/12/2018',0 UNION all
select 4,'2/12/2018','2/13/2018',0 UNION all
select 4,'3/7/2018','3/8/2018',0 UNION ALL --many overlapping dates, second instance of id 4, type 0
select 4,'3/8/2018','3/9/2018',0 UNION ALL
select 4,'3/9/2018','3/10/2018',0 UNION all
select 4,'3/10/2018','3/11/2018',0 UNION all
select 4,'3/11/2018','3/12/2018',0 UNION all
select 4,'3/12/2018','3/13/2018',0
;
WITH cdata
AS (SELECT Id,
d.Type,
d.StartDate,
d.EndDate,
CurrentStart = d.StartDate
FROM #Data d
WHERE
NOT EXISTS (
SELECT * FROM #Data x WHERE x.StartDate < d.StartDate AND d.StartDate <= x.EndDate AND d.EndDate >= x.StartDate AND d.Id = x.Id AND d.Type = x.Type --get first records for overlapping ranges
)
UNION ALL
SELECT d.Id,
d.Type,
StartDate = CASE WHEN d2.StartDate < d.StartDate THEN d2.StartDate ELSE d.StartDate END,
EndDate = CASE WHEN d2.EndDate > d.EndDate THEN d2.EndDate ELSE d.EndDate END,
CurrentStart = d2.StartDate
FROM cdata d
INNER JOIN #Data d2
ON (
d.StartDate <= d2.EndDate
AND d.EndDate >= d2.StartDate
)
AND d2.Id = d.Id
AND d2.Type = d.Type
AND d2.StartDate > d.CurrentStart)
SELECT cdata.Id, cdata.Type, cdata.StartDate, EndDate = MAX(cdata.EndDate)
FROM cdata
GROUP BY cdata.Id, cdata.Type, cdata.StartDate
This looks like a Packing Intervals problem. See the post by Itzik Ben-Gan for all the details and what indexes he recommends to make it work efficiently. He presents a solution without recursive CTE.
Two notes.
The query below assumes that intervals are [closed; open), i.e. StartDate is inclusive and EndDate is exclusive. This way to represent such data is often the most convenient. (in the same sense as having arrays as zero-based instead of 1-based is usually more convenient in programming languages).
I added a RowID column to have unambiguous sorting.
Sample data
DECLARE #T TABLE
(
RowID int IDENTITY,
id int,
StartDate date,
EndDate date,
tp int
);
INSERT INTO #T(Id, StartDate, EndDate, tp) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
-- Make EndDate an opened interval, make it exclusive
-- [Start; End)
UPDATE #T
SET EndDate = DATEADD(day, 1, EndDate)
;
Recommended indexes
-- indexes to support solutions
CREATE UNIQUE INDEX idx_start_id ON T(id, tp, StartDate, RowID);
CREATE UNIQUE INDEX idx_end_id ON T(id, tp, EndDate, RowID);
Query
Read the Itzik's post to understand what is going on. He has nice illustrations there. In short, each timestamp (start or end) is treated as an event. Each event has a + or - type. Each time we encounter a + event (some interval starts) we increase the running counter. Each time we encounter a - event (some interval ends) we decrease the running counter. When the running counter is 0 it means that the streak of overlapping intervals is over.
I took Itzik's query as is and simply changed the column names to match your names.
WITH C1 AS
-- let e = end ordinals, let s = start ordinals
(
SELECT
RowID, id, tp, StartDate AS ts, +1 AS EventType,
NULL AS e,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY StartDate, RowID) AS s
FROM #T
UNION ALL
SELECT
RowID, id, tp, EndDate AS ts, -1 AS EventType,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY EndDate, RowID) AS e,
NULL AS s
FROM #T
),
C2 AS
-- let se = start or end ordinal, namely, how many events (start or end) happened so far
(
SELECT C1.*,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts, EventType DESC, RowID) AS se
FROM C1
),
C3 AS
-- For start events, the expression s - (se - s) - 1 represents how many sessions were active
-- just before the current (hence - 1)
--
-- For end events, the expression (se - e) - e represents how many sessions are active
-- right after this one
--
-- The above two expressions are 0 exactly when a group of packed intervals
-- either starts or ends, respectively
--
-- After filtering only events when a group of packed intervals either starts or ends,
-- group each pair of adjacent start/end events
(
SELECT id, tp, ts,
((ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0
)
SELECT id, tp, MIN(ts) AS StartDate, DATEADD(day, -1, MAX(ts)) AS EndDate
FROM C3
GROUP BY id, tp, grpnum
ORDER BY id, tp, StartDate;
Result
+----+----+------------+------------+
| id | tp | StartDate | EndDate |
+----+----+------------+------------+
| 1 | 1 | 2012-02-18 | 2012-09-27 |
| 1 | 3 | 2014-08-23 | 2014-11-24 |
| 2 | 1 | 2015-07-04 | 2015-09-06 |
| 3 | 0 | 2013-11-01 | 2013-12-01 |
| 3 | 0 | 2018-01-09 | 2018-02-09 |
+----+----+------------+------------+
create table #table
(Id int,StartDate date, EndDate date, Type int)
insert into #table
values
('1','2012-02-18','2012-03-18','1'),('1','2012-03-19','2012-06-19','1'),
('1','2012-06-27','2012-09-27','1'),('1','2014-08-23','2014-09-24','3'),
('1','2014-09-23','2014-10-24','3'),('1','2014-10-23','2014-11-24','3'),
('2','2015-07-04','2015-08-06','1'),('2','2015-08-04','2015-09-06','1'),
('3','2013-11-01','2013-12-01','0'),('3','2018-01-09','2018-02-09','0')
select ID,MIN(startdate)sd,MAX(EndDate)ed,type from #table
group by ID,TYPE,YEAR(startdate),YEAR(EndDate)
this can be easily achieved by using some window-functions and CTE's. Here is the solution
DECLARE #table TABLE
(id INT,
StartDate DATE,
EndDate DATE,
[Type] INT
);
INSERT INTO #table(Id, StartDate, EndDate, [Type]) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
WITH C1 AS
(
SELECT *,
MAX(EndDate) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS PrevEnd
FROM #table
),
C2 AS
(
SELECT *,
SUM(StartFlag) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS UNBOUNDED PRECEDING) AS GroupID
FROM C1
CROSS APPLY ( VALUES(CASE WHEN StartDate <= PrevEnd THEN NULL ELSE 1 END) ) AS A(StartFlag)
)
SELECT Id, [Type], MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate
FROM C2
GROUP BY Id, [Type], GroupID;

Finding correct date pair and eliminate the overlapped one in T-SQL

I have a Dates like startdate as one column and Enddate as another column. I need to find eliminate Continuous date ranges in data in sQL.I need to find the overlapped items and i need to delete.I already using one code to find Overlap items.And i am giving startdate and enddate as parameter.
Code i am using to find overlap
Select * from #t
where
((cast(#StartDate as datetime2)>=StartDate and cast(#EndDate as datetime2)<=EndDate)
OR (StartDate>= cast(#StartDate as datetime2) and EndDate<= cast(#EndDate as datetime2))
OR (cast(#StartDate as datetime2)>=StartDate AND cast(#StartDate as datetime2)<=EndDate)
OR (cast(#EndDate as datetime2)>=StartDate AND cast(#EndDate as datetime2)<=EndDate))
Above query is ok to find normal overlap like
Id
Startdate
Enddate
1
01/01/2020
01/11/2020
2
01/01/2020
01/03/2021
In above condition i will delete one data and i will keep other one
But it fails in below type of data example.When run for below type of query 1 id is overlapped with 2 and 2 is overlapped with both 1 and 3.So it show both 1 and 2 to delete.but in my case is not to delete 1 and 3.only 2 need to be deleted.since 2 is overlapped between both data and 1& 3 is already in good date periods
For example
Id
Startdate
Enddate
1
01/01/2020
01/11/2020
2
01/01/2020
01/03/2021
3
02/11/2020
05/04/2022
In above example we have three pair of dates and id 1 and 3 are in correct interval and 2 is overlapped between both id. I need to find overlapped one or non overlapped items. Any case is ok for me to find the result.
My Expected Result is
Id
Startdate
Enddate
2
01/01/2020
01/03/2021
Another example is
Id
Startdate
Enddate
1
01/01/2020
01/11/2020
2
02/11/2020
06/05/2022
3
02/11/2020
05/04/2022
Above if you see 1 and 2 is in correct date periods but id 3 is overlapped with 2 id.Now i want to find only that overlapped result and i don't need other data.
Another example is
Id
Startdate
Enddate
3
02/11/2020
05/04/2022
I used second set of data, But this should work for first set of data as well. But I have a doubt on your first record set expected output. If you can clear it up, i can check again,
Create table OverlapData_1
(
id int
, Startdate date
, EndDate date
)
insert into OverlapData_1 values(1, '01/01/2020','01/11/2020')
insert into OverlapData_1 values(2, '01/01/2020','01/03/2021')
insert into OverlapData_1 values(3, '02/11/2020','05/04/2022')
SELECT A.[id]
,A.Startdate
,A.EndDate FROM
(
SELECT
CASE WHEN Startdate between LAG(StartDate) OVER ( order by id) and LAG(EndDate) OVER ( order by id) THEN 1 else 0 end as [status_1]
, CASE WHEN EndDate between LAG(StartDate) OVER ( order by id) and LAG(EndDate) OVER ( order by id) THEN 1 else 0 end as [status_2]
, CASE WHEN StartDate between LEAD(StartDate) OVER ( order by id) and LEAD(EndDate) OVER ( order by id) THEN 1 else 0 end as [status_3]
, CASE WHEN EndDate between LEAD(StartDate) OVER ( order by id) and LEAD(EndDate) OVER ( order by id) THEN 1 else 0 end as [status_4]
,*
FROM OverlapData_1
) AS A
WHERE (A.status_1 = 1 AND A.status_2 = 1)
OR (A.status_1 = 1 AND A.status_4 = 1)
Create table OverlapData_2
(
id int
, Startdate date
, EndDate date
)
insert into OverlapData_2 values(1, '01/01/2020','01/11/2020')
insert into OverlapData_2 values(2, '01/01/2020','06/05/2022')
insert into OverlapData_2 values(3, '02/11/2020','05/04/2022')
SELECT A.[id]
,A.Startdate
,A.EndDate FROM
(
SELECT
CASE WHEN Startdate between LAG(StartDate) OVER ( order by id) and LAG(EndDate) OVER ( order by id) THEN 1 else 0 end as [status_1]
, CASE WHEN EndDate between LAG(StartDate) OVER ( order by id) and LAG(EndDate) OVER ( order by id) THEN 1 else 0 end as [status_2]
, CASE WHEN StartDate between LEAD(StartDate) OVER ( order by id) and LEAD(EndDate) OVER ( order by id) THEN 1 else 0 end as [status_3]
, CASE WHEN EndDate between LEAD(StartDate) OVER ( order by id) and LEAD(EndDate) OVER ( order by id) THEN 1 else 0 end as [status_4]
,*
FROM OverlapData_2
) AS A
WHERE (A.status_1 = 1 AND A.status_2 = 1)
OR (A.status_1 = 1 AND A.status_4 = 1)

Group rows by dense_rank() and loop through each sub-group and compare another column in next row of that sub group?

I have tried the following in LINQPad:
create table users
(
id int not null,
startdate datetime not null,
enddate datetime not null
)
go
insert into users(id, startdate, enddate) values(1, '01/01/2000', '01/02/2000')
insert into users(id, startdate, enddate) values(1, '01/03/2000', '01/04/2000')
insert into users(id, startdate, enddate) values(2, '01/01/2000', '01/02/2000')
insert into users(id, startdate, enddate) values(2, '01/03/2000', '01/04/2000')
insert into users(id, startdate, enddate) values(2, '01/05/2000', '01/06/2000')
insert into users(id, startdate, enddate) values(3, '01/01/2000', '01/02/2000')
insert into users(id, startdate, enddate) values(3, '01/03/2000', '01/04/2000')
insert into users(id, startdate, enddate) values(3, '01/06/2000', '01/07/2000')
insert into users(id, startdate, enddate) values(4, '01/01/2000', '01/02/2000')
go
select * from users
go
// This query gave the result seen in the image
select id, startdate, enddate, rownum = dense_rank() over(partition by id order by enddate) from users
I want to write a query which will return only the IDs 1 and 2 (not 3 and 4) because:
ID 1 - has more than 1 rows and startdate of its rownum 2 is 1 day
ahead of enddate of its rownum 1
ID 2 - has more than 1 rows and
startdate of its rownum n + 1 is 1 day ahead of enddate of its rownum
n
ID 3 - THOUGH has more than 1 rows, startdate of its rownum 3 is
NOT 1 day ahead (but 2 days) of enddate of its rownum 2. Hence, it is
not qualified
ID 4 - DOES NOT HAVE more than 1 rows. Hence, it is not
qualified
Could you let me know how to get this result please?
You could use window function lag() to recover the previous enddate, then aggregation and filter in the having clause:
select id
from (
select
t.*,
lag(enddate) over(partition by id order by enddate) lag_enddate
from users t
) t
group by id
having
count(*) > 1
and max(case
when lag_enddate is null or startdate = dateadd(day, 1, lag_enddate)
then 0 else 1
end) = 0
Demo on DB Fiddle:
| id |
| -: |
| 1 |
| 2 |
In archaic versions of SQL Server, that do not support window functions, you can emulate lag() with a correlated subquery:
select id
from (
select
t.*,
(select max(enddate) from users t1 where t1.id = t.id and t1.enddate < t.enddate) lag_enddate
from users t
) t
group by id
having
count(*) > 1
and max(case when lag_enddate is null or startdate = dateadd(day, 1, lag_enddate) then 0 else 1 end) = 0
Demo on DB Fiddle

Merge rows if date columns are overlapping in TSQL

I have a table in the following format
Id StartDate EndDate Type
1 2012-02-18 2012-03-18 1
1 2012-03-17 2012-06-29 1
1 2012-06-27 2012-09-27 1
1 2014-08-23 2014-09-24 3
1 2014-09-23 2014-10-24 3
1 2014-10-23 2014-11-24 3
2 2015-07-04 2015-08-06 1
2 2015-08-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
I found similar questions here, but not something that could help me solve my problem. I want to merge rows that has the same Id, Type and overlapping date periods.
The result from the above table should be
Id StartDate EndDate Type
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
In another server, I was able to do it with the following restrictions and the query below:
Didn't care about the Type column, but just the Id
Had a newer version of SQL Server (2012), but now I have 2008 which the code is not compatible
SELECT Id
, MIN(StartDate) AS StartDate
, MAX(EndDate) AS EndDate
FROM (
SELECT *
, SUM(CASE WHEN a.EndDate = a.StartDate THEN 0
ELSE 1
END
) OVER (ORDER BY Id, StartDate) sm
FROM (
SELECT Id
, StartDate
, EndDate
, LAG(EndDate, 1, NULL) OVER (PARTITION BY Id ORDER BY Id, EndDate) EndDate
FROM #temptable
) a
) b
GROUP BY Id, sm
Any advice how I can
Include Type on the process
Make it work on SQL Server 2008
This approach uses an additional temp table to identify the groups of overlapping dates, and then performs a quick aggregate based on the groupings.
SELECT *, ROW_NUMBER() OVER (ORDER BY Id, Type) AS UID,
ROW_NUMBER() OVER (ORDER BY Id, Type) AS GroupId INTO #G FROM #TempTable
WHILE ##ROWCOUNT <> 0 BEGIN
UPDATE T1 SET
GroupId = T2.GroupId
FROM #G T1
INNER JOIN (
SELECT T1.UID, CASE WHEN T1.GroupId < T2.GroupId THEN T1.GroupId ELSE T2.GroupId END
FROM #G T1
LEFT OUTER JOIN #G T2
ON T1.Id = T2.Id AND T1.Type = T2.Type AND T1.GroupId <> T2.GroupId
AND T1.StartDate <= T2.EndDate AND T2.StartDate <= T1.EndDate
) T2 (UID, GroupId)
ON T1.UID = T2.UID
WHERE T1.GroupId <> T2.GroupId
END
SELECT Id, MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate, Type
FROM #G G GROUP BY GroupId, Id, Type
This returns the expected values
Id StartDate EndDate Type
----------- ---------- ---------- -----------
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
This is 2008 compatible. A CTE really is the best way to link up all overlapping records in my opinion. The date overlap logic came from this thread: SO Date Overlap
I added extra data that's more complex to make sure that it's working as expected.
DECLARE #Data table (Id INT, StartDate DATE, EndDate DATE, Type INT)
INSERT INTO #data
SELECT 1,'2/18/2012' ,'3/18/2012', 1 UNION ALL
select 1,'3/17/2012','6/29/2012',1 UNION ALL
select 1,'6/27/2012','9/27/2012',1 UNION ALL
select 1,'8/23/2014','9/24/2014',3 UNION ALL
select 1,'9/23/2014','10/24/2014',3 UNION ALL
select 1,'10/23/2014','11/24/2014',3 UNION ALL
select 2,'7/4/2015','8/6/2015',1 UNION ALL
select 2,'8/4/2015','9/6/2015',1 UNION ALL
select 3,'11/1/2013','12/1/2013',0 UNION ALL
select 3,'1/9/2018','2/9/2018',0 UNION ALL
select 4,'1/1/2018','1/2/2018',0 UNION ALL --many non overlapping dates
select 4,'1/4/2018','1/5/2018',0 UNION ALL
select 4,'1/7/2018','1/9/2018',0 UNION ALL
select 4,'1/11/2018','1/13/2018',0 UNION ALL
select 4,'2/7/2018','2/8/2018',0 UNION ALL --many overlapping dates
select 4,'2/8/2018','2/9/2018',0 UNION ALL
select 4,'2/9/2018','2/10/2018',0 UNION all
select 4,'2/10/2018','2/11/2018',0 UNION all
select 4,'2/11/2018','2/12/2018',0 UNION all
select 4,'2/12/2018','2/13/2018',0 UNION all
select 4,'3/7/2018','3/8/2018',0 UNION ALL --many overlapping dates, second instance of id 4, type 0
select 4,'3/8/2018','3/9/2018',0 UNION ALL
select 4,'3/9/2018','3/10/2018',0 UNION all
select 4,'3/10/2018','3/11/2018',0 UNION all
select 4,'3/11/2018','3/12/2018',0 UNION all
select 4,'3/12/2018','3/13/2018',0
;
WITH cdata
AS (SELECT Id,
d.Type,
d.StartDate,
d.EndDate,
CurrentStart = d.StartDate
FROM #Data d
WHERE
NOT EXISTS (
SELECT * FROM #Data x WHERE x.StartDate < d.StartDate AND d.StartDate <= x.EndDate AND d.EndDate >= x.StartDate AND d.Id = x.Id AND d.Type = x.Type --get first records for overlapping ranges
)
UNION ALL
SELECT d.Id,
d.Type,
StartDate = CASE WHEN d2.StartDate < d.StartDate THEN d2.StartDate ELSE d.StartDate END,
EndDate = CASE WHEN d2.EndDate > d.EndDate THEN d2.EndDate ELSE d.EndDate END,
CurrentStart = d2.StartDate
FROM cdata d
INNER JOIN #Data d2
ON (
d.StartDate <= d2.EndDate
AND d.EndDate >= d2.StartDate
)
AND d2.Id = d.Id
AND d2.Type = d.Type
AND d2.StartDate > d.CurrentStart)
SELECT cdata.Id, cdata.Type, cdata.StartDate, EndDate = MAX(cdata.EndDate)
FROM cdata
GROUP BY cdata.Id, cdata.Type, cdata.StartDate
This looks like a Packing Intervals problem. See the post by Itzik Ben-Gan for all the details and what indexes he recommends to make it work efficiently. He presents a solution without recursive CTE.
Two notes.
The query below assumes that intervals are [closed; open), i.e. StartDate is inclusive and EndDate is exclusive. This way to represent such data is often the most convenient. (in the same sense as having arrays as zero-based instead of 1-based is usually more convenient in programming languages).
I added a RowID column to have unambiguous sorting.
Sample data
DECLARE #T TABLE
(
RowID int IDENTITY,
id int,
StartDate date,
EndDate date,
tp int
);
INSERT INTO #T(Id, StartDate, EndDate, tp) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
-- Make EndDate an opened interval, make it exclusive
-- [Start; End)
UPDATE #T
SET EndDate = DATEADD(day, 1, EndDate)
;
Recommended indexes
-- indexes to support solutions
CREATE UNIQUE INDEX idx_start_id ON T(id, tp, StartDate, RowID);
CREATE UNIQUE INDEX idx_end_id ON T(id, tp, EndDate, RowID);
Query
Read the Itzik's post to understand what is going on. He has nice illustrations there. In short, each timestamp (start or end) is treated as an event. Each event has a + or - type. Each time we encounter a + event (some interval starts) we increase the running counter. Each time we encounter a - event (some interval ends) we decrease the running counter. When the running counter is 0 it means that the streak of overlapping intervals is over.
I took Itzik's query as is and simply changed the column names to match your names.
WITH C1 AS
-- let e = end ordinals, let s = start ordinals
(
SELECT
RowID, id, tp, StartDate AS ts, +1 AS EventType,
NULL AS e,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY StartDate, RowID) AS s
FROM #T
UNION ALL
SELECT
RowID, id, tp, EndDate AS ts, -1 AS EventType,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY EndDate, RowID) AS e,
NULL AS s
FROM #T
),
C2 AS
-- let se = start or end ordinal, namely, how many events (start or end) happened so far
(
SELECT C1.*,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts, EventType DESC, RowID) AS se
FROM C1
),
C3 AS
-- For start events, the expression s - (se - s) - 1 represents how many sessions were active
-- just before the current (hence - 1)
--
-- For end events, the expression (se - e) - e represents how many sessions are active
-- right after this one
--
-- The above two expressions are 0 exactly when a group of packed intervals
-- either starts or ends, respectively
--
-- After filtering only events when a group of packed intervals either starts or ends,
-- group each pair of adjacent start/end events
(
SELECT id, tp, ts,
((ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0
)
SELECT id, tp, MIN(ts) AS StartDate, DATEADD(day, -1, MAX(ts)) AS EndDate
FROM C3
GROUP BY id, tp, grpnum
ORDER BY id, tp, StartDate;
Result
+----+----+------------+------------+
| id | tp | StartDate | EndDate |
+----+----+------------+------------+
| 1 | 1 | 2012-02-18 | 2012-09-27 |
| 1 | 3 | 2014-08-23 | 2014-11-24 |
| 2 | 1 | 2015-07-04 | 2015-09-06 |
| 3 | 0 | 2013-11-01 | 2013-12-01 |
| 3 | 0 | 2018-01-09 | 2018-02-09 |
+----+----+------------+------------+
create table #table
(Id int,StartDate date, EndDate date, Type int)
insert into #table
values
('1','2012-02-18','2012-03-18','1'),('1','2012-03-19','2012-06-19','1'),
('1','2012-06-27','2012-09-27','1'),('1','2014-08-23','2014-09-24','3'),
('1','2014-09-23','2014-10-24','3'),('1','2014-10-23','2014-11-24','3'),
('2','2015-07-04','2015-08-06','1'),('2','2015-08-04','2015-09-06','1'),
('3','2013-11-01','2013-12-01','0'),('3','2018-01-09','2018-02-09','0')
select ID,MIN(startdate)sd,MAX(EndDate)ed,type from #table
group by ID,TYPE,YEAR(startdate),YEAR(EndDate)
this can be easily achieved by using some window-functions and CTE's. Here is the solution
DECLARE #table TABLE
(id INT,
StartDate DATE,
EndDate DATE,
[Type] INT
);
INSERT INTO #table(Id, StartDate, EndDate, [Type]) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
WITH C1 AS
(
SELECT *,
MAX(EndDate) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS PrevEnd
FROM #table
),
C2 AS
(
SELECT *,
SUM(StartFlag) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS UNBOUNDED PRECEDING) AS GroupID
FROM C1
CROSS APPLY ( VALUES(CASE WHEN StartDate <= PrevEnd THEN NULL ELSE 1 END) ) AS A(StartFlag)
)
SELECT Id, [Type], MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate
FROM C2
GROUP BY Id, [Type], GroupID;

breakdown by weeks

Below is a simple query and the result: Is the a way to aggregate the total EVENTs by 7 days, then sum up the total EVENTs? Would a rollup function work? I am using SQL SERVER 05 & 08. Thanks again, folks.
SELECT DATE_SOLD, count(DISTINCT PRODUCTS) AS PRODUCT_SOLD
FROM PRODUCTS
WHERE DATE >='10/1/2009'
and DATE <'10/1/2010'
GROUP BY DATE_SOLD
RESULTS:
DATE_SOLD PRODUCT_SOLD
10/1/09 5
10/2/09 11
10/3/09 14
10/4/09 6
10/5/09 11
10/6/09 13
10/7/09 10
Total 70
10/8/09 4
10/9/09 11
10/10/09 8
10/11/09 4
10/12/09 7
10/13/09 4
10/14/09 9
Total 47
Not having your table design to work with here's what I think you are after (although I have to admit the output needs to be cleaned up). It should, at least, get you some way to the solution you are looking for.
CREATE TABLE MyTable(
event_date date,
event_type char(1)
)
GO
INSERT MyTable VALUES ('2009-1-01', 'A')
INSERT MyTable VALUES ('2009-1-11', 'B')
INSERT MyTable VALUES ('2009-1-11', 'C')
INSERT MyTable VALUES ('2009-1-20', 'N')
INSERT MyTable VALUES ('2009-1-20', 'N')
INSERT MyTable VALUES ('2009-5-23', 'D')
INSERT MyTable VALUES ('2009-5-23', 'E')
INSERT MyTable VALUES ('2009-5-10', 'F')
INSERT MyTable VALUES ('2009-5-10', 'F')
GO
WITH T AS (
SELECT DATEPART(MONTH, event_date) event_month, event_date, event_type
FROM MyTable
)
SELECT CASE WHEN (GROUPING(event_month) = 0)
THEN event_month ELSE '99' END AS event_month,
CASE WHEN (GROUPING(event_date) = 1)
THEN '9999-12-31' ELSE event_date END AS event_date,
COUNT(DISTINCT event_type) AS event_count
FROM T
GROUP BY event_month, event_date WITH ROLLUP
ORDER BY event_month, event_date
This gives the following output:
event_month event_date event_count
1 2009-01-01 1
1 2009-01-11 2
1 2009-01-20 1
1 9999-12-31 4
5 2009-05-10 1
5 2009-05-23 2
5 9999-12-31 3
99 9999-12-31 7
Where the '99' for month and '9999-12-31' for year are the totals.
SELECT DATEDIFF(week, 0, DATE_SOLD) Week,
DATEADD(week, DATEDIFF(week, 0, DATE_SOLD), 0) From,
DATEADD(week, DATEDIFF(week, 0, DATE_SOLD), 0) + 6 To,
COUNT(DISTINCT PRODUCTS) PRODUCT_SOLD
FROM dbo.PRODUCTS
WHERE DATE >= '2009-10-01'
AND DATE < '2010-10-01'
GROUP BY DATEDIFF(week, 0, DATE_SOLD) WITH ROLLUP
ORDER BY DATEDIFF(week, 0, DATE_SOLD)