Count previous consecutive rows in SQL Server - sql

I have attendance data list which is showing below. Now I am trying to find data by a specific date range (01/05/2016 ā€“ 07/05/2016) with total Present Column, Total Present Column will be calculated from previous present data (P). Suppose today is 04/05/2016. If a person has 01,02,03,04 status ā€˜pā€™ then it will show date 04-05-2016 total present 4.
Could you help me to find total present from this result set.

You can check this example, which have logic to calculate previous sum value.
declare #t table (employeeid int, datecol date, status varchar(2) )
insert into #t values (10001, '01-05-2016', 'P'),
(10001, '02-05-2016', 'P'),
(10001, '03-05-2016', 'P'),
(10001, '04-05-2016', 'P'),
(10001, '05-05-2016', 'A'),
(10001, '06-05-2016', 'P'),
(10001, '07-05-2016', 'P'),
(10001, '08-05-2016', 'L'),
(10002, '07-05-2016', 'P'),
(10002, '08-05-2016', 'L')
--select * from #t
select * ,
SUM(case when status = 'P' then 1 else 0 end) OVER (PARTITION BY employeeid ORDER BY employeeid, datecol
ROWS BETWEEN UNBOUNDED PRECEDING
AND current row)
from
#t
Another twist of the same thing via cte (as you written SQLSERVER2012, this below solution only work in Sqlserver 2012 and above)
;with cte as
(
select employeeid , datecol , ROW_NUMBER() over(partition by employeeid order by employeeid, datecol) rowno
from
#t where status = 'P'
)
select t.*, cte.rowno ,
case when ( isnull(cte.rowno, 0) = 0)
then LAG(cte.rowno) OVER (ORDER BY t.employeeid, t.datecol)
else cte.rowno
end LagValue
from #t t left join cte on t.employeeid = cte.employeeid and t.datecol = cte.datecol
order by t.employeeid, t.datecol

You could use a subquery to calculate TotalPresent for each row:
SELECT
main.EmployeeID,
main.[Date],
main.[Status],
(
SELECT SUM(CASE WHEN t.[Status] = 'P' THEN 1 ELSE 0 END)
FROM [TableName] t
WHERE t.EmployeeID = main.EmployeeID AND t.[Date] <= main.[Date]
) as TotalPresent
FROM [TableName] main
ORDER BY
main.EmployeeID,
main.[Date]
Here I used subquery to count the sum of records that have the same EmployeeID and date is less or equal to the date of current row. If status of the record is 'P', then 1 is added to the sum, otherwise 0, which counts only records that have status P.

Interesting question, this should work:
select *
, (select count(retail) from p g
where g.date <= p.date and g.id = p.id and retail = 'P')
from p
order by ID, Date;
So I believe I understand correctly. You would like to count the occurences of P per ID datewise.
This makes a lot of sense. That is why the first occurrence of ID2 was L and the Total is 0. This query will count P status for each occurrence, pause at non-P for each ID.
Here is an example

Related

Collapse multiple rows into a single row based upon a break condition

I have a simple sounding requirement that has had me stumped for a day or so now, so its time to seek help from the experts.
My requirement is to simply roll-up multiple rows into a single row based upon a break condition - when any of these columns change Employee ID, Allowance Plan, Allowance Amount or To Date, then the row is to be kept, if that makes sense.
An example source data set is shown below:
and the target data after collapsing the rows should look like this:
As you can see I don't need any type of running totals calculating I just need to collapse the rows into a single record per from date/to date combination.
So far I have tried the following SQL using a GROUP BY and MIN function
select [Employee ID], [Allowance Plan],
min([From Date]), max([To Date]), [Allowance Amount]
from [dbo].[#AllowInfo]
group by [Employee ID], [Allowance Plan], [Allowance Amount]
but that just gives me a single row and does not take into account the break condition.
what do I need to do so that the records are rolled-up (correct me if that is not the right terminology) correctly taking into account the break condition?
Any help is appreciated.
Thank you.
Note that your test data does not really exercise the algo that well - e.g. you only have one employee, one plan. Also, as you described it, you would end up with 4 rows as there is a change of todate between 7->8, 8->9, 9->10 and 10->11.
But I can see what you are trying to do, so this should at least get you on the right track, and returns the expected 3 rows. I have taken the end of a group to be where either employee/plan/amount has changed, or where todate is not null (or where we reach the end of the data)
CREATE TABLE #data
(
RowID INT,
EmployeeID INT,
AllowancePlan VARCHAR(30),
FromDate DATE,
ToDate DATE,
AllowanceAmount DECIMAL(12,2)
);
INSERT INTO #data(RowID, EmployeeID, AllowancePlan, FromDate, ToDate, AllowanceAmount)
VALUES
(1,200690,'CarAllowance','30/03/2017', NULL, 1000.0),
(2,200690,'CarAllowance','01/08/2017', NULL, 1000.0),
(6,200690,'CarAllowance','23/04/2018', NULL, 1000.0),
(7,200690,'CarAllowance','30/03/2018', NULL, 1000.0),
(8,200690,'CarAllowance','21/06/2018', '01/04/2019', 1000.0),
(9,200690,'CarAllowance','04/11/2021', NULL, 1000.0),
(10,200690,'CarAllowance','30/03/2017', '13/05/2022', 1000.0),
(11,200690,'CarAllowance','14/05/2022', NULL, 850.0);
-- find where the break points are
WITH chg AS
(
SELECT *,
CASE WHEN LAG(EmployeeID, 1, -1) OVER(ORDER BY RowID) != EmployeeID
OR LAG(AllowancePlan, 1, 'X') OVER(ORDER BY RowID) != AllowancePlan
OR LAG(AllowanceAmount, 1, -1) OVER(ORDER BY RowID) != AllowanceAmount
OR LAG(ToDate, 1) OVER(ORDER BY RowID) IS NOT NULL
THEN 1 ELSE 0 END AS NewGroup
FROM #data
),
-- count the number of break points as we go to group the related rows
grp AS
(
SELECT chg.*,
ISNULL(
SUM(NewGroup)
OVER (ORDER BY RowID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
0) AS grpNum
FROM chg
)
SELECT MIN(grp.RowID) AS RowID,
MAX(grp.EmployeeID) AS EmployeeID,
MAX(grp.AllowancePlan) AS AllowancePlan,
MIN(grp.FromDate) AS FromDate,
MAX(grp.ToDate) AS ToDate,
MAX(grp.AllowanceAmount) AS AllowanceAmount
FROM grp
GROUP BY grpNum
one way is to get all rows the last todate, and then group on that
select min(t.RowID) as RowID,
t.EmployeeID,
min(t.AllowancePlan) as AllowancePlan,
min(t.FromDate) as FromDate,
max(t.ToDate) as ToDate,
min(t.AllowanceAmount) as AllowanceAmount
from ( select t.RowID,
t.EmployeeID,
t.FromDate,
t.AllowancePlan,
t.AllowanceAmount,
case when t.ToDate is null then ( select top 1 t2.ToDate
from test t2
where t2.EmployeeID = t.EmployeeID
and t2.ToDate is not null
and t2.FromDate > t.FromDate -- t2.RowID > t.RowID
order by t2.RowID, t2.FromDate
)
else t.ToDate
end as todate
from test t
) t
group by t.EmployeeID, t.ToDate
order by t.EmployeeID, min(t.RowID)
See and test yourself in this DBFiddle
the result is
RowID
EmployeeID
AllowancePlan
FromDate
ToDate
AllowanceAmount
1
200690
CarAllowance
2017-03-30
2019-04-01
1000
9
200690
CarAllowance
2021-11-04
2022-05-13
1000
11
200690
CarAllowance
2022-05-14
(null)
850

Find date of most recent overdue

I have the following problem: from the table of pays and dues, I need to find the date of the last overdue. Here is the table and data for example:
create table t (
Id int
, [date] date
, Customer varchar(6)
, Deal varchar(6)
, Currency varchar(3)
, [Sum] int
);
insert into t values
(1, '2017-12-12', '1110', '111111', 'USD', 12000)
, (2, '2017-12-25', '1110', '111111', 'USD', 5000)
, (3, '2017-12-13', '1110', '122222', 'USD', 10000)
, (4, '2018-01-13', '1110', '111111', 'USD', -10100)
, (5, '2017-11-20', '2200', '222221', 'USD', 25000)
, (6, '2017-12-20', '2200', '222221', 'USD', 20000)
, (7, '2017-12-31', '2201', '222221', 'USD', -10000)
, (8, '2017-12-29', '1110', '122222', 'USD', -10000)
, (9, '2017-11-28', '2201', '222221', 'USD', -30000);
If the value of "Sum" is positive - it means overdue has begun; if "Sum" is negative - it means someone paid on this Deal.
In the example above on Deal '122222' overdue starts at 2017-12-13 and ends on 2017-12-29, so it shouldn't be in the result.
And for the Deal '222221' the first overdue of 25000 started at 2017-11-20 was completly paid at 2017-11-28, so the last date of current overdue (we are interested in) is 2017-12-31
I've made this selection to sum up all the payments, and stuck here :(
WITH cte AS (
SELECT *,
SUM([Sum]) OVER(PARTITION BY Deal ORDER BY [Date]) AS Debt_balance
FROM t
)
Apparently i need to find (for each Deal) minimum of Dates if there is no 0 or negative Debt_balance and the next date after the last 0 balance otherwise..
Will be gratefull for any tips and ideas on the subject.
Thanks!
UPDATE
My version of solution:
WITH cte AS (
SELECT ROW_NUMBER() OVER (ORDER BY Deal, [Date]) id,
Deal, [Date], [Sum],
SUM([Sum]) OVER(PARTITION BY Deal ORDER BY [Date]) AS Debt_balance
FROM t
)
SELECT a.Deal,
SUM(a.Sum) AS NET_Debt,
isnull(max(b.date), min(a.date)),
datediff(day, isnull(max(b.date), min(a.date)), getdate())
FROM cte as a
LEFT OUTER JOIN cte AS b
ON a.Deal = b.Deal AND a.Debt_balance <= 0 AND b.Id=a.Id+1
GROUP BY a.Deal
HAVING SUM(a.Sum) > 0
I believe you are trying to use running sum and keep track of when it changes to positive, and it can change to positive multiple times and you want the last date at which it became positive. You need LAG() in addition to running sum:
WITH cte1 AS (
-- running balance column
SELECT *
, SUM([Sum]) OVER (PARTITION BY Deal ORDER BY [Date], Id) AS RunningBalance
FROM t
), cte2 AS (
-- overdue begun column - set whenever running balance changes from l.t.e. zero to g.t. zero
SELECT *
, CASE WHEN LAG(RunningBalance, 1, 0) OVER (PARTITION BY Deal ORDER BY [Date], Id) <= 0 AND RunningBalance > 0 THEN 1 END AS OverdueBegun
FROM cte1
)
-- eliminate groups that are paid i.e. sum = 0
SELECT Deal, MAX(CASE WHEN OverdueBegun = 1 THEN [Date] END) AS RecentOverdueDate
FROM cte2
GROUP BY Deal
HAVING SUM([Sum]) <> 0
Demo on db<>fiddle
You can use window functions. These can calculate intermediate values:
Last day when the sum is negative (i.e. last "good" record).
Last sum
Then you can combine these:
select deal, min(date) as last_overdue_start_date
from (select t.*,
first_value(sum) over (partition by deal order by date desc) as last_sum,
max(case when sum < 0 then date end) over (partition by deal order by date) as max_date_neg
from t
) t
where last_sum > 0 and date > max_date_neg
group by deal;
Actually, the value on the last date is not necessary. So this simplifies to:
select deal, min(date) as last_overdue_start_date
from (select t.*,
max(case when sum < 0 then date end) over (partition by deal order by date) as max_date_neg
from t
) t
where date > max_date_neg
group by deal;

SQL find average time difference between rows for a given category

I browsed SO but could not quite find the exact answer or maybe it was for a different language.
Let's say I have a table, where each row is a record of a trade:
trade_id customer trade_date
1 A 2013-05-01 00:00:00
2 B 2013-05-01 10:00:00
3 A 2013-05-02 00:00:00
4 A 2013-05-05 00:00:00
5 B 2013-05-06 12:00:00
I would like to have the average time between trades, in days or fraction of days, for each customer, and the number of days since last trade. So for instance for customer A, time between trades 1 and 3 is 1 day and between trades 3 and 4 is 3 days, for an average of 2. So the end table would look like something like this (assuming today it's the 2013-05-10):
customer avg_time_btw_trades time_since_last_trade
A 2.0 5.0
B 5.08 3.5
If a customer has only got 1 trade I guess NULL is fine as output.
Not even sure SQL is the best way to do this (I am working with SQL server), but any help is appreciated!
SELECT
customer,
DATEDIFF(second, MIN(trade_date), MAX(trade_date)) / (NULLIF(COUNT(*), 1) - 1) / 86400.0,
DATEDIFF(second, MAX(trade_date), GETDATE() ) / 86400.0
FROM
yourTable
GROUP BY
customer
http://sqlfiddle.com/#!6/eb46e/7
EDIT: Added final field that I didn't notice, apologies.
The following SQL script uses your data and gives the expected results.
DECLARE #temp TABLE
( trade_id INT,
customer CHAR(1),
trade_date DATETIME );
INSERT INTO #temp VALUES (1, 'A', '20130501');
INSERT INTO #temp VALUES (2, 'B', '20130501 10:00');
INSERT INTO #temp VALUES (3, 'A', '20130502');
INSERT INTO #temp VALUES (4, 'A', '20130505');
INSERT INTO #temp VALUES (5, 'B', '20130506 12:00');
DECLARE #getdate DATETIME
-- SET #getdate = getdate();
SET #getdate = '20130510';
SELECT s.customer
, AVG(s.days_btw_trades) AS avg_time_between_trades
, CAST(DATEDIFF(hour, MAX(s.trade_date), #getdate) AS float)
/ 24.0 AS time_since_last_trade
FROM (
SELECT CAST(DATEDIFF(HOUR, t2.trade_date, t.trade_date) AS float)
/ 24.0 AS days_btw_trades
, t.customer
, t.trade_date
FROM #temp t
LEFT JOIN #temp t2 ON t2.customer = t.customer
AND t2.trade_date = ( SELECT MAX(t3.trade_date)
FROM #temp t3
WHERE t3.customer = t.customer
AND t3.trade_date < t.trade_date)
) s
GROUP BY s.customer
You need a date difference between every trade and average them.
select
a.customer
,avg(datediff(a.trade_date, b.trade_date))
,datediff(now(),max(a.trade_date))
from yourTable a, yourTable b
where a.customer = b.customer
and b.trade_date = (
select max(trade_date)
from yourTable c
where c.customer = a.customer
and a.trade_date > c.trade_date)
#gets the one earlier date for every trade
group by a.customer
Just for grins I added a solution that would use CTE's. You could probably use a temp table if the first query is too large. I used #MatBailie creation script for the table:
CREATE TABLE customer_trades (
id INT IDENTITY(1,1),
customer_id INT,
trade_date DATETIME,
PRIMARY KEY (id),
INDEX ix_user_trades (customer_id, trade_date)
)
INSERT INTO
customer_trades (
customer_id,
trade_date
)
VALUES
(1, '2013-05-01 00:00:00'),
(2, '2013-05-01 10:00:00'),
(1, '2013-05-02 00:00:00'),
(1, '2013-05-05 00:00:00'),
(2, '2013-05-06 12:00:00')
;
;WITH CTE as(
select customer_id, trade_date, datediff(hour,trade_date,ISNULL(LEAD(trade_date,1) over (partition by customer_id order by trade_date),GETDATE())) Trade_diff
from customer_trades
)
, CTE2 as
(SELECT customer_id, trade_diff, LAST_VALUE(trade_diff) OVER(Partition by customer_id order by trade_date) Curr_Trade from CTE)
SELECT Customer_id, AVG(trade_diff) AV, Max(Curr_Trade) Curr_Trade
FROM CTE2
GROUP BY customer_id

What will be the best possible way to find date difference?

I have a table for operators in which I want to calculate the time difference between two status (10-20) for the whole day .
Here I want the time difference between "ActivityStatus" 10 and 20.
we have total 3 bunch of 10-20 status in this pic. for last status there is no 20 status in this case it will take the last oa_createdDate (ie oa_id 230141).
My expected output for this operator is date diff between cl_id 230096 and 230102 , date diff between cl_id 230103 and 230107 , date diff between cl_id 230109 and cl_id 230141. Once I get these difference I want to sum all the date diff value to calculate busy time for that operator.
Thanks in advance .
I have a sneaking suspicion that the DateDiff() function is the function that you seek
http://www.w3schools.com/sql/func_datediff.asp
There's an easy way to do what I assume you want done with outer apply, like so:
select tmin.*, t.oa_CreateDate oa_CreateDate_20
, datediff(minute, tmin.oa_CreateDate, t.oa_CreateDate) DiffInMinutes
from testtable t
cross apply
(select top 1 *
from testtable tmin
where tmin.oa_CreateDate < t.oa_CreateDate and tmin.oa_OperatorId = t.oa_OperatorId
order by tmin.oa_CreateDate asc) tmin
where t.ActivityStatus = 20
and t.oa_CreateDate < (select min(oa_CreateDate) from testtable where ActivityStatus = 10 and oa_OperatorId = 1960)
and t.oa_OperatorId = 1960
union all
select t.*
, coalesce(a.oa_CreateDate,ma.MaxDate) oa_CreateDate_20
, datediff(minute, t.oa_CreateDate, coalesce(a.oa_CreateDate,ma.MaxDate)) DiffInMinutes
from testtable t
outer apply
(select top 1 a.oa_CreateDate
from testtable a
where a.oa_OperatorId = t.oa_OperatorId and a.ActivityStatus = 20
and t.oa_CreateDate < a.oa_CreateDate order by a.oa_CreateDate asc) a
outer apply
(select max(a2.oa_CreateDate) maxDate
from testtable a2
where a2.oa_OperatorId = t.oa_OperatorId
and t.oa_CreateDate < a2.oa_CreateDate) ma
where oa_OperatorId = 1960
and ActivityStatus = 10
order by oa_CreateDate asc, oa_CreateDate_20 asc
You can see the fiddle here.
But of course, you have to give us the format / accurracy for the datediff comparison. And this assumes you will always have both Status 10 AND 20, and that their timestamp ranges never overlap.
EDIT: Updated the answer based on your comment, check the new script and fiddle. Now the script fill find all Status 10 - 20 datediffs, and in case no Status 20 exists after the last 10, then the latest existing timestamp after that Status 10 will be used instead.
EDIT 2: Updated with your comment below. But at this point the script is getting rather ugly. Unfortunately I don't have the time to clean it up, so I ask that next time you post a question, please make it as clear cut and clean as possible, since there's a lot less effort involved to answer a question once instead of editing 3 different variations along the ride. :)
This should work anyhow, the new section before the UNION ALL in the script will return results only if there are any Status 20's without preceding 10's. Otherwise it'll return nothing, and move to the main portion of the script as before. Fiddle has been updated as well.
This is one way of doing it.
The first OUTER APPLY will retrieve the next row with a status of 20 that is after the current created datetime.
The second OUTER APPLY will retrieve the next row after the current created datetime where there is no status 20.
SELECT
o.*
, COALESCE(NextStatus.oa_CreateDate, NextStatusIsNull.oa_CreateDate) AS NextTimestamp
, COALESCE(NextStatus.ActivityStatus, NextStatusIsNull.ActivityStatus) AS NextStatus
, DATEDIFF(MINUTE, o.oa_CreateDate,
COALESCE(NextStatus.oa_CreateDate, NextStatusIsNull.oa_CreateDate))
AS DifferenceInMinutes
FROM
operators AS o
OUTER APPLY
(
SELECT TOP 1
oa_CreateDate
, ActivityStatus
FROM
operators
WHERE
ActivityStatus = 20
AND oa_CreateDate > o.oa_CreateDate
ORDER BY
oa_CreateDate
) AS NextStatus
OUTER APPLY
(
SELECT TOP 1
oa_CreateDate
, ActivityStatus
FROM
operators
WHERE
NextStatus.oa_CreateDate IS NULL
AND oa_CreateDate > o.oa_CreateDate
ORDER BY
oa_CreateDate
) AS NextStatusIsNull
WHERE
ActivityStatus = 10
I have used some different test data because you used a picture from which I was unable to cut and paste. This should be easy to convert to your table:
Note this should also work with the none-existing start and end dates,
Also note this was done without any joins to optimize performance.
Test table and data:
DECLARE #t table(ActivityStatus int, oa_createdate datetime, oa_operatorid int)
INSERT #t values
(30, '2015-07-23 08:20', 1960),(20, '2015-07-23 08:24', 1960),
(10, '2015-07-23 08:30', 1960),(20, '2015-07-23 08:40', 1960),
(10, '2015-07-23 08:50', 1960),(50, '2015-07-23 09:40', 1960)
Query:
;WITH cte as
(
SELECT
ActivityStatus,
oa_createdate,
oa_operatorid
FROM #t
WHERE ActivityStatus in (10,20)
UNION ALL
SELECT 20, max(oa_createdate), oa_operatorid
FROM #t
GROUP BY oa_operatorid
HAVING
max(case when ActivityStatus = 20 then oa_createdate end) <
max(case when ActivityStatus = 10 then oa_createdate end)
UNION ALL
SELECT 10, min(oa_createdate), oa_operatorid
FROM #t
GROUP BY oa_operatorid
HAVING
min(case when ActivityStatus = 20 then oa_createdate end) <
min(case when ActivityStatus = 10 then oa_createdate else '2999-01-01' end)
)
SELECT
cast(cast(sum(case when activitystatus = 10 then -1 else 1 end
* cast(oa_createdate as float)) as datetime) as time(0)) as difference_in_time,
oa_operatorid
FROM cte
GROUP BY oa_operatorid
Result:
difference_in_time oa_operatorid
01:04:00 1960
Data
create table #Table2 (oa_id int, oa_OperatorId int, ActivityStatus int, oa_CreateDate datetime)
insert into #Table2
values (1, 1960,10,'2015-08-10 10:55:12.317')
,(2, 1960,20,'2015-08-10 11:55:12.317')
,(3, 1960,30,'2015-08-10 14:55:12.317')
,(4, 1960,50,'2015-08-10 14:58:12.317')
,(5, 1960,10,'2015-08-10 15:55:12.317')
,(6, 1960,20,'2015-08-10 16:20:12.317')
,(7, 1960,10,'2015-08-10 16:30:12.317')
,(8, 1960,50,'2015-08-10 17:20:12.317')
Populate target table with the rows we are interested in
select oa_id,
oa_operatorid,
ActivityStatus,
oa_createDate,
rn = row_number() over (order by oa_id desc)
into #Table
from #Table2
where ActivityStatus in (10, 20)
insert #Table
select top 1
oa_id,
oa_operatorid,
ActivityStatus,
oa_createDate,
0
from #Table2
order by oa_id desc
select * into #Table10 from #Table where ActivityStatus = 10
select * into #Table20 from #Table where ActivityStatus = 20
union
select * from #Table where rn = 0 /*add the last record*/
except
select * from #Table where rn = (select max(rn) from #Table) /**discard the first "20" record*/
/*free time info*/
select datediff(second, t10.oa_createDate, t20.oa_createDate) secondssincelast10,
t20.*
from #Table10 t10 join #Table20 t20
on t10.rn = t20.rn + 1
and t10.oa_OperatorId = t20.oa_OperatorId
/*Summarized info per operator*/
select sum(datediff(second, t10.oa_createDate, t20.oa_createDate)) totalbusytime,
t20.oa_OperatorId
from #Table10 t10 join #Table20 t20
on t10.rn = t20.rn + 1
and t10.oa_OperatorId = t20.oa_OperatorId
group by t20.oa_OperatorId
Best way
DATEDIFF(expr1,expr2)
Example:
CREATE TABLE pins
(`id` int, `time` datetime)
;
INSERT INTO pins
(`id`, `time`)
VALUES
(1, '2013-11-15 05:25:25')
;
SELECT DATEDIFF(CURDATE(), `time`)
FROM `pins`

Group records only if it have intersected periods

I have table like this
declare #data table
(
id int not null,
groupid int not null,
startDate datetime not null,
endDate datetime not null
)
insert into #data values
(1, 1, '20150101', '20150131'),
(2, 1, '20150114', '20150131'),
(3, 1, '20150201', '20150228');
and my current selecting statement is:
select groupid, 'some data', min(id), count(*)
from #data
group by groupid
But now I need to group records if it have intersected periods
desired result:
1, 'some data', 1, 2
1, 'some data', 3, 1
Is someone know how to do this?
One method is to identify the beginning of each group -- because it doesn't overlap with the previous one. Then, count the number of these as a group identifier.
with overlaps as (
select id
from #data d
where not exists (select 1
from #data d2
where d.groupid = d2.groupid and
d.startDate >= d2.startDate and
d.startDate < d2.endDate
)
),
groups as (
select d.*,
count(o.id) over (partition by groupid
order by d.startDate) as grpnum
from #data d left join
overlaps o
on d.id = o.id
)
select groupid, min(id), count(*),
min(startDate) as startDate, max(endDate) as endDate
from groups
group by grpnum, groupid;
Notes: This is using cumulative counts, which are available in SQL Server 2012+. You can do something similar with a correlated subquery or apply in earlier versions.
Also, this query assumes that the start dates are unique. If they are not, the query can be tweaked, but the logic becomes a bit more complicated.