Get peak and non-peak hours from table - sql

I have a peak_hours table where a certain duration in hours
are defined as the 'peak hours'
id | start | end
1 | 05:00 | 09:00
2 | 12:00 | 15:00
3 | 17:00 | 22:00
I have a jobs table that keeps track of the start and end date of a job.
id | started_at | completed_at
1 | 2019-05-07 04:00 | 2019-05-07 16:00
I'm trying to get the duration of which the job is in the peak, and non-peak hours
Expect output:
peak_hours_total | non_peak_hours_total
7 | 5

As Harry mentioned in the comments, one way of doing this is to expand single row with the date ranges into multiple rows, each representing a value at the desired level of granularity (hour, minute, etc.). This is all done because SQL Server is not really efficient when working with ranges and also, transaction data may extend over multiple days.
Following example expands data into minute level granularity and gives desired result. Keep in mind that I spent no time in trying to optimize the code, so there is definitely room for improvement:
-- Input
;with PeakHours as (
select 1 as id, '05:00' as [start], '09:00' as [end]
union all
select 2 as id, '12:00' as [start], '15:00' as [end]
union all
select 3 as id, '17:00' as [start], '22:00' as [end]
)
, data as (
select 1 as id, '2019-05-07 04:00' as started_at, '2019-05-07 16:00' as completed_at
)
-- Convert start and end to UNIX to be able to get ranges
, data2 as (
select *
,DATEDIFF(s, '1970-01-01', started_at) as started_at_unix
,DATEDIFF(s, '1970-01-01', completed_at) as completed_at_unix
from data
)
-- Find min start and max end to cover whole possible range
, data3 as (
select min(started_at_unix) as min_started_at_unix, max(completed_at_unix) as max_completed_at_unix
from data2
)
-- expand data using Tally table technique
,lv0 AS (SELECT 0 g UNION ALL SELECT 0)
,lv1 AS (SELECT 0 g FROM lv0 a CROSS JOIN lv0 b)
,lv2 AS (SELECT 0 g FROM lv1 a CROSS JOIN lv1 b)
,lv3 AS (SELECT 0 g FROM lv2 a CROSS JOIN lv2 b)
,lv4 AS (SELECT 0 g FROM lv3 a CROSS JOIN lv3 b)
,lv5 AS (SELECT 0 g FROM lv4 a CROSS JOIN lv4 b)
,Tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM lv5)
, data_expanded as (
SELECT TOP (select (max_completed_at_unix - min_started_at_unix) / 60 from data3) (n - 1) * 60 + d3.min_started_at_unix as unix_timestamp_min
from Tally as t
cross apply data3 as d3
)
-- Aggregate
select
1.0 * sum(case when ph.id is not null then 1 else 0 end) / 60 as peak_hours_total
,1.0 * sum(case when ph.id is null then 1 else 0 end) / 60 as non_peak_hours_total
from data_expanded as de
inner join data2 as d2
on de.unix_timestamp_min between d2.started_at_unix and d2.completed_at_unix
left join PeakHours as ph
on cast(dateadd(s, de.unix_timestamp_min, '1970-01-01') as time(0)) between ph.[start] and dateadd(SECOND, -1, cast(ph.[end] as time(0)))

Related

Group time series by time intervals (e.g. days) with aggregate of duration

I have a table containing a time series with following information. Each record represents the event of "changing the mode".
Timestamp | Mode
------------------+------
2018-01-01 12:00 | 1
2018-01-01 18:00 | 2
2018-01-02 01:00 | 1
2018-01-02 02:00 | 2
2018-01-04 04:00 | 1
By using the LEAD function, I can create a query with the following result. Now each record contains the information, when and how long the "mode was active".
Please check the 2nd and the 4th record. They "belong" to multiple days.
StartDT | EndDT | Mode | Duration
------------------+------------------+------+----------
2018-01-01 12:00 | 2018-01-01 18:00 | 1 | 6:00
2018-01-01 18:00 | 2018-01-02 01:00 | 2 | 7:00
2018-01-02 01:00 | 2018-01-02 02:00 | 1 | 1:00
2018-01-02 02:00 | 2018-01-04 04:00 | 2 | 50:00
2018-01-04 04:00 | (NULL) | 1 | (NULL)
Now I would like to have a query that groups the data by day and mode and aggregates the duration.
This result table is needed:
Date | Mode | Total
------------+------+-------
2018-01-01 | 1 | 6:00
2018-01-01 | 2 | 6:00
2018-01-02 | 1 | 1:00
2018-01-02 | 2 | 23:00
2018-01-03 | 2 | 24:00
2018-01-04 | 2 | 04:00
I didn't known how to handle the records that "belongs" to multiple days. Any ideas?
create table ChangeMode ( ModeStart datetime2(7), Mode int )
insert into ChangeMode ( ModeStart, Mode ) values
( '2018-11-15T21:00:00.0000000', 1 ),
( '2018-11-16T17:18:19.1231234', 2 ),
( '2018-11-16T18:00:00.5555555', 1 ),
( '2018-11-16T18:00:01.1234567', 2 ),
( '2018-11-16T19:02:22.8888888', 1 ),
( '2018-11-16T20:00:00.9876543', 2 ),
( '2018-11-17T09:00:00.0000000', 1 ),
( '2018-11-17T23:23:23.0230450', 2 ),
( '2018-11-19T17:00:00.0172839', 1 ),
( '2018-11-20T03:07:00.7033077', 2 )
;
with
-- Determine the earliest and latest dates.
-- Cast to date to remove the time portion.
-- Cast results back to datetime because we're going to add hours later.
MinMaxDates
as
(select cast(min(cast(ModeStart as date))as datetime) as MinDate,
cast(max(cast(ModeStart as date))as datetime) as MaxDate from ChangeMode),
-- How many days have passed during that period
Dur
as
(select datediff(day,MinDate,MaxDate) as Duration from MinMaxDates),
-- Create a list of numbers.
-- These will be added to MinDate to get a list of dates.
NumList
as
( select 0 as Num
union all
select Num+1 from NumList,Dur where Num<Duration ),
-- Create a list of dates by adding those numbers to MinDate
DayList
as
( select dateadd(day,Num,MinDate)as ModeDate from NumList, MinMaxDates ),
-- Create a list of day periods
PeriodList
as
( select ModeDate as StartTime,
dateadd(day,1,ModeDate) as EndTime
from DayList ),
-- Use LEAD to get periods for each record
-- Final record would return NULL for ModeEnd
-- We replace that with end of last day
ModePeriodList
as
( select ModeStart,
coalesce( lead(ModeStart)over(order by ModeStart),
dateadd(day,1,MaxDate) ) as ModeEnd,
Mode from ChangeMode, MinMaxDates ),
ModeDayList
as
( select * from ModePeriodList, PeriodList
where ModeStart<=EndTime and ModeEnd>=StartTime
),
-- Keep the later of the mode start time, and the day start time
-- Keep the earlier of the mode end time, and the day end time
ModeDayPeriod
as
( select case when ModeStart>=StartTime then ModeStart else StartTime end as StartTime,
case when ModeEnd<=EndTime then ModeEnd else EndTime end as EndTime,
Mode from ModeDayList ),
SumDurations
as
( select cast(StartTime as date) as ModeDate,
Mode,
DateDiff_Big(nanosecond,StartTime,EndTime)
/3600000000000
as DurationHours from ModeDayPeriod )
-- List the results in order
-- Use MaxRecursion option in case there are more than 100 days
select ModeDate as [Date], Mode, sum(DurationHours) as [Total Duration Hours]
from SumDurations
group by ModeDate, Mode
order by ModeDate, Mode
option (maxrecursion 0)
Result is:
Date Mode Total Duration Hours
---------- ----------- ---------------------------------------
2018-11-15 1 3.00000000000000
2018-11-16 1 18.26605271947221
2018-11-16 2 5.73394728052777
2018-11-17 1 14.38972862361111
2018-11-17 2 9.61027137638888
2018-11-18 2 24.00000000000000
2018-11-19 1 6.99999519891666
2018-11-19 2 17.00000480108333
2018-11-20 1 3.11686202991666
2018-11-20 2 20.88313797008333
you could use a CTE to create a table of days then join the time slots to it
DECLARE #MAX as datetime2 = (SELECT MAX(CAST(Timestamp as date)) MX FROM process);
WITH StartEnd AS (select p1.Timestamp StartDT,
P2.Timestamp EndDT ,
p1.mode
from process p1
outer apply
(SELECT TOP 1 pOP.* FROM
process pOP
where pOP.Timestamp > p1.Timestamp
order by pOP.Timestamp asc) P2
),
CAL AS (SELECT (SELECT MIN(cast(StartDT as date)) MN FROM StartEnd) DT
UNION ALL
SELECT DATEADD(day,1,DT) DT FROM CAL WHERE CAL.DT < #MAX
),
TMS AS
(SELECT CASE WHEN S.StartDT > C.DT THEN S.StartDT ELSE C.DT END AS STP,
CASE WHEN S.EndDT < DATEADD(day,1,C.DT) THEN S.ENDDT ELSE DATEADD(day,1,C.DT) END AS STE
FROM StartEnd S JOIN CAL C ON NOT(S.EndDT <= C.DT OR S.StartDT>= DATEADD(day,1,C.dt))
)
SELECT *,datediff(MI ,TMS.STP, TMS.ste) as x from TMS
The following uses recursive CTE to build a list of dates (a calendar or number table works equally well). It then intersect the dates with date times so that missing dates are populated with matching data. The important bit is that for each row, if start datetime belongs to previous day then it is clamped to 00:00. Likewise for end datetime.
DECLARE #t TABLE (timestamp DATETIME, mode INT);
INSERT INTO #t VALUES
('2018-01-01 12:00', 1),
('2018-01-01 18:00', 2),
('2018-01-02 01:00', 1),
('2018-01-02 02:00', 2),
('2018-01-04 04:00', 1);
WITH cte1 AS (
-- the min and max dates in your data
SELECT
CAST(MIN(timestamp) AS DATE) AS mindate,
CAST(MAX(timestamp) AS DATE) AS maxdate
FROM #t
), cte2 AS (
-- build all dates between min and max dates using recursive cte
SELECT mindate AS day_start, DATEADD(DAY, 1, mindate) AS day_end, maxdate
FROM cte1
UNION ALL
SELECT DATEADD(DAY, 1, day_start), DATEADD(DAY, 2, day_start), maxdate
FROM cte2
WHERE day_start < maxdate
), cte3 AS (
-- pull end datetime from next row into current
SELECT
timestamp AS dt_start,
LEAD(timestamp) OVER (ORDER BY timestamp) AS dt_end,
mode
FROM #t
), cte4 AS (
-- join datetime with date using date overlap query
-- then clamp start datetime to 00:00 of the date
-- and clamp end datetime to 00:00 of next date
SELECT
IIF(dt_start < day_start, day_start, dt_start) AS dt_start_fix,
IIF(dt_end > day_end, day_end, dt_end) AS dt_end_fix,
mode
FROM cte2
INNER JOIN cte3 ON day_end > dt_start AND dt_end > day_start
)
SELECT dt_start_fix, dt_end_fix, mode, datediff(minute, dt_start_fix, dt_end_fix) / 60.0 AS total
FROM cte4
DB Fiddle
Thanks everybody!
The answer from Cato put me on the right track. Here my final solution:
DECLARE #Start AS datetime;
DECLARE #End AS datetime;
DECLARE #Interval AS int;
SET #Start = '2018-01-01';
SET #End = '2018-01-05';
SET #Interval = 24 * 60 * 60;
WITH
cteDurations AS
(SELECT [Timestamp] AS StartDT,
LEAD ([Timestamp]) OVER (ORDER BY [Timestamp]) AS EndDT,
Mode
FROM tblLog
WHERE [Timestamp] BETWEEN #Start AND #End
),
cteTimeslots AS
(SELECT #Start AS StartDT,
DATEADD(SECOND, #Interval, #Start) AS EndDT
UNION ALL
SELECT EndDT,
DATEADD(SECOND, #Interval, EndDT)
FROM cteTimeSlots WHERE StartDT < #End
),
cteDurationsPerTimesplot AS
(SELECT CASE WHEN S.StartDT > C.StartDT THEN S.StartDT ELSE C.StartDT END AS StartDT,
CASE WHEN S.EndDT < C.EndDT THEN S.EndDT ELSE C.EndDT END AS EndDT,
C.StartDT AS Slot,
S.Mode
FROM cteDurations S
JOIN cteTimeslots C ON NOT(S.EndDT <= C.StartDT OR S.StartDT >= C.EndDT)
)
SELECT Slot,
Mode,
SUM(DATEDIFF(SECOND, StartDT, EndDT)) AS Duration
FROM cteDurationsPerTimesplot
GROUP BY Slot, Mode
ORDER BY Slot, Mode;
With the variable #Interval you are able to define the size of the timeslots.
The CTE cteDurations creates a subresult with the durations of all necessary entries by using the TSQL function LEAD (available in MSSQL >= 2012). This will be a lot faster than an OUTER APPLY.
The CTE cteTimeslots generates a list of timeslots with start time and end time.
The CTE cteDurationsPerTimesplot is a subresult with a JOIN between cteDurations and cteTimeslots. This this the magic JOIN statement from Cato!
And finally the SELECT statement will do the grouping and sum calculation per Slot and Mode.
Once again: Thanks a lot to everybody! Especially to Cato! You saved my weekend!
Regards
Oliver

SQL Server Group by date and by time of day over a date range

I'm not even sure if this can/should be done is SQL but here goes.
I have a table that stores a start date and an end date like so
userPingId createdAt lastUpdatedAt
1 2017-10-17 11:31:52.160 2017-10-18 14:31:52.160
I want to return a result set that groups the results by date and if they were active between different points between the two date.
The different points are
Morning - Before 12pm
Afternoon - Between 12pm and 5pm
Evening - After 5pm
So for example I would get the following results
sessionDate morning afternoon evening
2017-10-17 1 1 1
2017-10-18 1 1 0
Here is what I have so far and I believe that it's quite close but the fact I can't get the results I need make me think that this might not be possible in SQL (btw i'm using a numbers lookup table in my query which I saw on another tutorial)
DECLARE #s DATE = '2017-01-01', #e DATE = '2018-01-01';
;WITH d(sessionDate) AS
(
SELECT TOP (DATEDIFF(DAY, #s, #e) + 1) DATEADD(DAY, n-1, #s)
FROM dbo.Numbers ORDER BY n
)
SELECT
d.sessionDate,
sum(case when
(CONVERT(DATE, createdAt) = d.sessionDate AND datepart(hour, createdAt) < 12)
OR (CONVERT(DATE, lastUpdatedAt) = d.sessionDate AND datepart(hour, lastUpdatedAt) < 12)
then 1 else 0 end) as Morning,
sum(case when
(datepart(hour, createdAt) >= 12 and datepart(hour, createdAt) < 17)
OR (datepart(hour, lastUpdatedAt) >= 12 and datepart(hour, lastUpdatedAt) < 17)
OR (datepart(hour, createdAt) < 12 and datepart(hour, lastUpdatedAt) >= 17)
then 1 else 0 end) as Afternoon,
sum(case when datepart(hour, createdAt) >= 17 OR datepart(hour, lastUpdatedAt) >= 17 then 1 else 0 end) as Evening
FROM d
LEFT OUTER JOIN MYTABLE AS s
ON s.createdAt >= #s AND s.lastUpdatedAt <= #e
AND (CONVERT(DATE, s.createdAt) = d.sessionDate OR CONVERT(DATE, s.lastUpdatedAt) = d.sessionDate)
WHERE d.sessionDate >= #s AND d.sessionDate <= #e
AND userPingId = 49
GROUP BY d.sessionDate
ORDER BY d.sessionDate;
Building on what you started with the numbers table, you can add the time ranges to your adhoc calendar table using another common table expression using cross apply()
and the table value constructor (values (...),(...)).
From there, you can use an inner join based on overlapping date ranges along with conditional aggregation to pivot the results:
declare #s datetime = '2017-01-01', #e datetime = '2018-01-01';
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, d as ( /* adhoc date/numbers table */
select top (datediff(day, #s, #e)+1)
SessionDate=convert(datetime,dateadd(day,row_number() over(order by (select 1))-1,#s))
from n as deka cross join n as hecto cross join n as kilo
cross join n as tenK cross join n as hundredK
order by SessionDate
)
, h as ( /* add time ranges to date table */
select
SessionDate
, StartDateTime = dateadd(hour,v.s,SessionDate)
, EndDateTime = dateadd(hour,v.e,SessionDate)
, v.point
from d
cross apply (values
(0,12,'morning')
,(12,17,'afternoon')
,(17,24,'evening')
) v (s,e,point)
)
select
t.userPingId
, h.SessionDate
, morning = count(case when point = 'morning' then 1 end)
, afternoon = count(case when point = 'afternoon' then 1 end)
, evening = count(case when point = 'evening' then 1 end)
from t
inner join h
on t.lastupdatedat >= h.startdatetime
and h.enddatetime > t.createdat
group by t.userPingId, h.SessionDate
rextester demo: http://rextester.com/MVB77123
returns:
+------------+-------------+---------+-----------+---------+
| userPingId | SessionDate | morning | afternoon | evening |
+------------+-------------+---------+-----------+---------+
| 1 | 2017-10-17 | 1 | 1 | 1 |
| 1 | 2017-10-18 | 1 | 1 | 0 |
+------------+-------------+---------+-----------+---------+
Alternately, you could use pivot() instead of conditional aggregation in the final select:
select UserPingId, SessionDate, Morning, Afternoon, Evening
from (
select
t.userPingId
, h.SessionDate
, h.point
from t
inner join h
on t.lastupdatedat >= h.startdatetime
and h.enddatetime > t.createdat
) t
pivot (count(point) for point in ([Morning], [Afternoon], [Evening])) p
rextester demo: http://rextester.com/SKLRG63092
You can using PIVOT on CTE's to derive solution to this problem.
Below is the test table
select * from ping
Below is the sql query
;with details as
(
select userPingId, createdAt as presenceDate , convert(date, createdAt) as
onlyDate,
datepart(hour, createdAt) as onlyHour
from ping
union all
select userPingId, lastUpdatedAt as presenceDate , convert(date,
lastUpdatedAt) as onlyDate,
datepart(hour, lastUpdatedAt) as onlyHour
from ping
)
, cte as
(
select onlyDate,count(*) as count,
case
when onlyHour between 0 and 12 then 'morning'
when onlyHour between 12 and 17 then 'afternoon'
when onlyHour>17 then 'evening'
end as 'period'
from details
group by onlyDate,onlyHour
)
select onlyDate, coalesce(morning,0) as morning,
coalesce(afternoon,0) as afternoon , coalesce(evening,0) as evening from
(
select onlyDate, count,period
from cte ) src
pivot
(
sum(count)
for period in ([morning],[afternoon],[evening])
) p
Below is the final result
This is a fairly similar answer to the one already posted, I just wanted the practice with PIVOT :)
I use a separate table with the time sections in it. this is then cross joined with the number table to create a date and time range for bucketing. i join this to the data and then pivot it (example: https://data.stackexchange.com/stackoverflow/query/750496/bucketing-data-into-date-am-pm-evening-and-pivoting-results)
SELECT
*
FROM (
SELECT
[userPingId],
dt,
[desc]
FROM (
SELECT
DATEADD(D, number, #s) AS dt,
CAST(DATEADD(D, number, #s) AS datetime) + CAST(s AS datetime) AS s,
CAST(DATEADD(D, number, #s) AS datetime) + CAST(e AS datetime) AS e,
[desc]
FROM #numbers
CROSS JOIN #times
WHERE number < DATEDIFF(D, #s, #e)
) ts
INNER JOIN #mytable AS m
ON m.createdat < ts.e
AND m.[lastUpdatedAt] >= ts.s
) src
PIVOT
(
COUNT([userPingId])
FOR [desc] IN ([am], [pm], [ev])
) piv;
the #times table is just:
s e desc
00:00:00.0000000 12:00:00.0000000 am
12:00:00.0000000 17:00:00.0000000 pm
17:00:00.0000000 23:59:59.0000000 ev

SQL Server : Gap / Island, datetime, contiguous block 365 day block

I have a table that looks like this:-
tblMeterReadings
id meter period_start period_end amount
1 1 2014-01-01 00:00 2014-01-01 00:29:59 100.3
2 1 2014-01-01 00:30 2014-01-01 00:59:59 50.5
3 1 2014-01-01 01:00 2014-01-01 01:29:59 70.7
4 1 2014-01-01 01:30 2014-01-01 01:59:59 900.1
5 1 2014-01-01 02:00 2014-01-01 02:29:59 400.0
6 1 2014-01-01 02:30 2014-01-01 02:59:59 200.3
7 1 2014-01-01 03:00 2014-01-01 03:29:59 100.8
8 1 2014-01-01 03:30 2014-01-01 03:59:59 140.3
This is a tiny "contiguous block" from '2014-01-01 00:00' to '2014-01-01 3:59:59'.
In the real table there are "contiguous blocks" of years in length.
I need to find the the period_start and period_end of the most recent CONTINUOUS 365 COMPLETE DAYs (fileterd by meter column).
When I say COMPLETE DAYs I mean a day that has entries spanning 00:00 to 23:59.
When I say CONTINUOUS I mean there must be no days missing.
I would like to select all the rows that make up this block of CONTINUOUS COMPLETE DAYs.
I also need an output like:
block_start block_end total_amount_for_block
2013-02-26 00:00 2014-02-26 23:59:59 1034234.5
This is beyond me, so if someone can solve... I will be very impressed.
Since your granularity is 1 second, you need to expand your periods into all the date/times between the start and end at 1 second intervals. To do this you need to cross join with a numbers table (The numbers table is generated on the fly by ranking object ids from an arbitrary system view, I have limited it to TOP 86400 since this is the number of seconds in a day, and you have stated your time periods never span more than one day):
WITH Numbers AS
( SELECT TOP (86400)
Number = ROW_NUMBER() OVER(ORDER BY a.object_id) - 1
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
ORDER BY a.object_id
)
SELECT r.ID, r.meter, dt.[DateTime]
FROM tblMeterReadings r
CROSS JOIN Numbers n
OUTER APPLY
( SELECT [DateTime] = DATEADD(SECOND, n.Number, r.period_start)
) dt
WHERE dt.[DateTime] <= r.Period_End;
You then have your continuous range in which to perform the normal gaps and islands grouping:
WITH Numbers AS
( SELECT TOP (86400)
Number = ROW_NUMBER() OVER(ORDER BY a.object_id) - 1
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
ORDER BY a.object_id
), Grouped AS
( SELECT r.meter,
Amount = CASE WHEN Number = 1 THEN r.Amount ELSE 0 END,
dt.[DateTime],
GroupingSet = DATEADD(SECOND,
-DENSE_RANK() OVER(PARTITION BY r.Meter
ORDER BY dt.[DateTime]),
dt.[DateTime])
FROM tblMeterReadings r
CROSS JOIN Numbers n
OUTER APPLY
( SELECT [DateTime] = DATEADD(SECOND, n.Number, r.period_start)
) dt
WHERE dt.[DateTime] <= r.Period_End
)
SELECT meter,
PeriodStart = MIN([DateTime]),
PeriodEnd = MAX([DateTime]),
Amount = SUM(Amount)
FROM Grouped
GROUP BY meter, GroupingSet
HAVING DATEADD(YEAR, 1, MIN([DateTime])) < MAX([DateTime]);
N.B. Since the join to Number causes amounts to be duplicated, it is necessary to set all duplicates to 0 using CASE WHEN Number = 1 THEN r.Amount ELSE 0 END, i.e only include the amount for the first row for each ID
Removing the Having clause for your sample data will give:
meter | PeriodStart | PeriodEnd | Amount
------+---------------------+---------------------+----------
1 | 2014-01-01 00:00:00 | 2014-01-01 03:59:59 | 1963
Example on SQL Fiddle
You could try this:
Select MIN(period_start) as "block start"
, MAX(period_end) as "block end"
, SUM(amount) as "total amount"
FROM YourTable
GROUP BY datepart(year, period_start)
, datepart(month, period_start)
, datepart(day, period_start)
, datepart(year, period_end)
, datepart(month, period_end)
, datepart(day, period_end)
Having datepart(year, period_start) = datepart(year, period_end)
AND datepart(month, period_start) = datepart(month, period_end)
AND datepart(day, period_start) = datepart(day, period_end)
AND datepart(hour, MIN(period_start)) = 0
AND datepart(minute,MIN(period_start)) = 0
AND datepart(hour, MAX(period_end)) = 23
AND datepart(minute,MIN(period_end)) = 59

Group table into 15 minute intervals

T-SQL, SQL Server 2008 and up
Given a sample table of
StatusSetDateTime | UserID | Status | StatusEndDateTime | StatusDuration(in seconds)
============================================================================
2012-01-01 12:00:00 | myID | Available | 2012-01-01 13:00:00 | 3600
I need to break that down into a view that uses 15 minute intervals for example:
IntervalStart | UserID | Status | Duration
===========================================
2012-01-01 12:00:00 | myID | Available | 900
2012-01-01 12:15:00 | myID | Available | 900
2012-01-01 12:30:00 | myID | Available | 900
2012-01-01 12:45:00 | myID | Available | 900
2012-01-01 13:00:00 | myID | Available | 0
etc....
Now I've been able to search around and find some queries that will break down
I found something similar for MySql Here :
And something for T-SQL Here
But on the second example they are summing the results whereas I need to divide the total duration by the interval time (900 seconds) by user by status.
I was able to adapt the examples in the second link to split everything into intervals but the total duration time is returned and I cannot quite figure out how to get the Interval durations to split (and still sum up to the total original duration).
Thanks in advance for any insight!
edit : First Attempt
;with cte as
(select MIN(StatusDateTime) as MinDate
, MAX(StatusDateTime) as MaxDate
, convert(varchar(14),StatusDateTime, 120) as StartDate
, DATEPART(minute, StatusDateTime) /15 as GroupID
, UserID
, StatusKey
, avg(StateDuration) as AvgAmount
from AgentActivityLog
group by convert(varchar(14),StatusDateTime, 120)
, DATEPART(minute, StatusDateTime) /15
, Userid,StatusKey)
select dateadd(minute, 15*GroupID, CONVERT(datetime,StartDate+'00'))
as [Start Date]
, UserID, StatusKey, AvgAmount as [Average Amount]
from cte
edit : Second Attempt
;With cte As
(Select DateAdd(minute
, 15 * (DateDiff(minute, '20000101', StatusDateTime) / 15)
, '20000101') As StatusDateTime
, userid, statuskey, StateDuration
From AgentActivityLog)
Select StatusDateTime, userid,statuskey,Avg(StateDuration)
From cte
Group By StatusDateTime,userid,statuskey;
;with cte_max as
(
select dateadd(mi, -15, max(StatusEndDateTime)) as EndTime, min(StatusSetDateTime) as StartTime
from AgentActivityLog
), times as
(
select StartTime as Time from cte_max
union all
select dateadd(mi, 15, c.Time)
from times as c
cross join cte_max as cm
where c.Time <= cm.EndTime
)
select
t.Time, A.UserID, A.Status,
case
when t.Time = A.StatusEndDateTime then 0
else A.StatusDuration / (count(*) over (partition by A.StatusSetDateTime, A.UserID, A.Status) - 1)
end as Duration
from AgentActivityLog as A
left outer join times as t on t.Time >= A.StatusSetDateTime and t.Time <= A.StatusEndDateTime
sql fiddle demo
I've never been comfortable with using date math to split things up into partitions. It seems like there are all kinds of pitfalls to fall into.
What I prefer to do is to create a table (pre-defined, table-valued function, table variable) where there's one row for each date partition range. The table-valued function approach is particularly useful because you can build it for arbitrary ranges and partition sizes as you need. Then, you can join to this table to split things out.
paritionid starttime endtime
---------- ------------- -------------
1 8/1/2012 5:00 8/1/2012 5:15
2 8/1/2012 5:15 8/1/2012 5:30
...
I can't speak to the performance of this method, but I find the queries are much more intuitive.
It is relatively simple if you have a helper table with every 15-minute timestamp, which you JOIN to your base table via BETWEEN. You can build the helper table on the fly or keep it permanently in your database. Simple for the next guy at your company to figure out too:
// declare a table and a timestamp variable
declare #timetbl table(t datetime)
declare #t datetime
// set the first timestamp
set #t = '2012-01-01 00:00:00'
// set the last timestamp, can easily be extended to cover many years
while #t <= '2013-01-01'
begin
// populate the table with a new row, every 15 minutes
insert into #timetbl values (#t)
set #t = dateadd(mi, 15, #t)
end
// now the Select query:
select
tt.t, aal.UserID, aal.Status,
case when aal.StatusEndDateTime <= tt.t then 0 else 900 end as Duration
// using a shortcut for Duration, based on your comment that Start/End are always on the quarter-hour, and thus always 900 seconds or zero
from
#timetbl tt
INNER JOIN AgentActivityLog aal
on tt.t between aal.StatusSetDateTime and aal.StatusEndDateTime
order by
aal.UserID, tt.t
You can use a recursive Common Table Expression, where you keep adding your duration while the StatusEndDateTime is greater than the IntervalStart e.g.
;with cte as (
select StatusSetDateTime as IntervalStart
,UserID
,Status
,StatusDuration/(datediff(mi, StatusSetDateTime, StatusEndDateTime)/15) as Duration
, StatusEndDateTime
From AgentActivityLog
Union all
Select DATEADD(ss, Duration, IntervalStart) as IntervalStart
, UserID
, Status
, case when DATEADD(ss, Duration, IntervalStart) = StatusEndDateTime then 0 else Duration end as Duration
, StatusEndDateTime
From cte
Where IntervalStart < StatusEndDateTime
)
select IntervalStart, UserID, Status, Duration from cte
Here's a query that will do the job for you without requiring helper tables. (I have nothing against helper tables, they are useful and I use them. It is also possible to not use them sometimes.) This query allows for activities to start and end at any times, even if not whole minutes ending in :00, :15, :30, :45. If there will be millisecond portions then you'll have to do some experimenting because, following your model, I only went to second resolution.
If you have a known hard maximum duration, then remove #MaxDuration and replace it with that value, in minutes. N <= #MaxDuration is crucial to the query performing well.
DECLARE #MaxDuration int;
SET #MaxDuration = (SELECT Max(StatusDuration) / 60 FROM #AgentActivityLog);
WITH
L0 AS(SELECT 1 c UNION ALL SELECT 1),
L1 AS(SELECT 1 c FROM L0, L0 B),
L2 AS(SELECT 1 c FROM L1, L1 B),
L3 AS(SELECT 1 c FROM L2, L2 B),
L4 AS(SELECT 1 c FROM L3, L3 B),
L5 AS(SELECT 1 c FROM L4, L4 B),
Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) n FROM L5)
SELECT
S.IntervalStart,
Duration = DateDiff(second, S.IntervalStart, E.IntervalEnd)
FROM
#AgentActivityLog L
CROSS APPLY (
SELECT N, Offset = (N.N - 1) * 900
FROM Nums N
WHERE N <= #MaxDuration
) N
CROSS APPLY (
SELECT Edge =
DateAdd(second, N.Offset, DateAdd(minute,
DateDiff(minute, '20000101', L.StatusSetDateTime)
/ 15 * 15, '20000101')
)
) G
CROSS APPLY (
SELECT IntervalStart = Max(T.BeginTime)
FROM (
SELECT L.StatusSetDateTime
UNION ALL SELECT G.Edge
) T (BeginTime)
) S
CROSS APPLY (
SELECT IntervalEnd = Min(T.EndTime)
FROM (
SELECT L.StatusEndDateTime
UNION ALL SELECT G.Edge + '00:15:00'
) T (EndTime)
) E
WHERE
N.Offset <= L.StatusDuration
ORDER BY
L.StatusSetDateTime,
S.IntervalStart;
Here is setup script if you want to try it:
CREATE TABLE #AgentActivityLog (
StatusSetDateTime datetime,
StatusEndDateTime datetime,
StatusDuration AS (DateDiff(second, 0, StatusEndDateTime - StatusSetDateTime))
);
INSERT #AgentActivityLog -- weird end times
SELECT '20120101 12:00:00', '20120101 13:00:00'
UNION ALL SELECT '20120101 13:00:00', '20120101 13:27:56'
UNION ALL SELECT '20120101 13:27:56', '20120101 13:28:52'
UNION ALL SELECT '20120101 13:28:52', '20120120 11:00:00'
INSERT #AgentActivityLog -- 15-minute quantized end times
SELECT '20120101 12:00:00', '20120101 13:00:00'
UNION ALL SELECT '20120101 13:00:00', '20120101 13:30:00'
UNION ALL SELECT '20120101 13:30:00', '20120101 14:00:00'
UNION ALL SELECT '20120101 14:00:00', '20120120 11:00:00'
Also, here's a version that expects ONLY times that have whole minutes ending in :00, :15, :30, or :45.
DECLARE #MaxDuration int;
SET #MaxDuration = (SELECT Max(StatusDuration) / 60 FROM #AgentActivityLog);
WITH
L0 AS(SELECT 1 c UNION ALL SELECT 1),
L1 AS(SELECT 1 c FROM L0, L0 B),
L2 AS(SELECT 1 c FROM L1, L1 B),
L3 AS(SELECT 1 c FROM L2, L2 B),
L4 AS(SELECT 1 c FROM L3, L3 B),
L5 AS(SELECT 1 c FROM L4, L4 B),
Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) n FROM L5)
SELECT
S.IntervalStart,
Duration = CASE WHEN Offset = StatusDuration THEN 0 ELSE 900 END
FROM
#AgentActivityLog L
CROSS APPLY (
SELECT N, Offset = (N.N - 1) * 900
FROM Nums N
WHERE N <= #MaxDuration
) N
CROSS APPLY (
SELECT IntervalStart = DateAdd(second, N.Offset, L.StatusSetDateTime)
) S
WHERE
N.Offset <= L.StatusDuration
ORDER BY
L.StatusSetDateTime,
S.IntervalStart;
It really seems like having the final 0 Duration row is not correct, because then you can't just order by IntervalStart as there are duplicate IntervalStart values. What is the benefit of having rows that add 0 to the total?

SQL issue - calculate max days sequence

There is a table with visits data:
uid (INT) | created_at (DATETIME)
I want to find how many days in a row a user has visited our app. So for instance:
SELECT DISTINCT DATE(created_at) AS d FROM visits WHERE uid = 123
will return:
d
------------
2012-04-28
2012-04-29
2012-04-30
2012-05-03
2012-05-04
There are 5 records and two intervals - 3 days (28 - 30 Apr) and 2 days (3 - 4 May).
My question is how to find the maximum number of days that a user has visited the app in a row (3 days in the example). Tried to find a suitable function in the SQL docs, but with no success. Am I missing something?
UPD:
Thank you guys for your answers! Actually, I'm working with vertica analytics database (http://vertica.com/), however this is a very rare solution and only a few people have experience with it. Although it supports SQL-99 standard.
Well, most of solutions work with slight modifications. Finally I created my own version of query:
-- returns starts of the vitit series
SELECT t1.d as s FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', -1, t1.d))
WHERE t2.d is null GROUP BY t1.d
s
---------------------
2012-04-28 01:00:00
2012-05-03 01:00:00
-- returns end of the vitit series
SELECT t1.d as f FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', 1, t1.d))
WHERE t2.d is null GROUP BY t1.d
f
---------------------
2012-04-30 01:00:00
2012-05-04 01:00:00
So now only what we need to do is to join them somehow, for instance by row index.
SELECT s, f, DATEDIFF(day, s, f) + 1 as seq FROM (
SELECT t1.d as s, ROW_NUMBER() OVER () as o1 FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', -1, t1.d))
WHERE t2.d is null GROUP BY t1.d
) tbl1 LEFT JOIN (
SELECT t1.d as f, ROW_NUMBER() OVER () as o2 FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', 1, t1.d))
WHERE t2.d is null GROUP BY t1.d
) tbl2 ON o1 = o2
Sample output:
s | f | seq
---------------------+---------------------+-----
2012-04-28 01:00:00 | 2012-04-30 01:00:00 | 3
2012-05-03 01:00:00 | 2012-05-04 01:00:00 | 2
Another approach, the shortest, do a self-join:
with grouped_result as
(
select
sr.d,
sum((fr.d is null)::int) over(order by sr.d) as group_number
from tbl sr
left join tbl fr on sr.d = fr.d + interval '1 day'
)
select d, group_number, count(d) over m as consecutive_days
from grouped_result
window m as (partition by group_number)
Output:
d | group_number | consecutive_days
---------------------+--------------+------------------
2012-04-28 08:00:00 | 1 | 3
2012-04-29 08:00:00 | 1 | 3
2012-04-30 08:00:00 | 1 | 3
2012-05-03 08:00:00 | 2 | 2
2012-05-04 08:00:00 | 2 | 2
(5 rows)
Live test: http://www.sqlfiddle.com/#!1/93789/1
sr = second row, fr = first row ( or perhaps previous row? ツ ). Basically we are doing a back tracking, it's a simulated lag on database that doesn't support LAG (Postgres supports LAG, but the solution is very long, as windowing doesn't support nested windowing). So in this query, we uses a hybrid approach, simulate LAG via join, then use SUM windowing against it, this produces group number
UPDATE
Forgot to put the final query, the query above illustrate the underpinnings of group numbering, need to morph that into this:
with grouped_result as
(
select
sr.d,
sum((fr.d is null)::int) over(order by sr.d) as group_number
from tbl sr
left join tbl fr on sr.d = fr.d + interval '1 day'
)
select min(d) as starting_date, max(d) as end_date, count(d) as consecutive_days
from grouped_result
group by group_number
-- order by consecutive_days desc limit 1
STARTING_DATE END_DATE CONSECUTIVE_DAYS
April, 28 2012 08:00:00-0700 April, 30 2012 08:00:00-0700 3
May, 03 2012 08:00:00-0700 May, 04 2012 08:00:00-0700 2
UPDATE
I know why my other solution that uses window function became long, it became long on my attempt to illustrate the logic of group numbering and counting over the group. If I'd cut to the chase like in my MySql approach, that windowing function could be shorter. Having said that, here's my old windowing function approach, albeit better now:
with headers as
(
select
d,lag(d) over m is null or d - lag(d) over m <> interval '1 day' as header
from tbl
window m as (order by d)
)
,sequence_group as
(
select d, sum(header::int) over (order by d) as group_number
from headers
)
select min(d) as starting_date,max(d) as ending_date,count(d) as consecutive_days
from sequence_group
group by group_number
-- order by consecutive_days desc limit 1
Live test: http://www.sqlfiddle.com/#!1/93789/21
In MySQL you could do this:
SET #nextDate = CURRENT_DATE;
SET #RowNum = 1;
SELECT MAX(RowNumber) AS ConecutiveVisits
FROM ( SELECT #RowNum := IF(#NextDate = Created_At, #RowNum + 1, 1) AS RowNumber,
Created_At,
#NextDate := DATE_ADD(Created_At, INTERVAL 1 DAY) AS NextDate
FROM Visits
ORDER BY Created_At
) Visits
Example here:
http://sqlfiddle.com/#!2/6e035/8
However I am not 100% certain this is the best way to do it.
In Postgresql:
;WITH RECURSIVE VisitsCTE AS
( SELECT Created_At, 1 AS ConsecutiveDays
FROM Visits
UNION ALL
SELECT v.Created_At, ConsecutiveDays + 1
FROM Visits v
INNER JOIN VisitsCTE cte
ON 1 + cte.Created_At = v.Created_At
)
SELECT MAX(ConsecutiveDays) AS ConsecutiveDays
FROM VisitsCTE
Example here:
http://sqlfiddle.com/#!1/16c90/9
I know Postgresql has something similar to common table expressions as available in MSSQL. I'm not that familiar with Postgresql, but the code below works for MSSQL and does what you want.
create table #tempdates (
mydate date
)
insert into #tempdates(mydate) values('2012-04-28')
insert into #tempdates(mydate) values('2012-04-29')
insert into #tempdates(mydate) values('2012-04-30')
insert into #tempdates(mydate) values('2012-05-03')
insert into #tempdates(mydate) values('2012-05-04');
with maxdays (s, e, c)
as
(
select mydate, mydate, 1
from #tempdates
union all
select m.s, mydate, m.c + 1
from #tempdates t
inner join maxdays m on DATEADD(day, -1, t.mydate)=m.e
)
select MIN(o.s),o.e,max(o.c)
from (
select m1.s,max(m1.e) e,max(m1.c) c
from maxdays m1
group by m1.s
) o
group by o.e
drop table #tempdates
And here's the SQL fiddle: http://sqlfiddle.com/#!3/42b38/2
All are very good answers, but I think I should contribute by showing another approach utilizing an analytical capability specific to Vertica (after all it is part of what you paid for). And I promise the final query is short.
First, query using conditional_true_event(). From Vertica's documentation:
Assigns an event window number to each row, starting from 0, and
increments the number by 1 when the result of the boolean argument
expression evaluates true.
The example query looks like this:
select uid, created_at,
conditional_true_event( created_at - lag(created_at) > '1 day' )
over (partition by uid order by created_at) as seq_id
from visits;
And output:
uid created_at seq_id
--- ------------------- ------
123 2012-04-28 00:00:00 0
123 2012-04-29 00:00:00 0
123 2012-04-30 00:00:00 0
123 2012-05-03 00:00:00 1
123 2012-05-04 00:00:00 1
123 2012-06-04 00:00:00 2
123 2012-06-04 00:00:00 2
Now the final query becomes easy:
select uid, seq_id, count(1) num_days, min(created_at) s, max(created_at) f
from
(
select uid, created_at,
conditional_true_event( created_at - lag(created_at) > '1 day' )
over (partition by uid order by created_at) as seq_id
from visits
) as seq
group by uid, seq_id;
Final Output:
uid seq_id num_days s f
--- ------ -------- ------------------- -------------------
123 0 3 2012-04-28 00:00:00 2012-04-30 00:00:00
123 1 2 2012-05-03 00:00:00 2012-05-04 00:00:00
123 2 2 2012-06-04 00:00:00 2012-06-04 00:00:00
One final note:
num_days is actually number of rows of the inner query. If there are two '2012-04-28' visits in the original table (i.e. duplicates), you might want to work around that.
The following should be Oracle friendly, and not require recursive logic.
;WITH
visit_dates (
visit_id,
date_id,
group_id
)
AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY TRUNC(created_at)),
TRUNC(SYSDATE) - TRUNC(created_at),
TRUNC(SYSDATE) - TRUNC(created_at) - ROW_NUMBER() OVER (ORDER BY TRUNC(created_at))
FROM
visits
GROUP BY
TRUNC(created_at)
)
,
group_duration (
group_id,
duration
)
AS
(
SELECT
group_id,
MAX(date_id) - MIN(date_id) + 1 AS duration
FROM
visit_dates
GROUP BY
group_id
)
SELECT
MAX(duration) AS max_duration
FROM
group_duration
Postgresql:
with headers as
(
select
d,
lag(d) over m is null or d - lag(d) over m <> interval '1 day' as header
from tbl
window m as (order by d)
)
,sequence_group as
(
select d, sum(header::int) over m as group_number
from headers
window m as (order by d)
)
,consecutive_list as
(
select d, group_number, count(d) over m as consecutive_count
from sequence_group
window m as (partition by group_number)
)
select * from consecutive_list
Divide-and-conquer approach: 3 steps
1st step, find headers:
with headers as
(
select
d,
lag(d) over m is null or d - lag(d) over m <> interval '1 day' as header
from tbl
window m as (order by d)
)
select * from headers
Output:
d | header
---------------------+--------
2012-04-28 08:00:00 | t
2012-04-29 08:00:00 | f
2012-04-30 08:00:00 | f
2012-05-03 08:00:00 | t
2012-05-04 08:00:00 | f
(5 rows)
2nd step, designate grouping:
with headers as
(
select
d,
lag(d) over m is null or d - lag(d) over m <> interval '1 day' as header
from tbl
window m as (order by d)
)
,sequence_group as
(
select d, sum(header::int) over m as group_number
from headers
window m as (order by d)
)
select * from sequence_group
Output:
d | group_number
---------------------+--------------
2012-04-28 08:00:00 | 1
2012-04-29 08:00:00 | 1
2012-04-30 08:00:00 | 1
2012-05-03 08:00:00 | 2
2012-05-04 08:00:00 | 2
(5 rows)
3rd step, count max days:
with headers as
(
select
d,
lag(d) over m is null or d - lag(d) over m <> interval '1 day' as header
from tbl
window m as (order by d)
)
,sequence_group as
(
select d, sum(header::int) over m as group_number
from headers
window m as (order by d)
)
,consecutive_list as
(
select d, group_number, count(d) over m as consecutive_count
from sequence_group
window m as (partition by group_number)
)
select * from consecutive_list
Output:
d | group_number | consecutive_count
---------------------+--------------+-----------------
2012-04-28 08:00:00 | 1 | 3
2012-04-29 08:00:00 | 1 | 3
2012-04-30 08:00:00 | 1 | 3
2012-05-03 08:00:00 | 2 | 2
2012-05-04 08:00:00 | 2 | 2
(5 rows)
This is for MySQL, the shortest, and uses minimal variable (one variable only):
select
min(d) as starting_date, max(d) as ending_date,
count(d) as consecutive_days
from
(
select
sr.d,
IF(fr.d is null,#group_number := #group_number + 1,#group_number)
as group_number
from tbl sr
left join tbl fr on sr.d = adddate(fr.d,interval 1 day)
cross join (select #group_number := 0) as grp
) as x
group by group_number
Output:
STARTING_DATE ENDING_DATE CONSECUTIVE_DAYS
April, 28 2012 08:00:00-0700 April, 30 2012 08:00:00-0700 3
May, 03 2012 08:00:00-0700 May, 04 2012 08:00:00-0700 2
Live test: http://www.sqlfiddle.com/#!2/65169/1
For PostgreSQL 8.4 or later, there is a short and clean way with window functions and no JOIN.
I'd expect this to be the fastest solution posted so far:
WITH x AS (
SELECT created_at AS d
, lag(created_at) OVER (ORDER BY created_at) = (created_at - 1) AS nu
FROM visits
WHERE uid = 1
)
, y AS (
SELECT d, count(NULLIF(nu, TRUE)) OVER (ORDER BY d) AS seq
FROM x
)
SELECT count(*) AS max_days, min(d) AS seq_from, max(d) AS seq_to
FROM y
GROUP BY seq
ORDER BY 1 DESC
LIMIT 1;
Returns:
max_days | seq_from | seq_to
---------+------------+-----------
3 | 2012-04-28 | 2012-04-30
Assuming that created_at is a date and unique.
In CTE x: for every day our user visits, check if he was here yesterday, too.
To calculate "yesterday" just use created_at - 1 The first row is a special case and will produce NULL here.
In CTE y: calculate a running count of "days without yesterday so far" (seq) for every day. NULL values don't count, so count(NULLIF(nu, TRUE)) is the fastes and shortest way, also covering the special case.
Finally, group days per seq and count the days. While being at it I added first and last day of the sequence.
ORDER BY length of the sequence, and pick the longest one.
Upon seeing OP's query approach for their Vertica database, I tried making the two joins run at the same time:
These Postgresql and Sql Server query versions shall both work in Vertica
Postgresql version:
select
min(gr.d) as start_date,
max(gr.d) as end_date,
date_part('day', max(gr.d) - min(gr.d))+1 as consecutive_days
from
(
select
cr.d, (row_number() over() - 1) / 2 as pair_number
from tbl cr
left join tbl pr on pr.d = cr.d - interval '1 day'
left join tbl nr on nr.d = cr.d + interval '1 day'
where pr.d is null <> nr.d is null
) as gr
group by pair_number
order by start_date
Regarding pr.d is null <> nr.d is null. It means, it's either the previous row is null or next row is null, but they can never both be null, so this basically removes the non-consecutive dates, as non-consecutive dates' previous & next row are nulls (and this basically gives us all dates that are just headers and footers only). This is also called an XOR operation
If we are left with consecutive dates only, we can now pair them via row_number:
(row_number() over() - 1) / 2 as pair_number
row_number() starts with 1, we need to subtract it with 1 (we can also add with 1 instead), then we divide it by two; this makes the paired date adjacent to each other
Live test: http://www.sqlfiddle.com/#!1/fc440/7
This is the Sql Server version:
select
min(gr.d) as start_date,
max(gr.d) as end_date,
datediff(day, min(gr.d),max(gr.d)) +1 as consecutive_days
from
(
select
cr.d, (row_number() over(order by cr.d) - 1) / 2 as pair_number
from tbl cr
left join tbl pr on pr.d = dateadd(day,-1,cr.d)
left join tbl nr on nr.d = dateadd(day,+1,cr.d)
where
case when pr.d is null then 1 else 0 end
<> case when nr.d is null then 1 else 0 end
) as gr
group by pair_number
order by start_date
Same logic as above, except for artificial differences on date functions. And sql Server requires an ORDER BY clause on its OVER, while Postgresql's OVER can be left empty.
Sql Server has no first class boolean, that's why we cannot compare booleans directly:
pr.d is null <> nr.d is null
We must do this in Sql Server:
case when pr.d is null then 1 else 0 end
<> case when nr.d is null then 1 else 0 end
Live test: http://www.sqlfiddle.com/#!3/65df2/17
There have already been several answers to this question. However the SQL statements all seem too complex. This can be accomplished with basic SQL, a way to enumerate rows, and some date arithmetic.
The key observation is that if you have a bunch of days and have a parallel sequence of integers, then the difference is a constant date when the days are in a sequence.
The following query uses this observation to answer the original question:
select uid, min(d) as startdate, count(*) as numdaysinseq
from
(
select uid, d, adddate(d, interval -offset day) as groupstart
from
(
select uid, d, row_number() over (partition by uid order by date) as offset
from
(
SELECT DISTINCT uid, DATE(created_at) AS d
FROM visits
) t
) t
) t
Alas, mysql does not have the row_number() function. However, there is a work-around with variables (and most other databases do have this function).