Lag functions and SUM - sql

I need to get the list of users that have been offline for at least 20 min every day. Here's my data
I have this starting query but am stuck on how to sum the difference in offline_mins i.e. need to add "and sum(offline_mins)>=20" to the where clause
SELECT
userid,
connected,
LAG(recordeddt) OVER(PARTITION BY userid
ORDER BY userid,
recordeddt) AS offline_period,
DATEDIFF(minute, LAG(recordeddt) OVER(PARTITION BY userid
ORDER BY userid,
recordeddt),recordeddt) offline_mins
FROM device_data where connected=0;
My expected results :
Thanks in advance.

This reads like a gaps-and-island problem, where you want to group together adjacent rows having the same userid and status.
As a starter, here is a query that computes the islands:
select userid, connected, min(recordeddt) startdt, max(lead_recordeddt) enddt,
datediff(min(recordeddt), max(lead_recordeddt)) duration
from (
select dd.*,
row_number() over(partition by userid order by recordeddt) rn1,
row_number() over(partition by userid, connected order by recordeddt) rn2,
lead(recordeddt) over(partition by userid order by recordeddt) lead_recordeddt
from device_data dd
) dd
group by userid, connected, rn1 - rn2
Now, say you want users that were offline for at least 20 minutes every day. You can breakdown the islands per day, and use a having clause for filtering:
select userid
from (
select recordedday, userid, connected,
datediff(min(recordeddt), max(lead_recordeddt)) duration
from (
select dd.*, v.*,
row_number() over(partition by v.recordedday, userid order by recordeddt) rn1,
row_number() over(partition by v.recordedday, userid, connected order by recordeddt) rn2,
lead(recordeddt) over(partition by v.recordedday, userid order by recordeddt) lead_recordeddt
from device_data dd
cross apply (values (convert(date, recordeddt))) v(recordedday)
) dd
group by convert(date, recordeddt), userid, connected, rn1 - rn2
) dd
group by userid
having count(distinct case when connected = 0 and duration >= 20 then recordedday end) = count(distinct recordedday)

As noted this is a gaps and island problem. This is my take on it using a simple lag function to create groups, filter out the connected rows and then work on the date ranges.
CREATE TABLE #tmp(ID int, UserID int, dt datetime, connected int)
INSERT INTO #tmp VALUES
(1,1,'11/2/20 10:00:00',1),
(2,1,'11/2/20 10:05:00',0),
(3,1,'11/2/20 10:10:00',0),
(4,1,'11/2/20 10:15:00',0),
(5,1,'11/2/20 10:20:00',0),
(6,2,'11/2/20 10:00:00',1),
(7,2,'11/2/20 10:05:00',1),
(8,2,'11/2/20 10:10:00',0),
(9,2,'11/2/20 10:15:00',0),
(10,2,'11/2/20 10:20:00',0),
(11,2,'11/2/20 10:25:00',0),
(12,2,'11/2/20 10:30:00',0)
SELECT UserID, connected,DATEDIFF(minute,MIN(DT), MAX(DT)) OFFLINE_MINUTES
FROM
(
SELECT *, SUM(CASE WHEN connected <> LG THEN 1 ELSE 0 END) OVER (ORDER BY UserID,dt) grp
FROM
(
select *, LAG(connected,1,connected) OVER(PARTITION BY UserID ORDER BY UserID,dt) LG
from #tmp
) x
) y
WHERE connected <> 1
GROUP BY UserID,grp,connected
HAVING DATEDIFF(minute,MIN(DT), MAX(DT)) >= 20

Related

Getting category based on production shift

I have this query
with cte as(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY seq ORDER BY date_time) rn1,
ROW_NUMBER() OVER (PARTITION BY seq, output > 0
ORDER BY date_time) rn2
FROM myTable
;
select
seq,
date_time::date,
MIN(date_time) AS MinDatetime,
MAX(date_time) AS MaxDatetime,
SUM(output) AS sum_output
FROM cte cte
GROUP by
seq,
date_time::date ,
cntpr > 0,
rn1 - rn2
ORDER BY
seq,
MIN(date_time);
here's the result:
what I would like to do is to join my result to this master table
enter image description here
and the expected result will be MinDatetime and MaxDatetime among my master table's start and end shift to show the shift information, like this:
enter image description here
Any help would be very appreciated.. thank you!
This is the solution I came up with:
select seq, shift, start_shift, end_shift, MinDateTime, MaxDateTime
from
(
select
seq,
MIN(date_time) AS MinDatetime,
MAX(date_time) AS MaxDatetime,
SUM(output) AS sum_output
FROM cte cte
GROUP by
seq
ORDER BY
seq,
MIN(date_time::date)) t
join mstr
on
CASE
WHEN start_shift < end_shift THEN (MinDateTime::time between start_shift and end_shift) OR (MaxDateTime::time between start_shift and end_shift)
ELSE (MinDateTime::time >= start_shift) OR
(MaxDateTime::time >= start_shift) OR
(MinDateTime::time <= end_shift) OR
(MaxDateTime::time <= end_shift)
END
ORDER BY seq;
Fiddle: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/4208
Explanation: I get the groups, join them with master table on interval matching.

How to generate session_id by sql?

My tracking system do not generate sessions IDS.
I have user_id & event_date_time.
I need a new session_id for each user's session that starts 30 minutes or more after last event_date_time of each user.
My final goal is to calculate median session time.
I tried to generate session_id=1 and session_id=2 once event_date_time-next_event_time>30 and guid=guid, but i'm stuck from here
select a.*,
case when (a.next_event_date-a.event_date)*24*60<30 and userID=next_userID
then 1
when (a.next_event_date-a.event_date)*24*60>=30 and userID=next_userID then
2
end session_id
from
(select f.userID,
lead(f.userID) over (partition by f.guid order by f.event_date)
next_guid,
f.event_date,
lead(f.event_date) over (partition by f.guid order by f.event_date)
next_event_date
from event_table f
)a
where next_event_date is not null
If I understood correctly you could generate ID's this way:
select id, guid, event_date,
sum(chg) over (partition by guid order by event_date) session_id
from (
select id, guid, event_date,
case when lag(guid) over (partition by guid order by event_date) = guid
and 24 * 60 * (event_date -lag(event_date)
over (partition by guid order by event_date) ) < 30
then 0 else 1
end chg
from event_table ) a
dbfiddle demo
Compare neighbouring rows, if there are different guids or time difference is greater than 30 minutes then assign 1. Then sum these values analytically.
I think you're on the right track using lead or lag. My recommendation would be to break this into steps and create a temp table to work against:
With the first query, assign every record its own unique ID, either a sequence number or GUID. You could also capture some of the lagged data in this step.
With a second query, find the overlaps (< 30 minutes) and make the overlapping records all the same -- either the same as the earliest or latest in that grouping, doesn't matter as long as it's consistent.
Something like this:
create table events_temp as (
select f.*,
row_number() over (partition by f.userID order by f.event_date) as user_row,
lag(f.userID) over (partition by f.userID order by f.event_date) as prev_userID,
lag(f.event_date) over (partition by f.userID order by f.event_date) as prev_event_date
from event_table f
order by f.userId, f.event_date
)
select a.*,
case when prev_userID = userID
and 24 * 60 * (event_date - prev_event_date) < 30
then lag(user_row) over (partition by userID order by user_row)
else user_row
end as session_id
from events_temp

Max dates for each sequence within partitions

I would like to see if somebody has an idea how to get the max and min dates within each 'id' using the 'row_num' column as an indicator when the sequence starts/ends in SQL Server 2016.
The screenshot below shows the desired output in columns 'min_date' and 'max_date'.
Any help would be appreciated.
You could use windowed MIN/MAX:
WITH cte AS (
SELECT *,SUM(CASE WHEN row_num > 1 THEN 0 ELSE 1 END)
OVER(PARTITION BY id, cat ORDER BY date_col) AS grp
FROM tab
)
SELECT *, MIN(date_col) OVER(PARTITION BY id, cat, grp) AS min_date,
MAX(date_col) OVER(PARTITION BY id, cat, grp) AS max_date
FROM cte
ORDER BY id, date_col, cat;
Rextester Demo
Try something like
SELECT
Q1.id, Q1.cat,
MIN(Q1.date) AS min_dat,
MAX(Q1.date) AS max_dat
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id, cat ORDER BY [date]) AS r1,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY [date]) AS r2
) AS Q1
GROUP BY
Q1.id, Q1.r2 - Q1.r1

Return max value from a SQL selection

I do have a table license_Usage where which works like a log of the usage of licenses in a day
ID User license date
1 1 A 22/2/2015
2 1 A 23/2/2015
3 1 B 22/2/2015
4 2 A 22/2/2015
Where I want to Count how many licenses per user in a day, the result shoul look like:
QuantityOfLicenses User date
2 1 22/2/2015
1 2 22/2/2015
For that I did the following query :
select count(license) as [Quantity of licenses],[user],[date]
From license_Usage
where date = '22/2/2015'
Group by [date], [user]
which works, but know I want to know which user have used the most number of licenses, for that I did the following query:
select MAX(result.[Quantity of licenses])
From (
select count(license) as [Quantity of licenses],[user],[date]
From license_Usage
Group by [date], [user]
) as result
And it returns the max value of 2, but when I want to know which user have used 2 licenses,I try this query with no success :
select result.user, MAX(result.[Quantity of licenses])
From (
select count(license) as [Quantity of licenses],[user],[date]
From license_Usage
Group by [date], [user]
) as result
Group by result.user
You can use something like this:
select top 1 *
From (
select count(license) as Quantity,[user],[date]
From license_Usage
Group by [date], [user]
) as result
order by Quantity desc
If you need to have a fetch that fetches all the rows that have max in case there's several, then you'll have to use rank() window function
Use RANK to rank the users by the number of licenses per day.
SELECT
LicPerDay.*,
RANK() OVER (PARTITION BY [date] ORDER BY Qty DESC) AS User_Rank
FROM (
SELECT
COUNT(license) AS Qty,
User,
[date]
FROM license_usage
GROUP BY User, [date]
) LicPerDay
Any user with User_Rank = 1 will have the most licenses for that day.
If you only want the top user for each day, wrap the query above as a subquery and filter on User_Rank = 1:
SELECT * FROM (
SELECT
LicPerDay.*,
RANK() OVER (PARTITION BY [date] ORDER BY Qty) AS User_Rank
FROM (
SELECT
COUNT(license) AS Qty,
User,
[date]
FROM license_usage
GROUP BY User, [date]
) LicPerDay
) LicPerDayRanks
WHERE User_Rank = 1
Use a Windowed Aggregate Function, RANK, to get the highest count:
SELECT * FROM (
SELECT
User,
[date]
COUNT(license) AS Qty,
-- rank by descending number for each day ??
--RANK() OVER (PARTITION BY [date] ORDER BY COUNT(license) DESC) AS rnk
-- rank by descending number
RANK() OVER (ORDER BY COUNT(license) DESC) AS rnk
FROM license_usage
GROUP BY User, [date]
) dt
WHERE rnk = 1

Doing a comparison using the previous row?

I'm trying to work out an efficient way of comparing two rows in SQL Server 2008. I need to write a query which finds all rows in the Movement table which have Speed < 10 N consecutive times.
The structure of the table is:
EventTime
Speed
If the data were:
2012-02-05 13:56:36.980, 2
2012-02-05 13:57:36.980, 11
2012-02-05 13:57:46.980, 2
2012-02-05 13:59:36.980, 2
2012-02-05 14:06:36.980, 22
2012-02-05 15:56:36.980, 2
Then it would return rows 3/4 (13:57:46.980 / 13:59:36.980) if I looked for 2 consecutive rows, and would return nothing if I looked for three consecutive rows. The order of the data is EventTime/DateTime only.
Any help you could give me would be great. I'm considering using cursors but they're usually pretty inefficient. Also, this table is approximately 10m rows in size, so the more efficient the better! :)
Thanks!
DECLARE
#n INT,
#speed_limit INT
SELECT
#n = 5,
#speed_limit = 10
;WITH
partitioned AS
(
SELECT
*,
CASE WHEN speed < #speed_limit THEN 1 ELSE 0 END AS PartitionID
FROM
Movement
)
,
sequenced AS
(
SELECT
ROW_NUMBER() OVER ( ORDER BY EventTime) AS MasterSeqID,
ROW_NUMBER() OVER (PARTITION BY PartitionID ORDER BY EventTime) AS PartIDSeqID,
*
FROM
partitioned
)
,
filter AS
(
SELECT
MasterSeqID - PartIDSeqID AS GroupID,
MIN(MasterSeqID) AS GroupFirstMastSeqID,
MAX(MasterSeqID) AS GroupFinalMastSeqID
FROM
sequenced
WHERE
PartitionID = 1
GROUP BY
MasterSeqID - PartIDSeqID
HAVING
COUNT(*) >= #n
)
SELECT
sequenced.*
FROM
filter
INNER JOIN
sequenced
ON sequenced.MasterSeqID >= filter.GroupFirstMastSeqID
AND sequenced.MasterSeqID <= filter.GroupFinalMastSeqID
Alternative final steps (inspired by #t-clausen-dk), to avoid an additional JOIN. I would test both to see which is more performant.
,
filter AS
(
SELECT
MasterSeqID - PartIDSeqID AS GroupID,
COUNT(*) OVER (PARTITION BY MasterSeqID - PartIDSeqID) AS GroupSize,
*
FROM
sequenced
WHERE
PartitionID = 1
)
SELECT
*
FROM
filter
WHERE
GroupSize >= #n
declare #t table(EventTime datetime, Speed int)
insert #t values('2012-02-05 13:56:36.980', 2)
insert #t values('2012-02-05 13:57:36.980', 11)
insert #t values('2012-02-05 13:57:46.980', 2)
insert #t values('2012-02-05 13:59:36.980', 2)
insert #t values('2012-02-05 14:06:36.980', 22)
insert #t values('2012-02-05 15:56:36.980', 2)
declare #N int = 1
;with a as
(
select EventTime, Speed, row_number() over (order by EventTime) rn from #t
), b as
(
select EventTime, Speed, 1 grp, rn from a where rn = 1
union all
select a.EventTime, a.Speed, case when a.speed < 10 and b.speed < 10 then grp else grp + 1 end, a.rn
from a join b on a.rn = b.rn+1
), c as
(
select EventTime, Speed, count(*) over (partition by grp) cnt from b
)
select * from c
where cnt > #N
OPTION (MAXRECURSION 0) -- Thx Dems
Almost the same ideea as Dems, a little bit different:
select * from (
select eventtime, speed, rnk, new_rnk,
rnk - new_rnk,
max(rnk) over (partition by speed, new_rnk-rnk) -
min(rnk) over (partition by speed, new_rnk-rnk) + 1 as no_consec
from (
select eventtime, rnk, speed,
row_number() over (partition by speed order by eventtime) as new_rnk
from (
select eventtime, speed,
row_number() over (order by eventtime) as rnk
from a
) a
where a.speed < 5
)
order by eventtime
)
where no_consec >= 2;
5 is speed limit and 2 is min number of consecutive events.
I put date as number for simplicity of writing the create database.
SQLFIDDLE
EDIT:
To answer to comments, I've added three columns in the first inner query. To get only the first row you need to add an pos_in_group = 1 to WHERE clause and the distance is at your fingers.
SQLFIDDLE
select eventtime, speed, min_date, max_date, pos_in_group
from (
select eventtime, speed, rnk, new_rnk,
rnk - new_rnk,
row_number() over (partition by speed, new_rnk-rnk order by eventtime) pos_in_group,
min(eventtime) over (partition by speed, new_rnk-rnk) min_date,
max(eventtime) over (partition by speed, new_rnk-rnk) max_date,
max(rnk) over (partition by speed, new_rnk-rnk) -
min(rnk) over (partition by speed, new_rnk-rnk) + 1 as no_consec
from (
select eventtime, rnk, speed,
row_number() over (partition by speed order by eventtime) as new_rnk
from (
select eventtime, speed,
row_number() over (order by eventtime) as rnk
from a
) a
where a.speed < 5
)
order by eventtime
)
where no_consec > 1;