How to find the max time difference in PostgreSQL that a value stayed on the same state?

How to find the max time difference in PostgreSQL that a value stayed on the same state? - sql

I am working for a university project and this came up to me:
I have a table like this:
And I want to get the max duration that an actuator was on a state. For example, cool0 Was for 18 minutes.
The result table should look like this:
NAME COOL0
State False
Duration 18

This is a gaps and islands problem. Your data is a bit hard to follow, but I think:
select actuator, state, min(actuator_time), max(actuator_time)
from (select t.*,
row_number() over (partition by actuator order by actuator_time) as seqnum,
row_number() over (partition by actuator, state order by actuator_time) as seqnum_s
from t
) t
group by actuator, (seqnum- seqnum_s)
For the maximum per actuator, use distinct on:
select distinct on (actuator) actuator, state, min(actuator_time), max(actuator_time)
from (select t.*,
row_number() over (partition by actuator order by actuator_time) as seqnum,
row_number() over (partition by actuator, state order by actuator_time) as seqnum_s
from t
) t
group by actuator, (seqnum- seqnum_s)
order by actuator, max(actuator_time) - min(actuator_time) desc;

Related

Lag functions and SUM

I need to get the list of users that have been offline for at least 20 min every day. Here's my data
I have this starting query but am stuck on how to sum the difference in offline_mins i.e. need to add "and sum(offline_mins)>=20" to the where clause
SELECT
userid,
connected,
LAG(recordeddt) OVER(PARTITION BY userid
ORDER BY userid,
recordeddt) AS offline_period,
DATEDIFF(minute, LAG(recordeddt) OVER(PARTITION BY userid
ORDER BY userid,
recordeddt),recordeddt) offline_mins
FROM device_data where connected=0;
My expected results :
Thanks in advance.

This reads like a gaps-and-island problem, where you want to group together adjacent rows having the same userid and status.
As a starter, here is a query that computes the islands:
select userid, connected, min(recordeddt) startdt, max(lead_recordeddt) enddt,
datediff(min(recordeddt), max(lead_recordeddt)) duration
from (
select dd.*,
row_number() over(partition by userid order by recordeddt) rn1,
row_number() over(partition by userid, connected order by recordeddt) rn2,
lead(recordeddt) over(partition by userid order by recordeddt) lead_recordeddt
from device_data dd
) dd
group by userid, connected, rn1 - rn2
Now, say you want users that were offline for at least 20 minutes every day. You can breakdown the islands per day, and use a having clause for filtering:
select userid
from (
select recordedday, userid, connected,
datediff(min(recordeddt), max(lead_recordeddt)) duration
from (
select dd.*, v.*,
row_number() over(partition by v.recordedday, userid order by recordeddt) rn1,
row_number() over(partition by v.recordedday, userid, connected order by recordeddt) rn2,
lead(recordeddt) over(partition by v.recordedday, userid order by recordeddt) lead_recordeddt
from device_data dd
cross apply (values (convert(date, recordeddt))) v(recordedday)
) dd
group by convert(date, recordeddt), userid, connected, rn1 - rn2
) dd
group by userid
having count(distinct case when connected = 0 and duration >= 20 then recordedday end) = count(distinct recordedday)

As noted this is a gaps and island problem. This is my take on it using a simple lag function to create groups, filter out the connected rows and then work on the date ranges.
CREATE TABLE #tmp(ID int, UserID int, dt datetime, connected int)
INSERT INTO #tmp VALUES
(1,1,'11/2/20 10:00:00',1),
(2,1,'11/2/20 10:05:00',0),
(3,1,'11/2/20 10:10:00',0),
(4,1,'11/2/20 10:15:00',0),
(5,1,'11/2/20 10:20:00',0),
(6,2,'11/2/20 10:00:00',1),
(7,2,'11/2/20 10:05:00',1),
(8,2,'11/2/20 10:10:00',0),
(9,2,'11/2/20 10:15:00',0),
(10,2,'11/2/20 10:20:00',0),
(11,2,'11/2/20 10:25:00',0),
(12,2,'11/2/20 10:30:00',0)
SELECT UserID, connected,DATEDIFF(minute,MIN(DT), MAX(DT)) OFFLINE_MINUTES
FROM
(
SELECT *, SUM(CASE WHEN connected <> LG THEN 1 ELSE 0 END) OVER (ORDER BY UserID,dt) grp
FROM
(
select *, LAG(connected,1,connected) OVER(PARTITION BY UserID ORDER BY UserID,dt) LG
from #tmp
) x
) y
WHERE connected <> 1
GROUP BY UserID,grp,connected
HAVING DATEDIFF(minute,MIN(DT), MAX(DT)) >= 20

sql how can I do get the first and last date when in two columns different rows (islands problem)

I think this problem is called islands and I'm looking on the net but not getting it.
I have a table where I need to get the start date and end date (different columns) in a range.
The table has 100,000 rows and I want to group it down so result will be
I have created a http://sqlfiddle.com/#!18/f4800/1
From the internet I think I need to create rows so have this now:
But I'm stuck thinking over what my next step will be.

You need row_number() instead of dense_rank() & use the difference of sequences :
select [CodeID], min([DATE_START]) as DATE_START,
max(DATE_FINISH) as DATE_FINISH, state
from (select [CodeID],[DATE_START],[DATE_FINISH],[STATE],
row_number() over(partition by [CodeID] order by [DATE_START]) as seq1,
row_number() over(partition by [CodeID],[STATE] order by [DATE_START]) as seq2
from Row_State
--where codeid = 'code1'
) t
group by [CodeID], state, (seq1-seq2)
order by CodeID, DATE_START;
Here is db fiddle.

If you know that the final result will be tiled in time with no gaps, then you can also use lag() and lead() like this:
select code_id, state, date_start,
lead(date_start) over (partition by code_id order by date_start) - interval '1 day' as day_end
from (select rs.*,
lag(state) over (partition by code_id order by date_start) as prev_state
from row_state rs
) rs
where prev_state is null or prev_state <> state;
The only issue with this version is that it does not correctly calculate the final date. But for that:
select code_id, state, date_start,
coalesce(dateadd(day, -1, lead(date_start) over (partition by code_id order by date_start)),
max_date_end
) as day_end
from (select rs.*,
lag(state) over (partition by code_id order by date_start) as prev_state,
max(date_end) over (partition by code_id) as max_date_end
from row_state rs
) rs
where prev_state is null or prev_state <> state;
This could be faster than an approach that uses aggregation.

Calculate percent changes in contiguous ranges in Postgresql

I need to calculate price percent change in contiguous ranges. For example if price start moving up or down and I have sequence of decreasing or increasing values I need to grab first and last value of that sequence and calculate the change.
I'm using window lag function to calculate direction, my problem- I can't generate unique RANK for the sequences to calculate percent changes.
I tired combination of RANK, ROW_NUMBER, etc. with no luck.
Here's my query
WITH partitioned AS (
SELECT
*,
lag(price, 1) over(ORDER BY time) AS lag_price
FROM prices
),
sequenced AS (
SELECT
*,
CASE
WHEN price > lag_price THEN 'up'
WHEN price < lag_price THEN 'down'
ELSE 'equal'
END
AS direction
FROM partitioned
),
ranked AS (
SELECT
*,
-- Here's is the problem
-- I need to calculate unique rnk value for specific sequence
DENSE_RANK() OVER ( PARTITION BY direction ORDER BY time) + ROW_NUMBER() OVER ( ORDER BY time DESC) AS rnk
-- DENSE_RANK() OVER ( PARTITION BY seq ORDER BY time),
-- ROW_NUMBER() OVER ( ORDER BY seq, time DESC),
-- ROW_NUMBER() OVER ( ORDER BY seq),
-- RANK() OVER ( ORDER BY seq)
FROM sequenced
),
changed AS (
SELECT *,
FIRST_VALUE(price) OVER(PARTITION BY rnk ) first_price,
LAST_VALUE(price) OVER(PARTITION BY rnk ) last_price,
(LAST_VALUE(price) OVER(PARTITION BY rnk ) / FIRST_VALUE(price) OVER(PARTITION BY rnk ) - 1) * 100 AS percent_change
FROM ranked
)
SELECT
*
FROM changed
ORDER BY time DESC;
and SQLFiddle with sample data

If anyone interested here's solution, form another forum:
with ct1 as /* detecting direction: up, down, equal */
(
select
price, time,
case
when lag(price) over (order by time) < price then 'down'
when lag(price) over (order by time) > price then 'up'
else 'equal'
end as dir
from
prices
)
, ct2 as /* setting reset points */
(
select
price, time, dir,
case
when coalesce(lag(dir) over (order by time), 'none') <> dir
then 1 else 0
end as rst
from
ct1
)
, ct3 as /* making groups */
(
select
price, time, dir,
sum(rst) over (order by time) as grp
from
ct2
)
select /* calculates min, max price per group */
price, time, dir,
min(price) over (partition by grp) as min_price,
max(price) over (partition by grp) as max_price
from
ct3
order by
time desc;

Optimizing sum() over(order by...) clause throwing 'resources exceeded' error

I'm computing a sessions table from event data from out website in BigQuery. The events table has around 12 million events (pretty small). After I add in the logic to create sessions, I want to sum all sessions and assign a global_session_id. I'm doing that using a sum()over(order by...) clause which is throwing a resources exceeded error. I know that the order by clause is causing all the data to be processed on a single node and that is causing the compute resources to be exceeded, but I'm not sure what changes I can make to my code to achieve the same result. Any work arounds, advice, or explanations are greatly appreciated.
with sessions_1 as ( /* Tie a visitor's last event and last campaign to current event. */
select visitor_id as session_user_id,
sent_at,
context_campaign_name,
event,
id,
LAG(sent_at,1) OVER (PARTITION BY visitor_id ORDER BY sent_at) as last_event,
LAG(context_campaign_name,1) OVER (PARTITION BY visitor_id ORDER BY sent_at) as last_event_campaign_name
from tracks_2
),
sessions_2 as ( /* Flag events that begin a new session. */
select *,
case
when context_campaign_name != last_event_campaign_name
or context_campaign_name is null and last_event_campaign_name is not null
or context_campaign_name is not null and last_event_campaign_name is null
then 1
when unix_seconds(sent_at)
- unix_seconds(last_event) >= (60 * 30)
or last_event is null
then 1
else 0
end as is_new_session
from sessions_1
),
sessions_3 as ( /* Assign events sessions numbers for total sessions and total user sessions. */
select id as event_id,
sum(is_new_session) over (order by session_user_id, sent_at) as global_session_id
#sum(is_new_session) over (partition by session_user_id order by sent_at) as user_session_id
from materialized_result_of_sessions_2_query
)
select * from sessions_3

If might help if you defined a CTE with just the sessions, rather than at the event level. If this works:
select session_user_id, sent_at,
row_number() over (order by session_user_id, sent_at) as global_session_id
from materialized_result_of_sessions_2_query
where is_new_session
group by session_user_id, sent_at;
If that doesn't work, you can construct the global id:
You can join this back to the original event-level data and then use a max() window function to assign it to all events. Something like:
select e.*,
max(s.global_session_id) over (partition by e.session_user_id order by e.event_at) as global_session_id
from events e left join
(<above query>) s
on s.session_user_id = e.session_user_id and s.sent_at = e.event_at;
If not, you can do:
select us.*, us.user_session_id + s.offset as global_session_id
from (select session_user_id, sent_at,
row_number() over (partition by session_user_id order by sent_at) as user_session_id
from materialized_result_of_sessions_2_query
where is_new_session
) us join
(select session_user_id, count(*) as cnt,
sum(count(*)) over (order by session_user_id) - count(*) as offset
from materialized_result_of_sessions_2_query
where is_new_session
group by session_user_id
) s
on us.session_user_id = s.session_user_id;
This might still fail if almost all users are unique and the sessions are short.

Max dates for each sequence within partitions

I would like to see if somebody has an idea how to get the max and min dates within each 'id' using the 'row_num' column as an indicator when the sequence starts/ends in SQL Server 2016.
The screenshot below shows the desired output in columns 'min_date' and 'max_date'.
Any help would be appreciated.

You could use windowed MIN/MAX:
WITH cte AS (
SELECT *,SUM(CASE WHEN row_num > 1 THEN 0 ELSE 1 END)
OVER(PARTITION BY id, cat ORDER BY date_col) AS grp
FROM tab
)
SELECT *, MIN(date_col) OVER(PARTITION BY id, cat, grp) AS min_date,
MAX(date_col) OVER(PARTITION BY id, cat, grp) AS max_date
FROM cte
ORDER BY id, date_col, cat;
Rextester Demo

Try something like
SELECT
Q1.id, Q1.cat,
MIN(Q1.date) AS min_dat,
MAX(Q1.date) AS max_dat
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id, cat ORDER BY [date]) AS r1,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY [date]) AS r2
) AS Q1
GROUP BY
Q1.id, Q1.r2 - Q1.r1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to find the max time difference in PostgreSQL that a value stayed on the same state? - sql

I am working for a university project and this came up to me: I have a table like this: And I want to get the max duration that an actuator was on a state. For example, cool0 Was for 18 minutes. The result table should look like this: NAME COOL0 State False Duration 18

Related

Lag functions and SUM

sql how can I do get the first and last date when in two columns different rows (islands problem)

Calculate percent changes in contiguous ranges in Postgresql

Optimizing sum() over(order by...) clause throwing 'resources exceeded' error

Max dates for each sequence within partitions

Categories

Resources