Postgres: Session duration per event (row) - sql

I'm trying to write a query that builds a session duration per each event.
The database houses events from a webapp, each with a session-id and a timestamp.
Each row represents one event.
I thought I could solve this with a recursive query, but every attempt runs for minutes with no return. It's driving me crazy.
This is what I have so far.
with recursive session_time as (
select
f.data->'sessionId' as session_id,
f.ts,
null::timestamp with time zone as prev_timestamp,
0 as session_duration
from arbiter_events as f
union
select
n.data->'sessionId' as session_id,
n.ts,
st.ts as prev_timestamp,
(EXTRACT(epoch from (n.ts - (
select
st.ts
from arbiter_events p
where p.ts < n.ts
order by p.ts desc
limit 1
))) + st.session_duration)::integer as session_duration
from arbiter_events as n
inner join session_time st on st.session_id = n.data->'sessionId'
)
SELECT
ae.customer,
ae.username,
ae.data->'category' as category,
ae.data->'subCategory' as subcategory,
st.session_id,
st.session_duration
from arbiter_events ae
left join session_time st on ae.data->'sessionId' = st.session_id;

Related

Calculate time span between two specific statuses on the database for each ID

I have a table on the database that contains statuses updated on each vehicle I have, I want to calculate how many days each vehicle spends time between two specific statuses 'Maintenance' and 'Read'.
My table looks something like this
and I want to result to be like this, only show the number of days a vehicle spends in maintenance before becoming ready on a specific day
The code I written looks like this
drop table if exists #temps1
select
VehicleId,
json_value(VehiclesHistoryStatusID.text,'$.en') as VehiclesHistoryStatus,
VehiclesHistory.CreationTime,
datediff(day, VehiclesHistory.CreationTime ,
lead(VehiclesHistory.CreationTime ) over (order by VehiclesHistory.CreationTime ) ) as days,
lag(json_value(VehiclesHistoryStatusID.text,'$.en')) over (order by VehiclesHistory.CreationTime) as PrevStatus,
case
when (lag(json_value(VehiclesHistoryStatusID.text,'$.en')) over (order by VehiclesHistory.CreationTime) <> json_value(VehiclesHistoryStatusID.text,'$.en')) THEN datediff(day, VehiclesHistory.CreationTime , (lag(VehiclesHistory.CreationTime ) over (order by VehiclesHistory.CreationTime ))) else 0 end as testing
into #temps1
from fleet.VehicleHistory VehiclesHistory
left join Fleet.Lookups as VehiclesHistoryStatusID on VehiclesHistoryStatusID.Id = VehiclesHistory.StatusId
where (year(VehiclesHistory.CreationTime) > 2021 and (VehiclesHistory.StatusId = 140 Or VehiclesHistory.StatusId = 144) )
group by VehiclesHistory.VehicleId ,VehiclesHistory.CreationTime , VehiclesHistoryStatusID.text
order by VehicleId desc
drop table if exists #temps2
select * into #temps2 from #temps1 where testing <> 0
select * from #temps2
Try this
SELECT innerQ.VehichleID,innerQ.CreationDate,innerQ.Status
,SUM(DATEDIFF(DAY,innerQ.PrevMaintenance,innerQ.CreationDate)) AS DayDuration
FROM
(
SELECT t1.VehichleID,t1.CreationDate,t1.Status,
(SELECT top(1) t2.CreationDate FROM dbo.Test t2
WHERE t1.VehichleID=t2.VehichleID
AND t2.CreationDate<t1.CreationDate
AND t2.Status='Maintenance'
ORDER BY t2.CreationDate Desc) AS PrevMaintenance
FROM
dbo.Test t1 WHERE t1.Status='Ready'
) innerQ
WHERE innerQ.PrevMaintenance IS NOT NULL
GROUP BY innerQ.VehichleID,innerQ.CreationDate,innerQ.Status
In this query first we are finding the most recent 'maintenance' date before each 'ready' date in the inner most query (if exists). Then calculate the time span with DATEDIFF and sum all this spans for each vehicle.

Determine cluster of access time within 10min intervals per user per day in SQL Server

How to query in SQL from the sample data, it will group or cluster the access_time per user per day within 10min intervals?
This is a complete guess, based on reading between the lines, and is untested due to a lack of consumable sample data.
It, however, looks like you are after a triangular JOIN (these can perform poorly, especially as this won't be SARGable) and a DENSE_RANK:
SELECT YT.[date],
YT.User_ID,
YT2.AccessTime,
DENSE_RANK() OVER (PARTITION BY YT.[date], YT.User_ID ORDER BY YT1.AccessTime) AS Cluster
FROM dbo.YourTable YT
JOIN dbo.YourTable YT2 ON YT.[date] = YT2.[date]
AND YT.User_ID = YT2.User_ID
AND YT.AccessTime <= YT2.AccessTime --This will join the row to itself
AND DATEADD(MINUTE,10,YT.AccessTime) >= YT2.AccessTime; --That is intentional
If I have understood your problem you want to group all accesses for a user in a day when all accesses of that group are in a time interval of 10 minutes. Not counting single accesses, so an access distant more than 10 minutes from every other is not counted as a cluster.
You can identify the clusters joining the accesses table with itself to get all possible time intervals of 10 minutes and number them.
Finally simply rejoin access table to get accesses for each cluster:
; with
user_clusters as (
select a1.date, a1.user_id, a1.access_time cluster_start, a2.access_time cluster_end,
ROW_NUMBER() over (partition by a1.date, a1.user_id order by a1.access_time) user_cluster_id
from ACCESS_TIMES a1
join ACCESS_TIMES a2 on a1.date = a2.date and a1.user_id = a2.user_id
and a1.access_time < a2.access_time
and datediff(minute, a1.access_time, a2.access_time)<10
)
select *
from user_clusters c
join ACCESS_TIMES a on a.date = c.date and a.user_id = c.user_id and a.access_time between c.cluster_start and cluster_end
order by a.date, a.user_id, c.user_cluster_id, a.access_time
output:
date user_id access_time user_cluster_id
'2020-09-19', 'AA083P', '2020-09-19 18:15:00', 1
'2020-09-19', 'AA083P', '2020-09-19 18:22:00', 1
'2020-09-19', 'AA083P', '2020-09-19 18:22:00', 2
'2020-09-19', 'AA083P', '2020-09-19 18:28:00', 2
'2020-09-20', 'AB162Y', '2020-09-20 19:34:00', 1
'2020-09-20', 'AB162Y', '2020-09-20 19:37:00', 1

Datetime SQL statement (Working in SQL Developer)

I'm new to the SQL scene but I've started to gather some data that makes sense to me after learning a little about SQL Developer. Although, I do need help with a query.
My goal:
To use the current criteria I have and select records only when the date-time value is within 5 minutes of the latest date-time. Here is my current sql statement
`SELECT ABAMS.T_WORKORDER_HIST.LINE_NO AS Line,
ABAMS.T_WORKORDER_HIST.STATE AS State,
ASMBLYTST.V_SEQ_SERIAL_ALL.BUILD_DATE,
ASMBLYTST.V_SEQ_SERIAL_ALL.SEQ_NO,
ASMBLYTST.V_SEQ_SERIAL_ALL.SEQ_NO_EXT,
ASMBLYTST.V_SEQ_SERIAL_ALL.UPD_REASON_CODE,
ABAMS.V_SERIAL_LINESET.LINESET_DATE AS "Lineset Time",
ABAMS.T_WORKORDER_HIST.SERIAL_NO AS ESN,
ABAMS.T_WORKORDER_HIST.ITEM_NO AS "Shop Order",
ABAMS.T_WORKORDER_HIST.CUST_NAME AS Customer,
ABAMS.T_ITEM_POLICY.PL_LOC_DROP_ZONE_NO AS PLDZ,
ABAMS.T_WORKORDER_HIST.CONFIG_NO AS Configuration,
ASMBLYTST.V_EDP_ENG_LAST_ABSN.LAST_ASMBLY_ABSN AS "Last Sta",
ASMBLYTST.V_LAST_ENG_LOCATION.LAST_ASMBLY_LOC,
ASMBLYTST.V_LAST_ENG_LOCATION.LAST_MES_LOC,
ASMBLYTST.V_LAST_ENG_LOCATION.LAST_ASMBLY_TIME,
ASMBLYTST.V_LAST_ENG_LOCATION.LAST_MES_TIME
FROM ABAMS.T_WORKORDER_HIST
LEFT JOIN ABAMS.V_SERIAL_LINESET
ON ABAMS.V_SERIAL_LINESET.SERIAL_NO = ABAMS.T_WORKORDER_HIST.SERIAL_NO
LEFT JOIN ASMBLYTST.V_EDP_ENG_LAST_ABSN
ON ASMBLYTST.V_EDP_ENG_LAST_ABSN.SERIAL_NO = ABAMS.T_WORKORDER_HIST.SERIAL_NO
LEFT JOIN ASMBLYTST.V_SEQ_SERIAL_ALL
ON ASMBLYTST.V_SEQ_SERIAL_ALL.SERIAL_NO = ABAMS.T_WORKORDER_HIST.SERIAL_NO
LEFT JOIN ABAMS.T_ITEM_POLICY
ON ABAMS.T_ITEM_POLICY.ITEM_NO = ABAMS.T_WORKORDER_HIST.ITEM_NO
LEFT JOIN ABAMS.T_CUR_STATUS
ON ABAMS.T_CUR_STATUS.SERIAL_NO = ABAMS.T_WORKORDER_HIST.SERIAL_NO
INNER JOIN ASMBLYTST.V_LAST_ENG_LOCATION
ON ASMBLYTST.V_LAST_ENG_LOCATION.SERIAL_NO = ABAMS.T_WORKORDER_HIST.SERIAL_NO
WHERE ABAMS.T_WORKORDER_HIST.LINE_NO = 10
AND (ABAMS.T_WORKORDER_HIST.STATE = 'PROD'
OR ABAMS.T_WORKORDER_HIST.STATE = 'SCHED')
AND ASMBLYTST.V_SEQ_SERIAL_ALL.BUILD_DATE BETWEEN TRUNC(SysDate) - 10 AND TRUNC(SysDate) + 1
AND (ABAMS.V_SERIAL_LINESET.LINESET_DATE IS NOT NULL
OR ABAMS.V_SERIAL_LINESET.LINESET_DATE IS NULL)
AND (ASMBLYTST.V_EDP_ENG_LAST_ABSN.LAST_ASMBLY_ABSN < '1800'
OR ASMBLYTST.V_EDP_ENG_LAST_ABSN.LAST_ASMBLY_ABSN IS NULL)
ORDER BY ASMBLYTST.V_EDP_ENG_LAST_ABSN.LAST_ASMBLY_ABSN DESC Nulls Last,
ABAMS.V_SERIAL_LINESET.LINESET_DATE Nulls Last,
ASMBLYTST.V_SEQ_SERIAL_ALL.BUILD_DATE,
ASMBLYTST.V_SEQ_SERIAL_ALL.SEQ_NO,
ASMBLYTST.V_SEQ_SERIAL_ALL.SEQ_NO_EXT`
Here are some of the records I get from the table
ASMBLYTST.V_LAST_ENG_LOCATION.LAST_ASMBLY_TIME
2018-06-14 01:28:25
2018-06-14 01:29:26
2018-06-14 01:27:30
2018-06-13 22:44:03
2018-06-14 01:28:45
2018-06-14 01:27:37
2018-06-14 01:27:41
What I essentially want is for
2018-06-13 22:44:03
to be excluded from the query because it is not within the 5 minute window from the latest record Which in this data set is
2018-06-14 01:29:26
The one dynamic problem i seem to have is that the values for date-time are constantly updating.
Any ideas?
Thank you!
Here are two different solutions, each uses a table called "ASET".
ASET contains 20 records 1 minute apart:
WITH
aset (ttime, cnt)
AS
(SELECT systimestamp AS ttime, 1 AS cnt
FROM DUAL
UNION ALL
SELECT ttime + INTERVAL '1' MINUTE AS ttime, cnt + 1 AS cnt
FROM aset
WHERE cnt < 20)
select * from aset;
Now using ASET for our data, the following query finds the maximum date in ASET, and restricts the results to the six records within 5 minutes of ASET:
SELECT *
FROM aset
WHERE ttime >= (SELECT MAX (ttime)
FROM aset)
- INTERVAL '5' MINUTE;
An alternative is to use an analytic function:
with bset
AS
(SELECT ttime, cnt, MAX (ttime) OVER () - ttime AS delta
FROM aset)
SELECT *
FROM bset
WHERE delta <= INTERVAL '5' MINUTE

How can I include in schedules today's departures after midnight using GTFS?

I began with GTFS and offhand ran into big problem with my SQL query:
SELECT *, ( some columns AS shortcuts )
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN( $incodes )
AND trips.service_id IN ( $service_ids )
AND ( departure_time >= $time )
AND ( trips.end_time >= $time )
AND ( trips.start_time <= $time_plus_3hrs )
GROUP BY t,l,sm
ORDER BY t ASC, l DESC
LIMIT 14
This should show departures from some stop in next 3 hours.
It works but with approaching midnight (e.g. 23:50) it catch only "today's departure". After midnight it catch only "new day departures" and departures from previous day are missing, because they have departure_time e.g. "24:05" (=not bigger than $time 00:05).
Is possible to use something lighter than UNION same query for next day?
If UNION is using, how can I ORDER departures for trimming by LIMIT?
Trips.start_time and end_time are my auxiliary variables for accelerate SQL query execution, it means sequence1-arrival_time and MAXsequence-departure_time of any trip.
Using UNION to link together a query for each day is going to be your best bet, unless perhaps you want to issue two completely separate queries and then merge the results together in your application. The contortionism required to do all this with a single SELECT statement (assuming it's even possible) would not be worth the effort.
Part of the complexity here is that the set of active service IDs can vary between consecutive days, so a distinct set must be used for each one. (For a suggestion of how to build this set in SQL using a subquery and table join, see my answer to "How do I use calendar exceptions to generate accurate schedules using GTFS?".)
More complexity arises from the fact the results for each day must be treated differently: For the result set to be ordered correctly, we need to subtract twenty-four hours from all of (and only) yesterday's times.
Try a query like this, following the "pseudo-SQL" in your question and assuming you are using MySQL/MariaDB:
SELECT *, SUBTIME(departure_time, '24:00:00') AS t, ...
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN ( $incodes )
AND trips.service_id IN ( $yesterdays_service_ids )
AND ( departure_time >= ADDTIME($time, '24:00:00') )
AND ( trips.end_time >= ADDTIME($time, '24:00:00') )
AND ( trips.start_time <= ADDTIME($time_plus_3hrs, '24:00:00') )
UNION
SELECT *, departure_time AS t, ...
FROM stop_times
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN ( $incodes )
AND trips.service_id IN ( $todays_service_ids )
AND ( departure_time >= $time )
AND ( trips.end_time >= $time )
AND ( trips.start_time <= $time_plus_3hrs )
GROUP BY t, l, sm
ORDER BY t ASC, l DESC
LIMIT 14

How to get the maximum interim value of a parameter in a select statement in sql server?

How to get the maximum interim value of a parameter in a select statement in sql server?
Example:
I have a table userconnection that contains the login and logout time as below:
action, time, user
Login, 2013-24-11 13:00:00, a
Login, 2013-24-11 13:30:00, b
Login, 2013-24-11 14:00:00, c
Logout, 2013-24-11 14:10:00, b
...
...
...
Can anyone help me with the query below to show max concurrent users at any time during the day (=3 from the above example set) and current time of the day (=2 from the above example set?
[select DateAdd(day, 0, DateDiff(day, 0, time)) calanderday,
sum(case when action = 'Login' then 1 when action = 'Logout' then -1
else 0 end) concurrentuser,
max of(concurrentuser interim values) maxconcurrentuser
from userconnection
where time > sysdate - 1
group by DateAdd(day, 0, DateDiff(day, 0, time))
order by calanderday]
I would much appreciate any help with how to get
max of(concurrentuser interim values) maxconcurrentuser?? in the above query without using user defined functions etc, just using inline queries.
I think that this will work, but obviously you've only given us minimal sample data to work from:
;With PairedEvents as (
select a.[user],a.time as timeIn,b.time as TimeOut
from
userconnection a
left join
userconnection b
on
a.[user] = b.[user] and
a.time < b.time and
b.action = 'logout'
left join
userconnection b_anti
on
a.[user] = b_anti.[user] and
a.time < b_anti.time and
b_anti.time < b.time and
b_anti.action = 'logout'
where
a.action = 'Login' and
b_anti.action is null
), PossibleMaxima as (
select pe.timeIn,COUNT(*) as Cnt
from
PairedEvents pe
inner join
PairedEvents pe_all
on
pe_all.timeIn <= pe.timeIn and
(
pe_all.timeOut > pe.timeIn or
pe_all.timeOut is null
)
group by pe.timeIn
), Ranked as (
select *,RANK() OVER (ORDER BY Cnt desc) as rnk
from PossibleMaxima
)
select * from Ranked where rnk = 1
This assumes that all login events can be paired with logout events, and that you don't have stray extras (a logout without a login, or two logins in a row without a logout).
It works by generating 3 CTEs. The first, PairedEvents associates the login rows with their associated logout rows (and needs the above assumption).
Then, in PossibleMaxima, we take each login event and try to find any PairedEvents rows that overlap that time. The number of times that that join succeeds is the number of users who were concurrently online.
Finally, we have the Ranked CTE that gives the maximum value the rank of 1. If there are multiple periods that achieve the maximum then they will each be ranked 1 and returned in the final result set.
If it's possible for multiple users to have identical login times then a slight tweak to PossibleMaxima may be required - but that's only if we need to.