How to get time difference inside window - sql

I have table like following. I would like to get duration from start to finish in each type. I can get finish time by agg function max in time
type event attribute time
A start start 2019-04-21 23:58:33.0
A result process1 2019-04-22 23:58:33.0
A result process2 2019-04-23 23:58:33.0
A result process3 2019-04-24 23:58:33.0
B result process1 2019-04-26 23:58:33.0
B start start 2019-04-25 23:58:33.0
B result process2 2019-04-27 23:58:33.0
I created following queries and joined them.
select type,event,attribute,time
from table
where event in ('start')
select type,event,attribute,max(time) as time
from table
where event in ('result')
group by type,event,attribute
select tmp2.time - tmp1.time as duration
But I guess window function will be useful in this condition.to simplify my query, I'd like to refactor with window function.
Are there good way to achieve this ?
Thanks

If you consider the start time as the min value of time in the grouped by type then you don't need a window function, only agg functions :
SELECT type
, min(time) AS start
, max(time) AS finish
FROM table
GROUP BY type ;
If you consider the start time as the time associated to the start event, and the finish time as the max time associated to the result event in the group by type, then you need window functions :
SELECT min(time) FILTER (WHERE event = 'start') AS start
, max(time) FILTER (WHERE event = 'result') AS finish
FROM table
GROUP BY type
PS : as stated in the manual, any aggregate function can be used as a window function :
any built-in or user-defined ordinary aggregate (i.e., not ordered-set
or hypothetical-set aggregates) can be used as a window function

Related

SQL Query Getting Lag Time

I have a table that has two fields in it called RunId and LastUpdated. I am trying to put together a query that can take the LastUpdated date time and get the time difference from the previous RunId but there may be a gap. Example:
RunId LastUpdated
110 2020-05-11 05:06:27.000
113 2020-05-11 05:06:31.000
Is there a way to get the RunId and time diff such as this:
RunId TimeDiff
113 0:00:04
Thanks for any info
You can get the last updated value for the previous run id using lag():
select t.*, lag(lastupdate) over (order by runid) as prev_lastupdated
from t;
Then you can use your database-specific date/time functions to get the difference. It might be as simple as - or it might require a special function.

SELECT MIN from a subset of data obtained through GROUP BY

There is a database in place with hourly timeseries data, where every row in the DB represents one hour. Example:
TIMESERIES TABLE
id date_and_time entry_category
1 2017/01/20 12:00 type_1
2 2017/01/20 13:00 type_1
3 2017/01/20 12:00 type_2
4 2017/01/20 12:00 type_3
First I used the GROUP BY statement to find the latest date and time for each type of entry category:
SELECT MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category;
However now, I want to find which is the date and time which is the LEAST RECENT among the datetime's I obtained with the query listed above. I will need to use somehow SELECT MIN(date_and_time), but how do I let SQL know I want to treat the output of my previous query as a "new table" to apply a new SELECT query on? The output of my total query should be a single value—in case of the sample displayed above, date_and_time = 2017/01/20 12:00.
I've tried using aliases, but don't seem to be able to do the trick, they only rename existing columns or tables (or I'm misusing them..).There are many questions out there that try to list the MAX or MIN for a particular group (e.g. https://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ or Select max value of each group) which is what I have already achieved, but I want to do work now on this list of obtained datetime's. My database structure is very simple, but I lack the knowledge to string these queries together.
Thanks, cheers!
You can use your first query as a sub-query, it is similar to what you are describing as using the first query's output as the input for the second query. Here you will get the one row out put of the min date as required.
SELECT MIN(date_and_time)
FROM (SELECT MAX(date_and_time) as date_and_time, entry_category
FROM timeseries_table
GROUP BY entry_category)a;
Is this what you want?
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC;
This returns ties. If you do not want ties, then include an additional sort key:
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC, entry_category;

calculate time difference in minute between each record in group

I have following table:
Event Startdatetime Enddatetime Value
1 '23/09/2016 12:15:20' '23/09/2016 12:34:30' 50
1 '23/09/2016 14:10:40' '23/09/2016 14:30:25' 40
2 '25/10/2016 10:20:45' '25/10/2016 10:45:55' 80
2 '25/10/2016 11:27:35' '25/10/2016 11:48:55' 30
Each record has startdatetime and enddatetime and I'd like to calculate time difference in minute between each record in group (by event).
Since that isn't a SQL Server recognized date format, you need to convert it first then do your math. You can use LAG to get the previous record with SQL 2012 onward.
select datediff(mi,convert(datetime,Startdatetime,103),convert(datetime,lag(Enddatetime) over (order by Startdatetime),103))
or if it doesn't need converting...
select datediff(mi,Startdatetime,lag(Enddatetime) over (order by Startdatetime))
if you are only looking to do the time calculation on the event then you can partition off that as well...
select datediff(mi,Startdatetime,lag(Enddatetime) over (partition by Event order by Startdatetime))

Query to find all timestamps more than a certain interval apart

I'm using postgres to run some analytics on user activity. I have a table of all requests(pageviews) made by every user and the timestamp of the request, and I'm trying to find the number of distinct sessions for every user. For the sake of simplicity, I'm considering every set of requests an hour or more apart from others as a distinct session. The data looks something like this:
id| request_time| user_id
1 2014-01-12 08:57:16.725533 1233
2 2014-01-12 08:57:20.944193 1234
3 2014-01-12 09:15:59.713456 1233
4 2014-01-12 10:58:59.713456 1234
How can I write a query to get the number of sessions per user?
To start a new session after every gap >= 1 hour:
SELECT user_id, count(*) AS distinct_sessions
FROM (
SELECT user_id
,(lag(request_time, 1, '-infinity') OVER (PARTITION BY user_id
ORDER BY request_time)
<= request_time - '1h'::interval) AS step -- start new session
FROM tbl
) sub
WHERE step
GROUP BY user_id
ORDER BY user_id;
Assuming request_time NOT NULL.
Explain:
In subquery sub, check for every row if a new session begins. Using the third parameter of lag() to provide the default -infinity, which is lower than any timestamp and therefore always starts a new session for the first row.
In the outer query count how many times new sessions started. Eliminate step = FALSE and count per user.
Alternative interpretation
If you really wanted to count hours where at least one request happened (I don't think you do, but another answer assumes as much), you would:
SELECT user_id
, count(DISTINCT date_trunc('hour', request_time)) AS hours_with_req
FROM tbl
GROUP BY 1
ORDER BY 1;

select overlapping datetime events with SQL

I have a SQL table Events (ID int, Event int, StartTime datetime, Duration int).
Event is event code (1=system running, 2=break)
Duration is the amount of seconds that the event was active.
I'd like to get the amount of seconds that event 1 was active, but subtract the duration of event 2.
E.g. event 1 was from 1:00 to 6:00, event 2 from 0:00 to 2:00 and event 2 from 5:00 to 6:00. The total time should be from 2:00 to 5:00 -> 3 hours.
There is a way I can think of: for each event 1 find all events 2 that can intersect with event 1, and for each event 2 in that set: trim its duration to get only the part that was active during its event 1.
e.g. for my event 1 (1:00 - 6:00) I'll find event 2 (0:00 - 2:00), get only the part that interests me (1:00-2:00); find another event 2(5:00-6:00), get the part that interests me (it's whole event 5:00-6:00) - that summed up are two hours. The total time of event 1 was 5 hours; 5 hrs - 2 hrs (event 2) is 3 hours.
But this won't work if there are thousands of events in the specified time frame, so I'd prefer a hint of solution without loops (cursors).
;WITH CTE AS (
SELECT
evnt2.id as ID,
sum(evnt1.duration) as Duration
from
#events evnt1
INNER JOIN #events evnt2
ON evnt1.id <> evnt2.id
WHERE
DATEADD(second, evnt1.duration, evnt1.starttime)
BETWEEN
evnt2.starttime AND DATEADD(second, evnt2.duration, evnt2.starttime)
GROUP BY evnt2.id
)
SELECT
#events.duration - CTE.duration,
*
FROM
#events
INNER JOIN CTE
ON #events.id = CTE.id
The simplest way I can think of to do this is with multiple self-joins. I say multiple because the Event 2 can start before or during Event 1.
Here's a bit of code that will answer your question if Event 2 always starts before Event 1.
select DateDiff(s,e1.StartTime, DateAdd(s,e2Before.Duration,e2Before.StartTime))
from events e1
join events e2Before
on (e1.StartTime between e2Before.StartTime and DateAdd(s,e2Before.duration,e2Before.StartTime))
and e1.event = 1
and e2Before.event = 2
To answer the question fully, you'll need to add another join with some of the DateAdd parameters swapped around a bit to cater for situations where Event 2 starts after Event 1 starts.