SQL Average Inter-arrival Time, Time Between Dates - sql

I have a table with sequential timestamps:
2011-03-17 10:31:19
2011-03-17 10:45:49
2011-03-17 10:47:49
...
I need to find the average time difference between each of these(there could be dozens) in seconds or whatever is easiest, I can work with it from there. So for example the above inter-arrival time for only the first two times would be 870 (14m 30s). For all three times it would be: (870 + 120)/2 = 445 (7m 25s).
A note, I am using postgreSQL 8.1.22 .
EDIT: The table I mention above is from a different query that is literally just a one-column list of timestamps

Not sure I understood your question completely, but this might be what you are looking for:
SELECT avg(difference)
FROM (
SELECT timestamp_col - lag(timestamp_col) over (order by timestamp_col) as difference
FROM your_table
) t
The inner query calculates the distance between each row and the preceding row. The result is an interval for each row in the table.
The outer query simply does an average over all differences.

i think u want to find avg(timestamptz).
my solution is avg(current - min value). but since result is interval, so add it to min value again.
SELECT avg(target_col - (select min(target_col) from your_table))
+ (select min(target_col) from your_table)
FROM your_table

If you cannot upgrade to a version of PG that supports window functions, you
may compute your table's sequential steps "the slow way."
Assuming your table is "tbl" and your timestamp column is "ts":
SELECT AVG(t1 - t0)
FROM (
-- All this silliness would be moot if we could use
-- `` lead(ts) over (order by ts) ''
SELECT tbl.ts AS t0,
next.ts AS t1
FROM tbl
CROSS JOIN
tbl next
WHERE next.ts = (
SELECT MIN(ts)
FROM tbl subquery
WHERE subquery.ts > tbl.ts
)
) derived;
But don't do that. Its performance will be terrible. Please do what
a_horse_with_no_name suggests, and use window functions.

Related

Average time between two consecutive events in SQL

I have a table as shown below.
time
Event
2021-03-19T17:15:05
A
2021-03-19T17:15:11
B
2021-03-19T17:15:11
C
2021-03-19T17:15:12
A
2021-03-19T17:15:14
C
I want to find the average time between event A and the event following it.
How do I find it using an SQL query?
here desired output is: 4 seconds.
I really appreciate any help you can provide.
The basic idea is lead() to get the time from the next row. Then you need to calculate the difference. So for all rows:
select t.*,
(to_unix_timestamp(lead(time) over (order by time) -
to_unix_timestamp(time)
) as diff_seconds
from t;
Use a subquery and filtering for just A and the average:
select avg(diff_seconds)
from (select t.*,
(to_unix_timestamp(lead(time) over (order by time) -
to_unix_timestamp(time)
) as diff_seconds
from t
) t
where event = 'A';

How to compute window function for each nth row in Presto?

I am working with a table that contains timeseries data, with a row for each minute for each user.
I want to compute some aggregate functions on a rolling window of N calendar days.
This is achieved via
SELECT
SOME_AGGREGATE_FUN(col) OVER (
PARTITION BY user_id
ORDER BY timestamp
ROWS BETWEEN (60 * 24 * N) PRECEDING AND CURRENT ROW
) as my_col
FROM my_table
However, I am only interested in the result of this at a daily scale.
i.e. I want the window to be computed only at 00:00:00, but I want the window itself to contain all the minute-by-minute data to be passed into my aggregate function.
Right now I am doing this:
WITH agg_results AS (
SELECT
SOME_AGGREGATE_FUN(col) OVER (
PARTITION BY user_id
ORDER BY timestamp_col
ROWS BETWEEN (60 * 24 * N) PRECEDING AND CURRENT ROW
)
FROM my_table
)
SELECT * FROM agg_results
WHERE
timestamp_col = DATE_TRUNC('day', "timestamp_col")
This works in theory, but it does 60 * 24 more computations that necessary, resulting in the query being super slow.
Essentially, I am trying to find a way to make the right window bound skip rows based on a condition. Or, if it is simpler to implement, for every nth row (as I have a constant number of rows for each day).
I don't think that's possible with window functions. You could switch to a subquery instead, assuming that your aggregate function works as a regular aggregate function too (that is, without an OVER() clause):
select
timestamp_col,
(
select some_aggregate_fun(t1.col)
from my_table t1
where
t1.user_id = t.user_id
and t1.timestamp_col >= t.timestamp_col - interval '1' day
and t1.timestamp_col <= t.timestamp_col
)
from my_table t
where timestamp_col = date_trunc('day', timestamp_col)
I am unsure that this would perform better than your original query though; you might need to assess that against your actual dataset.
You can change interval '1' day to the actual interval you want to use.

20 Day moving average with joins alone

There are questions like this all over the place so let me specify where I specifically need help.
I have seen moving averages in SQL with Oracle Analytic functions, MSSQL apply, or a variety of other methods. I have also seen this done with self joins (one join for each day of the average, such as here How do you create a Moving Average Method in SQL? ).
I am curious as to if there is a way (only using self joins) to do this in SQL (preferably oracle, but since my question is geared towards joins alone this should be possible for any RDBMS). The way would have to be scalable (for a 20 or 100 day moving average, in contrast to the link I researched above, which required a join for each day in the moving average).
My thoughts are
select customer, a.tradedate, a.shares, avg(b.shares)
from trades a, trades b
where b.tradedate between a.tradedate-20 and a.tradedate
group by customer, a.tradedate
But when I tried it in the past it hadn't worked. To be more specific, I am trying a smaller but similar exmaple (5 day avg instead of 20 day) with this fiddle demo and cant find out where I am going wrong. http://sqlfiddle.com/#!6/ed008/41
select a.ticker, a.dt_date, a.volume, avg(b.volume)
from yourtable a, yourtable b
where b.dt_date between a.dt_date-5 and a.dt_date
and a.ticker=b.ticker
group by a.ticker, a.dt_date, a.volume
I don't see anything wrong with your second query, I think the only reason it's not what you're expecting is because the volume field is an integer data type so when you calculate the average the resulting output will also be an integer data type. For an average you have to cast it, because the result won't necessarily be an integer (whole number):
select a.ticker, a.dt_date, a.volume, avg(cast(b.volume as float))
from yourtable a
join yourtable b
on a.ticker = b.ticker
where b.dt_date between a.dt_date - 5 and a.dt_date
group by a.ticker, a.dt_date, a.volume
Fiddle:
http://sqlfiddle.com/#!6/ed008/48/0 (thanks to #DaleM for DDL)
I don't know why you would ever do this vs. an analytic function though, especially since you mention wanting to do this in Oracle (which has analytic functions). It would be different if your preferred database were MySQL or a database without analytic functions.
Just to add to the answer, this is how you would achieve the same result in Oracle using analytic functions. Notice how the PARTITION BY acts as the join you're using on ticker. That splits up the results so that the same date shared across multiple tickers don't interfere.
select ticker,
dt_date,
volume,
avg(cast(volume as decimal)) over( partition by ticker
order by dt_date
rows between 5 preceding
and current row ) as mov_avg
from yourtable
order by ticker, dt_date, volume
Fiddle:
http://sqlfiddle.com/#!4/0d06b/4/0
Analytic functions will likely run much faster.
http://sqlfiddle.com/#!6/ed008/45 would appear to be what you need.
select a.ticker,
a.dt_date,
a.volume,
(select avg(cast(b.volume as float))
from yourtable b
where b.dt_date between a.dt_date-5 and a.dt_date
and a.ticker=b.ticker)
from yourtable a
order by a.ticker, a.dt_date
not a join but a subquery

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?
Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.
Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;
If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn
How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29
This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.

Difference in time between records

I have a table that has (among others) a timestamp column (named timestamp; it's a standard Oracle DATE datatype). The records are about 4-11 minutes apart, about 7 or 8 records every hour, and I'm trying to determine if there is any pattern to them.
Is there an easy way to see each record, and the number of minutes that record occurred after the previous record?
Thanks,
AndyDan
This is Oracle 9i+, using the LAG function to get the previous timestamp value without needing to self join:
SELECT t.timestamp - LAG(t.timestamp) OVER (ORDER BY t.timestamp) AS diff
FROM YOUR_TABLE t
...but because whole numbers represent the number of days in the result, a difference of less than 24 hours will be a fraction. Also, the LAG will return NULL if there's no earlier value -- same as if having used an OUTER JOIN.
To see minutes, use the ROUND function:
SELECT ROUND((t.timestamp - LAG(t.timestamp) OVER (ORDER BY t.timestamp)) *1440) AS diff_in_minutes
FROM YOUR_TABLE t
If the records have sequential id's you could do a self join like this:
SELECT t2.*, t2.timestamp - t1.timestamp AS timediff
FROM foo t1 inner join foo.t2 on t1.id = t2.id-1
You'd probably need to tweak this to handle the first and last records, but that's the basics.