Not show correct delta in minutes - postgresql-9.5

column created_at type timestamp without timezone.
I need to get delta in minutes between current date and column created_at
Query:
select id, created_at,
extract(minutes from (CURRENT_TIMESTAMP) - created_at) as delta
from shop_order order by created_at
And here result:
Why in record with id = 20 the delta is 19 ?
It's difference is 3 DAYS. Why show only 19 minutes?

An interval (which is the result of subtracting two timestamp) consists of several "parts" (similar to a date) and extract only extracts the named part, not the representation of that interval for that unit. If the result of the subtraction is e.g. 3 days 19 minutes extract will return 19 minutes - similar to the way extract(year ...) or extract(month ...) work.
You can extract the number of seconds and then divide that by 60 to get the total duration in minutes:
select id,
created_at,
extract(epoch from CURRENT_TIMESTAMP - created_at) / 60 as delta
from shop_order order
by created_at

Related

How can I aggregate time series data in postgres from a specific timestamp & fixed intervals (e.g. 1 hour , 1 day, 7 day ) without using date_trunc()?

I have a postgres table "Generation" with half-hourly timestamps spanning 2009 - present with energy data:
I need to aggregate (average) the data across different intervals from specific timepoints, for example data from 2021-01-07T00:00:00.000Z for one year at 7 day intervals, or 3 months at 1 day interval or 7 days at 1h interval etc. date_trunc() partly solves this, but rounds the weeks to the nearest monday e.g.
SELECT date_trunc('week', "DATETIME") AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2021-01-07T00:00:00.000Z' AND "DATETIME" <= '2022-01-06T23:59:59.999Z'
GROUP BY week
ORDER BY week ASC
;
returns the first time series interval as 2021-01-04 with an incorrect count:
week count gas coal
"2021-01-04 00:00:00" 192 18291.34375 2321.4427083333335
"2021-01-11 00:00:00" 336 14477.407738095239 2027.547619047619
"2021-01-18 00:00:00" 336 13947.044642857143 1152.047619047619
****EDIT: the following will return the correct weekly intervals by checking the start date relative to the nearest monday / start of week, and adjusts the results accordingly:
WITH vars1 AS (
SELECT '2021-01-07T00:00:00.000Z'::timestamp as start_time,
'2021-01-28T00:00:00.000Z'::timestamp as end_time
),
vars2 AS (
SELECT
((select start_time from vars1)::date - (date_trunc('week', (select start_time from vars1)::timestamp))::date) as diff
)
SELECT date_trunc('week', "DATETIME" - ((select diff from vars2) || ' day')::interval)::date + ((select diff from vars2) || ' day')::interval AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (select start_time from vars1) AND "DATETIME" < (select end_time from vars1)
GROUP BY week
ORDER BY week ASC
returns..
week count gas coal
"2021-01-07 00:00:00" 336 17242.752976190477 2293.8541666666665
"2021-01-14 00:00:00" 336 13481.497023809523 1483.0565476190477
"2021-01-21 00:00:00" 336 15278.854166666666 1592.7916666666667
And then for any daily or hourly (swap out day with hour) intervals you can use the following:
SELECT date_trunc('day', "DATETIME") AS day,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2022-01-07T00:00:00.000Z' AND "DATETIME" < '2022-01-10T23:59:59.999Z'
GROUP BY day
ORDER BY day ASC
;
In order to select the complete week, you should change the WHERe-clause to something like:
WHERE "DATETIME" >= date_trunc('week','2021-01-07T00:00:00.000Z'::timestamp)
AND "DATETIME" < (date_trunc('week','2022-01-06T23:59:59.999Z'::timestamp) + interval '7' day)::date
This will effectively get the records from January 4,2021 until (and including ) January 9,2022
Note: I changed <= to < to stop the end-date being included!
EDIT:
when you want your weeks to start on January 7, you can always group by:
(date_part('day',(d-'2021-01-07'))::int-(date_part('day',(d-'2021-01-07'))::int % 7))/7
(where d is the column containing the datetime-value.)
see: dbfiddle
EDIT:
This will get the list from a given date, and a specified interval.
see DBFIFFLE
WITH vars AS (
SELECT
'2021-01-07T00:00:00.000Z'::timestamp AS qstart,
'2022-01-06T23:59:59.999Z'::timestamp AS qend,
7 as qint,
INTERVAL '1 DAY' as qinterval
)
SELECT
(select date(qstart) FROM vars) + (SELECT qinterval from vars) * ((date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int-(date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int % (SELECT qint FROM vars)))::int) AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (SELECT qstart FROM vars) AND "DATETIME" <= (SELECT qend FROM vars)
GROUP BY week
ORDER BY week
;
I added the WITH vars to do the variable stuff on top and no need to mess with the rest of the query. (Idea borrowed here)
I only tested with qint=7,qinterval='1 DAY' and qint=14,qinterval='1 DAY' (but others values should work too...)
Using the function EXTRACT you may calculate the difference in days, weeks and hours between your timestamp ts and the start_date as follows
Difference in Days
extract (day from ts - start_date)
Difference in Weeks
Is the difference in day divided by 7 and truncated
trunc(extract (day from ts - start_date)/7)
Difference in Hours
Is the difference in day times 24 + the difference in hours of the day
extract (day from ts - start_date)*24 + extract (hour from ts - start_date)
The difference can be used in GROUP BY directly. E.g. for week grouping the first group is difference 0, i.e. same week, the next group with difference 1, the next week, etc.
Sample Example
I'm using a CTE for the start date to avoid multpile copies of the paramater
with start_time as
(select DATE'2021-01-07' as start_ts),
prep as (
select
ts,
extract (day from ts - (select start_ts from start_time)) day_diff,
trunc(extract (day from ts - (select start_ts from start_time))/7) week_diff,
extract (day from ts - (select start_ts from start_time)) *24 + extract (hour from ts - (select start_ts from start_time)) hour_diff,
value
from test_table
where ts >= (select start_ts from start_time)
)
select week_diff, avg(value)
from prep
group by week_diff order by 1

Window function for average

I have this table timestamp_table and I'm using Presto SQL
timestamp | id
2021-01-01 10:00:00 | 2456
I would like to compute the number of unique IDs in the last 24 and 48 hours and I thought this could be achieved with window functions but I'm struggling. This is my proposed solution, but it needs work
SELECT COUNT(id) OVER (PARTITION BY timestamp ORDER BY timestamp RANGE BETWEEN INTERVAL '24' HOUR PRECEDING AND CURRENT ROW)
You're probably having trouble due to the PARTITION BY clause, since the COUNT will only apply to rows within the same timestamp values.
Try something like this, as a starting point:
The fiddle
SELECT *
, COUNT(id) OVER (ORDER BY timestamp RANGE BETWEEN INTERVAL '24' HOUR PRECEDING AND CURRENT ROW)
, MIN(id) OVER (ORDER BY timestamp RANGE BETWEEN INTERVAL '24' HOUR PRECEDING AND CURRENT ROW)
FROM tbl
;
I think that you can't get data for both time intervals by one table scan. Because row that is in last 24 hours must be in both groups: 24 hours and 48 hours. So you must do 2 request or union them.
select 'h24', count(distinct id)
from timestamp_table
where timestamp < current_timestamp and timestamp >= date_add(day, -1, current_timestamp)
union all
select 'h48', count(distinct id)
from timestamp_table
where timestamp < current_timestamp and timestamp >= date_add(day, -2, current_timestamp)

How can I create a hourly use profile that computes aggregates based on the hour of day, day of week, in 1 hour, 2 hour and 3 hour windows in Postgres

Hi I'm trying to create a usage profile, for each hour of the week from three months of data in Postgres.
The raw data is 90 days of sensor_id, timestamp, value and the table should have these columns:
sensor_id, day_of_week, hour_of_day, avg_1hour, max_1hour, min_1hour, p95_1hour, max_2hour, avg_2hour min_2hour, p95_2hour, avg_3hour, max_3hour, min_3hour, p95_3hour
The *_1hour columns are the result of aggregate functions for each hour_of_day, day_of_week pair over data within that hour. This is not so bad, and I believe this query generates the desired result.
Select
sensor_id,
extract(dow from ts) as day_of_week,
extract(hour from ts) as hour_of_day,
avg(val) as avg_1hour,
PERCENTILE_CONT(.95) within group (order by val) as p95_1hour,
max(val) as max_1hour,
min(val) as min_1hour
from timeseries_data
group by hour_of_day, day_of_week
where ts between current_date - interval '1 day' and current_date - interval '91 day'
order by day_of_week, hour_of_day asc
For example avg_1hour should have a row
where day_of_week is 1 (Monday) and hour_of_day would be 6 (5am), and then avg_1hour would be the average of every reading at Monday, 5am for the last 90 days.
The *_2hour and *_3hour columns are harder for me.
For the same day, hour pair on the *_2hour columns, for example there would be a row where day_of_week is 1 (Monday), hour_of_day would be 6 (5am), but include the prior hour, so avg_2hour would be the average of vals from all rows where day_of_week is 1 (Monday), and hour_of_day would be 6 (5am) or 5 (4am).
avg_3hour would be the average of vals from all rows where day_of_week is 1 (Monday), and hour_of_day would be 6 (5am), 5 (4am) or 4 (3am).
This is running on a TimescaleDB server with Postgres 13.3.
Thanks in advance.
Welcome, Gregory! Have you checked the window function?
You can create window in different timeframes to project the groups.
Timescale also offers counter-aggregations that can be used with continuous aggregates to save aggregated data that can be reused in bigger timeframes without needing to recalculate everything.

How to extract the hour of day from an epoch and count each instance that occurs during that hour?

I have a question that I feel is pretty straight forward but is giving me some issues.
I have a column in table X called event_time which is an epoch. I am wanting to extract the hour of day out of that and count the number of rides that have occurred during that hour.
So the output will end up being a bar chart with x values 0-24 and the Y being the number of instances that occur (which is bike rides for example).
Here is what I have now, that isn't giving me the correct output:
select extract(hour from to_timestamp(start_time)::date) as hr,
count(*) as ct
from x
group by hr
order by hr asc
Any hints or help are appreciated.
Thanks
You can use arithmetic:
select floor( (start_time % (24 * 60 * 60)) / (60 * 60) ) as hour,
count(*)
from x
group by hour;
Or convert to a date/time and extract the hour:
select extract(hour from '1970-01-01'::date + start_time * interval '1 second') as hour, count(*)
from x
group by hour;

Get average duration per week-day from a list of records with start and end date

I have a input table with three columns :
id => string
start_date => timestamptz
end_date => timestamptz
I want to get the average duration in seconds (end_date - start_date) per week-day number over records.
My problem is : If I have a record where interval between start_date and end_date is 4 days, I want to get the result per day, not only at the start_date or end_date, and if I have no records between 3 weeks for example, take no value for a weekday as 'zero' value in the average.
Example :
id
start_date
end_date
1 (Friday to Sunday)
2021-03-12T01:00:00.000Z
2021-03-14T01:00:00.000Z
2 (Friday)
2021-03-12T01:00:00.000Z
2021-03-12T05:00:00.000Z
3 (Wed.)
2021-03-03T16:00:00.000Z
2021-03-03T17:00:00.000Z
Expected result (european weekday here for example, sunday is 7) :
weekday
avg_duration_seconds
1
0
2
0
3
1800
4
0
5
48600
6
86400
7
3600
Thank's for your help !
Note: the following works on Postgres as you tagged that as well. I have no idea if this works on CockroachDB as well.
You can "expand" the start/end timestamps to days by using generate_series(). To calculate the effective duration on each day, the full days need to be treated differently than the partial days at the start and end. Once those timestamps are calculated it's easy to get the duration per day. The do a left join on all weekdays and group by them:
select x.weekday,
avg(extract(epoch from real_end - real_start)) as duration
from generate_series(1,7) as x(weekday)
left join (
select t.id,
extract(isodow from g.dt) as weekday,
case
when start_date < g.dt then date_trunc('day', g.dt)
else start_date
end as real_start,
case
when end_date::date > g.dt then date_trunc('day', g.dt::date + 1)
else end_date
end as real_end
from the_table t
cross join generate_series(start_date, end_date, interval '1 day') as g(dt)
) t on x.weekday = t.weekday
group by x.weekday
order by x.weekday;
I am not 100% my expressions for "real_start" and "real_end" cover all corner cases, but it should be enough to get you started.
This gives a slightly different result than your expected one, because you have the weekdays wrong for 2021-03-02 and 2021-03-11.
Online example