I have a table with a timestamp for when an incident occurred and the downtime associated with that timestamp (in minutes). I want to break down this table by minute using Time_slice and show the minute associated with each slice. For example:
Time Duration
11:34 4.5
11:40 2
to:
time Duration
11:34 1
11:35 1
11:36 1
11:37 1
11:38 0.5
11:39 1
11:40 1
How can I accomplish this?
if you are fine with the same minute being listed many times if the input time + duration over lap, then you can do this.
WITH big_list_of_numbers AS (
SELECT
ROW_NUMBER() OVER (ORDER BY SEQ4())-1 as rn
FROM generator(ROWCOUNT => 1000)
)
SELECT
DATEADD('minute', r.rn, t.time) AS TIME
IFF(r.rn > t.duration, r.rn - t.duration, 1) AS duration
FROM table AS t
JOIN big_list_of_numbers AS r
ON t.duration < r.time
ORDER BY 1
if you want the total per minute you can put a grouping on it like:
WITH big_list_of_numbers AS (
SELECT
ROW_NUMBER() OVER (ORDER BY SEQ4()) as rn
FROM generator(ROWCOUNT => 1000)
)
SELECT
DATEADD('minute', r.rn, t.time) AS TIME
SUM(IFF(r.rn > t.duration, r.rn - t.duration, 1)) AS duration
FROM table AS t
JOIN big_list_of_numbers AS r
ON t.duration < r.time
GROUP BY 1
ORDER BY 1
The GENERATOR needs fixed input, so just use a huge number, it's not the expensive. Also SEQx() function can (and do) have gaps in them, so for data where you need continuous values (like this example) the SEQx() needs to be feed into a ROW_NUMBER() to force non-distributed allocation of numbers.
I use goodle big query. My query includes 2 different timestamps: start_at and end_at.
The goal of the query is to round these 2 timestamps to the nearest 30 minutes interval, which I manage using this: TIMESTAMP_TRUNC(TIMESTAMP_SUB(start_at, INTERVAL MOD(EXTRACT(MINUTE FROM start_at), 30) MINUTE),MINUTE) and the same goes for end_at.
Events occur (net_lost_orders) at each rounded timestamp.
The 2 problems that I encounter are:
First, as long as start_at and end_at are in the same 30 min. interval, things work well but when it is not the case (for example when start_at: 19:15 (nearest 30 min interval is 19:00) / end_at: 21:15 (nearest 30 min interval is 21:00), the results are not as expected. Additionally, I do not only need the 2 extreme intervals but all 30 minutes interval between start_at and end_at(19:00/19:30/20:00/20:30/21:00 in the example).
Secondly, I don't manage to create a condition that allows to show each interval on a separate row. I have tried to CAST, TRUNCATE,EXTRACTthe timestamps and to use CASE WHEN and to GROUP BY without success.
Here's the final part of the query (timestamps rounded excluded):
...
-------LOST ORDERS--------
a AS (SELECT created_date, closure, zone_id, city_id, l.interval_start,
l.net as net_lost_orders, l.starts_at, CAST(DATETIME(l.starts_at, timezone)AS TIMESTAMP) as start_local_time
FROM `XXX`, UNNEST(lost_orders) as l),
b AS (SELECT city_id, city_name, zone_id, zone_name FROM `YYY`),
lost AS (SELECT DISTINCT created_date, closure, zone_name, city_name, start_local_time,
TIMESTAMP_TRUNC(TIMESTAMP_SUB(start_local_time, INTERVAL MOD(EXTRACT(MINUTE FROM start_local_time), 30) MINUTE),MINUTE) AS lost_order_30_interval,
net_lost_orders
FROM a LEFT JOIN b ON a.city_id=b.city_id AND a.zone_id=b.zone_id AND a.city_id=b.city_id
WHERE zone_name='Atlanta' AND created_date='2021-09-09'
ORDER BY rt ASC),
------PREPARATION CLOSURE START AND END INTERVALS------
f AS (SELECT
DISTINCT TIMESTAMP_TRUNC(TIMESTAMP_SUB(start_at, INTERVAL MOD(EXTRACT(MINUTE FROM start_at), 30) MINUTE),MINUTE) AS start_closure_30_interval,
TIMESTAMP_TRUNC(TIMESTAMP_SUB(end_at, INTERVAL MOD(EXTRACT(MINUTE FROM end_at), 30) MINUTE),MINUTE) AS end_closure_30_interval,
country_code,
report_date,
Day,
CASE
WHEN Day="Monday" THEN 1
WHEN Day="Tuesday" THEN 2
WHEN Day="Wednesday" THEN 3
WHEN Day="Thursday" THEN 4
WHEN Day="Friday" THEN 5
WHEN Day="Saturday" THEN 6
WHEN Day="Sunday" THEN 7
END AS Weekday_order,
report_week,
city_name,
events_mod.zone_name,
closure,
start_at,
end_at,
activation_threshold,
deactivation_threshold,
shrinkage_drive_time,
ROUND(duration/60,2) AS duration,
FROM events_mod
WHERE report_date="2021-09-09"
AND events_mod.zone_name="Atlanta"
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
ORDER BY report_date, start_at ASC)
------FINAL TABLE------
SELECT DISTINCT
start_closure_30_interval,end_closure_30_interval, report_date, Day, Weekday_order, report_week, f.city_name, f.zone_name, closure,
start_at, end_at, start_time,end_time, activation_threshold, deactivation_threshold, duration, net_lost_orders
FROM f
LEFT JOIN lost ON f.city_name=lost.city_name
AND f.zone_name=lost.zone_name
AND f.report_date=lost.created_date
AND f.start_closure_30_interval=lost.lost_order_30_interval
AND f.end_closure_30_interval=lost.lost_order_30_interval
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
Results:
Expected results:
I would be really grateful if you could help and explain me how to get all the rounded timestamps between start_at and end_aton separate rows. Thank you in advance. Best, Fabien
Spreadsheet here
Consider below approach
select intervals, any_value(t).*, sum(Nb_lost_orders) as Nb_lost_orders
from table1 t,
unnest(generate_timestamp_array(
timestamp_seconds(div(unix_seconds(starts_at), 1800) * 1800),
timestamp_seconds(div(unix_seconds(ends_at), 1800) * 1800),
interval 30 minute
)) intervals
left join (
select Nb_lost_orders,
timestamp_seconds(div(unix_seconds(Time_when_the_lost_order_occurred), 1800) * 1800) as intervals
from Table2
)
using(intervals)
group by intervals
if applied to sample data in your question
with Table1 as (
select 'Closure' Event, timestamp '2021-09-09 11:00:00' starts_at, timestamp '2021-09-09 11:45:00' ends_at union all
select 'Closure', '2021-09-09 12:05:00', '2021-09-09 14:10:00'
), Table2 as (
select 5 Nb_lost_orders, timestamp '2021-09-09 11:38:00' Time_when_the_lost_order_occurred
)
output is
I have a dataset that looks like this:
id name value timestamp
1 Indicator1 5 "2021-07-06 20:28:59.999+03"
2 Indicator1 6 "2021-07-06 20:29:59.999+03"
3 Indicator1 14 "2021-07-06 20:30:59.999+03"
4 Indicator2 1 "2021-07-06 20:31:59.999+03"
5 Indicator2 3 "2021-07-06 20:32:59.999+03"
etc
The timestamps are 1 minute apart.
What I would like to get out of this data set is groups of rows which correspond to, let's say 5 minute intervals and while doing so get the first and last value/row in each group. I have to calculate some differences in values over fixed time intervals. It's sort of like a k-line.
What I managed so far is to get only the first interval and then repeat this query to get subsequent (older intervals) with a different where clause:
select r.name,
r.value_end - r.value_start as value_increase,
interval '5 min' as time_interval
from
(select k.name,
FIRST_VALUE(k.value) over w as value_start,
LAST_VALUE(k.value) over w as value_end,
ROW_NUMBER() over w as rownum
from dataset k
where k.timestamp >= (now() - interval '5 min')
window w as (partition by k.name order by k.timestamp RANGE BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING)) as r
where r.rownum = 1
Is there a way of doing this with Postgres?
Basically, you want to do is assign a grouping to timestamps in five minutes. Instead of fiddling with timestamp arithmetic, one method is to use epoch time and some arithmetic:
select min(timestamp), max(timestamp),
min(case when seqnum = 1 then value end) value_first,
min(case when seqnum = cnt then value end) value_last
from (select d.*, v.m,
row_number() over (partition by v.m order by timestamp) as seqnum,
count(*) over (partition by v.m) as cnt
from dataset d cross join lateral
(values (floor(extract(epoch from d.timestamp) / (24 * 60 * 12))
) v(m)
) t
group by v.m;
I have a User table, where there are the following fields.
| id | created_at | username |
I want to filter this table so that I can get the number of users who have been created in a datetime range, separated into N intervals. e.g. for users having created_at in between 2019-01-01T00:00:00 and 2019-01-02T00:00:00 separated into 2 intervals, I will get something like this.
_______________________________
| dt | count |
-------------------------------
| 2019-01-01T00:00:00 | 6 |
| 2019-01-01T12:00:00 | 7 |
-------------------------------
Is it possible to do so in one hit? I am currently using my Django ORM to create N date ranges and then making N queries, which isn't very efficient.
Generate the times you want and then use left join and aggregation:
select gs.ts, count(u.id)
from generate_series('2019-01-01T00:00:00'::timestamp,
'2019-01-01T12:00:00'::timestamp,
interval '12 hour'
) gs(ts) left join
users u
on u.created_at >= gs.ts and
u.created_at < gs.ts + interval '12 hour'
group by 1
order by 1;
EDIT:
If you want to specify the number of rows, you can use something similar:
from generate_series(1, 10, 1) as gs(n) cross join lateral
(values ('2019-01-01T00:00:00'::timestamp + (gs.n - 1) * interval '12 hour')
) v(ts) left join
users u
on u.created_at >= v.ts and
u.created_at < v.ts + interval '12 hour'
In Postgres, there is a dedicated function for this (several overloaded variants, really): width_bucket().
One additional difficulty: it does not work on type timestamp directly. But you can work with extracted epoch values like this:
WITH cte(min_ts, max_ts, buckets) AS ( -- interval and nr of buckets here
SELECT timestamp '2019-01-01T00:00:00'
, timestamp '2019-01-02T00:00:00'
, 2
)
SELECT width_bucket(extract(epoch FROM t.created_at)
, extract(epoch FROM c.min_ts)
, extract(epoch FROM c.max_ts)
, c.buckets) AS bucket
, count(*) AS ct
FROM tbl t
JOIN cte c ON t.created_at >= min_ts -- incl. lower
AND t.created_at < max_ts -- excl. upper
GROUP BY 1
ORDER BY 1;
Empty buckets (intervals with no rows in it) are not returned at all. Your
comment seems to suggest you want that.
Notably, this accesses the table once - as requested and as opposed to generating intervals first and then joining to the table (repeatedly).
See:
How to reduce result rows of SQL query equally in full range?
Aggregating (x,y) coordinate point clouds in PostgreSQL
That does not yet include effective bounds, just bucket numbers. Actual bounds can be added cheaply:
WITH cte(min_ts, max_ts, buckets) AS ( -- interval and nr of buckets here
SELECT timestamp '2019-01-01T00:00:00'
, timestamp '2019-01-02T00:00:00'
, 2
)
SELECT b.*
, min_ts + ((c.max_ts - c.min_ts) / c.buckets) * (bucket-1) AS lower_bound
FROM (
SELECT width_bucket(extract(epoch FROM t.created_at)
, extract(epoch FROM c.min_ts)
, extract(epoch FROM c.max_ts)
, c.buckets) AS bucket
, count(*) AS ct
FROM tbl t
JOIN cte c ON t.created_at >= min_ts -- incl. lower
AND t.created_at < max_ts -- excl. upper
GROUP BY 1
ORDER BY 1
) b, cte c;
Now you only change input values in the CTE to adjust results.
db<>fiddle here
I have a TSQL query that is returning a list of variable names and their values at a point in time. Currently it is truncating the datetime column to give me a minute-by-minute result set.
It would be incredibly useful to me to be able to specify whatever interval of data I want. Every x seconds, every x minutes, or every x hours.
I cannot GROUP BY because I do not want to aggregate the selected values.
Here is my current query:
SELECT time, var_name, value
FROM (
SELECT time, var_name, value, ROW_NUMBER() over (partition by var_id, convert(varchar(16), time, 121) order by time desc) as seqnum
FROM var_values vv
JOIN var_names vn ON vn.id = vv.tag_id
WHERE ( var_id = 1 OR var_id = 2)
AND time >= '2013-06-04 00:00:00' AND time < '2013-06-04 16:20:17'
) k
WHERE seqnum = 1
ORDER BY time;
And the result set:
2013-06-04 00:20:52.847 Random.Boolean 0
2013-06-04 00:20:52.850 Random.Int1 76
2013-06-04 00:21:52.893 Random.Boolean 1
2013-06-04 00:21:52.897 Random.Int1 46
2013-06-04 00:22:52.920 Random.Boolean 1
2013-06-04 00:22:52.927 Random.Int1 120
Also just to be complete, I want to retain the ability to modify the WHERE clause to choose which var_id's I want in my result set.
You should be able to partition by the unix timestamp divided by your required interval in seconds;
(PARTITION BY var_id, DATEDIFF(SECOND,{d '1970-01-01'}, time) / 60 -- 60 seconds
ORDER BY TIME DESC) AS seqnum
The calculation will give the same result for 60 seconds, which will put all rows in the interval inside the same partition.