Query aggregated data with a given sampling time - sql

Suppose my raw data is:
Timestamp High Low Volume
10:24.22345 100 99 10
10:24.23345 110 97 20
10:24.33455 97 89 40
10:25.33455 60 40 50
10:25.93455 40 20 60
With a sample time of 1 second, the output data should be as following (add additional column):
Timestamp High Low Volume Count
10:24 110 89 70 3
10:25 60 20 110 2
The sampling unit from varying from 1 second, 5 sec, 1 minute, 1 hour, 1 day, ...
How to query the sampled data in quick time in the PostgreSQL database with Rails?
I want to fill all the interval by getting the error
ERROR: JOIN/USING types bigint and timestamp without time zone cannot be matched
SQL
SELECT
t.high,
t.low
FROM
(
SELECT generate_series(
date_trunc('second', min(ticktime)) ,
date_trunc('second', max(ticktime)) ,
interval '1 sec'
) FROM czces AS g (time)
LEFT JOIN
(
SELECT
date_trunc('second', ticktime) AS time ,
max(last_price) OVER w AS high ,
min(last_price) OVER w AS low
FROM czces
WHERE product_type ='TA' AND contract_month = '2014-08-01 00:00:00'::TIMESTAMP
WINDOW w AS (
PARTITION BY date_trunc('second', ticktime)
ORDER BY ticktime ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
) t USING (time)
ORDER BY 1
) AS t ;

Simply use date_trunc() before you aggregate. Works for basic time units 1 second, 1 minute, 1 hour, 1 day - but not for 5 sec. Arbitrary intervals are slightly more complex, see link below!
SELECT date_trunc('second', timestamp) AS timestamp -- or minute ...
, max(high) AS high, min(low) AS low, sum(volume) AS vol, count(*) AS ct
FROM tbl
GROUP BY 1
ORDER BY 1;
If there are no rows for a sample point, you get no row in the result. If you need one row for every sample point:
SELECT g.timestamp, t.high, t.low, t.volume, t.ct
FROM (SELECT generate_series(date_trunc('second', min(timestamp))
,date_trunc('second', max(timestamp))
,interval '1 sec') AS g (timestamp) -- or minute ...
LEFT JOIN (
SELECT date_trunc('second', timestamp) AS timestamp -- or minute ...
, max(high) AS high, min(low) AS low, sum(volume) AS vol, count(*) AS ct
FROM tbl
GROUP BY 1
) t USING (timestamp)
ORDER BY 1;
The LEFT JOIN is essential.
For arbitrary intervals:
Best way to count records by arbitrary time intervals in Rails+Postgres
Retrieve aggregates for arbitrary time intervals
Aside: Don't use timestamp as column name. It's a basic type name and a reserved word in standard SQL. It's also misleading for data that's not actually a timestamp.

Related

Snowflake SQL Time Breakdown

I have a table with a timestamp for when an incident occurred and the downtime associated with that timestamp (in minutes). I want to break down this table by minute using Time_slice and show the minute associated with each slice. For example:
Time Duration
11:34 4.5
11:40 2
to:
time Duration
11:34 1
11:35 1
11:36 1
11:37 1
11:38 0.5
11:39 1
11:40 1
How can I accomplish this?
if you are fine with the same minute being listed many times if the input time + duration over lap, then you can do this.
WITH big_list_of_numbers AS (
SELECT
ROW_NUMBER() OVER (ORDER BY SEQ4())-1 as rn
FROM generator(ROWCOUNT => 1000)
)
SELECT
DATEADD('minute', r.rn, t.time) AS TIME
IFF(r.rn > t.duration, r.rn - t.duration, 1) AS duration
FROM table AS t
JOIN big_list_of_numbers AS r
ON t.duration < r.time
ORDER BY 1
if you want the total per minute you can put a grouping on it like:
WITH big_list_of_numbers AS (
SELECT
ROW_NUMBER() OVER (ORDER BY SEQ4()) as rn
FROM generator(ROWCOUNT => 1000)
)
SELECT
DATEADD('minute', r.rn, t.time) AS TIME
SUM(IFF(r.rn > t.duration, r.rn - t.duration, 1)) AS duration
FROM table AS t
JOIN big_list_of_numbers AS r
ON t.duration < r.time
GROUP BY 1
ORDER BY 1
The GENERATOR needs fixed input, so just use a huge number, it's not the expensive. Also SEQx() function can (and do) have gaps in them, so for data where you need continuous values (like this example) the SEQx() needs to be feed into a ROW_NUMBER() to force non-distributed allocation of numbers.

Condition and round to nearest 30 minutes interval multiple timestamps SQL BIG QUERY

I use goodle big query. My query includes 2 different timestamps: start_at and end_at.
The goal of the query is to round these 2 timestamps to the nearest 30 minutes interval, which I manage using this: TIMESTAMP_TRUNC(TIMESTAMP_SUB(start_at, INTERVAL MOD(EXTRACT(MINUTE FROM start_at), 30) MINUTE),MINUTE) and the same goes for end_at.
Events occur (net_lost_orders) at each rounded timestamp.
The 2 problems that I encounter are:
First, as long as start_at and end_at are in the same 30 min. interval, things work well but when it is not the case (for example when start_at: 19:15 (nearest 30 min interval is 19:00) / end_at: 21:15 (nearest 30 min interval is 21:00), the results are not as expected. Additionally, I do not only need the 2 extreme intervals but all 30 minutes interval between start_at and end_at(19:00/19:30/20:00/20:30/21:00 in the example).
Secondly, I don't manage to create a condition that allows to show each interval on a separate row. I have tried to CAST, TRUNCATE,EXTRACTthe timestamps and to use CASE WHEN and to GROUP BY without success.
Here's the final part of the query (timestamps rounded excluded):
...
-------LOST ORDERS--------
a AS (SELECT created_date, closure, zone_id, city_id, l.interval_start,
l.net as net_lost_orders, l.starts_at, CAST(DATETIME(l.starts_at, timezone)AS TIMESTAMP) as start_local_time
FROM `XXX`, UNNEST(lost_orders) as l),
b AS (SELECT city_id, city_name, zone_id, zone_name FROM `YYY`),
lost AS (SELECT DISTINCT created_date, closure, zone_name, city_name, start_local_time,
TIMESTAMP_TRUNC(TIMESTAMP_SUB(start_local_time, INTERVAL MOD(EXTRACT(MINUTE FROM start_local_time), 30) MINUTE),MINUTE) AS lost_order_30_interval,
net_lost_orders
FROM a LEFT JOIN b ON a.city_id=b.city_id AND a.zone_id=b.zone_id AND a.city_id=b.city_id
WHERE zone_name='Atlanta' AND created_date='2021-09-09'
ORDER BY rt ASC),
------PREPARATION CLOSURE START AND END INTERVALS------
f AS (SELECT
DISTINCT TIMESTAMP_TRUNC(TIMESTAMP_SUB(start_at, INTERVAL MOD(EXTRACT(MINUTE FROM start_at), 30) MINUTE),MINUTE) AS start_closure_30_interval,
TIMESTAMP_TRUNC(TIMESTAMP_SUB(end_at, INTERVAL MOD(EXTRACT(MINUTE FROM end_at), 30) MINUTE),MINUTE) AS end_closure_30_interval,
country_code,
report_date,
Day,
CASE
WHEN Day="Monday" THEN 1
WHEN Day="Tuesday" THEN 2
WHEN Day="Wednesday" THEN 3
WHEN Day="Thursday" THEN 4
WHEN Day="Friday" THEN 5
WHEN Day="Saturday" THEN 6
WHEN Day="Sunday" THEN 7
END AS Weekday_order,
report_week,
city_name,
events_mod.zone_name,
closure,
start_at,
end_at,
activation_threshold,
deactivation_threshold,
shrinkage_drive_time,
ROUND(duration/60,2) AS duration,
FROM events_mod
WHERE report_date="2021-09-09"
AND events_mod.zone_name="Atlanta"
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
ORDER BY report_date, start_at ASC)
------FINAL TABLE------
SELECT DISTINCT
start_closure_30_interval,end_closure_30_interval, report_date, Day, Weekday_order, report_week, f.city_name, f.zone_name, closure,
start_at, end_at, start_time,end_time, activation_threshold, deactivation_threshold, duration, net_lost_orders
FROM f
LEFT JOIN lost ON f.city_name=lost.city_name
AND f.zone_name=lost.zone_name
AND f.report_date=lost.created_date
AND f.start_closure_30_interval=lost.lost_order_30_interval
AND f.end_closure_30_interval=lost.lost_order_30_interval
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
Results:
Expected results:
I would be really grateful if you could help and explain me how to get all the rounded timestamps between start_at and end_aton separate rows. Thank you in advance. Best, Fabien
Spreadsheet here
Consider below approach
select intervals, any_value(t).*, sum(Nb_lost_orders) as Nb_lost_orders
from table1 t,
unnest(generate_timestamp_array(
timestamp_seconds(div(unix_seconds(starts_at), 1800) * 1800),
timestamp_seconds(div(unix_seconds(ends_at), 1800) * 1800),
interval 30 minute
)) intervals
left join (
select Nb_lost_orders,
timestamp_seconds(div(unix_seconds(Time_when_the_lost_order_occurred), 1800) * 1800) as intervals
from Table2
)
using(intervals)
group by intervals
if applied to sample data in your question
with Table1 as (
select 'Closure' Event, timestamp '2021-09-09 11:00:00' starts_at, timestamp '2021-09-09 11:45:00' ends_at union all
select 'Closure', '2021-09-09 12:05:00', '2021-09-09 14:10:00'
), Table2 as (
select 5 Nb_lost_orders, timestamp '2021-09-09 11:38:00' Time_when_the_lost_order_occurred
)
output is

Split data set in fixed interval batches, while getting the first and last row of the batch

I have a dataset that looks like this:
id name value timestamp
1 Indicator1 5 "2021-07-06 20:28:59.999+03"
2 Indicator1 6 "2021-07-06 20:29:59.999+03"
3 Indicator1 14 "2021-07-06 20:30:59.999+03"
4 Indicator2 1 "2021-07-06 20:31:59.999+03"
5 Indicator2 3 "2021-07-06 20:32:59.999+03"
etc
The timestamps are 1 minute apart.
What I would like to get out of this data set is groups of rows which correspond to, let's say 5 minute intervals and while doing so get the first and last value/row in each group. I have to calculate some differences in values over fixed time intervals. It's sort of like a k-line.
What I managed so far is to get only the first interval and then repeat this query to get subsequent (older intervals) with a different where clause:
select r.name,
r.value_end - r.value_start as value_increase,
interval '5 min' as time_interval
from
(select k.name,
FIRST_VALUE(k.value) over w as value_start,
LAST_VALUE(k.value) over w as value_end,
ROW_NUMBER() over w as rownum
from dataset k
where k.timestamp >= (now() - interval '5 min')
window w as (partition by k.name order by k.timestamp RANGE BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING)) as r
where r.rownum = 1
Is there a way of doing this with Postgres?
Basically, you want to do is assign a grouping to timestamps in five minutes. Instead of fiddling with timestamp arithmetic, one method is to use epoch time and some arithmetic:
select min(timestamp), max(timestamp),
min(case when seqnum = 1 then value end) value_first,
min(case when seqnum = cnt then value end) value_last
from (select d.*, v.m,
row_number() over (partition by v.m order by timestamp) as seqnum,
count(*) over (partition by v.m) as cnt
from dataset d cross join lateral
(values (floor(extract(epoch from d.timestamp) / (24 * 60 * 12))
) v(m)
) t
group by v.m;

PostgreSQL - Get count of items in a table grouped by a datetime column for N intervals

I have a User table, where there are the following fields.
| id | created_at | username |
I want to filter this table so that I can get the number of users who have been created in a datetime range, separated into N intervals. e.g. for users having created_at in between 2019-01-01T00:00:00 and 2019-01-02T00:00:00 separated into 2 intervals, I will get something like this.
_______________________________
| dt | count |
-------------------------------
| 2019-01-01T00:00:00 | 6 |
| 2019-01-01T12:00:00 | 7 |
-------------------------------
Is it possible to do so in one hit? I am currently using my Django ORM to create N date ranges and then making N queries, which isn't very efficient.
Generate the times you want and then use left join and aggregation:
select gs.ts, count(u.id)
from generate_series('2019-01-01T00:00:00'::timestamp,
'2019-01-01T12:00:00'::timestamp,
interval '12 hour'
) gs(ts) left join
users u
on u.created_at >= gs.ts and
u.created_at < gs.ts + interval '12 hour'
group by 1
order by 1;
EDIT:
If you want to specify the number of rows, you can use something similar:
from generate_series(1, 10, 1) as gs(n) cross join lateral
(values ('2019-01-01T00:00:00'::timestamp + (gs.n - 1) * interval '12 hour')
) v(ts) left join
users u
on u.created_at >= v.ts and
u.created_at < v.ts + interval '12 hour'
In Postgres, there is a dedicated function for this (several overloaded variants, really): width_bucket().
One additional difficulty: it does not work on type timestamp directly. But you can work with extracted epoch values like this:
WITH cte(min_ts, max_ts, buckets) AS ( -- interval and nr of buckets here
SELECT timestamp '2019-01-01T00:00:00'
, timestamp '2019-01-02T00:00:00'
, 2
)
SELECT width_bucket(extract(epoch FROM t.created_at)
, extract(epoch FROM c.min_ts)
, extract(epoch FROM c.max_ts)
, c.buckets) AS bucket
, count(*) AS ct
FROM tbl t
JOIN cte c ON t.created_at >= min_ts -- incl. lower
AND t.created_at < max_ts -- excl. upper
GROUP BY 1
ORDER BY 1;
Empty buckets (intervals with no rows in it) are not returned at all. Your
comment seems to suggest you want that.
Notably, this accesses the table once - as requested and as opposed to generating intervals first and then joining to the table (repeatedly).
See:
How to reduce result rows of SQL query equally in full range?
Aggregating (x,y) coordinate point clouds in PostgreSQL
That does not yet include effective bounds, just bucket numbers. Actual bounds can be added cheaply:
WITH cte(min_ts, max_ts, buckets) AS ( -- interval and nr of buckets here
SELECT timestamp '2019-01-01T00:00:00'
, timestamp '2019-01-02T00:00:00'
, 2
)
SELECT b.*
, min_ts + ((c.max_ts - c.min_ts) / c.buckets) * (bucket-1) AS lower_bound
FROM (
SELECT width_bucket(extract(epoch FROM t.created_at)
, extract(epoch FROM c.min_ts)
, extract(epoch FROM c.max_ts)
, c.buckets) AS bucket
, count(*) AS ct
FROM tbl t
JOIN cte c ON t.created_at >= min_ts -- incl. lower
AND t.created_at < max_ts -- excl. upper
GROUP BY 1
ORDER BY 1
) b, cte c;
Now you only change input values in the CTE to adjust results.
db<>fiddle here

TSQL adjustable time interval

I have a TSQL query that is returning a list of variable names and their values at a point in time. Currently it is truncating the datetime column to give me a minute-by-minute result set.
It would be incredibly useful to me to be able to specify whatever interval of data I want. Every x seconds, every x minutes, or every x hours.
I cannot GROUP BY because I do not want to aggregate the selected values.
Here is my current query:
SELECT time, var_name, value
FROM (
SELECT time, var_name, value, ROW_NUMBER() over (partition by var_id, convert(varchar(16), time, 121) order by time desc) as seqnum
FROM var_values vv
JOIN var_names vn ON vn.id = vv.tag_id
WHERE ( var_id = 1 OR var_id = 2)
AND time >= '2013-06-04 00:00:00' AND time < '2013-06-04 16:20:17'
) k
WHERE seqnum = 1
ORDER BY time;
And the result set:
2013-06-04 00:20:52.847 Random.Boolean 0
2013-06-04 00:20:52.850 Random.Int1 76
2013-06-04 00:21:52.893 Random.Boolean 1
2013-06-04 00:21:52.897 Random.Int1 46
2013-06-04 00:22:52.920 Random.Boolean 1
2013-06-04 00:22:52.927 Random.Int1 120
Also just to be complete, I want to retain the ability to modify the WHERE clause to choose which var_id's I want in my result set.
You should be able to partition by the unix timestamp divided by your required interval in seconds;
(PARTITION BY var_id, DATEDIFF(SECOND,{d '1970-01-01'}, time) / 60 -- 60 seconds
ORDER BY TIME DESC) AS seqnum
The calculation will give the same result for 60 seconds, which will put all rows in the interval inside the same partition.