Generate step time-series in SQL (PostgreSQL) - sql

We are storing data corresponding to rates (ex: electricity price) in a SQL table, such as:
Date
Value
2022-08-25 01:00
12.3
2022-09-23 06:12
14.5
2022-10-18 05:34
9.8
The date interval between two rows is not regular. In this table, 12.3 is the current rate until it's replaced by the new value on September 23rd, when the rate becomes 14.5
From there, we want to generate an hourly time-series, with each value corresponding to the correct rate, such as:
Date
Value
2022-08-25 01:00
12.3
2022-08-25 02:00
12.3
2022-08-25 03:00
12.3
2022-08-25 04:00
12.3
2022-08-25 05:00
12.3
...
12.3
2022-09-23 06:12
14.5
2022-09-23 07:00
14.5
2022-09-23 08:00
14.5
...
14.5
2022-10-18 05:34
9.8
...
9.8
how you would generate such as time-series in PostgreSQL ?

So you need to do two things: generate the time series with hourly intervals and then check for each interval which value was active during that.
For Postgres I would also create a timestamp range that contains the start and end of the range in which the price is valid (excluding the upper bound). This can be used in a join condition against the generated time series
with time_series ("date") as (
select g.*
from (
select min("date") as start_date, max("date") as end_date
from the_table
) x
cross join generate_series(x.start_date, x.end_date, interval '1 hour') as g
), ranges as (
select tsrange("date", lead("date") over (order by "date"), '(]') as valid_during,
value
from the_table
)
select ts."date",
r.value
from time_series ts
join ranges r on r.valid_during #> ts."date"
If you don't really need a "dynamic time series", you can just use generate_series() with a hard-coded start and end which would simplify this a bit.
Online example

This is solution for Postgres. I think it's what you wanted, the intervals end with full hour and after generation ends the next hour is exact timestamp from the original table (see table). It was done through comparison of the generated date with original date truncated to the hours. To make sure that the last date appears in the result I made COALESCE on LAG window function to fill the NULL value with the last date. Hope it doesn't look too hacky.
hourly_interval
value
2022-08-25 01:00:00
12.3
2022-08-25 02:00:00
12.3
...
...
2022-09-23 06:00:00
12.3
2022-09-23 06:12:00
14.5
2022-09-23 07:00:00
14.5
...
...
2022-10-18 05:00:00
14.5
2022-10-18 05:34:00
9.8
The result has 1303 rows
WITH cte AS (
SELECT *,
date_trunc('hour',generate_series(date,
COALESCE((LAG(date,-1) OVER (ORDER BY date)),date),
'1 hour')) hourly_interval
FROM electricity
)
SELECT
CASE WHEN
hourly_interval = date_trunc('hour',date)
THEN
date
ELSE
hourly_interval
END AS hourly_interval,
value
FROM cte
Feel free to fiddle around

Related

How to calculate range in 1 week using Postgres?

tanggal | product
2021-01-01 bag 1
2021-01-05 bag 5
2021-01-08 bag 8
2021-01-11 bag 11
2021-01-12 bag 12
2021-01-13 bag 13
2021-01-14 bag 14
here I have a product tbl, in this table there are input dates and product names,
I want to calculate the product based on 1 week how the query to calculate the data with a range of 7 days?
and this my query
select tanggal, product from tbl_product
where tanggal > current_date + interval '7' day
You could solve this for arbitrary dates using a generated time series.
For example:
SELECT series::date
FROM generate_series(
(now() - interval '1 week')::date,
now()::date,
'1 day'::interval
) series;
Would result in:
2021-05-26
2021-05-27
2021-05-28
2021-05-29
2021-05-30
2021-05-31
2021-06-01
2021-06-02
which you can join with other tables as you see fit.
For further information on generate_series() and other set-returning functions, check out the documentation.

Google Bigquery - Create time series of number of active records

I'm trying to create a timeseries in google bigquery SQL. My data is a series of time ranges covering the period of activity for that record. Here is an example:
Start End
2020-11-01 21:04:00 UTC 2020-11-02 07:15:00 UTC
2020-11-01 21:45:00 UTC 2020-11-02 04:00:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 09:48:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 06:00:00 UTC
I wish to create a new table to total the number of active records within a 15 minute block. "21:00:00" would for example be 21:00 to 21:14.59. My desired output for the above would be:
Period Active_Records
2020-11-01 21:00:00 1
2020-11-01 21:15:00 1
2020-11-01 21:30:00 1
2020-11-01 21:45:00 2
2020-11-01 22:00:00 4
2020-11-01 22:15:00 4
etc until the end of the last active range.
I would also like to be able to generate this on the fly by querying a date range and having it return every 15 minute block in the range and how many active records there was in that period.
Any assistance would be greatly appreciated.
Below is for BigQuery Standard SQL
#standardSQL
select ts as period, count(1) as Active_Records
from unnest((
select generate_timestamp_array(timestamp_trunc(min(start), hour), max(`end`), interval 15 minute)
from `project.dataset.table`
)) ts
join `project.dataset.table`
on not (`end` < ts or start > timestamp_add(ts, interval 15 * 60 - 1 second))
group by ts
if to apply to sample data from your question - output is

BigQuery - A way to generate timestamps based on hour/minute/seconds?

Is there a way to generate sequential timestamps in BigQuery that is focused on hours, minutes, and seconds?
In BigQuery you can generate sequential dates by:
select *
FROM UNNEST(GENERATE_DATE_ARRAY('2016-10-18', '2016-10-19', INTERVAL 1 DAY)) as day
This will generate the dates from 2016-10-18 to 2016-10-19 in date intervals
Row day
1 2016-10-18
2 2016-10-19
But let's say I want intervals in 15 minutes or 5 minutes, is there a way to do that?
First, I would recommend "starring" the feature request for GENERATE_TIMESTAMP_ARRAY to express interest in having a function like this. Given GENERATE_ARRAY, though, the best option currently is to use a query of this form:
SELECT TIMESTAMP_ADD('2018-04-01', INTERVAL 15 * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, 13)) AS x;
If you want a minute-based GENERATE_TIMESTAMP_ARRAY equivalent, you can use a UDF like this:
CREATE TEMP FUNCTION GenerateMinuteTimestampArray(
t0 TIMESTAMP, t1 TIMESTAMP, minutes INT64) AS (
ARRAY(
SELECT TIMESTAMP_ADD(t0, INTERVAL minutes * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, TIMESTAMP_DIFF(t1, t0, MINUTE))) AS x
)
);
SELECT ts
FROM UNNEST(GenerateMinuteTimestampArray('2018-04-01', '2018-04-01 12:00:00', 15)) AS ts;
This returns a timestamp for each 15-minute interval between midnight and 12 PM on April 1.
Update: You can now use the GENERATE_TIMESTAMP_ARRAY function in BigQuery. If you want to generate timestamps at intervals of 15 minutes, for example, you can use:
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-18', '2016-10-19', INTERVAL 15 MINUTE);
Epochs seems like the way to go.
But requires to convert date to epoch first.
select TIMESTAMP_MICROS(CAST(day * 1000000 as INT64))
FROM UNNEST(GENERATE_ARRAY(1522540800, 1525132799, 900)) as day
Row f0_
1 2018-04-01 00:00:00.000 UTC
2 2018-04-01 00:15:00.000 UTC
3 2018-04-01 00:30:00.000 UTC
4 2018-04-01 00:45:00.000 UTC
5 2018-04-01 01:00:00.000 UTC
6 2018-04-01 01:15:00.000 UTC
7 2018-04-01 01:30:00.000 UTC
8 2018-04-01 01:45:00.000 UTC
9 2018-04-01 02:00:00.000 UTC
10 2018-04-01 02:15:00.000 UTC
11 2018-04-01 02:30:00.000 UTC
12 2018-04-01 02:45:00.000 UTC
13 2018-04-01 03:00:00.000 UTC

PostgreSQL: How do I join two tables based on same start and end time (timestamp without time zone)?

Okay, I came across this relevant question but it is slightly different than my case.
Problem
I have two similar type of tables in my PostgreSQL 9.5 database tbl1 and tbl2 both containing 1,274 rows. The structure and layout of table 1 is as follows:
Table 1:
id (integer) start_time end_time my_val1 (numeric)
51 1994-09-26 16:50:00 1994-10-29 13:30:00 3.7
52 1994-10-29 13:30:00 1994-11-27 12:30:00 2.4
53 1994-11-27 12:30:00 1994-12-29 09:25:00 7.6
54 1994-12-29 09:25:00 1994-12-31 23:59:59 2.9
54 1995-01-01 00:00:00 1995-02-05 13:50:00 2.9
55 1995-02-05 13:50:00 1995-03-12 11:10:00 1.6
56 1995-03-12 11:10:00 1995-04-11 09:05:00 2.2
171 1994-10-29 16:15:00 1994-11-27 19:10:00 6.9
172 1994-11-27 19:10:00 1994-12-29 11:40:00 4.2
173 1994-12-29 11:40:00 1994-12-31 23:59:59 6.7
173 1995-01-01 00:00:00 1995-02-05 15:30:00 6.7
174 1995-02-05 15:30:00 1995-03-12 09:45:00 3.2
175 1995-03-12 09:45:00 1995-04-11 11:30:00 1.2
176 1995-04-11 11:30:00 1995-05-11 15:30:00 2.7
321 1994-09-26 14:40:00 1994-10-30 14:30:00 0.2
322 1994-10-30 14:30:00 1994-11-27 14:45:00 7.8
323 1994-11-27 14:45:00 1994-12-29 14:20:00 4.6
324 1994-12-29 14:20:00 1994-12-31 23:59:59 4.1
324 1995-01-01 00:00:00 1995-02-05 14:35:00 4.1
325 1995-02-05 14:35:00 1995-03-12 11:30:00 8.2
326 1995-03-12 11:30:00 1995-04-11 09:45:00 1.2
.....
In some rows, start_time and end_time may look similar but whole time window may not be equal. For example,
id (integer) start_time end_time my_val1 (numeric)
54 1994-12-29 09:25:00 1994-12-31 23:59:59 2.9
173 1994-12-29 11:40:00 1994-12-31 23:59:59 6.7
Start_time and end_time are timestamp without time zone. The start_time and end_time have to be in one year window thus whenever there was a change of year from 1994 to 1995 then that row was divided into two rows therefore, there are repeating IDs in the column id. Table 2 tbl2 contains the similar start_time and end_time (timestamp without time zone) and column my_val2 (numeric). For each row in table 1 I need to join corresponding row of table 2 where start_time and end_time are similar.
What I have tried,
Select
a.id,
a.start_time, a.end_time,
a.my_val1,
b.my_val2
from tbl1 a
left join tbl2 b on
b.start_time = a.start_time
order by a.id;
The query returned 3,802 rows which is not desired. The desired result is 1,274 rows of table 1 joined with my_val2. I am aware of Postgres Distinct on clause but I need to keep all repeating ids of tbl1 and only need to join my_val2 of tbl2. Do I need to use Postgres Window function here. Can someone suggest that how to join these two tables?
why you don't add to the ON part the condition
ON b.start_time = a.start_time AND a.id = b.id
For each row in table 1 I need to join corresponding row of table 2
where start_time and end_time are similar.
SQL query should include end_time
SELECT a.id,
a.start_time,
a.end_time,
a.my_val1,
b.my_val2
FROM tbl1 a
LEFT JOIN tbl2 b
ON b.start_time = a.start_time
AND b.end_time = a.end_time
ORDER BY a.id;

How do I generate a series of hourly averages in MySQL?

I've got data in ten minutes intervals in my table:
2009-01-26 00:00:00 12
2009-01-26 00:10:00 1.1
2009-01-26 00:20:00 11
2009-01-26 00:30:00 0
2009-01-26 00:40:00 5
2009-01-26 00:50:00 3.4
2009-01-26 01:00:00 7
2009-01-26 01:10:00 7
2009-01-26 01:20:00 7.2
2009-01-26 01:30:00 3
2009-01-26 01:40:00 25
2009-01-26 01:50:00 4
2009-01-26 02:00:00 3
2009-01-26 02:10:00 4
etc.
Is it possible to formulate a single SQL-query for MySQL which will return a series of averages over each hour?
In this case it should return:
5.42
8.87
etc.
It's unclear whether you want the average to be aggregated over days or not.
If you want a different average for midnight on the 26th vs midnight on the 27th, then modify Mabwi's query thus:
SELECT AVG( value ) , thetime
FROM hourly_averages
GROUP BY DATE( thetime ), HOUR( thetime )
Note the additional DATE() in the GROUP BY clause. Without this, the query would average together all of the data from 00:00 to 00:59 without regard to the date on which it happened.
This should work:
SELECT AVG( value ) , thetime
FROM hourly_averages
GROUP BY HOUR( thetime )
Here's the result
AVG(value) thetime
5.4166666865349 2009-01-26 00:00:00
8.8666666348775 2009-01-26 01:00:00
3.5 2009-01-26 02:00:00
There is also another possibility considering the fact that dates have a string representation in the database:
You can use SUBSTRING(thetime, 1, [len]), extracting the common part of your group. For the example with hourly averages you have the SQL query
SELECT SUBSTRING(thetime, 1, 13) AS hours, AVG(value) FROM hourly_averages GROUP BY hours
By the len parameter you can specify the aggregated time interval considering the MySQL date format yyyy-MM-dd HH:mm:ss[.SS...]:
len = 4: group by years
len = 7: group by months
len = 10: group by days
len = 13: group by hours
len = 16: group by minutes
len = 19: group by seconds
We encountered a better performance of this method over using date and time function, especially when used in JOINs in MySQL 5.7. However in MySQL 8 at least for grouping both ways seem to take approximately the same time.