Is there a way to generate sequential timestamps in BigQuery that is focused on hours, minutes, and seconds?
In BigQuery you can generate sequential dates by:
select *
FROM UNNEST(GENERATE_DATE_ARRAY('2016-10-18', '2016-10-19', INTERVAL 1 DAY)) as day
This will generate the dates from 2016-10-18 to 2016-10-19 in date intervals
Row day
1 2016-10-18
2 2016-10-19
But let's say I want intervals in 15 minutes or 5 minutes, is there a way to do that?
First, I would recommend "starring" the feature request for GENERATE_TIMESTAMP_ARRAY to express interest in having a function like this. Given GENERATE_ARRAY, though, the best option currently is to use a query of this form:
SELECT TIMESTAMP_ADD('2018-04-01', INTERVAL 15 * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, 13)) AS x;
If you want a minute-based GENERATE_TIMESTAMP_ARRAY equivalent, you can use a UDF like this:
CREATE TEMP FUNCTION GenerateMinuteTimestampArray(
t0 TIMESTAMP, t1 TIMESTAMP, minutes INT64) AS (
ARRAY(
SELECT TIMESTAMP_ADD(t0, INTERVAL minutes * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, TIMESTAMP_DIFF(t1, t0, MINUTE))) AS x
)
);
SELECT ts
FROM UNNEST(GenerateMinuteTimestampArray('2018-04-01', '2018-04-01 12:00:00', 15)) AS ts;
This returns a timestamp for each 15-minute interval between midnight and 12 PM on April 1.
Update: You can now use the GENERATE_TIMESTAMP_ARRAY function in BigQuery. If you want to generate timestamps at intervals of 15 minutes, for example, you can use:
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-18', '2016-10-19', INTERVAL 15 MINUTE);
Epochs seems like the way to go.
But requires to convert date to epoch first.
select TIMESTAMP_MICROS(CAST(day * 1000000 as INT64))
FROM UNNEST(GENERATE_ARRAY(1522540800, 1525132799, 900)) as day
Row f0_
1 2018-04-01 00:00:00.000 UTC
2 2018-04-01 00:15:00.000 UTC
3 2018-04-01 00:30:00.000 UTC
4 2018-04-01 00:45:00.000 UTC
5 2018-04-01 01:00:00.000 UTC
6 2018-04-01 01:15:00.000 UTC
7 2018-04-01 01:30:00.000 UTC
8 2018-04-01 01:45:00.000 UTC
9 2018-04-01 02:00:00.000 UTC
10 2018-04-01 02:15:00.000 UTC
11 2018-04-01 02:30:00.000 UTC
12 2018-04-01 02:45:00.000 UTC
13 2018-04-01 03:00:00.000 UTC
Related
So my data looks like this:
DATE TEMPERATURE
2012-01-13 23:15:00 UTC 0
2012-01-14 01:35:00 UTC 5
2012-01-14 02:15:00 UTC 6
2012-01-14 03:15:00 UTC 8
2012-01-14 04:15:00 UTC 0
2012-01-14 04:55:00 UTC 0
2012-01-14 05:15:00 UTC -2
2012-01-14 05:35:00 UTC 0
I am trying to calculate the amount of time a zip code temperature will drop to 0 or below on any given day. On the 13th, it only happens for a very short amount of time so we don't really care. I want to know how to calculate the number of minutes this happens on the 14th, since it looks like a significantly (and consistently) cold day.
I want the query to add two more columns.
The first column added would be the time difference between the rows on a given date. So row 3- row 2=40 mins and row 4-row3=60 mins.
The second column would total the amount of minutes for a whole day the minutes the temperature has dropped to 0 or below. Here row 2-4 would be ignored. From row 5-8, total time that the temperature was 0 or below would be about 90 mins
It should end up looking like this:
DATE TEMPERATURE MINUTES_DIFFERENCE TOTAL_MINUTES
2012-01-13 23:15:00 UTC 0 0 0
2012-01-14 01:35:00 UTC 5 140 0
2012-01-14 02:15:00 UTC 6 40 0
2012-01-14 03:15:00 UTC 8 60 0
2012-01-14 04:15:00 UTC 0 60 60
2012-01-14 04:55:00 UTC 0 30 90
2012-01-14 05:15:00 UTC-2 20 110
2012-01-14 05:35:00 UTC 0 20 130
Use below
select *,
sum(minutes_difference) over(order by date) total_minutes
from (
select *,
ifnull(timestamp_diff(timestamp(date), lag(timestamp(date)) over(order by date), minute), 0) as minutes_difference
from your_table
)
if applied to sample data in your question - output is
Update to answer updated question
select * except(new_grp, grp),
sum(if(temperature > 0, 0, minutes_difference)) over(partition by grp order by date) total_minutes
from (
select *, countif(new_grp) over(order by date) as grp
from (
select *,
ifnull(timestamp_diff(timestamp(date), lag(timestamp(date)) over(order by date), minute), 0) as minutes_difference,
ifnull(((temperature <= 0) and (lag(temperature) over(order by date) > 0)) or
((temperature > 0) and (lag(temperature) over(order by date) <= 0)), true) as new_grp
from your_table
)
)
with output
I am generating one time-series from using the below query.
SELECT date_trunc('day', dd):: TIMESTAMP WITHOUT TIME zone as time_ent
FROM generate_series (
CASE
WHEN MOD(EXTRACT(DAY FROM '2020-12-13 13:02:42'::timestamp)::INT, 4) = 0 THEN
'2020-12-13 13:02:42'::date
ELSE
'2020-12-13 13:02:42'::date + concat(MOD(EXTRACT(DAY FROM '2020-12-13 13:02:42'::timestamp)::INT, 4), ' day')::interval
END
, '2021-12-13 13:02:42'::date
, '5760 min'::INTERVAL
) dd
and it will give me output like below.
2020-12-14 00:00:00.000
2020-12-18 00:00:00.000
2020-12-22 00:00:00.000
2020-12-26 00:00:00.000
2020-12-30 00:00:00.000
2021-01-03 00:00:00.000
but I need output like.
2020-12-16 00:00:00.000
2020-12-20 00:00:00.000
2020-12-24 00:00:00.000
2020-12-28 00:00:00.000
2020-01-01 00:00:00.000
2021-01-05 00:00:00.000
currently, the time series days depend upon the timestamp that I pass. in above it gives me days like 14,18,22...but I want the days like 16,20,24. multiple of 4..days should not depend on the time I passed in query. I tried many things but not any success.
Try this :
SELECT date_trunc('day', dd):: TIMESTAMP WITHOUT TIME zone as time_ent
FROM generate_series ( date_trunc('month', '2020-12-13 13:02:42' :: timestamp) :: date + (ceiling (EXTRACT(DAY FROM '2020-12-13 13:02:42'::timestamp)/4)*4 - 1) :: integer
, '2021-12-13 13:02:42'::date
, '4 days' ::INTERVAL
) dd
see the result
I'm trying to create a timeseries in google bigquery SQL. My data is a series of time ranges covering the period of activity for that record. Here is an example:
Start End
2020-11-01 21:04:00 UTC 2020-11-02 07:15:00 UTC
2020-11-01 21:45:00 UTC 2020-11-02 04:00:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 09:48:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 06:00:00 UTC
I wish to create a new table to total the number of active records within a 15 minute block. "21:00:00" would for example be 21:00 to 21:14.59. My desired output for the above would be:
Period Active_Records
2020-11-01 21:00:00 1
2020-11-01 21:15:00 1
2020-11-01 21:30:00 1
2020-11-01 21:45:00 2
2020-11-01 22:00:00 4
2020-11-01 22:15:00 4
etc until the end of the last active range.
I would also like to be able to generate this on the fly by querying a date range and having it return every 15 minute block in the range and how many active records there was in that period.
Any assistance would be greatly appreciated.
Below is for BigQuery Standard SQL
#standardSQL
select ts as period, count(1) as Active_Records
from unnest((
select generate_timestamp_array(timestamp_trunc(min(start), hour), max(`end`), interval 15 minute)
from `project.dataset.table`
)) ts
join `project.dataset.table`
on not (`end` < ts or start > timestamp_add(ts, interval 15 * 60 - 1 second))
group by ts
if to apply to sample data from your question - output is
This question already has answers here:
Is there a SQL function to expand table?
(4 answers)
Closed 3 years ago.
I have a time-series starting from 2017-01-01 00:00:00 to the end of 2017-12-31 23:00:00 for 1-hour interval. I need to duplicate this 1-year timestamp for 2400 times in the same column. I need help about this one..
Row Date_time
1 2017-01-01 00:00:00 UTC
2 2017-01-01 01:00:00 UTC
3 2017-01-01 02:00:00 UTC
4 2017-01-01 03:00:00 UTC
5 2017-01-01 04:00:00 UTC
6 2017-01-01 05:00:00 UTC
7 2017-01-01 06:00:00 UTC
8 2017-01-01 07:00:00 UTC
...........................
...........................
You would do this in BigQuery by generating a timestamp array and then unnesting:
select ts
from unnest(generate_timestamp_array('2017-01-01 00:00:00', '2017-12-31 23:00:00', interval 1 hour)) ts
You can then get multiple rows with a similar construct:
select ts
from unnest(generate_timestamp_array('2017-01-01 00:00:00', '2017-12-31 23:00:00', interval 1 hour)
) ts cross join
unnest(generate_series(1, 2400)) n
I've got data in ten minutes intervals in my table:
2009-01-26 00:00:00 12
2009-01-26 00:10:00 1.1
2009-01-26 00:20:00 11
2009-01-26 00:30:00 0
2009-01-26 00:40:00 5
2009-01-26 00:50:00 3.4
2009-01-26 01:00:00 7
2009-01-26 01:10:00 7
2009-01-26 01:20:00 7.2
2009-01-26 01:30:00 3
2009-01-26 01:40:00 25
2009-01-26 01:50:00 4
2009-01-26 02:00:00 3
2009-01-26 02:10:00 4
etc.
Is it possible to formulate a single SQL-query for MySQL which will return a series of averages over each hour?
In this case it should return:
5.42
8.87
etc.
It's unclear whether you want the average to be aggregated over days or not.
If you want a different average for midnight on the 26th vs midnight on the 27th, then modify Mabwi's query thus:
SELECT AVG( value ) , thetime
FROM hourly_averages
GROUP BY DATE( thetime ), HOUR( thetime )
Note the additional DATE() in the GROUP BY clause. Without this, the query would average together all of the data from 00:00 to 00:59 without regard to the date on which it happened.
This should work:
SELECT AVG( value ) , thetime
FROM hourly_averages
GROUP BY HOUR( thetime )
Here's the result
AVG(value) thetime
5.4166666865349 2009-01-26 00:00:00
8.8666666348775 2009-01-26 01:00:00
3.5 2009-01-26 02:00:00
There is also another possibility considering the fact that dates have a string representation in the database:
You can use SUBSTRING(thetime, 1, [len]), extracting the common part of your group. For the example with hourly averages you have the SQL query
SELECT SUBSTRING(thetime, 1, 13) AS hours, AVG(value) FROM hourly_averages GROUP BY hours
By the len parameter you can specify the aggregated time interval considering the MySQL date format yyyy-MM-dd HH:mm:ss[.SS...]:
len = 4: group by years
len = 7: group by months
len = 10: group by days
len = 13: group by hours
len = 16: group by minutes
len = 19: group by seconds
We encountered a better performance of this method over using date and time function, especially when used in JOINs in MySQL 5.7. However in MySQL 8 at least for grouping both ways seem to take approximately the same time.