I have a table and one of the columns is a timestamp. What I would like to do is a SQL query (BigQuery compatible) that rounds the timestamp of each line to the quarter of the hour previous to that time.
Examples:
2019-07-05 21:11:28 UTC -> 2019-07-05 21:00:00 UTC
2019-07-05 21:17:05 UTC -> 2019-07-05 21:15:00 UTC
2019-07-05 20:29:56 UTC -> 2019-07-05 20:15:00 UTC
2019-07-05 21:55:39 UTC -> 2019-07-05 21:45:00 UTC
I found TIMESTAMP_TRUNC that can round to minutes, but this will round to the timestamp's minute, not the quarter.
Do you guys have any idea of how could I do this?
Thanks in advance
Below is for BigQuery Standard SQL
#standardSQL
SELECT ts,
TIMESTAMP_SECONDS(UNIX_SECONDS(ts) - MOD(UNIX_SECONDS(ts), 15 * 60)) ts_rounded_to_quarter_of_hour
FROM `project.dataset.table`
Another, slightly refactored version is
#standardSQL
SELECT ts,
TIMESTAMP_SECONDS(ts_seconds_since_epoch - MOD(ts_seconds_since_epoch, 15 * 60)) ts_rounded_to_quarter_of_hour
FROM `project.dataset.table`, UNNEST([UNIX_SECONDS(ts)]) ts_seconds_since_epoch
And finally, my favorite version would be
#standardSQL
CREATE TEMP FUNCTION TIMESTAMP_TRUNC_TO_QUATER_OF_HOUR(ts TIMESTAMP) AS ((
SELECT TIMESTAMP_SECONDS(ts_seconds_since_epoch - MOD(ts_seconds_since_epoch, 15 * 60))
FROM UNNEST([UNIX_SECONDS(ts)]) ts_seconds_since_epoch
));
SELECT ts,
TIMESTAMP_TRUNC_TO_QUATER_OF_HOUR(ts) AS ts_rounded_to_quarter_of_hour
FROM `project.dataset.table`
You can test, play with above using sample data from your question as in below example
#standardSQL
CREATE TEMP FUNCTION TIMESTAMP_TRUNC_TO_QUATER_OF_HOUR(ts TIMESTAMP) AS ((
SELECT TIMESTAMP_SECONDS(ts_seconds_since_epoch - MOD(ts_seconds_since_epoch, 15 * 60))
FROM UNNEST([UNIX_SECONDS(ts)]) ts_seconds_since_epoch
));
WITH `project.dataset.table` AS (
SELECT TIMESTAMP '2019-07-05 21:11:28 UTC' ts UNION ALL --> 2019-07-05 21:00:00 UTC
SELECT '2019-07-05 21:17:05 UTC' UNION ALL --> 2019-07-05 21:15:00 UTC
SELECT '2019-07-05 20:29:56 UTC' UNION ALL --> 2019-07-05 20:15:00 UTC
SELECT '2019-07-05 21:55:39 UTC' --> 2019-07-05 21:45:00 UTC
)
SELECT ts,
TIMESTAMP_TRUNC_TO_QUATER_OF_HOUR(ts) AS ts_rounded_to_quarter_of_hour
FROM `project.dataset.table`
Obviously, all three above versions return below [same] result
Row ts ts_rounded_to_quarter_of_hour
1 2019-07-05 21:11:28 UTC 2019-07-05 21:00:00 UTC
2 2019-07-05 21:17:05 UTC 2019-07-05 21:15:00 UTC
3 2019-07-05 20:29:56 UTC 2019-07-05 20:15:00 UTC
4 2019-07-05 21:55:39 UTC 2019-07-05 21:45:00 UTC
If you are happy using stored procedures you can do something like this:
declare #d datetime='2019-07-05 21:11:28'
select DATEADD(mi, DATEDIFF(mi, 0, #d)/15*15, 0)
This works because the 0 integer in the example above is actually the start of the epoch (1900-01-01).
You can use timestamp_trunc() and some date arithmetic:
select timestamp_add(timestamp_trunc(current_timestamp, hour),
interval cast(extract(minute from current_timestamp) / 15 as int64)*15 minute
)
Related
I'm trying to create a timeseries in google bigquery SQL. My data is a series of time ranges covering the period of activity for that record. Here is an example:
Start End
2020-11-01 21:04:00 UTC 2020-11-02 07:15:00 UTC
2020-11-01 21:45:00 UTC 2020-11-02 04:00:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 09:48:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 06:00:00 UTC
I wish to create a new table to total the number of active records within a 15 minute block. "21:00:00" would for example be 21:00 to 21:14.59. My desired output for the above would be:
Period Active_Records
2020-11-01 21:00:00 1
2020-11-01 21:15:00 1
2020-11-01 21:30:00 1
2020-11-01 21:45:00 2
2020-11-01 22:00:00 4
2020-11-01 22:15:00 4
etc until the end of the last active range.
I would also like to be able to generate this on the fly by querying a date range and having it return every 15 minute block in the range and how many active records there was in that period.
Any assistance would be greatly appreciated.
Below is for BigQuery Standard SQL
#standardSQL
select ts as period, count(1) as Active_Records
from unnest((
select generate_timestamp_array(timestamp_trunc(min(start), hour), max(`end`), interval 15 minute)
from `project.dataset.table`
)) ts
join `project.dataset.table`
on not (`end` < ts or start > timestamp_add(ts, interval 15 * 60 - 1 second))
group by ts
if to apply to sample data from your question - output is
Using data mytable
date value
2019-07-11 02:20:00 UTC 14.99
2019-07-11 02:30:00 UTC 12.53
2019-07-11 02:40:00 UTC 12.53
2019-07-11 02:50:00 UTC 14.99
2019-07-11 03:00:00 UTC 10.07
2019-07-11 03:10:00 UTC 7.61
2019-07-11 03:20:00 UTC 7.61
2019-07-11 03:30:00 UTC 10.07
2019-07-11 03:40:00 UTC 10.07
2019-07-11 03:50:00 UTC 7.61
2019-07-11 04:00:00 UTC 7.61
2019-07-11 04:10:00 UTC 7.61:
I want to output MAX (value) over following 30 minutes IF current row value is > 10 and previous row is < 10.
For example, if value is >10, check previous row value is <10. If this is true, output MAX(value) over 30 minutes following current row. For the table above, the first value that this would output should be 10.07
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
CASE value > 10 AND prev_value < 10
WHEN TRUE THEN
MAX(value) OVER(ORDER BY UNIX_SECONDS(ts) RANGE BETWEEN CURRENT ROW AND 1800 FOLLOWING)
ELSE NULL
END max_value_next_30_min
FROM (
SELECT *, LAG(value) OVER(ORDER BY ts) prev_value
FROM `project.dataset.table`
)
-- ORDER BY ts
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT TIMESTAMP '2019-07-11 02:20:00 UTC' ts, 14.99 value UNION ALL
SELECT '2019-07-11 02:30:00 UTC', 12.53 UNION ALL
SELECT '2019-07-11 02:40:00 UTC', 12.53 UNION ALL
SELECT '2019-07-11 02:50:00 UTC', 14.99 UNION ALL
SELECT '2019-07-11 03:00:00 UTC', 10.07 UNION ALL
SELECT '2019-07-11 03:10:00 UTC', 7.61 UNION ALL
SELECT '2019-07-11 03:20:00 UTC', 7.61 UNION ALL
SELECT '2019-07-11 03:30:00 UTC', 10.07 UNION ALL
SELECT '2019-07-11 03:40:00 UTC', 10.07 UNION ALL
SELECT '2019-07-11 03:50:00 UTC', 17.61 UNION ALL
SELECT '2019-07-11 04:00:00 UTC', 7.61 UNION ALL
SELECT '2019-07-11 04:10:00 UTC', 7.61
)
SELECT *,
CASE value > 10 AND prev_value < 10
WHEN TRUE THEN
MAX(value) OVER(ORDER BY UNIX_SECONDS(ts) RANGE BETWEEN CURRENT ROW AND 1800 FOLLOWING)
ELSE NULL
END max_value_next_30_min
FROM (
SELECT *, LAG(value) OVER(ORDER BY ts) prev_value
FROM `project.dataset.table`
)
-- ORDER BY ts
with output
Row ts value prev_value max_value_next_30_min
1 2019-07-11 02:20:00 UTC 14.99 null null
2 2019-07-11 02:30:00 UTC 12.53 14.99 null
3 2019-07-11 02:40:00 UTC 12.53 12.53 null
4 2019-07-11 02:50:00 UTC 14.99 12.53 null
5 2019-07-11 03:00:00 UTC 10.07 14.99 null
6 2019-07-11 03:10:00 UTC 7.61 10.07 null
7 2019-07-11 03:20:00 UTC 7.61 7.61 null
8 2019-07-11 03:30:00 UTC 10.07 7.61 17.61
9 2019-07-11 03:40:00 UTC 10.07 10.07 null
10 2019-07-11 03:50:00 UTC 17.61 10.07 null
11 2019-07-11 04:00:00 UTC 7.61 17.61 null
12 2019-07-11 04:10:00 UTC 7.61 7.61 null
Project: BIRT
Datasource: Amazon Redshift
I want to generate a Data Set with value of:
00:00:00
1:00:00
2:00:00
3:00:00
4:00:00
5:00:00
6:00:00
7:00:00
8:00:00
9:00:00
10:00:00
11:00:00
12:00:00
13:00:00
14:00:00
15:00:00
16:00:00
17:00:00
18:00:00
19:00:00
20:00:00
21:00:00
22:00:00
23:00:00
23:59:59 //the last value should display like this
I was able to generate a series of 24hours with 1 hr interval, but I need to make the last one's value as 23:59:59
Query to generate 24 hours with 1 hour interval:
SELECT start_date + gs * interval '1 hour' as times
FROM (
SELECT '2019-05-21 00:00:00'::timestamp as start_date, generate_series(1,24, 1) as gs)
How is that?
Thanks
Updating your query, just adding a if for the last hour:
SELECT
start_date + gs * interval '1 hour'
- if(gs=24, interval '1 second', interval '0 second') as times
FROM (
SELECT
'2019-05-21 00:00:00'::timestamp as start_date
, generate_series(1,24, 1) as gs
)
I think too much about this, the simplest way to achieve this is just add a default value on the report parameter , if you're going to use the data set in the report parameter
or with this:
SELECT start_date + gs * interval '1 hour' as times
FROM (
SELECT '2020-01-01 00:00:00'::timestamp as start_date, generate_series(1,24, 1) as gs)
union
select '2020-01-01 23:59:59'::timestamp as start_date
I have a bunch of historic timestamp dates. Basically, I need to simulate a new date such that the historic dates are moved to within a 48 hour window of the current date.
This is an extract of the date column:
2019-05-07 17:46:57.733 UTC
2019-05-15 13:03:25.247 UTC
2019-05-07 13:27:49.453 UTC
2019-05-11 04:24:02.293 UTC
2019-04-18 08:00:54.660 UTC
2019-04-25 05:34:36.777 UTC
2019-05-14 16:48:07.863 UTC
Assuming the current date is 2019-10-03 15:00:00. The expected range of dates should be between 2019-10-03 15:00:00 and 2019-10-01 15:00:00
The expected results should be the following.
2019-10-02 17:46:57.733 UTC
2019-10-03 13:03:25.247 UTC
2019-10-03 13:27:49.453 UTC
2019-10-03 04:24:02.293 UTC
2019-10-02 08:00:54.660 UTC
2019-10-02 05:34:36.777 UTC
2019-10-01 16:48:07.863 UTC
Why not just construct two days of random timestamps?
select timestamp_add(current_timestamp, interval cast(rand() * (60 * 60 * 24 * 2) as int64) second)
from t
It feels like you are looking for a random date function.
CREATE TEMP FUNCTION random_date()
RETURNS DATE
AS ( DATE_SUB(CURRENT_DATE(), INTERVAL CAST(FLOOR(RAND() * 29 / 10) AS INT64) DAY));
with data as (
select "2019-05-07 17:46:57.733 UTC" as date_time UNION ALL
select "2019-05-15 13:03:25.247 UTC" UNION ALL
select "2019-05-07 13:27:49.453 UTC" UNION ALL
select "2019-05-11 04:24:02.293 UTC" UNION ALL
select "2019-04-18 08:00:54.660 UTC" UNION ALL
select "2019-04-25 05:34:36.777 UTC" UNION ALL
select "2019-05-14 16:48:07.863 UTC" )
SELECT
CONCAT(FORMAT_DATE("%Y-%m-%d", random_date()), " ", SUBSTR(date_time, 12))
FROM data;
Output:
+-----------------------------+
| f0_ |
+-----------------------------+
| 2019-10-01 17:46:57.733 UTC |
| 2019-10-01 13:03:25.247 UTC |
| 2019-10-02 13:27:49.453 UTC |
| 2019-10-03 04:24:02.293 UTC |
| 2019-10-03 08:00:54.660 UTC |
| 2019-10-03 05:34:36.777 UTC |
| 2019-10-02 16:48:07.863 UTC |
+-----------------------------+
Is there a way to generate sequential timestamps in BigQuery that is focused on hours, minutes, and seconds?
In BigQuery you can generate sequential dates by:
select *
FROM UNNEST(GENERATE_DATE_ARRAY('2016-10-18', '2016-10-19', INTERVAL 1 DAY)) as day
This will generate the dates from 2016-10-18 to 2016-10-19 in date intervals
Row day
1 2016-10-18
2 2016-10-19
But let's say I want intervals in 15 minutes or 5 minutes, is there a way to do that?
First, I would recommend "starring" the feature request for GENERATE_TIMESTAMP_ARRAY to express interest in having a function like this. Given GENERATE_ARRAY, though, the best option currently is to use a query of this form:
SELECT TIMESTAMP_ADD('2018-04-01', INTERVAL 15 * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, 13)) AS x;
If you want a minute-based GENERATE_TIMESTAMP_ARRAY equivalent, you can use a UDF like this:
CREATE TEMP FUNCTION GenerateMinuteTimestampArray(
t0 TIMESTAMP, t1 TIMESTAMP, minutes INT64) AS (
ARRAY(
SELECT TIMESTAMP_ADD(t0, INTERVAL minutes * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, TIMESTAMP_DIFF(t1, t0, MINUTE))) AS x
)
);
SELECT ts
FROM UNNEST(GenerateMinuteTimestampArray('2018-04-01', '2018-04-01 12:00:00', 15)) AS ts;
This returns a timestamp for each 15-minute interval between midnight and 12 PM on April 1.
Update: You can now use the GENERATE_TIMESTAMP_ARRAY function in BigQuery. If you want to generate timestamps at intervals of 15 minutes, for example, you can use:
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-18', '2016-10-19', INTERVAL 15 MINUTE);
Epochs seems like the way to go.
But requires to convert date to epoch first.
select TIMESTAMP_MICROS(CAST(day * 1000000 as INT64))
FROM UNNEST(GENERATE_ARRAY(1522540800, 1525132799, 900)) as day
Row f0_
1 2018-04-01 00:00:00.000 UTC
2 2018-04-01 00:15:00.000 UTC
3 2018-04-01 00:30:00.000 UTC
4 2018-04-01 00:45:00.000 UTC
5 2018-04-01 01:00:00.000 UTC
6 2018-04-01 01:15:00.000 UTC
7 2018-04-01 01:30:00.000 UTC
8 2018-04-01 01:45:00.000 UTC
9 2018-04-01 02:00:00.000 UTC
10 2018-04-01 02:15:00.000 UTC
11 2018-04-01 02:30:00.000 UTC
12 2018-04-01 02:45:00.000 UTC
13 2018-04-01 03:00:00.000 UTC