Multiplying a timestamp data for several times in BigQuery [duplicate] - sql

This question already has answers here:
Is there a SQL function to expand table?
(4 answers)
Closed 3 years ago.
I have a time-series starting from 2017-01-01 00:00:00 to the end of 2017-12-31 23:00:00 for 1-hour interval. I need to duplicate this 1-year timestamp for 2400 times in the same column. I need help about this one..
Row Date_time
1 2017-01-01 00:00:00 UTC
2 2017-01-01 01:00:00 UTC
3 2017-01-01 02:00:00 UTC
4 2017-01-01 03:00:00 UTC
5 2017-01-01 04:00:00 UTC
6 2017-01-01 05:00:00 UTC
7 2017-01-01 06:00:00 UTC
8 2017-01-01 07:00:00 UTC
...........................
...........................

You would do this in BigQuery by generating a timestamp array and then unnesting:
select ts
from unnest(generate_timestamp_array('2017-01-01 00:00:00', '2017-12-31 23:00:00', interval 1 hour)) ts
You can then get multiple rows with a similar construct:
select ts
from unnest(generate_timestamp_array('2017-01-01 00:00:00', '2017-12-31 23:00:00', interval 1 hour)
) ts cross join
unnest(generate_series(1, 2400)) n

Related

How to find entry that is between two dates?

I have a table as:
Id start_timestamp end_timestamp
1 2021-07-12 03:00:00 2021-07-13 11:58:05
2 2021-07-13 04:00:00 2021-07-13 05:00:00
3 2021-07-13 04:00:00 2021-07-13 09:00:00
4 2021-07-13 04:00:00 NULL
5 2020-04-10 04:00:00 2020-04-10 04:01:00
....
I want to find all records that fall between two specific timestamps? Basically I'm looking to understand what process ran during a high pick time of the day (it doesn't matter if they have 1 sec in the window or hours.. just occurrence in the window is enough)
So if the timestamps are 2021-07-13 00:00:00 to 2021-07-13 04:30:00
The query will return
1
2
3
4
How can I do that with SQL? (Preferably Presto)
This is the overlapping range problem. You may use:
SELECT *
FROM yourTable
WHERE
(end_timestamp > '2021-07-13 00:00:00' OR end_timestamp IS NULL) AND
(start_timestamp < '2021-07-13 04:30:00' OR start_timestamp IS NULL);
My answer assumes that a missing start/end timestamp value in the table logically means that this value should not be considered. This seems to be the logic you want here.

Having trouble joining information from two tables using timestamps

I have two tables in BigQuery:
A - Has the exact start and end time of the processes
B - It has the cost per hour of several products consumed by the processes
I need to calculate an estimate of the cost of each process (table A) using the data in table B. I thought of doing this by summing the cost of all products (table B) included in the time period consumed by the process in table A.
So, here is some fake data for the two tables and the desired output:
Process metadata (Table A)
process_name
timestamp_init
timestamp_end
a
2021-04-01 11:15:44.888153 UTC
2021-04-01 12:25:44.888153 UTC
b
2021-04-01 13:50:17.033498 UTC
2021-04-01 14:50:17.033498 UTC
c
2008-04-02 20:19:36.983747 UTC
2008-04-02 20:58:20.983747 UTC
d
2010-04-02 22:06:10.348753 UTC
2010-04-02 23:08:28.348753 UTC
Platform costs (Table B)
product
usage_start_time
usage_end_time
cost
ax
2021-04-01 11:00:00 UTC
2021-04-01 12:00:00 UTC
10
b4
2021-04-01 11:00:00 UTC
2021-04-01 12:00:00 UTC
9
cf
2021-04-01 11:00:00 UTC
2021-04-01 12:00:00 UTC
25
jw
2021-04-01 14:00:00 UTC
2021-04-01 15:00:00 UTC
125
ki
2021-04-01 20:00:00 UTC
2021-04-01 21:00:00 UTC
180
fr
2021-04-01 22:00:00 UTC
2021-04-01 23:00:00 UTC
250
Desired Results
process_name
total_cost
a
44
b
125
c
180
d
250
I developed the following code:
SELECT a.process_name,
SUM(b.cost) as total_cost
FROM A a,
B b
WHERE b.usage_start_time >= timestamp_trunc(timestamp_add(a.timestamp_init, interval 30 minute), hour)
AND b.usage_end_time <= timestamp_trunc(timestamp_add(a.timestamp_end, interval 30 minute), hour)
GROUP BY a.process_name
Note that I'm rounding the timestamps from table A so it matches the format of table B.
But for some reason I don't know, it is not returning any results. What am I doing wrong?
I'm not sure where the 30 minutes is coming from. The logic for an overlap would be:
SELECT a.process_name,
SUM(b.cost) as total_cost
FROM A a JOIN
B b
ON b.usage_start_time < a.timestamp_end AND
b.ussage_end_time >= a.timestamp_init
GROUP BY a.process_name

Google Bigquery - Create time series of number of active records

I'm trying to create a timeseries in google bigquery SQL. My data is a series of time ranges covering the period of activity for that record. Here is an example:
Start End
2020-11-01 21:04:00 UTC 2020-11-02 07:15:00 UTC
2020-11-01 21:45:00 UTC 2020-11-02 04:00:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 09:48:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 06:00:00 UTC
I wish to create a new table to total the number of active records within a 15 minute block. "21:00:00" would for example be 21:00 to 21:14.59. My desired output for the above would be:
Period Active_Records
2020-11-01 21:00:00 1
2020-11-01 21:15:00 1
2020-11-01 21:30:00 1
2020-11-01 21:45:00 2
2020-11-01 22:00:00 4
2020-11-01 22:15:00 4
etc until the end of the last active range.
I would also like to be able to generate this on the fly by querying a date range and having it return every 15 minute block in the range and how many active records there was in that period.
Any assistance would be greatly appreciated.
Below is for BigQuery Standard SQL
#standardSQL
select ts as period, count(1) as Active_Records
from unnest((
select generate_timestamp_array(timestamp_trunc(min(start), hour), max(`end`), interval 15 minute)
from `project.dataset.table`
)) ts
join `project.dataset.table`
on not (`end` < ts or start > timestamp_add(ts, interval 15 * 60 - 1 second))
group by ts
if to apply to sample data from your question - output is

BigQuery - A way to generate timestamps based on hour/minute/seconds?

Is there a way to generate sequential timestamps in BigQuery that is focused on hours, minutes, and seconds?
In BigQuery you can generate sequential dates by:
select *
FROM UNNEST(GENERATE_DATE_ARRAY('2016-10-18', '2016-10-19', INTERVAL 1 DAY)) as day
This will generate the dates from 2016-10-18 to 2016-10-19 in date intervals
Row day
1 2016-10-18
2 2016-10-19
But let's say I want intervals in 15 minutes or 5 minutes, is there a way to do that?
First, I would recommend "starring" the feature request for GENERATE_TIMESTAMP_ARRAY to express interest in having a function like this. Given GENERATE_ARRAY, though, the best option currently is to use a query of this form:
SELECT TIMESTAMP_ADD('2018-04-01', INTERVAL 15 * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, 13)) AS x;
If you want a minute-based GENERATE_TIMESTAMP_ARRAY equivalent, you can use a UDF like this:
CREATE TEMP FUNCTION GenerateMinuteTimestampArray(
t0 TIMESTAMP, t1 TIMESTAMP, minutes INT64) AS (
ARRAY(
SELECT TIMESTAMP_ADD(t0, INTERVAL minutes * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, TIMESTAMP_DIFF(t1, t0, MINUTE))) AS x
)
);
SELECT ts
FROM UNNEST(GenerateMinuteTimestampArray('2018-04-01', '2018-04-01 12:00:00', 15)) AS ts;
This returns a timestamp for each 15-minute interval between midnight and 12 PM on April 1.
Update: You can now use the GENERATE_TIMESTAMP_ARRAY function in BigQuery. If you want to generate timestamps at intervals of 15 minutes, for example, you can use:
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-18', '2016-10-19', INTERVAL 15 MINUTE);
Epochs seems like the way to go.
But requires to convert date to epoch first.
select TIMESTAMP_MICROS(CAST(day * 1000000 as INT64))
FROM UNNEST(GENERATE_ARRAY(1522540800, 1525132799, 900)) as day
Row f0_
1 2018-04-01 00:00:00.000 UTC
2 2018-04-01 00:15:00.000 UTC
3 2018-04-01 00:30:00.000 UTC
4 2018-04-01 00:45:00.000 UTC
5 2018-04-01 01:00:00.000 UTC
6 2018-04-01 01:15:00.000 UTC
7 2018-04-01 01:30:00.000 UTC
8 2018-04-01 01:45:00.000 UTC
9 2018-04-01 02:00:00.000 UTC
10 2018-04-01 02:15:00.000 UTC
11 2018-04-01 02:30:00.000 UTC
12 2018-04-01 02:45:00.000 UTC
13 2018-04-01 03:00:00.000 UTC

How to group by hour in HANA

I have the following table in HANA :
vehicle_id time roaming_time parking_time
1 Sep 01,2016 3:09:03 AM 3 9
2 Sep 01,2016 3:12:03 AM 6 8
1 Sep 01,2016 9:10:03 AM 10 6
4 Sep 01,2016 10:09:03 AM 9 3
1 Sep 01,2016 10:10:03 AM 10 10
4 Sep 01,2016 12:09:03 AM 3 9
from these information I wanted to know that what is the sum of roaming_time and sum of parking_time for each hour from all the vehicles and want the output in the format:
time roaming_time parking_time
____ _____________ ____________
2016-09-01 00:00:00 3 9
2016-09-01 01:00:00 6 8
2016-09-01 02:00:00 9 6
2016-09-01 03:00:00 3 6
2016-09-01 04:00:00 12 3
2016-09-01 05:00:00 15 8
2016-09-01 06:00:00 18 4
2016-09-01 07:00:00 8 3
2016-09-01 08:00:00 9 4
2016-09-01 09:00:00 6 6
2016-09-01 10:00:00 6 9
........
2016-09-01 23:00:00 3 12
I need to group the following query which gives all the sum by hour wise and get the expected result:
select sum(roaming_time) as roaming_time,sum(parking_time) as parking_time
from time>='2016-09-01 00:00:00'
time>='2016-09-01 23:59:59'
I do not know how to do the grouping by hour in HANA. Any help is appreciated
Here is one method . . . it converts the time to a date and hour format:
select to_varchar(time, 'YYYY-MM-DD'), hour(time),
sum(roaming_time) as roaming_time, sum(parking_time) as parking_time from t
group by date(time), hour(time)
order by to_varchar(time, 'YYYY-MM-DD'), hour(time);
Use a group by clause with SERIES_ROUND(). Avoid date() and hour() and similar data/time functions on large data sets as they tend to be slower.
select SERIES_ROUND(time, 'INTERVAL 1 HOUR') as time,
sum(roaming_time) as roaming_time, sum(parking_time) as parking_time from t
group by SERIES_ROUND(time, 'INTERVAL 1 HOUR')
order by SERIES_ROUND(time, 'INTERVAL 1 HOUR');
Another approach is to convert it to a string, especially if no further time calculations are required.
This could look like this:
select to_varchar(time, 'DD.MM.YYYY HH24') as parking_hour ,
sum(roaming_time) as roaming_time, sum(parking_time) as parking_time from t
group by to_varchar(time, 'DD.MM.YYYY HH24') as parking_hour
order byto_varchar(time, 'DD.MM.YYYY HH24') as parking_hour;