Add TIME to DATETIME value - google-bigquery

I can see from documentation that DATETIME_ADD only works with INT64 values, not TIME objects.
I have a DATETIME which represents the point of start, and then a duration in a TIME object
WITH input AS (
SELECT
DATE(2018,03,05) AS start_date,
TIME(5,0,0) AS start_time,
TIME(8,0,0) AS duration
)
SELECT
*,
DATETIME(start_date,start_time) AS start_datetime,
DATETIME_ADD(
DATETIME_ADD(
DATETIME_ADD(
DATETIME(start_date,start_time),
INTERVAL EXTRACT(HOUR FROM duration) HOUR
),
INTERVAL EXTRACT(MINUTE FROM duration) MINUTE
),
INTERVAL EXTRACT(SECOND FROM duration) SECOND
) AS end_datetime
FROM input
Exists a nicer way to add the 3 values of the TIME object (hours ,minutes ,seconds) to the given DATETIME object?

Below is for BigQuery Standard SQL
WITH input AS (
SELECT
DATE(2018,03,05) AS start_date,
TIME(5,0,0) AS start_time,
TIME(8,0,0) AS duration
)
SELECT
*,
DATETIME(start_date,start_time) AS start_datetime,
DATETIME_ADD(
DATETIME_ADD(
DATETIME_ADD(
DATETIME(start_date,start_time),
INTERVAL EXTRACT(HOUR FROM duration) HOUR
),
INTERVAL EXTRACT(MINUTE FROM duration) MINUTE
),
INTERVAL EXTRACT(SECOND FROM duration) SECOND
) AS end_datetime,
DATETIME_ADD(
DATETIME(start_date,start_time),
INTERVAL DATETIME_DIFF(DATETIME(start_date,duration), DATETIME(start_date), SECOND) SECOND
) end_datetime_nicer_way
FROM input

Related

Aggregating length of time intervals and grouping to fixed time grid

I have some data consisting of shifts, logging the time periods taken as breaks during the shift.
start_ts end_ts shift_id
2022-01-01T08:31:37Z 2022-01-01T08:58:37Z 1
2022-01-01T08:37:37Z 2022-01-01T09:03:37Z 2
2022-01-01T08:46:37Z 2022-01-01T08:48:37Z 3
I want to map this data to a 15-minute grid, counting how many seconds in total (not per shift) are spent on break during that interval. A solution would look like this:
start_time end_time total_break_seconds
2022-01-01T08:30:00Z 2022-01-01T08:45:00Z 1246
2022-01-01T08:45:00Z 2022-01-01T09:00:00Z 1837
2022-01-01T09:00:00Z 2022-01-01T09:15:00Z 217
I know this is a gaps-and-islands style problem, but I'm not sure how to combine this with the mapping to a time grid element. I've looked at using UNIX_SECONDS/time-to-epoch to get the 15-minute intervals, but can't make it out. I'll be working with pretty large tables so ideally I would do as much work as possible before expanding each time interval to the 15-minute grid, but all solutions welcome.
I'm working on BigQuery
Here's a reproducible example to start with:
SELECT
TIMESTAMP("2022-01-01 08:31:37") AS start_ts,
TIMESTAMP("2022-01-01 08:58:37") AS end_ts,
1 as shift_id
UNION ALL (
SELECT
TIMESTAMP("2022-01-01 08:37:37") AS start_ts,
TIMESTAMP("2022-01-01 09:03:37") AS end_ts,
2 as shift_id
)
UNION ALL (
SELECT
TIMESTAMP("2022-01-01 08:46:37") AS start_ts,
TIMESTAMP("2022-01-01 08:48:37") AS end_ts,
3 as shift_id
)
Consider below
with grid as (
select start_time, timestamp_sub(timestamp_add(start_time, interval 15 minute), interval 1 second) end_time
from (
select max(end_ts) max_end,
timestamp_trunc(min(start_ts), hour) min_start
from your_table
), unnest(generate_timestamp_array(min_start, max_end, interval 15 minute)) start_time
), seconds as (
select ts from your_table,
unnest(generate_timestamp_array(start_ts, timestamp_sub(end_ts, interval 1 second), interval 1 second)) ts # this is the line with fix
)
select start_time, end_time, count(*) total_break_seconds
from grid
join seconds
on ts between start_time and end_time
group by start_time, end_time
if applied to sample data in your question - output is
With below query:
WITH breaks AS (
SELECT *,
CASE
-- for staring break (considering start_ts and end_ts are in same break)
WHEN break <= start_ts AND end_ts < break + INTERVAL 15 MINUTE THEN TIMESTAMP_DIFF(end_ts, start_ts, SECOND)
WHEN break <= start_ts THEN 900 - TIMESTAMP_DIFF(start_ts, break, SECOND)
-- for remaining breaks (considering full break + partial break)
ELSE IF(DIV(diff, 900) > 0 AND break + INTERVAL 15 MINUTE < end_ts, 900, MOD(diff, 900))
END AS elapsed
FROM sample,
UNNEST(GENERATE_TIMESTAMP_ARRAY(
TIMESTAMP_TRUNC(start_ts, HOUR), TIMESTAMP_TRUNC(end_ts, HOUR) + INTERVAL 1 HOUR, INTERVAL 15 MINUTE
)) break,
UNNEST([TIMESTAMP_DIFF(end_ts, break, SECOND)]) diff
WHERE break + INTERVAL 15 MINUTE >= start_ts AND break < end_ts
)
SELECT break AS start_time, break + INTERVAL 15 MINUTE AS end_time, SUM(elapsed) total_break_seconds
FROM breaks
GROUP BY 1 ORDER BY 1;
Output will be:

How to calculate average time when it is used TIME format in Bigquery?

I'm trying to get the AVG time, but the time format is not supported by the AVG function. I tried with CAST function, like in some posts were explained, but it seems doesn't work anyway. Thanks
WITH october_fall AS
(SELECT
start_station_name,
end_station_name,
start_station_id,
end_station_id,
EXTRACT (DATE FROM started_at) AS start_date,
EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
EXTRACT (TIME FROM started_at) AS start_time,
EXTRACT (DATE FROM ended_at) AS end_date,
EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,
EXTRACT (TIME FROM ended_at) AS end_time,
DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_lenght,
member_casual
FROM
`ciclystic.cyclistic_seasonal_analysis.fall_202010` AS fall_analysis
ORDER BY
started_at DESC)
SELECT
COUNT (start_week_date) AS avg_start_1,
AVG (start_time) AS avg_start_time_1, ## here is where the problem start
member_casual
FROM
october_fall
WHERE
start_week_date = 1
GROUP BY
member_casual
Try below
SELECT
COUNT (start_week_date) AS avg_start_1,
TIME(
EXTRACT(hour FROM AVG(start_time - '0:0:0')),
EXTRACT(minute FROM AVG(start_time - '0:0:0')),
EXTRACT(second FROM AVG(start_time - '0:0:0'))
) as avg_start_time_1
member_casual
FROM
october_fall
WHERE
start_week_date = 1
GROUP BY
member_casual
Another option would be
SELECT
COUNT (start_week_date) AS avg_start_1,
PARSE_TIME('0-0 0 %H:%M:%E*S', '' || AVG(start_time - '0:0:0')) as avg_start_time_1
member_casual
FROM
october_fall
WHERE
start_week_date = 1
GROUP BY
member_casual
Because BigQuery cannot calc AVG on TIME type, you would see the error message if you tried to do so.
Instead you could calc AVG by INT64.
The time_ts is timestamp format.
I tried to use time_diff to calc the differences from time to "00:00:00", then I could get the seconds in FLOAT64 format and cast it to INT64 format.
I create a function secondToTime. It's pretty straightforward to calc hour / minute / second and parse back to time format.
For the date format, I think you could do it in the same way.
create temp function secondToTime (seconds INT64)
returns time
as (
PARSE_TIME (
"%H:%M:%S",
concat(
cast(seconds / 3600 as int),
":",
cast(mod(seconds, 3600) / 60 as int),
":",
mod(seconds, 60)
)
)
);
with october_fall as (
select
extract (date from time_ts) as start_date,
extract (time from time_ts) as start_time
from `bigquery-public-data.hacker_news.comments`
limit 10
) SELECT
avg(time_diff(start_time, time '00:00:00', second)),
secondToTime(
cast(avg(time_diff(start_time, time '00:00:00', second)) as INT64)
),
secondToTime(0),
secondToTime(60),
secondToTime(3601),
secondToTime(7265)
FROM october_fall
I know a few months have passed, but maybe someone else will be facing the same issue.
As for the section where the problem occurred, something like this worked for me and gave the average ride_length:
FORMAT_TIMESTAMP
('%T',
TIMESTAMP_SECONDS(CAST(AVG(TIME_DIFF(ride_length, '00:00:00', SECOND)) AS
INT64)))
AS avg_ride_length

How to subtract 2 timestamp (one as a date, other in HH24:MI format) and get the result in HH24:MI format in OracleSQL

I have a table with two columns, Start_time and SLA.
Start time updates for each day and is in a date format e.g., 01-Jun-2021 19:15:38
SLA column is having fixed HH24MI as 2010
I want 1915 - 2010 to be -00:55 (as in HH24MI format)
SELECT TO_CHAR((TO_CHAR(START_TIME,'HH24')||TO_CHAR(START_TIME,'MI'))-2010,'0000')
FROM DUAL;
Above will give the result as -0095 but I want it to be -00:55
Store the start_time as a DATE data type and the sla as an INTERVAL DAY TO SECOND data type:
CREATE TABLE table_name (
start_time DATE,
sla INTERVAL DAY TO SECOND
);
Then your data would be:
INSERT INTO table_name ( start_time, sla ) VALUES (
TO_DATE('01-Jun-2021 19:15:38', 'DD-MON-YYYY HH24:MI:SS', 'NLS_DATE_LANGUAGE=American'),
INTERVAL '20:10:00' HOUR TO SECOND
);
And, to find the difference, you can use:
SELECT start_time,
sla,
(start_time - TRUNC(start_time)) DAY TO SECOND - sla AS difference
FROM table_name
Which outputs:
START_TIME
SLA
DIFFERENCE
2021-06-01 19:15:38
+00 20:10:00.000000
-000000000 00:54:22.000000000
If you want the output as a formatted string, rather than as an interval, then:
SELECT start_time,
sla,
CASE WHEN difference < INTERVAL '0' HOUR THEN '-' END
|| TO_CHAR( ABS( EXTRACT( HOUR FROM difference ) ), 'FM00' )
|| TO_CHAR( ABS( EXTRACT( MINUTE FROM difference ) ), 'FM00' )
AS difference
FROM (
SELECT start_time,
sla,
(start_time - TRUNC(start_time)) DAY TO SECOND - sla AS difference
FROM table_name
)
Which outputs:
START_TIME
SLA
DIFFERENCE
2021-06-01 19:15:38
+00 20:10:00.000000
-0054
db<>fiddle here

BigQuery: extract SECOND from TIMESTAMP

How can i run this query?
Error Message: No matching signature for function EXTRACT for argument types: DATE_TIME_PART FROM INT64. Supported signatures: EXTRACT(DATE_TIME_PART FROM DATE); EXTRACT(DATE_TIME_PART FROM TIMESTAMP [AT TIME ZONE STRING]); EXTRACT(DATE_TIME_PART FROM DATETIME); EXTRACT(DATE_TIME_PART FROM TIME) at [12:12]
They both give the same error message
WHERE EXTRACT( SECOND FROM event_timestamp )
- EXTRACT( SECOND FROM last_event) >= (60 * 10)
OR last_event IS NULL
WHERE EXTRACT( SECOND FROM event_timestamp AT TIME ZONE "UTC")
- EXTRACT( SECOND FROM last_event AT TIME ZONE "UTC") >= (60 * 10)
OR last_event IS NULL
use TIMESTAMP_MICROS()
WHERE EXTRACT( SECOND FROM TIMESTAMP_MICROS(event_timestamp))
- EXTRACT( SECOND FROM last_event) >= (60 * 10)
OR last_event IS NULL
If you want events that are more than 10 minutes from the previous timestamp, just use some arithmetic and comparisons:
where event_timestamp > last_event + (60 * 10 * 1000000) or
last_event is null
You are storing the timestamp as a microseconds value. You don't need to convert to another type.
If you really wanted to convert this to timestamp values, you could use:
where timestamp_micros(event_timestamp) > timestamp_add(timestamp_micros(last_event), interval 10 minute) or
last_event is null
In particular, you don't want to extract seconds. That value is always going to be between 0 and 59.

Select Data From Multiple Days Between Certain Times (Spanning 2 days)

I need to know how many entries appear in my DB for the past 7 days with a timestamp between 23:00 & 01:00...
The Issue I have is the timestamp goes across 2 days and unsure if this is even possible in the one query.
So far I have come up with the below:
select trunc(timestamp) as DTE, extract(hour from timestamp) as HR, count(COLUMN) as Total
from TABLE
where trunc(timestamp) >= '12-NOV-19' and
extract(hour from timestamp) in ('23','00','01')
group by trunc(timestamp), extract(hour from timestamp)
order by 1,2 desc;
The result I am hoping for is something like this:
DTE | Total
20-NOV-19 5
19-NOV-19 4
18-NOV-19 4
17-NOV-19 6
Many thanks
Filter on the day first comparing it to TRUNC( SYSDATE ) - INTERVAL '7' DAY and then consider the hours by comparing the timestamp to itself truncated back to midnight with an offset of a number of hours.
select trunc(timestamp) as DTE,
extract(hour from timestamp) as HR,
count(COLUMN) as Total
from TABLE
WHERE timestamp >= TRUNC( SYSDATE ) - INTERVAL '7' DAY
AND ( timestamp <= TRUNC( timestamp ) + INTERVAL '01:00' HOUR TO MINUTE
OR timestamp >= TRUNC( timestamp ) + INTERVAL '23:00' HOUR TO MINUTE
)
group by trunc(timestamp), extract(hour from timestamp)
order by DTE, HR desc;
Subtract or add an hour to derive the date. I'm not sure what date you want to assign to each period, but the idea is:
select trunc(timestamp - interval '1' hour) as DTE,
count(*) as Total
from t
where trunc(timestamp - interval '1' hour) >= DATE '2019-11-12' and
extract(hour from timestamp) in (23, 0)
group by trunc(timestamp - interval '1' hour)
order by 1 desc;
Note: If you want times between 11:00 p.m. and 1:00 a.m., then you want the hour to be 23 or 0.