Having trouble joining information from two tables using timestamps

Having trouble joining information from two tables using timestamps - sql

I have two tables in BigQuery:
A - Has the exact start and end time of the processes
B - It has the cost per hour of several products consumed by the processes
I need to calculate an estimate of the cost of each process (table A) using the data in table B. I thought of doing this by summing the cost of all products (table B) included in the time period consumed by the process in table A.
So, here is some fake data for the two tables and the desired output:
Process metadata (Table A)
process_name
timestamp_init
timestamp_end
a
2021-04-01 11:15:44.888153 UTC
2021-04-01 12:25:44.888153 UTC
b
2021-04-01 13:50:17.033498 UTC
2021-04-01 14:50:17.033498 UTC
c
2008-04-02 20:19:36.983747 UTC
2008-04-02 20:58:20.983747 UTC
d
2010-04-02 22:06:10.348753 UTC
2010-04-02 23:08:28.348753 UTC
Platform costs (Table B)
product
usage_start_time
usage_end_time
cost
ax
2021-04-01 11:00:00 UTC
2021-04-01 12:00:00 UTC
10
b4
2021-04-01 11:00:00 UTC
2021-04-01 12:00:00 UTC
9
cf
2021-04-01 11:00:00 UTC
2021-04-01 12:00:00 UTC
25
jw
2021-04-01 14:00:00 UTC
2021-04-01 15:00:00 UTC
125
ki
2021-04-01 20:00:00 UTC
2021-04-01 21:00:00 UTC
180
fr
2021-04-01 22:00:00 UTC
2021-04-01 23:00:00 UTC
250
Desired Results
process_name
total_cost
a
44
b
125
c
180
d
250
I developed the following code:
SELECT a.process_name,
SUM(b.cost) as total_cost
FROM A a,
B b
WHERE b.usage_start_time >= timestamp_trunc(timestamp_add(a.timestamp_init, interval 30 minute), hour)
AND b.usage_end_time <= timestamp_trunc(timestamp_add(a.timestamp_end, interval 30 minute), hour)
GROUP BY a.process_name
Note that I'm rounding the timestamps from table A so it matches the format of table B.
But for some reason I don't know, it is not returning any results. What am I doing wrong?

I'm not sure where the 30 minutes is coming from. The logic for an overlap would be:
SELECT a.process_name,
SUM(b.cost) as total_cost
FROM A a JOIN
B b
ON b.usage_start_time < a.timestamp_end AND
b.ussage_end_time >= a.timestamp_init
GROUP BY a.process_name

Related

SQL query for getting data for the last 6 months grouped by month?

I know a basic query to get some results for the last 6 months. Let's say like this:
SELECT *
FROM RANDOM_TABLE
WHERE Date_Column >= DATEADD(MONTH, -6, GETDATE())
But what if I'd like to get results grouped by month - each month looking back 6 months into the past?
The first three rows of a result could ideally look like this (count of IDs is random):
Month_and_year
COUNT(ID)
January 2017
120
February 2017
160
March 2017
240
The last three rows:
Month_and_year
COUNT(ID)
November 2021
80
December 2021
350
January 2021
260
Hope it's understandable.
Thanks in advance!

EDIT:
Over the hours I made a few corrections. Most notably I corrected the self join query to reflect my intentions and also added more details to better explain what is going on.
To my knowledge there are two ways about it (which are probably the same under the hood).
Also, please note that these solutions assume you have a month field already in place. If you have a date or timestamp field, you should take one extra preparation step.
[Addendum] To be more precise, I'd say that the ideal would be to have a date/timestamp field that is truncated/flattened to the first day of the month.
As an example,
month
amount
2021-01-01
50
2021-02-01
20
2021-03-01
10
2021-04-01
100
2021-05-01
20
2021-06-01
40
2021-07-01
80
2021-08-01
50
The first is to use a "self-non-equi join"
SELECT
a.month,
SUM(b.amount) AS amount_over_6_months
FROM table AS a
INNER JOIN table AS b ON a.month BETWEEN b.month AND DATEADD(MONTH, 5, b.month)
WHERE a.month >= DATEADD(MONTH, -5, GETDATE())
GROUP BY a.month
What happens here is that you are joining the table with itself. Specifically, for each row in the (a) alias, you will join six rows from the (b) alias. For each row you will join the rows where the month is equal, all the way back to five months prior. So...
a.month
b.month
a.amount
b.amount
2021-01-01
2021-01-01
50
50
2021-02-01
2021-01-01
20
50
2021-02-01
2021-02-01
20
20
2021-03-01
2021-01-01
10
50
2021-03-01
2021-02-01
10
20
2021-03-01
2021-03-01
10
10
2021-04-01
2021-01-01
100
50
2021-04-01
2021-02-01
100
20
2021-04-01
2021-03-01
100
10
2021-04-01
2021-04-01
100
100
2021-05-01
2021-01-01
20
50
2021-05-01
2021-02-01
20
20
2021-05-01
2021-03-01
20
10
2021-05-01
2021-04-01
20
100
2021-05-01
2021-05-01
20
20
2021-06-01
2021-01-01
40
50
2021-06-01
2021-02-01
40
20
2021-06-01
2021-03-01
40
10
2021-06-01
2021-04-01
40
100
2021-06-01
2021-05-01
40
20
2021-06-01
2021-06-01
40
40
2021-07-01
2021-02-01
80
20
2021-07-01
2021-03-01
80
10
2021-07-01
2021-04-01
80
100
2021-07-01
2021-05-01
80
20
2021-07-01
2021-06-01
80
40
2021-07-01
2021-07-01
80
80
...
...
...
...
Then it's just a matter of grouping based on the month in the (a) alias, and summing the amounts coming from the (b) alias.
The advantage of this approach is that it should be vendor and generation agnostic, save the DATEADD() fucuntion.
The second solution would be to use window functions. I cannot comment on whether this would work with your vendor and the specific version.
SELECT
month,
SUM(amount) OVER (ORDER BY month ROWS BETWEEN 5 PRECEDING AND CURRENT ROW)
FROM table

Google Bigquery - Create time series of number of active records

I'm trying to create a timeseries in google bigquery SQL. My data is a series of time ranges covering the period of activity for that record. Here is an example:
Start End
2020-11-01 21:04:00 UTC 2020-11-02 07:15:00 UTC
2020-11-01 21:45:00 UTC 2020-11-02 04:00:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 09:48:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 06:00:00 UTC
I wish to create a new table to total the number of active records within a 15 minute block. "21:00:00" would for example be 21:00 to 21:14.59. My desired output for the above would be:
Period Active_Records
2020-11-01 21:00:00 1
2020-11-01 21:15:00 1
2020-11-01 21:30:00 1
2020-11-01 21:45:00 2
2020-11-01 22:00:00 4
2020-11-01 22:15:00 4
etc until the end of the last active range.
I would also like to be able to generate this on the fly by querying a date range and having it return every 15 minute block in the range and how many active records there was in that period.
Any assistance would be greatly appreciated.

Below is for BigQuery Standard SQL
#standardSQL
select ts as period, count(1) as Active_Records
from unnest((
select generate_timestamp_array(timestamp_trunc(min(start), hour), max(`end`), interval 15 minute)
from `project.dataset.table`
)) ts
join `project.dataset.table`
on not (`end` < ts or start > timestamp_add(ts, interval 15 * 60 - 1 second))
group by ts
if to apply to sample data from your question - output is

Multiplying a timestamp data for several times in BigQuery [duplicate]

This question already has answers here:
Is there a SQL function to expand table?
(4 answers)
Closed 3 years ago.
I have a time-series starting from 2017-01-01 00:00:00 to the end of 2017-12-31 23:00:00 for 1-hour interval. I need to duplicate this 1-year timestamp for 2400 times in the same column. I need help about this one..
Row Date_time
1 2017-01-01 00:00:00 UTC
2 2017-01-01 01:00:00 UTC
3 2017-01-01 02:00:00 UTC
4 2017-01-01 03:00:00 UTC
5 2017-01-01 04:00:00 UTC
6 2017-01-01 05:00:00 UTC
7 2017-01-01 06:00:00 UTC
8 2017-01-01 07:00:00 UTC
...........................
...........................

You would do this in BigQuery by generating a timestamp array and then unnesting:
select ts
from unnest(generate_timestamp_array('2017-01-01 00:00:00', '2017-12-31 23:00:00', interval 1 hour)) ts
You can then get multiple rows with a similar construct:
select ts
from unnest(generate_timestamp_array('2017-01-01 00:00:00', '2017-12-31 23:00:00', interval 1 hour)
) ts cross join
unnest(generate_series(1, 2400)) n

BigQuery - A way to generate timestamps based on hour/minute/seconds?

Is there a way to generate sequential timestamps in BigQuery that is focused on hours, minutes, and seconds?
In BigQuery you can generate sequential dates by:
select *
FROM UNNEST(GENERATE_DATE_ARRAY('2016-10-18', '2016-10-19', INTERVAL 1 DAY)) as day
This will generate the dates from 2016-10-18 to 2016-10-19 in date intervals
Row day
1 2016-10-18
2 2016-10-19
But let's say I want intervals in 15 minutes or 5 minutes, is there a way to do that?

First, I would recommend "starring" the feature request for GENERATE_TIMESTAMP_ARRAY to express interest in having a function like this. Given GENERATE_ARRAY, though, the best option currently is to use a query of this form:
SELECT TIMESTAMP_ADD('2018-04-01', INTERVAL 15 * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, 13)) AS x;
If you want a minute-based GENERATE_TIMESTAMP_ARRAY equivalent, you can use a UDF like this:
CREATE TEMP FUNCTION GenerateMinuteTimestampArray(
t0 TIMESTAMP, t1 TIMESTAMP, minutes INT64) AS (
ARRAY(
SELECT TIMESTAMP_ADD(t0, INTERVAL minutes * x MINUTE)
FROM UNNEST(GENERATE_ARRAY(0, TIMESTAMP_DIFF(t1, t0, MINUTE))) AS x
)
);
SELECT ts
FROM UNNEST(GenerateMinuteTimestampArray('2018-04-01', '2018-04-01 12:00:00', 15)) AS ts;
This returns a timestamp for each 15-minute interval between midnight and 12 PM on April 1.
Update: You can now use the GENERATE_TIMESTAMP_ARRAY function in BigQuery. If you want to generate timestamps at intervals of 15 minutes, for example, you can use:
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-18', '2016-10-19', INTERVAL 15 MINUTE);

Epochs seems like the way to go.
But requires to convert date to epoch first.
select TIMESTAMP_MICROS(CAST(day * 1000000 as INT64))
FROM UNNEST(GENERATE_ARRAY(1522540800, 1525132799, 900)) as day
Row f0_
1 2018-04-01 00:00:00.000 UTC
2 2018-04-01 00:15:00.000 UTC
3 2018-04-01 00:30:00.000 UTC
4 2018-04-01 00:45:00.000 UTC
5 2018-04-01 01:00:00.000 UTC
6 2018-04-01 01:15:00.000 UTC
7 2018-04-01 01:30:00.000 UTC
8 2018-04-01 01:45:00.000 UTC
9 2018-04-01 02:00:00.000 UTC
10 2018-04-01 02:15:00.000 UTC
11 2018-04-01 02:30:00.000 UTC
12 2018-04-01 02:45:00.000 UTC
13 2018-04-01 03:00:00.000 UTC

Total time calculation in a sql query for a day where time in 24 hour format as hhmm

I have a table with date(date), left time(varchar2(4)) and arrival time(varchar2(4)). Time taken is in 24 hour format as hhmm. If a person travel 3 times a day, what will be the query to calculate total travel time in a day?
I am using oracle 11g. Kindly help. Thank you.

Convert the value to a number and report in minutes:
select to_number(substring(time, 1, 2))*60 + to_number(substring(time, 3, 2)) as minutes
Your query would look something like:
select person, sum(to_number(substring(time, 1, 2))*60 + to_number(substring(time, 3, 2))) as minutes
from t
group by person;
I see no reason to convert this back to a string -- or to even store the value as a string instead of as a number. But if you need to, you can reverse the process to get a string.

There are 2 answers, If you want to sum time only on date then it can be done as:-
select curr_date,
sum(24 * (to_date(arrival_time, 'HH24:mi:ss')- to_date(left_time, 'HH24:mi:ss'))) as difference
from sql_prac group by curr_date,arrival_time,left_time;
The sample output is as follows:-
select curr_date,left_time,arrival_time from sql_prac;
CURR_DATE LEFT_TIME ARRIVAL_TIME
--------- -------------------- --------------------
30-JUN-17 00:00:00 15:00:00
30-JUL-17 03:30:00 11:30:00
30-AUG-17 03:00:00 12:30:00
30-SEP-17 04:00:00 17:00:00
30-JUN-17 00:00:00 15:00:00
30-JUL-17 03:30:00 11:30:00
30-AUG-17 03:00:00 12:30:00
30-SEP-17 04:00:00 17:00:00
30-SEP-17 04:00:00 17:00:00
9 rows selected
select curr_date,sum(24 * (to_date(arrival_time, 'HH24:mi:ss')- to_date(left_time, 'HH24:mi:ss'))) as difference
from sql_prac group by curr_date,arrival_time,left_time;
CURR_DATE DIFFERENCE
--------- ----------
30-JUN-17 30
30-JUL-17 16
30-SEP-17 39
30-AUG-17 19

If you want to sum it by person and date then it can be done as:-
select dept,curr_date,sum(24 * (to_date(arrival_time, 'HH24:mi:ss')- to_date(left_time, 'HH24:mi:ss'))) as difference
from sql_prac group by dept,curr_date,arrival_time,left_time order by Dept;
The sample output is as follows:-
Data in table is:-
select dept,curr_date,left_time,arrival_time from sql_prac;
DEPT CURR_DATE LEFT_TIME ARRIVAL_TIME
-------------------- --------- -------------------- --------------------
A 30-SEP-17 04:00:00 17:00:00
B 30-SEP-17 04:00:00 17:00:00
C 30-AUG-17 03:00:00 12:30:00
D 30-DEC-17 04:00:00 17:00:00
A 30-SEP-17 04:00:00 17:00:00
B 30-JUL-17 03:30:00 11:30:00
C 30-AUG-17 03:00:00 12:30:00
D 30-SEP-17 04:00:00 17:00:00
R 30-SEP-17 04:00:00 17:00:00
Data fetched using the query
select dept,curr_date,sum(24 * (to_date(arrival_time, 'HH24:mi:ss')- to_date(left_time, 'HH24:mi:ss'))) as difference
from sql_prac group by dept,curr_date,arrival_time,left_time order by Dept;
DEPT CURR_DATE DIFFERENCE
-------------------- --------- ----------
A 30-SEP-17 26
B 30-JUL-17 8
B 30-SEP-17 13
C 30-AUG-17 19
D 30-SEP-17 13
D 30-DEC-17 13
R 30-SEP-17 13

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Having trouble joining information from two tables using timestamps - sql

I'm not sure where the 30 minutes is coming from. The logic for an overlap would be: SELECT a.process_name, SUM(b.cost) as total_cost FROM A a JOIN B b ON b.usage_start_time < a.timestamp_end AND b.ussage_end_time >= a.timestamp_init GROUP BY a.process_name

Related

SQL query for getting data for the last 6 months grouped by month?

Google Bigquery - Create time series of number of active records

Multiplying a timestamp data for several times in BigQuery [duplicate]

BigQuery - A way to generate timestamps based on hour/minute/seconds?

Total time calculation in a sql query for a day where time in 24 hour format as hhmm

Categories

Resources