Group By Timestamp_Trunc including empty rows with '0' count

Group By Timestamp_Trunc including empty rows with '0' count - google-bigquery

I have a BigQuery query as follows:
SELECT
timestamp_trunc(timestamp,
hour) hour,
statusCode,
CAST(AVG(durationMs) as integer) averageDurationMs,
COUNT(*) count
FROM
`project.dataset.table`
WHERE timestamp > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
GROUP BY
hour,
statusCode
And it works great, returning results like this:
However, my charting component needs empty rows for empty 'hours' (e.g. 18:00 should be 0, 19:00 = 0 etc)
Is there an elegant way to do this in BigQuery SQL or do I have to do it in code before returning to my UI?

Try generating array of hours needed cross joining it with all the status codes and left joining with your results:
with mytable as (
select timestamp '2021-10-18 19:00:00' as hour, 200 as statusCode, 1234 as averageDurationMs, 25 as count union all
select '2021-10-18 21:00:00', 500, 4978, 6015 union all
select '2021-10-18 21:00:00', 404, 4987, 5984 union all
select '2021-10-18 21:00:00', 200, 5048, 11971 union all
select '2021-10-18 21:00:00', 401, 4976, 6030
)
select myhour, allCodes.statusCode, IFNULL(mytable.averageDurationMs, 0) as statusCode, IFNULL(mytable.count, 0) as averageDurationMs
from
UNNEST(GENERATE_TIMESTAMP_ARRAY(TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), HOUR), INTERVAL 23 HOUR), TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), HOUR), INTERVAL 1 HOUR)) as myhour
CROSS JOIN
(SELECT DISTINCT statusCode FROM mytable) as allCodes
LEFT JOIN mytable ON myHour = mytable.hour AND allCodes.statusCode = mytable.statusCode

Related

Get the last time before the call - SQL Query

I have here a set of 2 tables that I need to bash.
First table is the list of time and date the customer contacted us, its not unique.
The next table is the escalated call they made to us.
What I need to do is to show the date and time before the escalated call.
I can do simple left join based on customer ID, but having issue on the getting the last call.
Hope that I can get answers + explanation that I can use moving forward.
Here's my code so far:
Select a.customer id, a.contact_time, b.date of contact time as last_contact
from escalated_call a
left join all calls b on a.customer id = b.customer ID

Just Use a Where Clause
Select a.customerid,
a.contact_time,
b.DateOfContactTime as last_contact
from escalated_call AS a LEFT JOIN Calls AS b on a.customerID = b.customerID
WHERE a.contact_time < b.DateOfContactTime

You just need an aggregate max here, you can also do it with a correlated subquery but it’s probably not worth it.
You may need to correct your column names, I’ve just guessed you have underscored instead of the spaces
Select a.customer_id
,a.contact_time
,max(b.date_of_contact_time) as last_contact
from escalated_call a
left join all_calls b
on a.customer_id = b.customer_ID
Group by a.customer_id, a.contact_time

From Oracle 12, you can use a LATERAL join and return the FIRST ROW ONLY:
SELECT ec.*, ac.dt
FROM escalated_calls ec
LEFT OUTER JOIN LATERAL (
SELECT ac.dt
FROM all_calls ac
WHERE ac.customer_id = ec.customer_id
AND ac.dt <= ec.ct
ORDER BY ac.dt DESC
FETCH FIRST ROW ONLY
) ac
ON (1 = 1)
Which, for the sample data:
CREATE TABLE all_calls(customer_id, dt) AS
SELECT 1, DATE '2019-12-24' + INTERVAL '00:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 1, DATE '2019-12-24' + INTERVAL '00:15' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 1, DATE '2019-12-24' + INTERVAL '00:35' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 1, DATE '2019-12-24' + INTERVAL '01:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, DATE '2019-12-24' + INTERVAL '00:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, DATE '2019-12-24' + INTERVAL '00:15' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, DATE '2019-12-24' + INTERVAL '00:35' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, DATE '2019-12-24' + INTERVAL '01:00' HOUR TO MINUTE FROM DUAL;
CREATE TABLE escalated_calls (customer_id, ct) AS
SELECT 1, DATE '2019-12-24' + INTERVAL '00:45' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, DATE '2019-12-24' + INTERVAL '00:05' HOUR TO MINUTE FROM DUAL;
Outputs:
CUSTOMER_ID
CT
DT
1
2019-12-24 00:45:00
2019-12-24 00:35:00
2
2019-12-24 00:05:00
2019-12-24 00:00:00
db<>fiddle here

You can also use a subquery in the select clause to solve this problem.
SELECT e.*
, ( SELECT max(a.Date_Of_Contact_Time)
FROM all_calls a
WHERE a.customer_id = e.customer_id
AND a.Date_Of_Contact_Time <= e.contact_time
) AS correct_answer
FROM escalated_calls e
;

calculate each in and out duration for an id

I've data like below
Id
date
time
type
1
01-01-2022
08:00
in
1
01-01-2022
11:30
out
1
01-01-2022
11:35
out
1
01-01-2022
12:45
in
1
01-01-2022
17:30
out
1
01-01-2022
01:00
out
expected output :
Id
start
end
totaltime
1
08:00
11:35
03:35:00
1
12:45
17:30
04:45:00
where date is of DATE and time is of VARCHAR type columns. I want to calculate, all in/out duration for an Id. I am not able to think logic how get duration for all in/out duration for this.
Can anyone suggest any ideas ?

It's a typical gaps-and-islands type of question, and one option to resolve would be conditionally using SUM() OVER () analytic function such as
WITH t1 AS
(
SELECT t.*,
TO_TIMESTAMP(TO_CHAR("date",'yyyy-mm-dd ')||time,'yyyy-mm-dd hh24:mi:ss') AS dt
FROM t --> your table
), t2 AS
(
SELECT t1.*,
SUM(CASE WHEN type = 'in' THEN 1 ELSE 0 END) OVER (PARTITION BY id ORDER BY dt) AS rn_in,
SUM(CASE WHEN type = 'out' THEN 1 ELSE 0 END) OVER (PARTITION BY id ORDER BY dt) AS rn_out,
CASE WHEN type = 'in' THEN dt END AS "in",
CASE WHEN type = 'out' THEN dt END AS "out"
FROM t1
)
SELECT id, MIN("in") AS "start", MAX("out") AS "end", MAX("out")-MIN("in") AS "totaltime"
FROM t2
WHERE rn_in >= 1
GROUP BY id, rn_in
ORDER BY "start"
where "date" is considered to be a date type column, not an ordinary string as commented.
Demo

In Oracle, a DATE data type is a binary data type consisting of 7-bytes representing: century, year-of-century, month, day, hour, minute and second and it ALWAYS has those components. Given that, there is no point in having separate date and time columns and you can combine your two columns into one and have the sample data:
CREATE TABLE table_name (Id, datetime, type) AS
SELECT 1, DATE '2022-01-01' + INTERVAL '08:00' HOUR TO MINUTE, 'in' FROM DUAL UNION ALL
SELECT 1, DATE '2022-01-01' + INTERVAL '11:30' HOUR TO MINUTE, 'out' FROM DUAL UNION ALL
SELECT 1, DATE '2022-01-01' + INTERVAL '11:35' HOUR TO MINUTE, 'out' FROM DUAL UNION ALL
SELECT 1, DATE '2022-01-01' + INTERVAL '12:45' HOUR TO MINUTE, 'in' FROM DUAL UNION ALL
SELECT 1, DATE '2022-01-01' + INTERVAL '17:30' HOUR TO MINUTE, 'out' FROM DUAL UNION ALL
SELECT 1, DATE '2022-01-01' + INTERVAL '01:00' HOUR TO MINUTE, 'out' FROM DUAL;
Note: Another option to create the sample data is to use TO_DATE('2022-01-01 12:45', 'YYYY-MM-DD HH24:MI').
Then, from Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row operations on the data:
SELECT m.*,
(end_dt - start_dt) DAY TO SECOND AS duration
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY id
ORDER BY datetime
MEASURES
FIRST(ins.datetime) AS start_dt,
LAST(outs.datetime) AS end_dt
PATTERN (ins+ outs+)
DEFINE
ins AS type = 'in',
outs AS type = 'out'
) m
Which outputs:
ID
START_DT
END_DT
DURATION
1
2022-01-01 08:00:00
2022-01-01 11:35:00
+00 03:35:00.000000
1
2022-01-01 12:45:00
2022-01-01 17:30:00
+00 04:45:00.000000
If you do keep separate date and time columns (you should not) then you can combine them into a single column before using MATCH_RECOGNIZE:
SELECT m.*,
(end_dt - start_dt) DAY TO SECOND AS duration
FROM (
SELECT id,
TO_DATE(TO_CHAR("DATE", 'YYYY-MM-DD') || time, 'YYYY-MM-DDHH24:MI') AS datetime,
type
FROM table_name
)
MATCH_RECOGNIZE (
PARTITION BY id
ORDER BY datetime
MEASURES
FIRST(ins.datetime) AS start_dt,
LAST(outs.datetime) AS end_dt
PATTERN (ins+ outs+)
DEFINE
ins AS type = 'in',
outs AS type = 'out'
) m
db<>fiddle here

By one comment I assumed that every 'in' record got its matching 'out' record on the same day, so I solved it by getting the next 'in' on a day and finding max 'out' in between:
with t1 as
(select ID, "date" , "time" as "start", LEAD("time",1,'24:00') over(partition by id, "date" order by "time") as "next"
from test
where "type" = 'in'),
t3 as
(select ID, "start", (select max("time") from test t2
where t1.id = t2.id and t1."date" = t2."date"
and t2."type" = 'out' and t2."time" between t1."start"
and t1."next" ) as "end"
from t1)
select ID, "start", "end", (case when "end" is null then null else (TO_DSINTERVAL('0 '||"end"||':00') - TO_DSINTERVAL('0 '||"start"||':00')) end) as totaltime
from t3

Dynamic LAG function (Standard SQL, BigQuery). Is it possible?

I am trying hard to find a solution for that. I've attached an image with a overview about what I want too, but I will write here too.
In LAG function, is it possible to have a dynamic number in the syntax?
LAG(sessions, 3)
Instead of using 3, I need the number of column minutosdelift which is 3 in this example, but it will be different for each situation.
I've tried to use LAG(sessions, minutosdelift) but It is not possible. I've tried LAG(sessions, COUNT(minutosdelift)) and it is not possible either.
The final goal is to calculate the difference between 52 and 6. So, (52/6)-1 which gives me 767%. But to do it I need a dynamic number into LAG function (or another idea to do it).
I've tried using ROWS PRECEDING AND ROWS UNBOUNDED PRECEDING, but again it needs a literal number.
Please, any idea about how to do it? Thanks!
This screenshot might explain it:
enter image description here
My code: this is the last query I've tried, because I have 7 previous views
SELECT
DATE, HOUR, MINUTE, SESSIONS, PROGRAMA_2,
janela_lift_teste, soma_sessao_programa_2, minutosdelift,
CASE
WHEN minutosdelift != 0
THEN LAG(sessions, 3) OVER(ORDER BY DATE, HOUR, MINUTE ASC)
END AS lagtest,
CASE
WHEN programa_2 = "#N/A" OR programa_2 is null
THEN LAST_VALUE(sessions) OVER (PARTITION BY programa_2 ORDER BY DATE, HOUR, MINUTE ASC)
END AS firstvaluetest,
FROM
tbl8
GROUP BY
DATE, HOUR, MINUTE, SESSIONS, PROGRAMA_2,
janela_lift_teste, minutosdelift, soma_sessao_programa_2
ORDER BY
DATE, HOUR, MINUTE ASC

In BigQuery (as in some other databases), the argument to lag() has to be a constant.
One method to get around this uses a self join. I find it hard to follow your query, but the idea is:
with tt as (
select row_number() over (order by sessions) as seqnum,
t.*
from t
)
select t.*, tprev.*
from t join
t tprev
on tprev.seqnum = t.seqnum - minutosdelift;

Consider below example - hope you can apply this approach to your use case
#standardSQL
with `project.dataset.table` as (
select 1 session, timestamp '2021-01-01 00:01:00' ts, 10 minutosdelift union all
select 2, '2021-01-01 00:02:00', 1 union all
select 3, '2021-01-01 00:03:00', 2 union all
select 4, '2021-01-01 00:04:00', 3 union all
select 5, '2021-01-01 00:05:00', 4 union all
select 6, '2021-01-01 00:06:00', 5 union all
select 7, '2021-01-01 00:07:00', 3 union all
select 8, '2021-01-01 00:08:00', 1 union all
select 9, '2021-01-01 00:09:00', 2 union all
select 10, '2021-01-01 00:10:00', 8 union all
select 11, '2021-01-01 00:11:00', 6 union all
select 12, '2021-01-01 00:12:00', 4 union all
select 13, '2021-01-01 00:13:00', 2 union all
select 14, '2021-01-01 00:14:00', 1 union all
select 15, '2021-01-01 00:15:00', 11 union all
select 16, '2021-01-01 00:16:00', 1 union all
select 17, '2021-01-01 00:17:00', 8
)
select a.*, b.session as lagtest
from `project.dataset.table` a
left join `project.dataset.table` b
on b.ts = timestamp_sub(a.ts, interval a.minutosdelift minute)
with output

use generate series

Im writing a psql procedure to read source table, then agregate and write in aggregate table.
My table source contains 2 columns beg, and end refers to client connection to the website, and client disconnect.
I want to caculate for each client the time that he spends . The purpose to use generate series is when the event is over one day.
My pseudo code is below
execute $$SELECT MAX(date_) FROM $$||aggregate_table INTO max_date;
IF max_date is not NULL THEN
execute $$DELETE FROM $$||aggregate_table||$$ WHERE date_ >= $$||quote_literal(max_date);
ELSE
max_date := 'XXXXXXX';
end if;
SELECT * from (
select
Id, gs.due_date,
(case
When TRIM(set) ~ '^OPT[0-9]{3}/MINUTE/$'
Then 'minute'
When TRIM(set) ~ '^OPT[0-9]{3}/SECOND/$'
Then 'second'
as TIME,
sum(extract(epoch from (least(s.end, gs.date_ + interval '1 day') -
greatest(s.beg, gs.date_)
)
) / 60) as Timing
from source s cross join lateral
generate_series(date_trunc(‘day’, s.beg), date_trunc('day',
least(s.end,
CASE WHEN $$||quote_literal(max_date)||$$ = ‘XXXXXXX’
THEN (current_date)
ELSE $$||quote_literal(max_date)||$$
END)
), interval '1 day’) gs(date_)
where ( (beg, end) overlaps ($$||quote_literal(max_date)||$$'00:00:00', $$||quote_literal(max_date)||$$'23:59:59’))
group by id, gs.date_, TIME
) as X
where ($$||quote_literal(max_date)||$$ = X.date_ and $$||quote_literal(max_date)||$$ != ‘XXXXXXX’)
OR ($$||quote_literal(max_date)||$$ ='XXXXXXX')
Data of table source
number, beg, end, id, set
(10, '2019-10-25 13:00:00', '2019-10-25 13:30:00', 1234, 'OPT111/MINUTE/'),
(11, '2019-10-25 13:00:00', '2019-10-25 14:00:00', 1234, 'OPT111/MINUTE/'),
(12, '2019-11-04 09:19:00', '2019-11-04 09:29:00', 1124, 'OPT111/SECOND/'),
(13, '2019-11-04 22:00:00', '2019-11-05 02:00:00', 1124, 'OPT111/MINUTE/')
Expected_output agregate table
2019-10-25, 1234, MINUTE, 90(1h30)
2019-11-04, 1124, SECOND, 10
2019-11-04, 1124, MINUTE, 120
2019-11-05, 1124, MINUTE, 120
The problem of my code is that, it diesn't work if i have new row that will be added tomorrow with for example (14, '2019-11-06 12:00:00', '2019-11-06 13:00:00', 1124, 'OPT111/MINUTE/').
Please guys who can help?
thank you

Here is my solution. I have changed column names in order to avoid reserved words. You may need to touch the formatting of duration.
with mycte as
(
select -- the first / first and only days
id, col_beg,
case when col_beg::date = col_end::date then col_end else date_trunc('day', col_end) end as col_end
from mytable
union all
select -- the last days of multi-day periods
id, date_trunc('day', col_end) as col_beg, col_end
from mytable
where col_end::date > col_beg::date
union all
select -- the middle days of multi-day periods
id, rd as col_beg, rd::date + 1 as col_end
from mytable
cross join lateral generate_series(col_beg::date + 1, col_end::date - 1, interval '1 day') g(rd)
where col_end::date > col_beg::date + 1
)
select
col_beg::date as start_time, id, sum(col_end - col_beg) as duration
from mycte group by 1, 2 order by 1;

SQL: Return '0' for a row if it doesn't exist

I have a SQL query which displays count, date, and time.
This is what the output looks like:
And this is my SQL query:
select
count(*),
to_char(timestamp, 'MM/DD/YYYY'),
to_char(timestamp, 'HH24')
from
MY_TABLE
where
timestamp >= to_timestamp('03/01/2016','MM/DD/YYYY')
group by
to_char(timestamp, 'MM/DD/YYYY'), to_char(timestamp, 'HH24')
Now, in COUNT column, I want to display 0 if the count doesn't exist for that hour. So on 3/2/2016 at 8am, the count was 6. Then at 9am the count was 0 so that row didn't get displayed. I want to display that row. And at 10am & 11am, the counts are displayed then it just goes to next day.
So how do I display count of 0? I want to display 0 count for each day every hour doesn't matter if it's 0 or 6 or whatever. Thanks :)

Use a partition outer join:
SELECT m.day,
h.hr,
COALESCE( freq, 0 ) AS freq
FROM ( SELECT LEVEL - 1 AS hr
FROM DUAL
CONNECT BY LEVEL <= 24
) h
LEFT OUTER JOIN
( SELECT COUNT(*) AS freq,
TO_CHAR( "timestamp", 'mm/dd/yyyy' ) AS day,
EXTRACT( HOUR FROM "timestamp" ) AS hr
FROM MY_TABLE
WHERE "timestamp" >= TIMESTAMP '2016-03-01 00:00:00'
GROUP BY
TO_CHAR( "timestamp", 'mm/dd/yyyy' ),
EXTRACT( HOUR FROM "timestamp" )
) m
PARTITION BY ( m.day, m.hr )
ON ( m.hr = h.hr );

Use a cte to generate numbers for all the hours in a day. Then cross join the result with all the possible dates from the table. Then left join on the cte which has all date and hour combinations, to get a 0 count when a row is absent for a particular hour.
with nums(n) as (select 1 from dual
union all
select n+1 from nums where n < 24)
,dateshrscomb as (select n,dt
from nums
cross join (select distinct trunc(timestamp) dt from my_table
where timestamp >= to_timestamp('03/01/2016','MM/DD/YYYY')
) alldates
)
select count(trunc(m.timestamp)), d.dt, d.n
from dateshrscomb d
left join MY_TABLE m on to_char(m.timestamp, 'HH24') = d.n
and trunc(m.timestamp) = d.dt
and m.timestamp >= to_timestamp('03/01/2016','MM/DD/YYYY')
group by d.dt, d.n

with cteHours(h) as (select 0 from dual
union all
select h+1 from cteHours where h < 24)
, cteDates(d) AS (
SELECT
trunc(MIN(timestamp)) as d
FROM
My_Table
WHERE
timestamp >= to_timestamp('03/01/2016','MM/DD/YYYY')
UNION ALL
SELECT
d + 1 as d
FROM
cteDates
WHERE
d + 1 <= (SELECT trunc(MAX(timestamp)) FROM MY_TABLE)
)
, datesNumsCross (d,h) AS (
SELECT
d, h
FROM
cteDates
CROSS JOIN cteHours
)
select count(*), to_char(d.d, 'MM/DD/YYYY'), d.h
from datesNumsCross d
LEFT JOIN MY_TABLE m
ON d.d = trunc(m.timestamp)
AND d.h = to_char(m.timestamp, 'HH24')
group by d.d, d.h
#VPK is doing a good job at answering, I just happened to be writing this at the same time as his last edit to generate a date hour cross join. This solution differs from his in that it will get all dates between your desired max and min. Where as his will get only the dates within the table so if you have a day missing completely it would not be represented in his but would in this one. Plus I did a little clean up on the joins.

Here is one way to do that.
Using Oracle's hierarchical query feature and level psuedo column, generate the dates and hours.
Then do an outer join of above with your data.
Need to adjust the value of level depending upon your desired range (This example uses 120). Start date needs to be set as well. It is ( trunc(sysdate, 'hh24')-2/24 ) in this example.
select nvl(c1.cnt, 0), d1.date_part, d1.hour_part
from
(
select
to_char(s.dt - (c.lev)/24, 'mm/dd/yyyy') date_part,
to_char(s.dt - (c.lev)/24, 'hh24') hour_part
from
(select level lev from dual connect by level <= 120) c,
(select trunc(sysdate, 'hh24')-2/24 dt from dual) s
where (s.dt - (c.lev)/24) < trunc(sysdate, 'hh24')-2/24
) d1
full outer join
(
select
count(*) cnt,
to_char(timestamp, 'MM/DD/YYYY') date_part,
to_char(timestamp, 'HH24') hour_part
from
MY_TABLE
where
timestamp >= to_timestamp('03/01/2016','MM/DD/YYYY')
group by
to_char(timestamp, 'MM/DD/YYYY'), to_char(timestamp, 'HH24')
) c1
on d1.date_part = c1.date_part
and d1.hour_part = c1.hour_part

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Group By Timestamp_Trunc including empty rows with '0' count - google-bigquery

Related

Get the last time before the call - SQL Query

calculate each in and out duration for an id

Dynamic LAG function (Standard SQL, BigQuery). Is it possible?

use generate series

SQL: Return '0' for a row if it doesn't exist

Categories

Resources