Dynamic LAG function (Standard SQL, BigQuery). Is it possible? - sql

I am trying hard to find a solution for that. I've attached an image with a overview about what I want too, but I will write here too.
In LAG function, is it possible to have a dynamic number in the syntax?
LAG(sessions, 3)
Instead of using 3, I need the number of column minutosdelift which is 3 in this example, but it will be different for each situation.
I've tried to use LAG(sessions, minutosdelift) but It is not possible. I've tried LAG(sessions, COUNT(minutosdelift)) and it is not possible either.
The final goal is to calculate the difference between 52 and 6. So, (52/6)-1 which gives me 767%. But to do it I need a dynamic number into LAG function (or another idea to do it).
I've tried using ROWS PRECEDING AND ROWS UNBOUNDED PRECEDING, but again it needs a literal number.
Please, any idea about how to do it? Thanks!
This screenshot might explain it:
enter image description here
My code: this is the last query I've tried, because I have 7 previous views
SELECT
DATE, HOUR, MINUTE, SESSIONS, PROGRAMA_2,
janela_lift_teste, soma_sessao_programa_2, minutosdelift,
CASE
WHEN minutosdelift != 0
THEN LAG(sessions, 3) OVER(ORDER BY DATE, HOUR, MINUTE ASC)
END AS lagtest,
CASE
WHEN programa_2 = "#N/A" OR programa_2 is null
THEN LAST_VALUE(sessions) OVER (PARTITION BY programa_2 ORDER BY DATE, HOUR, MINUTE ASC)
END AS firstvaluetest,
FROM
tbl8
GROUP BY
DATE, HOUR, MINUTE, SESSIONS, PROGRAMA_2,
janela_lift_teste, minutosdelift, soma_sessao_programa_2
ORDER BY
DATE, HOUR, MINUTE ASC

In BigQuery (as in some other databases), the argument to lag() has to be a constant.
One method to get around this uses a self join. I find it hard to follow your query, but the idea is:
with tt as (
select row_number() over (order by sessions) as seqnum,
t.*
from t
)
select t.*, tprev.*
from t join
t tprev
on tprev.seqnum = t.seqnum - minutosdelift;

Consider below example - hope you can apply this approach to your use case
#standardSQL
with `project.dataset.table` as (
select 1 session, timestamp '2021-01-01 00:01:00' ts, 10 minutosdelift union all
select 2, '2021-01-01 00:02:00', 1 union all
select 3, '2021-01-01 00:03:00', 2 union all
select 4, '2021-01-01 00:04:00', 3 union all
select 5, '2021-01-01 00:05:00', 4 union all
select 6, '2021-01-01 00:06:00', 5 union all
select 7, '2021-01-01 00:07:00', 3 union all
select 8, '2021-01-01 00:08:00', 1 union all
select 9, '2021-01-01 00:09:00', 2 union all
select 10, '2021-01-01 00:10:00', 8 union all
select 11, '2021-01-01 00:11:00', 6 union all
select 12, '2021-01-01 00:12:00', 4 union all
select 13, '2021-01-01 00:13:00', 2 union all
select 14, '2021-01-01 00:14:00', 1 union all
select 15, '2021-01-01 00:15:00', 11 union all
select 16, '2021-01-01 00:16:00', 1 union all
select 17, '2021-01-01 00:17:00', 8
)
select a.*, b.session as lagtest
from `project.dataset.table` a
left join `project.dataset.table` b
on b.ts = timestamp_sub(a.ts, interval a.minutosdelift minute)
with output

Related

Get the last time before the call - SQL Query

I have here a set of 2 tables that I need to bash.
First table is the list of time and date the customer contacted us, its not unique.
The next table is the escalated call they made to us.
What I need to do is to show the date and time before the escalated call.
I can do simple left join based on customer ID, but having issue on the getting the last call.
Hope that I can get answers + explanation that I can use moving forward.
Here's my code so far:
Select a.customer id, a.contact_time, b.date of contact time as last_contact
from escalated_call a
left join all calls b on a.customer id = b.customer ID
Just Use a Where Clause
Select a.customerid,
a.contact_time,
b.DateOfContactTime as last_contact
from escalated_call AS a LEFT JOIN Calls AS b on a.customerID = b.customerID
WHERE a.contact_time < b.DateOfContactTime
You just need an aggregate max here, you can also do it with a correlated subquery but it’s probably not worth it.
You may need to correct your column names, I’ve just guessed you have underscored instead of the spaces
Select a.customer_id
,a.contact_time
,max(b.date_of_contact_time) as last_contact
from escalated_call a
left join all_calls b
on a.customer_id = b.customer_ID
Group by a.customer_id, a.contact_time
From Oracle 12, you can use a LATERAL join and return the FIRST ROW ONLY:
SELECT ec.*, ac.dt
FROM escalated_calls ec
LEFT OUTER JOIN LATERAL (
SELECT ac.dt
FROM all_calls ac
WHERE ac.customer_id = ec.customer_id
AND ac.dt <= ec.ct
ORDER BY ac.dt DESC
FETCH FIRST ROW ONLY
) ac
ON (1 = 1)
Which, for the sample data:
CREATE TABLE all_calls(customer_id, dt) AS
SELECT 1, DATE '2019-12-24' + INTERVAL '00:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 1, DATE '2019-12-24' + INTERVAL '00:15' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 1, DATE '2019-12-24' + INTERVAL '00:35' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 1, DATE '2019-12-24' + INTERVAL '01:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, DATE '2019-12-24' + INTERVAL '00:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, DATE '2019-12-24' + INTERVAL '00:15' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, DATE '2019-12-24' + INTERVAL '00:35' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, DATE '2019-12-24' + INTERVAL '01:00' HOUR TO MINUTE FROM DUAL;
CREATE TABLE escalated_calls (customer_id, ct) AS
SELECT 1, DATE '2019-12-24' + INTERVAL '00:45' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 2, DATE '2019-12-24' + INTERVAL '00:05' HOUR TO MINUTE FROM DUAL;
Outputs:
CUSTOMER_ID
CT
DT
1
2019-12-24 00:45:00
2019-12-24 00:35:00
2
2019-12-24 00:05:00
2019-12-24 00:00:00
db<>fiddle here
You can also use a subquery in the select clause to solve this problem.
SELECT e.*
, ( SELECT max(a.Date_Of_Contact_Time)
FROM all_calls a
WHERE a.customer_id = e.customer_id
AND a.Date_Of_Contact_Time <= e.contact_time
) AS correct_answer
FROM escalated_calls e
;

Oracle SQL query to select only records with date earliest on Department Table

If time difference for 'Login_Date' column with records for same Department is within 18 hours then pick only the record with earliest login date.
Below sample data:
Need query for below data:
From Oracle 12, you can use MATCH_RECOGNIZE to do row-by-row processing if you want to exclude all rows that are within 18 hours of the first row of the group:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY department
ORDER BY login_date
ALL ROWS PER MATCH
PATTERN (first_row {- within_18_hours* -} )
DEFINE
within_18_hours AS login_date <= first_row.login_date + INTERVAL '18' HOUR
)
Which, for the sample data:
CREATE TABLE table_name (record_id, department, "USER", login_date) AS
SELECT 1, 'IT', 'xujk', DATE '2022-01-10' + INTERVAL '10' HOUR FROM DUAL UNION ALL
SELECT 2, 'IT', 'jkl', DATE '2022-01-10' + INTERVAL '15' HOUR FROM DUAL UNION ALL
SELECT 3, 'IT', 'xujk', DATE '2022-01-12' + INTERVAL '11' HOUR FROM DUAL UNION ALL
SELECT 4, 'FINANCE', 'mno', DATE '2022-01-10' + INTERVAL '01' HOUR FROM DUAL UNION ALL
SELECT 5, 'FINANCE', 'abc', DATE '2022-01-12' + INTERVAL '15' HOUR FROM DUAL UNION ALL
SELECT 6, 'FINANCE', 'def', DATE '2022-01-12' + INTERVAL '20' HOUR FROM DUAL;
Outputs:
DEPARTMENT
LOGIN_DATE
RECORD_ID
USER
FINANCE
10-JAN-22
4
mno
FINANCE
12-JAN-22
5
abc
IT
10-JAN-22
1
xujk
IT
12-JAN-22
3
xujk
If you want to exclude rows that are within 18 hours of the previous row (and not necessarily within 18 hours of the earliest row of the group) then you can use:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY department
ORDER BY login_date
ALL ROWS PER MATCH
PATTERN (first_row {- within_18_hours* -} )
DEFINE
within_18_hours AS login_date <= PREV(login_date) + INTERVAL '18' HOUR
)
db<>fiddle here
Try this:
Select Record_ID,d.Department,User,d.Login_Date
from data d
inner join ( Select Department,min(Login_Date)
from data
where EXTRACT(HOUR FROM CAST(Login_Date AS timestamp)) <= 18
group by Department,trunc(Login_Date) ) as t
on d.Department= t.Department and d.Login_Date = t.Login_Date

Group By Timestamp_Trunc including empty rows with '0' count

I have a BigQuery query as follows:
SELECT
timestamp_trunc(timestamp,
hour) hour,
statusCode,
CAST(AVG(durationMs) as integer) averageDurationMs,
COUNT(*) count
FROM
`project.dataset.table`
WHERE timestamp > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
GROUP BY
hour,
statusCode
And it works great, returning results like this:
However, my charting component needs empty rows for empty 'hours' (e.g. 18:00 should be 0, 19:00 = 0 etc)
Is there an elegant way to do this in BigQuery SQL or do I have to do it in code before returning to my UI?
Try generating array of hours needed cross joining it with all the status codes and left joining with your results:
with mytable as (
select timestamp '2021-10-18 19:00:00' as hour, 200 as statusCode, 1234 as averageDurationMs, 25 as count union all
select '2021-10-18 21:00:00', 500, 4978, 6015 union all
select '2021-10-18 21:00:00', 404, 4987, 5984 union all
select '2021-10-18 21:00:00', 200, 5048, 11971 union all
select '2021-10-18 21:00:00', 401, 4976, 6030
)
select myhour, allCodes.statusCode, IFNULL(mytable.averageDurationMs, 0) as statusCode, IFNULL(mytable.count, 0) as averageDurationMs
from
UNNEST(GENERATE_TIMESTAMP_ARRAY(TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), HOUR), INTERVAL 23 HOUR), TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), HOUR), INTERVAL 1 HOUR)) as myhour
CROSS JOIN
(SELECT DISTINCT statusCode FROM mytable) as allCodes
LEFT JOIN mytable ON myHour = mytable.hour AND allCodes.statusCode = mytable.statusCode

use generate series

Im writing a psql procedure to read source table, then agregate and write in aggregate table.
My table source contains 2 columns beg, and end refers to client connection to the website, and client disconnect.
I want to caculate for each client the time that he spends . The purpose to use generate series is when the event is over one day.
My pseudo code is below
execute $$SELECT MAX(date_) FROM $$||aggregate_table INTO max_date;
IF max_date is not NULL THEN
execute $$DELETE FROM $$||aggregate_table||$$ WHERE date_ >= $$||quote_literal(max_date);
ELSE
max_date := 'XXXXXXX';
end if;
SELECT * from (
select
Id, gs.due_date,
(case
When TRIM(set) ~ '^OPT[0-9]{3}/MINUTE/$'
Then 'minute'
When TRIM(set) ~ '^OPT[0-9]{3}/SECOND/$'
Then 'second'
as TIME,
sum(extract(epoch from (least(s.end, gs.date_ + interval '1 day') -
greatest(s.beg, gs.date_)
)
) / 60) as Timing
from source s cross join lateral
generate_series(date_trunc(‘day’, s.beg), date_trunc('day',
least(s.end,
CASE WHEN $$||quote_literal(max_date)||$$ = ‘XXXXXXX’
THEN (current_date)
ELSE $$||quote_literal(max_date)||$$
END)
), interval '1 day’) gs(date_)
where ( (beg, end) overlaps ($$||quote_literal(max_date)||$$'00:00:00', $$||quote_literal(max_date)||$$'23:59:59’))
group by id, gs.date_, TIME
) as X
where ($$||quote_literal(max_date)||$$ = X.date_ and $$||quote_literal(max_date)||$$ != ‘XXXXXXX’)
OR ($$||quote_literal(max_date)||$$ ='XXXXXXX')
Data of table source
number, beg, end, id, set
(10, '2019-10-25 13:00:00', '2019-10-25 13:30:00', 1234, 'OPT111/MINUTE/'),
(11, '2019-10-25 13:00:00', '2019-10-25 14:00:00', 1234, 'OPT111/MINUTE/'),
(12, '2019-11-04 09:19:00', '2019-11-04 09:29:00', 1124, 'OPT111/SECOND/'),
(13, '2019-11-04 22:00:00', '2019-11-05 02:00:00', 1124, 'OPT111/MINUTE/')
Expected_output agregate table
2019-10-25, 1234, MINUTE, 90(1h30)
2019-11-04, 1124, SECOND, 10
2019-11-04, 1124, MINUTE, 120
2019-11-05, 1124, MINUTE, 120
The problem of my code is that, it diesn't work if i have new row that will be added tomorrow with for example (14, '2019-11-06 12:00:00', '2019-11-06 13:00:00', 1124, 'OPT111/MINUTE/').
Please guys who can help?
thank you
Here is my solution. I have changed column names in order to avoid reserved words. You may need to touch the formatting of duration.
with mycte as
(
select -- the first / first and only days
id, col_beg,
case when col_beg::date = col_end::date then col_end else date_trunc('day', col_end) end as col_end
from mytable
union all
select -- the last days of multi-day periods
id, date_trunc('day', col_end) as col_beg, col_end
from mytable
where col_end::date > col_beg::date
union all
select -- the middle days of multi-day periods
id, rd as col_beg, rd::date + 1 as col_end
from mytable
cross join lateral generate_series(col_beg::date + 1, col_end::date - 1, interval '1 day') g(rd)
where col_end::date > col_beg::date + 1
)
select
col_beg::date as start_time, id, sum(col_end - col_beg) as duration
from mycte group by 1, 2 order by 1;

Duplicate row based on date difference

help needed. I have a set of data from oracle import to tableau for calculation. But in order to do that, i need to duplicate charts as shown in table below. For example, if there is date diff between start and end, then i need to duplicate it and assign with code 0,1 depend on how many date differences. The purpose is i need to use this function in Tableau for time interval calculation. Thanks
Pregenerate codes up to max possible value and join original table to code series so that number of row duplications is determined by difference between dates on particular row:
with t (s,e) as (
select timestamp '2020-08-16 18:30:00', timestamp '2020-08-16 20:00:00' from dual union all
select timestamp '2020-08-17 08:00:00', timestamp '2020-08-18 08:00:00' from dual union all
select timestamp '2020-08-19 08:00:00', timestamp '2020-08-19 00:00:00' from dual union all
select timestamp '2020-08-20 10:00:00', timestamp '2020-08-22 03:00:00' from dual
), series (code) as (
select level - 1 from dual connect by level <= (select count(*) from t)
)
select t.*, series.code
from t
join series on trunc(e) - trunc(s) >= series.code
order by s,code;