Stack data-structure LIFO - equivalent in big query sql - sql

Need help in solving the below use-case.
We've machines sending events that contain information machineid, session, state (start, success, failure) for some tasks, and event timestamp. Each task contains a state - start, success, and failure.
A success/failure event might present without having a start event. Similarly start event may present without having any success/failure.
WITH eventsource AS
( SELECT 'start' as state, '2023-01-01 10:00:00' as event_time, "a" as rowidentifier
UNION ALL SELECT 'success' as state, '2023-01-01 10:00:01' as event_time, "b" as rowidentifier
UNION ALL SELECT 'start' as state, '2023-01-01 10:00:02' as event_time, "c" as rowidentifier
UNION ALL SELECT 'start' as state, '2023-01-01 10:00:03' as event_time, "d" as rowidentifier
UNION ALL SELECT 'success' as state, '2023-01-01 10:00:04' as event_time, "e" as rowidentifier
UNION ALL SELECT 'success' as state, '2023-01-01 10:00:07' as event_time, "f" as rowidentifier
UNION ALL SELECT 'success' as state, '2023-01-01 10:00:08' as event_time, "g" as rowidentifier
UNION ALL SELECT 'start' as state, '2023-01-01 10:00:08' as event_time, "h" as rowidentifier
UNION ALL SELECT 'failure' as state, '2023-01-01 10:00:09' as event_time, "i" as rowidentifier
UNION ALL SELECT 'failure' as state, '2023-01-01 10:00:10' as event_time, "j" as rowidentifier
UNION ALL SELECT 'start' as state, '2023-01-01 10:00:11' as event_time, "k" as rowidentifier
)
select * from eventsource order by event_time
enter image description here
Problem : We need to have one stack-like structure, as soon as the start event comes - we need to push to stack. If an event start/failure comes, pop the 'start' event from the stack, and compute the time difference between them (only for the success event, for failure we need to pop from the stack but do nothing). This calculation is per session.
In the above scenario - for simplicity added row identifiers - and the expected outcome is
a,b events one pair,
d,e events one pair,
c,f events one pair,
g is out of order success - as no start event present for this event
h,i one pair,
j out of order event
k out of order event - no success/failure preset for this event.
What is the best way to do it in sql.I tried with array variable in stored procedure, but removing the last element and adding for start-event for each row, is a performance hit. It is taking lots of time. I tried with dynamic sql way with "execute immediate" command by creating a table with single column - 'adding/deleting' in a for loop for each record. This is worse in performance compared to the array variable approach.
Data size: Around 2 million sessions per hour.
I explored different ways - like rank/dense_rank/row_number (did not get how to solve this use case) array variable in stored procedure, create a temp table with single column - adding/deleting according to success/failure events but nothing worked.

Related

How to write logic for using two adjacent values in the column in bigquery?

If you see below
I am having a column called event. In that particular event, I need to calculate the time on how many minutes practice is stopped due to rain.
We have rain and practice strings in adjacent. How to develop a single line condition using both values in single statement ( if rain_started and practice stopped then 'stopped' ) or ( if rain stopped and practice started as 'started'). If a classify them as start and stopped .I can get that particular session time
strong text
Using your logic, consider the query below. It uses LAG() to get the previous value.
Take note that you might have different values and positioning for field event so just replace it with your actual values.
with sample_data as (
select 'rain started' as event, current_time() as time_stmp,
union all select 'practice stopped' as event, time_add(current_time(), interval 1 minute) as time_stmp,
union all select 'rain stopped' as event, time_add(current_time(), interval 10 minute) as time_stmp,
union all select 'practice started' as event, time_add(current_time(), interval 11 minute) as time_stmp,
),
-- ######
-- query above is just to generate sample data
-- ######
get_prev as (
select event,
time_stmp,
lag(event) over (order by time_stmp) as prev_event,
from sample_data
)
select
event,
time_stmp,
case
when prev_event = 'rain started' and event = 'practice stopped' then 'stopped'
when prev_event = 'rain stopped' and event = 'practice started' then 'started'
else null
end as status,
from get_prev
Output:

Group Timestamps into intervals of 5 minutes, take value that's closest to timestamp and always give out a value

I'm new to SQL coding and would heavily appreciate help for a problem I'm facing. I have the following SQL script, that gives me the following output (see picture 1):
WITH speicher as(
select a.node as NODE_ID, d.name_0 as NODE_NAME, d.parent as PARENT_ID, c.time_stamp as ZEITSTEMPEL, c.value_num as WERT, b.DESCRIPTION_0 as Beschreibung, TO_CHAR(c.time_stamp, 'HH24:MI:SS') as Uhrzeit
from p_value_relations a, l_nodes d, p_values b, p_value_archive c
where a.node in (select sub_node from l_node_relations r where r.node in (
50028,
49989,
49848
))
and a.node = d.id
and (b."DESCRIPTION_0" like 'Name1' OR b."DESCRIPTION_0" like 'Name2')
and c.time_stamp between SYSDATE-30 AND SYSDATE-1
and a.value = b.id and b.id = c.value)
SELECT WERT as Value, NODE_NAME, ZEITSTEMPEL as Timestamp, Uhrzeit as Time, Beschreibung as Category
FROM speicher
I would like to create time intervals of 5 minutes to output the value. It should always choose the value closest above one on the defined time interval time stamps. If there is no value inside a set 5 minute intervall it should still give out the last value it finds, since the value has not changed in that case. To see what I mean please see the following picture. Any help wold be greatly appreciated. This data is from an oracle database.
Result until now [
Result I would like
Since I do not understand your data, and can't test with it, I present something I could test with. My data has a table which tracks when folks login to a system.
This is not intended as a complete answer, but as something to potentially point you in the right direction;
with time_range
as
(
select rownum, sysdate - (1/288)*rownum time_stamp
from dual
connect By Rownum <= 288*30
)
select time_stamp, min(LOGIN_TIME)
from time_range
left outer join WEB_SECURITY_LOGGED_IN on LOGIN_TIME >= time_stamp
group by time_stamp
order by 1;
Good luck...
Edit:
The with part of the query builds a time_stamp column which has one row for every 5 minutes for the last 30 days. The query portion joins to my login log table which I get the login which is the smallest date/time greater than the time_stamp.

Oracle Apex, Trying to pick value in Checkbox based on time

I've setup a dynamic action to run javascript code to pick either "A" or "B" based on what time it is. If the time is between 07:00 (h24:mm) and 17:50 (h24:mm) it should set the value to "A", else "B".
I've got a field (P4_TIME_RECORDED) on the form that pulls the current time using this code
select to_char(systimestamp,'hh24:mm') as timestamp from dual
and a field (P4_SHIFT) that is a checkbox between A and B. checkbox example
Any help would be awesome.
Based on the explanation in your question I'd go for a slightly different approach.
The value needs to be Shift A or Shift B. A radio group seems to be most appropriate there, since only one value can be selected at any time.
The value is based on sysdate, not on something that is changed after the page is rendered, so a dynamic action is overkill. Instead you can use a computation to get the value at page rendering time.
So this is what I'd do
Create a page item P4_SHIFT of type radio group (Static values, values A and B)
Add a computation on P4_SHIFT with processing point before regions of type Expression with source
CASE
WHEN TO_CHAR(SYSDATE,'HH24MI') BETWEEN '0700' AND '1750'
THEN 'A'
ELSE 'B'
END
I'd create a DA and set it to be fired on page load.
set value -> from SQL Query
my query would be something like
select
case
when sysdate between time_1 and time_2
then 'A'
else 'B'
end as res
from(
select
trunc(sysdate) + interval '7' hour as time_1,
trunc(sysdate) + interval '17' hour + interval'50' minute as time_2
from dual)
and ofcourse, affected element has to be set as your cbx item.

In SQL, Is there any way to construct a variable that tracks historical data within multiple groups?

I have inquiries about the "variable construction" in the SQL, more specifically Big Query in the GCP (Google Cloud Platform). I do not have a deep understanding of SQL, so I am having a hard time manipulating and constructing variables I intend to make. So, any comment would be very appreciated.
I’m thinking of constructing two variables, which seems quite tricky to me. I’d like to briefly introduce the structure of this dataset before I inquire about the way of constructing those variables. This dataset is the historical record of game matches played by around 25,000 users, totaling around 100 million matches. 10 players participate in a single match, and each player choose their own hero. Due to the technical issue, I can only manipulate and construct those two variables through Big Query in the GCP (Google Cloud Platform).
Constructing “Favorite Hero” Variable
First, I am planning to construct a “favorite hero” variable within match-user level. As shown in the tables below, the baseline variables are 1)match_id (that specifies each match) 2)user_id(that specifies each user) 3) day(that indicates the date of match played) 4)hero_type(that indicates which hero did each player(user) choose).
Let me make clear what I intend to construct. As shown below, the user “3258 (blue)” played four times within the observation period. So, for the fourth match of user 3258, his/her favorite hero_type is 36 because his/her cumulative favorite here_type is 36. Please note that the “cumulative” does not include the very that day. For example, the user “2067(red)” played three times: 20190208, 20190209, 20190212. Each time, the player chose heroes 81, 81, and 34, respectively. So, the “favorite_hero” for the third time is 81, not 34. Also, I’d like to set the number of favorite_hero is 2.
The important thing to note is that there are consecutive but split tables as shown below. Although those tables are split, the timeline should not be discontinued but be linked to each other.
Constructing “Familiarity” Variable
I think the second variable I intend to make is quite trickier than the previous one. I am planning to construct a “met_before” variable that counts the total number of cases where each user met another player (s). For example, in the match_id 2, the user 3258(blue) and 2067(red) met each other previously at the match_id 1. So, each user has a value of 1 for the variable “met_before” So, the concept of “match_id” particularly becomes more important when making this variable than the previous one, because this variable is primarily made based on the match_id. Another example is, for the match_id 5, the user 3258(blue) has the value of 4 for the variable “met_before” because the player met with user 2386(green) for two times (match_id 1 and 3) and with user 2067(red) for the two times(match_id 1 and 2), respectively.
Again, the important thing to note is that there are consecutive but split tables as shown below. Although those tables are split, the timeline should not be discontinued but be linked to each other.
As stated in the comments, it would be better if you could provide sample data.
Also, there are 2 separate problems in the question. It would be better to create 2 different threads for them.
I prepared sample data from your screenshots and the code you need.
So you can try the code and give feedback according to the output. So if there is anything wrong, we can iterate it.
CREATE TEMP FUNCTION find_fav_hero(heroes ARRAY<INT64>) AS
((
SELECT STRING_AGG(CAST(hero as string) ORDER BY hero)
FROM (
SELECT *, max(cnt) over () as max_cnt
FROM (
SELECT hero, count(*) as cnt
FROM UNNEST(heroes) as hero
GROUP BY 1
)
)
WHERE cnt = max_cnt
));
WITH
rawdata as (
SELECT 2386 AS user_id, 20190208 as day, 30 as hero_type UNION ALL
SELECT 3268 AS user_id, 20190208 as day, 30 as hero_type UNION ALL
SELECT 2067 AS user_id, 20190208 as day, 81 as hero_type UNION ALL
SELECT 3268 AS user_id, 20190209 as day, 36 as hero_type UNION ALL
SELECT 2067 AS user_id, 20190209 as day, 81 as hero_type UNION ALL
SELECT 2386 AS user_id, 20190210 as day, 3 as hero_type UNION ALL
SELECT 3268 AS user_id, 20190210 as day, 36 as hero_type UNION ALL
SELECT 2386 AS user_id, 20190212 as day, 203 as hero_type UNION ALL
SELECT 3268 AS user_id, 20190212 as day, 36 as hero_type UNION ALL
SELECT 2067 AS user_id, 20190212 as day, 34 as hero_type
)
SELECT *,
count(*) over (partition by user_id order by day) - 1 as met_before,
find_fav_hero(array_agg(hero_type) over (partition by user_id order by day rows between unbounded preceding and 1 preceding )) as favourite_hero
from rawdata
order by day, user_id

Big Query Compute Average time between two Custom Events

I'm attempting to determine the average time between two events in my Firebase analytics using BigQuery. The table looks something like this:
I'd like to collect the timstamp_micros for the LOGIN_CALL and LOGIN_CALL_OK events, subtract LOGIN_CALL from LOGIN_CALL_OK and compute the average for this across all rows.
#standardSQL
SELECT AVG(
(SELECT
event.timestamp_micros
FROM
`table`,
UNNEST(event_dim) AS event
where event.name = "LOGIN_CALL_OK") -
(SELECT
event.timestamp_micros
FROM
`table`,
UNNEST(event_dim) AS event
where event.name = "LOGIN_CALL"))
from `table`
I've managed to list either the low or the hi numbers, but any time I try to do any math on them I run into errors I'm struggling to pull apart. This approach above seems like it should work but i get the following error:
Error: Scalar subquery produced more than one element
I read this error to mean that each of the UNNEST() functions is returning an array, and not single value which is causing AVG to barf. I've tried to unnest once and apply a "low" and "hi" name to the values, but can't figure out how to filter using the event_dim.name correctly.
I couldn't fully test this one but maybe this might work for you:
WITH data AS(
SELECT STRUCT('1' as user_id) user_dim, ARRAY< STRUCT<date string, name string, timestamp_micros INT64> > [('20170610', 'EVENT1', 1497088800000000), ('20170610', 'LOGIN_CALL', 1498088800000000), ('20170610', 'LOGIN_CALL_OK', 1498888800000000), ('20170610', 'EVENT2', 159788800000000), ('20170610', 'LOGIN_CALL', 1599088800000000), ('20170610', 'LOGIN_CALL_OK', 1608888800000000)] event_dim union all
SELECT STRUCT('2' as user_id) user_dim, ARRAY< STRUCT<date string, name string, timestamp_micros INT64> > [('20170610', 'EVENT1', 1497688500400000), ('20170610', 'LOGIN_CALL', 1497788800000000)] event_dim UNION ALL
SELECT STRUCT('3' as user_id) user_dim, ARRAY< STRUCT<date string, name string, timestamp_micros INT64> > [('20170610', 'EVENT1', 1487688500400000), ('20170610', 'LOGIN_CALL', 1487788845000000), ('20170610', 'LOGIN_CALL_OK', 1498888807700000)] event_dim
)
SELECT
AVG(time_diff) avg_time_diff
FROM(
SELECT
CASE WHEN e.name = 'LOGIN_CALL' AND LEAD(NAME,1) OVER(PARTITION BY user_dim.user_id ORDER BY timestamp_micros ASC) = 'LOGIN_CALL_OK' THEN TIMESTAMP_DIFF(TIMESTAMP_MICROS(LEAD(TIMESTAMP_MICROS, 1) OVER(PARTITION BY user_dim.user_id ORDER BY timestamp_micros ASC)), TIMESTAMP_MICROS(TIMESTAMP_MICROS), day) END time_diff
FROM data,
UNNEST(event_dim) e
WHERE e.name in ('LOGIN_CALL', 'LOGIN_CALL_OK')
)
I've simulated 3 users with the same schema that you have in Firebase Schema.
Basically, I first applied the UNNEST operation so to have each value of event_dim.name. Then applied filter to get only the events that you are interested in, that is, "LOGIN_CALL" and "LOGIN_CALL_OK".
As Mosha commented above, you do need to have some identification for these rows as otherwise you won't know which event succeeded which so that's why the partitioning of the analytical functions takes the user_dim.user_id as input as well.
After that, it's just TIMESTAMP operations to get the differences when appropriate (when the leading event is "LOGIN_CALL_OK" and the current one being "LOGIN_CALL" then take the difference. This is expressed in the CASE expression).
You can choose in the TIMESTAMP_DIFF function which part of the date you want to analyze, such as seconds, minutes, days and so on.