firebase to bigquery -> counting occurances of nested custom event with specific values - google-bigquery

I'm having problems counting custom event occurrences by the event value (it has multiple).
I'm currently trying to make this work:
SELECT count(*) as count,app_info.id,event_date,event_name,platform,events.value.string_value
FROM `api-6xxx.analytics_xxx.events_*`
CROSS JOIN UNNEST(event_params) as events
WHERE _table_suffix BETWEEN '20200225' AND '20200229'
AND event_name = 'ad_finished'
AND app_info.id = 'com.bundle.app'
AND platform = "ANDROID"
AND events.key = 'ad_type'
AND traffic_source.source = 'google'
AND events.value.string_value <> 'specific_thing'
GROUP BY 2,3,4,5,6
ORDER BY event_date
The problem is that it does not filter by the AND events.value.string_value <> 'specific_thing', and I have no idea why.

Related

UPDATE statement GA4 Big Query SQL with events and UNNEST

I'm trying to find a way to update records that have a event_name page_view and key page_location where the latter is containing some pattern. The query below gives me the selection I'm after. Now the problem is that I cannot wrap my head around how to include and UPDATE statement to change the values of page_location in that selection. Do you know?
SELECT *
FROM (
SELECT (SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_location') AS page
FROM `project-name.analytics_299XXXXXX.events_*`
WHERE
_table_suffix BETWEEN '20220322'
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND
event_name = 'page_view'
LIMIT 1000
) x
WHERE x.page LIKE '%login%';
If I understand you correctly i think you should use the following pattern. Google docs don't have this update pattern listed which is a shame as it's the most useful IMO:
update ds.targettable t
set t.targetfield = s.sourcefield
from (select keyfield, sourcefield
from ds.sourcetable
) s
where t.keyfield = s.keyfield
I apologize in advance as I don't have access to the GA events table so i've just coded this up from the top of my head; here is some code that should get you close:
update `project-name.analytics_299XXXXXX.events_*` tgt
set page_location = src.page
from (SELECT event_name, key, _table_suffix (SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_location') AS page
FROM `project-name.analytics_299XXXXXX.events_*`
WHERE _table_suffix BETWEEN '20220322'
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND event_name = 'page_view'
and page LIKE '%login%') src
--join tables together on PK (or make one)
where farm_fingerprint(concat(tgt.key, tgt.event_name) = farm_fingerprint(concat(src.key, src.event_name)
and src._table_suffix = tgt._table_suffix
and tgt.page LIKE '%login%'
I'm not exactly sure how to manage the table suffix so you may need to play with that

User Life Cycle SQL Query Logic in Snowflake

I am working on building a query to track the life cycle of an user through the platform via events. The table EVENTS has 3 columns USER_ID, DATE_TIME and EVENT_NAME. Below is a snapshot of the table,
My query should return the below result (the first timestamp for the registered event followed by the immediate/next timestamp of the following log_in event and finally followed by the immediate/next timestamp of the final landing_page event),
Below is my query ,
WITH FIRST_STEP AS
(SELECT
USER_ID,
MIN(CASE WHEN EVENT_NAME = 'registered' THEN DATE_TIME ELSE NULL END) AS REGISTERED_TIMESTAMP
FROM EVENTS
GROUP BY 1
),
SECOND_STEP AS
(SELECT * FROM EVENTS
WHERE EVENT_NAME = 'log_in'
ORDER BY DATE_TIME
),
THIRD_STEP AS
(SELECT * FROM EVENTS
WHERE EVENT_NAME = 'landing_page'
ORDER BY DATE_TIME
)
SELECT
a.USER_ID,
a.REGISTERED_TIMESTAMP,
(SELECT
CASE WHEN b.DATE_TIME >= a.REGISTRATIONS_TIMESTAMP THEN b.DATE_TIME END AS LOG_IN_TIMESTAMP
FROM SECOND_STEP
LIMIT 1
),
(SELECT
CASE WHEN c.DATE_TIME >= LOG_IN_TIMESTAMP THEN c.DATE_TIME END AS LANDING_PAGE_TIMESTAMP
FROM THIRD_STEP
LIMIT 1
)
FROM FIRST_STEP AS a
LEFT JOIN SECOND_STEP AS b ON a.USER_ID = b.USER_ID
LEFT JOIN THIRD_STEP AS c ON b.USER_ID = c.USER_ID;
Unfortunately I am getting the "SQL compilation error: Unsupported subquery type cannot be evaluated" error when I try to run the query
This is a perfect use case for MATCH_RECOGNIZE.
The pattern you are looking for is register anything* login anything* landing and the measures are the min(iff(event_name='x', date_time, null)) for each.
Check:
https://towardsdatascience.com/funnel-analytics-with-sql-match-recognize-on-snowflake-8bd576d9b7b1
https://docs.snowflake.com/en/user-guide/match-recognize-introduction.html
Set the output to one row per match.
Untested sample query:
select *
from data
match_recognize(
partition by user_id
order by date_time
measures min(iff(event_name='register', date_time, null)) as t1
, min(iff(event_name='log_in', date_time, null)) as t2
, min(iff(event_name='landing_page', date_time, null)) as t3
one row per match
pattern(register anything* login anything* landing)
define
register as event_name = 'register'
, login as event_name = 'log_in'
, landing as event_name = 'landing_page'
);

How do I write a query to reference two keys and two values?

I am trying to write a query in Google BigQuery that pulls two keys and two values. The query should be: count distinct psuedo user IDs from one table where event_params.key = result and event_params.key = confirmation number (and is not null), and event_params.value.string_value = success. This has already been unnested. I'm SUPER new to SQL, so please dumb down any answers.
SELECT
*
FROM
`table_name`,
UNNEST(event_params) AS params
WHERE
(stream_id = '1168190076'
OR stream_id = '1168201031')
AND params.key = 'result'
AND params.value.string_value IN ('success',
'SUCCESS')
AND params.key = 'confirmationNumber' NOT NULL
I keep getting errors, and when I don't get errors, my numbers are off by a lot! I'm not sure where to go next.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.table`
WHERE stream_id IN ('1168190076', '1168201031')
AND 2 = (
SELECT COUNT(1)
FROM UNNEST(event_params) param
WHERE (
param.key = 'result' AND
LOWER(param.value.string_value) = 'success'
) OR (
param.key = 'confirmationNumber' AND
NOT param.value.string_value IS NULL
)
)
I suspect that you want something more like this:
SELECT t.*
FROM `table_name`t
UNNEST(event_params) AS params
WHERE t.stream_id IN ('1168190076', '1168201031') AND
EXISTS (SELECT 1
FROM UNNEST(t.event_params) p
WHERE p.key = 'result' AND
p.value.string_value IN ('success', 'SUCCESS')
) AND
EXISTS (SELECT 1
FROM UNNEST(t.event_params) p
WHERE p.key = 'confirmationNumber'
);
That is, test each parameter independently. You don't need to unnest the result for the result set -- unless you really want to, of course.
I don't know what the lingering NOT NULL is for in your query, so I'm ignoring it. You might want to check the value, however.

Postgres: Subquery with GROUP BY

I'm trying to optimize a query (instead of repeating it a lot of time), with this NOT FUNCTIONAL CODE above (since subqueries only return 1 column):
SELECT
e.pageview_identifier,
e.created_at,
e.pageview_current_url,
e.pageview_mobile,
(
SELECT event_type, COUNT(event_identifier)
FROM events v
WHERE
v.company_identifier = e.company_identifier AND
v.user_identifier = e.user_identifier AND
v.pageview_identifier = e.pageview_identifier
GROUP BY v.event_type
)
FROM events e
WHERE
company_identifier = 'xyz' AND
user_identifier = '01CDQZVSJFBDA8W444JS2CS3BA' AND
event_type = 'page:view';
Basically, I want to retrieve the columns as
pageview_identifier, created_at, ..., event_type_a_count, event_type_b_count, ...
A FUNCTIONAL code that works is:
SELECT
e.pageview_identifier,
e.created_at,
e.pageview_current_url,
e.pageview_mobile,
(
SELECT COUNT(event_identifier)
FROM events v
WHERE
v.company_identifier = e.company_identifier AND
v.user_identifier = e.user_identifier AND
v.pageview_identifier = e.pageview_identifier AND
v.event_type = 'mouse:move'
) as mouse_move_count
FROM events e
WHERE
company_identifier = 'xyz' AND
user_identifier = '01CDQZVSJFBDA8W444JS2CS3BA' AND
event_type = 'page:view';
But in this case, I would need to repeat a lot of time this subquery for each kind of event_type.
Edit 1 - More information:
On my WHERE clause, I restrict it to only event_type = 'page:view'. I have some possible values for event_type, and for each page:view, I need to count related events (with different event_type) to it based on the condition e.pageview_identifier = v.pageview_identifier.
Just use a window function:
SELECT e.pageview_identifier,
e.created_at,
e.pageview_current_url,
e.pageview_mobile,
COUNT(*) OVER (PARTITION BY e.company_identifier, e.user_identifier, e.pageview_identifier) as cnt
FROM events e
WHERE e.company_identifier = 'xyz' AND
e.user_identifier = '01CDQZVSJFBDA8W444JS2CS3BA' AND
e.event_type = 'page:view';
Note: This counts only 'page:view' events. If you want a count of each event, then one way is:
SELECT e.*
FROM (SELECT e.pageview_identifier,
e.created_at,
e.pageview_current_url,
e.pageview_mobile,
COUNT(*) FILTER (WHERE .event_type = 'mouse:move') OVER (PARTITION BY e.company_identifier, e.user_identifier, e.pageview_identifier) as cnt_mouse_move,
COUNT(*) FILTER (WHERE .event_type = ''page:view'') OVER (PARTITION BY e.company_identifier, e.user_identifier, e.pageview_identifier) as cnt_page_view,
. . .
FROM events e
WHERE e.company_identifier = 'xyz' AND
e.user_identifier = '01CDQZVSJFBDA8W444JS2CS3BA'
) e
WHERE e.event_type = 'page:view';

BigQuery select multiple key values

With a custom event in Firebase exported to BigQuery, multiple key-value params can exist within it. I can't seem to figure out how to select more than just one of these using "standard SQL".
Let's say that you wanted to select the string_value that corresponds with firebase_event_origin and the int_value associated with firebase_screen_id for all control_reading events. You could express the query as:
#standardSQL
SELECT
(SELECT param.value.string_value
FROM UNNEST(event_dim.params) AS param
WHERE param.key = 'firebase_event_origin') AS firebase_event_origin,
(SELECT param.value.int_value
FROM UNNEST(event_dim.params) AS param
WHERE param.key = 'firebase_screen_id') AS firebase_screen_id
FROM `your_dataset.your_table_*`
CROSS JOIN UNNEST(event_dim) AS event_dim
WHERE _TABLE_SUFFIX BETWEEN '20170501' AND '20170503' AND
event_dim.name = 'control_reading';