BigQuery select multiple key values - google-bigquery

With a custom event in Firebase exported to BigQuery, multiple key-value params can exist within it. I can't seem to figure out how to select more than just one of these using "standard SQL".

Let's say that you wanted to select the string_value that corresponds with firebase_event_origin and the int_value associated with firebase_screen_id for all control_reading events. You could express the query as:
#standardSQL
SELECT
(SELECT param.value.string_value
FROM UNNEST(event_dim.params) AS param
WHERE param.key = 'firebase_event_origin') AS firebase_event_origin,
(SELECT param.value.int_value
FROM UNNEST(event_dim.params) AS param
WHERE param.key = 'firebase_screen_id') AS firebase_screen_id
FROM `your_dataset.your_table_*`
CROSS JOIN UNNEST(event_dim) AS event_dim
WHERE _TABLE_SUFFIX BETWEEN '20170501' AND '20170503' AND
event_dim.name = 'control_reading';

Related

BigQuery Google Analytics Sample: How to unnest with repetitive elements when CASE WHEN eliminates the multiple elements?

The part that is tripping me up is towards the end of the query:
CASE WHEN type='EVENT' THEN product.v2ProductName[SAFE_OFFSET(1)] ELSE NULL END.
Without this statement, the query works. I have tried several variations including unnesting again within the case statement (although it is already unnested in the orig temporary table) using OFFSET and SAFE_OFFSET both with and without UNNEST.
Also, "product" is a record and does not need to be unnested (same as page and eventInfo).
Here is one version:
WITH
a AS
(WITH
orig AS
(SELECT*
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`),
non_buyers AS
(SELECT fullVisitorId as visitor_id,
MAX(visitNumber) as last_visit,
FROM orig
GROUP BY fullVisitorId
HAVING COUNT(totals.transactions)=0)
SELECT non_buyers.visitor_id as visitor_id,
non_buyers.last_visit as last_visit,
(SELECT MAX(hitNumber) FROM UNNEST( hits ) ) as last_hit
FROM non_buyers LEFT JOIN orig ON non_buyers.visitor_id = orig.fullVisitorID AND non_buyers.last_visit = orig.visitNumber),
orig AS
(SELECT*
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`, UNNEST(hits) )
SELECT type,
page.pagePath,
CASE WHEN type='EVENT' THEN product.v2ProductName[SAFE_OFFSET(1)] ELSE NULL END,
eventInfo.eventAction,
eventInfo.eventLabel,
visitor_id,
last_visit,
last_hit
FROM a LEFT JOIN orig ON a.visitor_id = orig.fullVisitorID AND a.last_visit = orig.visitNumber AND a.last_hit = orig.hitNumber
I am trying to get a frame with the following fields:
visitor_id, last_visit, last_hit, type, page, product, action, label
for each user who never purchased anything.
Also important to NOTE: The hits.product.v2ProductName field us nested and repetitive. BUT it is not repatative for event.type='EVENT' (only for then the event.type =social, page, etc.) so technically I should not have to OFFSET to choose which element I want from the list, although I have tried with and without it. Any help is so appreciated. Thanks!

UPDATE statement GA4 Big Query SQL with events and UNNEST

I'm trying to find a way to update records that have a event_name page_view and key page_location where the latter is containing some pattern. The query below gives me the selection I'm after. Now the problem is that I cannot wrap my head around how to include and UPDATE statement to change the values of page_location in that selection. Do you know?
SELECT *
FROM (
SELECT (SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_location') AS page
FROM `project-name.analytics_299XXXXXX.events_*`
WHERE
_table_suffix BETWEEN '20220322'
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND
event_name = 'page_view'
LIMIT 1000
) x
WHERE x.page LIKE '%login%';
If I understand you correctly i think you should use the following pattern. Google docs don't have this update pattern listed which is a shame as it's the most useful IMO:
update ds.targettable t
set t.targetfield = s.sourcefield
from (select keyfield, sourcefield
from ds.sourcetable
) s
where t.keyfield = s.keyfield
I apologize in advance as I don't have access to the GA events table so i've just coded this up from the top of my head; here is some code that should get you close:
update `project-name.analytics_299XXXXXX.events_*` tgt
set page_location = src.page
from (SELECT event_name, key, _table_suffix (SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_location') AS page
FROM `project-name.analytics_299XXXXXX.events_*`
WHERE _table_suffix BETWEEN '20220322'
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND event_name = 'page_view'
and page LIKE '%login%') src
--join tables together on PK (or make one)
where farm_fingerprint(concat(tgt.key, tgt.event_name) = farm_fingerprint(concat(src.key, src.event_name)
and src._table_suffix = tgt._table_suffix
and tgt.page LIKE '%login%'
I'm not exactly sure how to manage the table suffix so you may need to play with that

BigQuery - UNNEST where event_params.key matches a certain value

I'm trying to pull a distinct count of users WHERE the traffic medium = 'referral'
On running the below query, I get this error:
Syntax error: Unexpected keyword UNNEST at [4:1]
I'm trying to UNNEST the event_params field so that I flatten the table. I'm also sharing a sample row of the data which has the event_params.key / value pairs.
Thanks.
SELECT
COUNT (DISTINCT(user_pseudo_id)) AS total_users
FROM `project-table`
UNNEST (event_params) AS event_params
WHERE event_name = 'page_view'
AND event_params.medium='referral'
Below is for BigQuery Standard SQL
#standardSQL
SELECT COUNT (DISTINCT(user_pseudo_id)) AS total_users
FROM `project.dataset.table`,
UNNEST (event_params) AS event_param
WHERE event_name = 'page_view'
AND event_param.key = 'medium'
AND event_param.value.string_value = 'referral'

firebase to bigquery -> counting occurances of nested custom event with specific values

I'm having problems counting custom event occurrences by the event value (it has multiple).
I'm currently trying to make this work:
SELECT count(*) as count,app_info.id,event_date,event_name,platform,events.value.string_value
FROM `api-6xxx.analytics_xxx.events_*`
CROSS JOIN UNNEST(event_params) as events
WHERE _table_suffix BETWEEN '20200225' AND '20200229'
AND event_name = 'ad_finished'
AND app_info.id = 'com.bundle.app'
AND platform = "ANDROID"
AND events.key = 'ad_type'
AND traffic_source.source = 'google'
AND events.value.string_value <> 'specific_thing'
GROUP BY 2,3,4,5,6
ORDER BY event_date
The problem is that it does not filter by the AND events.value.string_value <> 'specific_thing', and I have no idea why.

How do I write a query to reference two keys and two values?

I am trying to write a query in Google BigQuery that pulls two keys and two values. The query should be: count distinct psuedo user IDs from one table where event_params.key = result and event_params.key = confirmation number (and is not null), and event_params.value.string_value = success. This has already been unnested. I'm SUPER new to SQL, so please dumb down any answers.
SELECT
*
FROM
`table_name`,
UNNEST(event_params) AS params
WHERE
(stream_id = '1168190076'
OR stream_id = '1168201031')
AND params.key = 'result'
AND params.value.string_value IN ('success',
'SUCCESS')
AND params.key = 'confirmationNumber' NOT NULL
I keep getting errors, and when I don't get errors, my numbers are off by a lot! I'm not sure where to go next.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.table`
WHERE stream_id IN ('1168190076', '1168201031')
AND 2 = (
SELECT COUNT(1)
FROM UNNEST(event_params) param
WHERE (
param.key = 'result' AND
LOWER(param.value.string_value) = 'success'
) OR (
param.key = 'confirmationNumber' AND
NOT param.value.string_value IS NULL
)
)
I suspect that you want something more like this:
SELECT t.*
FROM `table_name`t
UNNEST(event_params) AS params
WHERE t.stream_id IN ('1168190076', '1168201031') AND
EXISTS (SELECT 1
FROM UNNEST(t.event_params) p
WHERE p.key = 'result' AND
p.value.string_value IN ('success', 'SUCCESS')
) AND
EXISTS (SELECT 1
FROM UNNEST(t.event_params) p
WHERE p.key = 'confirmationNumber'
);
That is, test each parameter independently. You don't need to unnest the result for the result set -- unless you really want to, of course.
I don't know what the lingering NOT NULL is for in your query, so I'm ignoring it. You might want to check the value, however.