UPDATE statement GA4 Big Query SQL with events and UNNEST - google-bigquery

I'm trying to find a way to update records that have a event_name page_view and key page_location where the latter is containing some pattern. The query below gives me the selection I'm after. Now the problem is that I cannot wrap my head around how to include and UPDATE statement to change the values of page_location in that selection. Do you know?
SELECT *
FROM (
SELECT (SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_location') AS page
FROM `project-name.analytics_299XXXXXX.events_*`
WHERE
_table_suffix BETWEEN '20220322'
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND
event_name = 'page_view'
LIMIT 1000
) x
WHERE x.page LIKE '%login%';

If I understand you correctly i think you should use the following pattern. Google docs don't have this update pattern listed which is a shame as it's the most useful IMO:
update ds.targettable t
set t.targetfield = s.sourcefield
from (select keyfield, sourcefield
from ds.sourcetable
) s
where t.keyfield = s.keyfield
I apologize in advance as I don't have access to the GA events table so i've just coded this up from the top of my head; here is some code that should get you close:
update `project-name.analytics_299XXXXXX.events_*` tgt
set page_location = src.page
from (SELECT event_name, key, _table_suffix (SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_location') AS page
FROM `project-name.analytics_299XXXXXX.events_*`
WHERE _table_suffix BETWEEN '20220322'
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND event_name = 'page_view'
and page LIKE '%login%') src
--join tables together on PK (or make one)
where farm_fingerprint(concat(tgt.key, tgt.event_name) = farm_fingerprint(concat(src.key, src.event_name)
and src._table_suffix = tgt._table_suffix
and tgt.page LIKE '%login%'
I'm not exactly sure how to manage the table suffix so you may need to play with that

Related

How to get traffic source data of 'current session' in GA4 - bigquery export?

I linked my GA4 account with a BigQuery Project a month ago.
With the raw data, I'd like to get various data&insights including many features that GA currently offers. One of them is Conversion path. I'd like to see from which mediums/sources my users went through using conversion path. For this, traffic source or medium/campaign/source values are necessary. Nevertheless, the help page of Google Bigquery Export schema says that traffic source is "Name of the traffic source that first acquired the user".
https://support.google.com/firebase/answer/7029846?hl=en
So for each user, the traffic sources appear always to be same, and I can't track the path of medium/source users who made the conversion. I also tried extracting utm code of page_location in event_param column of big query schema, but it doesn't seem to be correct.
For example, the first image is value counts of the last medium in conversion path of GA4.
Second one is the value counts of last medium that I extracted from page_location/page_referrer of the BigQuery data. The period is same, hence the total number of conversion is same. The counts of mediums differs though.
value counts of the last medium in conversion path of GA4
Value counts of last medium that I extracted from page_location/page_referrer of the BigQuery data
My question is how I can get the traffic source of each session in the raw data of BigQuery?
Any idea or clue would be appreciated.
Many thanks.
If you're still looking for the answer, here's how I've done it:
select
event_date,
user_pseudo_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer') as page_referrer,
CONCAT(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id,
CASE
WHEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') IS NULL THEN '(direct)'
ELSE ((select value.string_value from unnest(event_params) where key = 'medium' ))
END as session_medium,
CASE
WHEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') IS NULL THEN '(none)'
ELSE ((select value.string_value from unnest(event_params) where key = 'source'))
END as session_source,
CASE
WHEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'campaign') IS NULL THEN '(none)'
ELSE ((select value.string_value from unnest(event_params) where key = 'campaign'))
END as session_campaign,
CASE
WHEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'content') IS NULL THEN '(none)'
ELSE((select value.string_value from unnest(event_params) where key = 'content'))
END as session_content,
from
`ga4.analytics_308612563.events_*`
where event_name = "page_view"
and (select value.int_value from unnest(event_params) where key = 'entrances' ) = 1
So basically you're looking for that first page view by filtering for entrance = 1. For my use case, I used this query as a CTE, then did another CTE for other events I was looking for. THen joined both temp tables on the session id based on user_pseudo_id and ga_session_id.
Data from the GA4 export is event based, not session based, so it sounds like what you'd like to do is roll the event data up to a session content.
This query will do that:
SELECT
user_pseudo_id,
TIMESTAMP_MICROS(event_timestamp) AS session_start_ts,
CAST(LEAD(TIMESTAMP_MICROS(event_timestamp),1) OVER (PARTITION BY CONCAT(user_pseudo_id)
ORDER BY
event_timestamp) AS timestamp) AS session_end_ts,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') AS session_number,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer') AS referrer,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS landing_page_path,
(SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_title') AS landing_page_title,
traffic_source.name,
traffic_source.medium,
traffic_source.source,
CASE
WHEN device.category = "desktop" THEN "desktop"
WHEN device.category = "tablet" AND app_info.id IS NULL THEN "tablet-web"
WHEN device.category = "mobile" AND app_info.id IS NULL THEN "mobile-web"
WHEN device.category = "tablet" AND app_info.id IS NOT NULL THEN "tablet-app"
WHEN device.category = "mobile" AND app_info.id IS NOT NULL THEN "mobile-app"
END AS device,
device.mobile_brand_name,
device.mobile_model_name,
device.mobile_marketing_name,
device.mobile_os_hardware_model,
device.operating_system,
device.operating_system_version,
device.vendor_id,
device.advertising_id,
device.language,
device.is_limited_ad_tracking,
device.time_zone_offset_seconds,
device.browser,
device.browser_version,
device.web_info.browser,
device.web_info.browser_version,
device.web_info.hostname
FROM
`[my_project].analytics_[my_id].events*` s -- modify to your project
WHERE
event_name = 'session_start'
order by 1,2
LIMIT 500
Full credit for this code goes to the following article on Rittman Analytics, which I think would be a very beneficial read for you when starting out with the GA4 exports to BigQuery:
https://rittmananalytics.com/blog/2021/7/25/event-based-analytics-and-bigquery-export-comes-to-google-analytics-4-how-does-it-worknbsp-and-whats-thenbspcatch
Ben

firebase to bigquery -> counting occurances of nested custom event with specific values

I'm having problems counting custom event occurrences by the event value (it has multiple).
I'm currently trying to make this work:
SELECT count(*) as count,app_info.id,event_date,event_name,platform,events.value.string_value
FROM `api-6xxx.analytics_xxx.events_*`
CROSS JOIN UNNEST(event_params) as events
WHERE _table_suffix BETWEEN '20200225' AND '20200229'
AND event_name = 'ad_finished'
AND app_info.id = 'com.bundle.app'
AND platform = "ANDROID"
AND events.key = 'ad_type'
AND traffic_source.source = 'google'
AND events.value.string_value <> 'specific_thing'
GROUP BY 2,3,4,5,6
ORDER BY event_date
The problem is that it does not filter by the AND events.value.string_value <> 'specific_thing', and I have no idea why.

How do I write a query to reference two keys and two values?

I am trying to write a query in Google BigQuery that pulls two keys and two values. The query should be: count distinct psuedo user IDs from one table where event_params.key = result and event_params.key = confirmation number (and is not null), and event_params.value.string_value = success. This has already been unnested. I'm SUPER new to SQL, so please dumb down any answers.
SELECT
*
FROM
`table_name`,
UNNEST(event_params) AS params
WHERE
(stream_id = '1168190076'
OR stream_id = '1168201031')
AND params.key = 'result'
AND params.value.string_value IN ('success',
'SUCCESS')
AND params.key = 'confirmationNumber' NOT NULL
I keep getting errors, and when I don't get errors, my numbers are off by a lot! I'm not sure where to go next.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.table`
WHERE stream_id IN ('1168190076', '1168201031')
AND 2 = (
SELECT COUNT(1)
FROM UNNEST(event_params) param
WHERE (
param.key = 'result' AND
LOWER(param.value.string_value) = 'success'
) OR (
param.key = 'confirmationNumber' AND
NOT param.value.string_value IS NULL
)
)
I suspect that you want something more like this:
SELECT t.*
FROM `table_name`t
UNNEST(event_params) AS params
WHERE t.stream_id IN ('1168190076', '1168201031') AND
EXISTS (SELECT 1
FROM UNNEST(t.event_params) p
WHERE p.key = 'result' AND
p.value.string_value IN ('success', 'SUCCESS')
) AND
EXISTS (SELECT 1
FROM UNNEST(t.event_params) p
WHERE p.key = 'confirmationNumber'
);
That is, test each parameter independently. You don't need to unnest the result for the result set -- unless you really want to, of course.
I don't know what the lingering NOT NULL is for in your query, so I'm ignoring it. You might want to check the value, however.

Redshift: Layered correlated subquery pattern not supported

I have a manifest table that has the latest processed timestamp of account/version combinations. I want to filter a raw events table to give me only the newest unprocessed timestamps based on the account/version combinations.
-- ERROR: This type of correlated subquery pattern is not supported due to
-- internal error
FROM events e
WHERE
CASE WHEN (e.account_id, e.app_version, e.app_build)
IN (SELECT DISTINCT account_id, app_version, app_build FROM manifest)
THEN
tstamp > (SELECT last_processed_tstamp FROM manifest m
WHERE m.account_id = e.account_id
AND m.app_version = e.app_version
AND m.app_build = e.app_build)
ELSE
1=1
END
Oddly too, if I only check one column in the CASE-WHEN, it works
-- Somehow this works
FROM events e
WHERE
CASE WHEN e.account_id IN (SELECT DISTINCT account_id FROM manifest)
THEN
tstamp > (SELECT last_processed_tstamp FROM manifest m
WHERE m.account_id = e.account_id
AND m.app_version = e.app_version
AND m.app_build = e.app_build)
ELSE
1=1
END
Unfortunately though this is the wrong logic since it isn't filtering by the correct account/version combination. Would appreciate any help. Thanks.
You could use an OR.
CASE WHEN
(e.account_id IN (SELECT DISTINCT account_id, app_version, app_build FROM
manifest)
OR( e.app_version IN (SELECT DISTINCT account_id, app_version, app_build
FROM manifest)
OR (e.app_build IN (SELECT DISTINCT account_id, app_version, app_build FROM
manifest))
THEN ....
I'd break out the sub select to make sure you're only running it once.

BigQuery select multiple key values

With a custom event in Firebase exported to BigQuery, multiple key-value params can exist within it. I can't seem to figure out how to select more than just one of these using "standard SQL".
Let's say that you wanted to select the string_value that corresponds with firebase_event_origin and the int_value associated with firebase_screen_id for all control_reading events. You could express the query as:
#standardSQL
SELECT
(SELECT param.value.string_value
FROM UNNEST(event_dim.params) AS param
WHERE param.key = 'firebase_event_origin') AS firebase_event_origin,
(SELECT param.value.int_value
FROM UNNEST(event_dim.params) AS param
WHERE param.key = 'firebase_screen_id') AS firebase_screen_id
FROM `your_dataset.your_table_*`
CROSS JOIN UNNEST(event_dim) AS event_dim
WHERE _TABLE_SUFFIX BETWEEN '20170501' AND '20170503' AND
event_dim.name = 'control_reading';