BigQuery unnest multiple params - sql

I need to unnest multiple param keys event_date, page_location,page_title,user_pseudo_id. Two of them, page_location and page_title I need to unnest and show them separately. The code below just randomly shows either the location or title value, I need them in a separate row
SELECT
event_date, value.string_value, user_pseudo_id,
FROM
` mydata.events20220909*`,
unnest(event_params)
WHERE
key = "page_title" OR key = "page_location"

You don't necessarily have to unnest all the event_params, with the following query, you can put any parameter in a separate column
select
event_date,
user_pseudo_id,
(select value.string_value from unnest(event_params) where key = 'page_location') as page_location,
(select value.string_value from unnest(event_params) where key = 'page_title') as page_title
from `mydata.events_20220909*`

Related

Grouping by Session ID in BigQuery for GA4 data

I'm currently trying to build a query that allows me to group all my GA4 event data by session ID in order to get information about the all the events, per session, as opposed to analyzing the data by each event separately.
The resulting output of my initial query is a new table that has session ID as its own column in the table, instead of being within an array for event parameters for a particular event.
The problem is that the session_id column has non-unique values, a session id is repeated multiple times for each row that is a new event (that happens within that session). I am trying to combine (merge) those non-unique session ids so that I can get ALL the events associated with a particular session_id.
I have tried this query which provides me with session_id as a new column, that is repeated for each event.
`SELECT
*,
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value)
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'
) AS session_id,
(
SELECT COALESCE(value.string_value)
FROM UNNEST(event_params)
WHERE key = 'page_location'
) AS page_location
FROM
`digital-marketing-xxxxxx.analytics_xxxxxxx.events_intraday*``
gives me an output like (it has way more columns than this but just an example):
session_id
event_name
1234567
session_start
1234567
click_url
I need a way to basically merge the two session ids into a single cell. When I try this:
SELECT
*,
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value)
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'
) AS session_id,
(
SELECT COALESCE(value.string_value)
FROM UNNEST(event_params)
WHERE key = 'page_location'
) AS page_location
FROM
`digital-marketing-xxxxxxx.analytics_xxxxxxx.events_intraday*`
GROUP BY session_id
I get an error that tells me (if I understand correctly) that I can't aggregate certain values (like date) which is what the code is trying to do when attempting to group by session id.
Is there any way around this? I'm new to SQL but the searches I've done do far haven't given me a clear answer on how to attempt this.
I use this code to understand sequence of events, it might not be that efficient as I have it set up to look at other things as well
with _latest as (
SELECT
--create unique id
concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as unique_session_id,
--create event id
concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id'),event_name) as session_ids,
event_name,
event_date,
TIMESTAMP_MICROS(event_timestamp) AS event_timestamp
FROM *******
where
-- change the date range by using static and/or dynamic dates
_table_suffix between '20221113' and '20221114'),
Exit_count as (
select *,
row_number() over (partition by session_ids order by event_timestamp desc) as Event_order
from _latest)
select
Event_order,
unique_session_id,
event_date,
event_name,
FROM
Exit_count
group by
Event_order,
event_name,
unique_session_id,
--pagepath,
event_date
--Country_site
order by
unique_session_id,
Event_order

When joining CTE's on generated uuid, returns empty data set

Table ga4-extract.analytics_1234567.events_20220831 has a record type field event_params I want to select * from this table where field event_params which is a record contains the following name value condition: key = 'page_location' and value.string_value like '%example.dev%'.
If working with a json field in athena or postgres I would simply do something such as:
select jsonfield['event_id'] as event_id
from mytable
where jsonfield['page_location'] like '%example.dev%'
I want that equivilent but must be approaching it wrong because this is way too long for what I want. And, it doesn;t work. I cannot rejoin the CTE's back onto each other.
My query:
with
main as (
select GENERATE_UUID() uuid, *
from `ga4-extract.analytics_1234567.events_20220831`
where event_name in ('trial', 'login')
limit 10000
),
dev as (
select uuid, event_name, e.key, e.value.string_value
from main, unnest(event_params) e
where e.key = 'page_location'
and e.value.string_value like '%example.dev%'
),
event_ids as (
select uuid, event_name, e.key, e.value.string_value as event_id
from main, unnest(event_params) e
where e.key = 'event_id'
)
select *
from main m
join dev d on d.uuid = m.uuid
join event_ids e on e.uuid = m.uuid
How can I filter all records in ga4-extract.analytics_1234567.events_20220831 where event_params.page_location is like 'example.dev' and then get the corresponding values of event_params.event_id?
SELECT t3.value.string_value,
t3.value.int_value,
t1.*,
FROM `ga4-extract.analytics_1234567.events_20220831` t1
inner join unnest(t1.event_params) t2 on t2.key = 'page_location'
and t2.value.string_value like '%example.dev%'
inner join unnest(t1.event_params) t3 on t3.key = 'ga_session_id' -- HERE SET YOUR PARAM NAME! In my analytics there is not "event_id" parameter, but there is ga_session_id.
Each time you use main CTE - it is being re-created along with those GENERATE_UUID() uuid thus when you then join on 'uuid' you get none!
Below is simple proof
with main as (
select 1 id, GENERATE_UUID() uuid
)
select * from main union all
select * from main
with output
BigQuery only materializes the results of recursive CTEs, but does not materialize the results of non-recursive CTEs inside the WITH clause. If a non-recursive CTE is referenced in multiple places in a query, then the CTE is executed once for each reference. The WITH clause with non-recursive CTEs is useful primarily for readability.
Having above in mind - below is hacky "fix"
with recursive main as (
select 1 id, GENERATE_UUID() uuid
), temp as (
select * from main
union all
select t.* from temp t
join main on false
)
select * from temp union all
select * from temp
with output
P.S. Please note - this answer addresses the question/issue expressed in the question's title - When joining CTE's on generated uuid, returns empty data set

Google Biq Query and SQL

I'm used to working with SQL Server databases and now I need to query data from BigQuery.
What is a better way to query data from the table like this?
Where one column includes several columns...
BigQuery supports unnest() for turning array elements into rows. So, you can convert all of this into rows as:
select t.user_id, t.user_pseudo_id, up.*
from t cross join
unnest(user_properties) up;
You want a field per property. There are several ways to do this. If you want exactly one value per row, you can use a subquery and aggregation:
select t.user_id, t.user_pseudo_id, p.*
from t cross join
(select max(case when up.key = 'age' then up.string_value end) as age,
max(case when up.key = 'gender' then up.string_value end) as gender
from unnest(user_properties) up
) p
Usually subqueries are used like:
SELECT
user_id,
user_pseudo_id,
(SELECT value.string_value FROM user_properties WHERE key = "age") AS age,
(SELECT value.string_value FROM user_properties WHERE key = "gender") AS gender,
FROM dataset.table

SPLIT Bigquery Row Content into New Separate Columns [duplicate]

This question already has answers here:
How to unnest and pivot two columns in BigQuery
(3 answers)
Closed 2 years ago.
I have the following requirement as per the screenshot attached.
Need the content of a nested columns into new separate columns as per the screenshot attached. I only need 3 values from the Column event_params.key 1. percentage 2.seconds 3. activity_id and its value will take from column event_params.value.double_value
Any ideas of how to achieve that?
To extract a row from an array of struct (nested struct) by specific key/value in Google BigQuery we need to unnest array first and then extract key-value pairs manually using subquery:
with test_table as (
select
'20201014' as event_date,
'112321341234' as event_timestamp,
'spent_time' as event_name,
[ (select as struct 'percentage' as key, 0.0 as double_value),
(select as struct 'seconds' as key, 0.0 as double_value),
(select as struct 'activity_id' as key, 88.0 as double_value)
] as event_params
union all
select
'20201014999' as event_date,
'112321341234999' as event_timestamp,
'spent_time999' as event_name,
[ (select as struct 'percentage' as key, 9.0 as double_value),
(select as struct 'seconds' as key, 0.9 as double_value),
(select as struct 'activity_id' as key, 99.0 as double_value)
] as event_params
)
select
event_date,
event_timestamp,
event_name,
(select double_value from unnest(event_params) where key = 'percentage') as percentage,
(select double_value from unnest(event_params) where key = 'seconds') as seconds,
(select double_value from unnest(event_params) where key = 'activity_id') as activity_id
from test_table

Understanding self joins and flattening

I'll start with the fact that I'm a newbie and I managed to hack this original query together. I've looked over many examples but I'm just not wrapping my head around self joins and displaying the data I want to see.
I'm feeding BQ with mobile app data daily and thus am querying multiple tables. I'm trying to query for a count of fatal crashes by IMEI by date. This query does give me most of the output I want as it returns Date, IMEI and Count.
However, I want the output to be Date, IMEI, Branch, Truck and Count. user_dim.user_properties.key is a nested field and in my query I'm specifically asking for user_dim.user_properties.key = 'imei_id' and getting it's value in user_dim.user_properties.value.value.string_value.
I don't understand how I would perform the join to also get back the values where user_dim.user_properties.key = 'truck_id' and user_dim.user_properties.key = 'branch_id' and ultimately getting my output to be: Date, IMEI, Branch, Truck and Count in one row.
Thanks for your help.
SELECT
event_dim.date AS Date,
user_dim.user_properties.value.value.string_value AS IMEI,
COUNT(*) AS Count
FROM
FLATTEN( (
SELECT
*
FROM
TABLE_QUERY([smarttruck-6d137:com_usiinc_android_ANDROID],'table_id CONTAINS "app_events_"')), user_dim.user_properties)
WHERE
user_dim.user_properties.key = 'imei_id'
AND event_dim.name = 'app_exception'
AND event_dim.params.key = 'fatal'
AND event_dim.params.value.int_value = 1
AND event_dim.date = '20170807'
GROUP BY
Date,
IMEI
ORDER BY
Count DESC
Here is a query that should work for you, using standard SQL:
#standardSQL
SELECT
event_dim.date AS Date,
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'imei_id') AS IMEI,
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'branch_id') AS branch_id,
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'truck_id') AS truck_id,
COUNT(*) AS Count
FROM `smarttruck-6d137.com_usiinc_android_ANDROID.app_events_*`
CROSS JOIN UNNEST(event_dim) AS event_dim
WHERE
event_dim.name = 'app_exception' AND
EXISTS (
SELECT 1 FROM UNNEST(event_dim.params)
WHERE key = 'fatal' AND value.int_value = 1
) AND
event_dim.date = '20170807'
GROUP BY
Date,
IMEI,
branch_id,
truck_id
ORDER BY
Count DESC;
A couple of thoughts/suggestions, though:
To restrict how much data you scan, you probably want to filter on _TABLE_SUFFIX = '20170807' instead of event_dim.date = '20170807'. This will be cheaper and (if I understand correctly) will return the same results.
If the combinations of IMEI, branch_id, and truck_id are unique, there probably isn't a benefit to computing the count, so you can remove the COUNT(*) and also the GROUP BY/ORDER BY clauses.