This question already has answers here:
How to unnest and pivot two columns in BigQuery
(3 answers)
Closed 2 years ago.
I have the following requirement as per the screenshot attached.
Need the content of a nested columns into new separate columns as per the screenshot attached. I only need 3 values from the Column event_params.key 1. percentage 2.seconds 3. activity_id and its value will take from column event_params.value.double_value
Any ideas of how to achieve that?
To extract a row from an array of struct (nested struct) by specific key/value in Google BigQuery we need to unnest array first and then extract key-value pairs manually using subquery:
with test_table as (
select
'20201014' as event_date,
'112321341234' as event_timestamp,
'spent_time' as event_name,
[ (select as struct 'percentage' as key, 0.0 as double_value),
(select as struct 'seconds' as key, 0.0 as double_value),
(select as struct 'activity_id' as key, 88.0 as double_value)
] as event_params
union all
select
'20201014999' as event_date,
'112321341234999' as event_timestamp,
'spent_time999' as event_name,
[ (select as struct 'percentage' as key, 9.0 as double_value),
(select as struct 'seconds' as key, 0.9 as double_value),
(select as struct 'activity_id' as key, 99.0 as double_value)
] as event_params
)
select
event_date,
event_timestamp,
event_name,
(select double_value from unnest(event_params) where key = 'percentage') as percentage,
(select double_value from unnest(event_params) where key = 'seconds') as seconds,
(select double_value from unnest(event_params) where key = 'activity_id') as activity_id
from test_table
Related
I'm currently trying to build a query that allows me to group all my GA4 event data by session ID in order to get information about the all the events, per session, as opposed to analyzing the data by each event separately.
The resulting output of my initial query is a new table that has session ID as its own column in the table, instead of being within an array for event parameters for a particular event.
The problem is that the session_id column has non-unique values, a session id is repeated multiple times for each row that is a new event (that happens within that session). I am trying to combine (merge) those non-unique session ids so that I can get ALL the events associated with a particular session_id.
I have tried this query which provides me with session_id as a new column, that is repeated for each event.
`SELECT
*,
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value)
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'
) AS session_id,
(
SELECT COALESCE(value.string_value)
FROM UNNEST(event_params)
WHERE key = 'page_location'
) AS page_location
FROM
`digital-marketing-xxxxxx.analytics_xxxxxxx.events_intraday*``
gives me an output like (it has way more columns than this but just an example):
session_id
event_name
1234567
session_start
1234567
click_url
I need a way to basically merge the two session ids into a single cell. When I try this:
SELECT
*,
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value)
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'
) AS session_id,
(
SELECT COALESCE(value.string_value)
FROM UNNEST(event_params)
WHERE key = 'page_location'
) AS page_location
FROM
`digital-marketing-xxxxxxx.analytics_xxxxxxx.events_intraday*`
GROUP BY session_id
I get an error that tells me (if I understand correctly) that I can't aggregate certain values (like date) which is what the code is trying to do when attempting to group by session id.
Is there any way around this? I'm new to SQL but the searches I've done do far haven't given me a clear answer on how to attempt this.
I use this code to understand sequence of events, it might not be that efficient as I have it set up to look at other things as well
with _latest as (
SELECT
--create unique id
concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as unique_session_id,
--create event id
concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id'),event_name) as session_ids,
event_name,
event_date,
TIMESTAMP_MICROS(event_timestamp) AS event_timestamp
FROM *******
where
-- change the date range by using static and/or dynamic dates
_table_suffix between '20221113' and '20221114'),
Exit_count as (
select *,
row_number() over (partition by session_ids order by event_timestamp desc) as Event_order
from _latest)
select
Event_order,
unique_session_id,
event_date,
event_name,
FROM
Exit_count
group by
Event_order,
event_name,
unique_session_id,
--pagepath,
event_date
--Country_site
order by
unique_session_id,
Event_order
I need to unnest multiple param keys event_date, page_location,page_title,user_pseudo_id. Two of them, page_location and page_title I need to unnest and show them separately. The code below just randomly shows either the location or title value, I need them in a separate row
SELECT
event_date, value.string_value, user_pseudo_id,
FROM
` mydata.events20220909*`,
unnest(event_params)
WHERE
key = "page_title" OR key = "page_location"
You don't necessarily have to unnest all the event_params, with the following query, you can put any parameter in a separate column
select
event_date,
user_pseudo_id,
(select value.string_value from unnest(event_params) where key = 'page_location') as page_location,
(select value.string_value from unnest(event_params) where key = 'page_title') as page_title
from `mydata.events_20220909*`
I have this table 1 in Bigquery and I need to count the elements in column segments and category that correspond to a single user id. Desired outcome presented in table 2. I haven't been able to figure out how to do it... maybe transforming those elements to arrays?
TABLE 1
TABLE 2
Use below
select `desc`, count(distinct user_id) distinct_user_id
from (
select category as `desc`, user_id from your_table
union all
select segment, user_id from your_table,
unnest(split(segment, ';')) segment
)
where `desc` != ''
group by `desc`
I have this jsonb column with name "demographics" . It looks like this
{
"genre":{"women":10,"men":5}
}
I am trying to create new column which will pick the json key name of the one with highest value;
I tried this.
SELECT Greatest(demographics -> 'genre' ->> 'men',
demographics -> 'genre' ->> 'women')
AS greatest
FROM demographics;
This one will get "value" of the highest one but i want the key name and not the value.
Basically i want that it returns in this case to the row "women" as it has higher value;
You can unpack the json object at genre to individual rows using jsonb_each and then get the greatest value using row_number or any number of methods:
with cte as
(
select *
, genre.key as genre
, row_number() over (partition by id order by value desc) as ord
from demographics
cross
join lateral jsonb_each(demographics->'genre') genre
)
select id, genre, value
from cte
where ord = 1
Here's a working demo on dbfiddle
I'm used to working with SQL Server databases and now I need to query data from BigQuery.
What is a better way to query data from the table like this?
Where one column includes several columns...
BigQuery supports unnest() for turning array elements into rows. So, you can convert all of this into rows as:
select t.user_id, t.user_pseudo_id, up.*
from t cross join
unnest(user_properties) up;
You want a field per property. There are several ways to do this. If you want exactly one value per row, you can use a subquery and aggregation:
select t.user_id, t.user_pseudo_id, p.*
from t cross join
(select max(case when up.key = 'age' then up.string_value end) as age,
max(case when up.key = 'gender' then up.string_value end) as gender
from unnest(user_properties) up
) p
Usually subqueries are used like:
SELECT
user_id,
user_pseudo_id,
(SELECT value.string_value FROM user_properties WHERE key = "age") AS age,
(SELECT value.string_value FROM user_properties WHERE key = "gender") AS gender,
FROM dataset.table