How do I exclude NULL values in a subquery? (BigQuery - SQL - GA4) - sql

Here is my query so far:
SELECT
event_date,
event_timestamp,
user_pseudo_id,
geo.country,
geo.region,
geo.city,
geo.sub_continent,
(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
key = "search_term") AS search_term,
FROM `GoogleAnlyticsDatabase`
I am trying to exclude all NULL values in the 'search_term' column.
I am struggling to identify where I need to include IS NOT NULL in my code.
Everything I have tried so far has thrown up errors.
Does anyone have any ideas?

Is this query giving you the expected result besides the NULL problem?
If yes, you can just wrap your query to an CTE aand filter this CTE
like
WITH
SRC AS (
SELECT
event_date,
event_timestamp,
user_pseudo_id,
geo.country,
geo.region,
geo.city,
geo.sub_continent,
(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
key = "search_term") AS search_term
FROM `GoogleAnlyticsDatabase`
)
SELECT * FROM SRC WHERE search_term IS NOT NULL

Related

Recreating google analytics dashboard using data from big query within data studio

I am currently trying to recreate my google analytics dashboard using the big query connector within ga4. I am using a custom query to pull the data i need from Big Query and display it in data studio. When i just calculate some KPIs from the data and pull those using the query, the information comes through correct. But once I try to access custom event data, i have to unnest it before it is accessible. Once unnested, the unnested data needs to be grouped or the query will throw an error. The grouping of rows seems to be messing up the KPI's i previously calculated and inflates the values. How do i query this data correctly? I tried to pull all the raw data in and use custom fields within data studio to calculate the fields i need but ran into issues there too.
SELECT
distinct
event_date,
event_timestamp,
event_name,
user_pseudo_id,
device.category,
-- author
(
SELECT
distinct
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='author'
group by 1 ) AS author,
-- campaign
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='campaign'
group by 1 ) AS campaign,
-- categories
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='categories'
group by 1 ) AS categories,
-- clientid
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='clientid'
group by 1 ) AS clientid,
-- duration
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='duration'
group by 1 ) AS duration,
-- eventactions
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='eventactions'
group by 1 ) AS eventactions,
-- eventcategory
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='eventcategory'
group by 1 ) AS eventcategory,
-- eventlabel
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='eventlabel'
group by 1 ) AS eventlabel,
-- mediatype
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='mediatype'
group by 1 ) AS mediatype,
-- pagetitle
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='pagetitle'
group by 1 ) AS pagetitle,
-- pagetype
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='pagetype'
group by 1 ) AS pagetype,
-- source
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='source'
group by 1 ) AS SOURCE,
-- sourceurl
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='sourceurl'
group by 1 ) AS sourceurl,
-- srclink
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='srclink'
group by 1 ) AS srclink,
-- status
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='status'
group by 1 ) AS status,
-- title
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='title'
group by 1 ) AS title,
-- user_clientid
(
SELECT
params.value.string_value
FROM
UNNEST(event_params) AS params
WHERE
params.key='user_clientid'
group by 1 ) AS user_clientid,
traffic_source.source AS User_Source,
-- end groupby
COUNT(1) AS eventCount,
SAFE_DIVIDE(COUNT(DISTINCT
CASE
WHEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1' THEN CONCAT(user_pseudo_id,( SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))
END
),COUNT(DISTINCT CONCAT(user_pseudo_id,(
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
key = 'ga_session_id')))) AS engagement_rate,
COUNT(DISTINCT user_pseudo_id) AS Unique_Users,
COUNT(DISTINCT
CASE
WHEN ( SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') > 0 OR ( SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1' THEN user_pseudo_id
ELSE
NULL
END
) AS active_users,
COUNT(DISTINCT
CASE
WHEN ( SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1 THEN user_pseudo_id
ELSE
NULL
END
) AS new_users,
COUNT(DISTINCT user_pseudo_id) AS users
FROM
`zngly-corporate.analytics_315869392.events_*`,
UNNEST(event_params) AS params
GROUP BY
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23
Above is the sql query that I have tried. Not sure how i can achieve what i need correctly, if anyone has any insight to share I would appreciate it.
The SQL for this is quite complex. You might try a query builder like Analytics Canvas to verify your result set: https://analyticscanvas.com/ga4-bigquery-query-builder/. There's a free trial, so you don't need to buy the software to verify your results and get you back on track.
Be super careful when connecting Looker Studio to BQ directly. It sends separate queries for each chart, table, scorecard and user of the report. If you connect directly, ensure you are creating date partitioned summary tables and hitting those, instead of the raw GA4 export tables.

Correlated subquery not working in Netezza

I have a query like this in Netezza, but not sure how I can rewrite it so it will work. Thanks
with dates as (
select distinct event_date from table
)
select event_date,
(select count(distinct id)
from table
where event_date < dates.event_date
)
from dates
This form of correlated query is not supported - consider rewriting
This would be more efficient using window functions anyway. I think the logic is:
select event_date,
sum(count(*)) over (order by event_date) - count(*) as events_before
from table
group by event_date

How to find sum of engagement_time_msec for different sessions performed by users in bigquery?

My data looks like this:
For a particular "event_params.key" = "ga_session_id", I want to find the sum of "event_params.value.int_value" when "event_params.key" = "engagement_time_msec".
This is to be done for every user (column - "user_pseudo_id").
The "engagement_time_msec" is present in only "event_name" = "screen_view" and "user_engagement" and can come multiple times for one particular "ga_session_id".
Basically, "ga_session_id" is the unique id for every session a "user_pseudo_id" creates. I want to find the average session duration for the users.
Please help me.
Your question is really hard to follow. You seem to want something like the sum of engagement_time_msec:
select t.user_pseudo_id,
sum(case when ep.key = 'engagement_time_msec' then ep.value.int_val end)
from t left join
unnest(event_params) ep
on 1=1
group by 1;
I have no idea what the other keys are for. This appears to have the data you want.
You can use UNNEST to extract data from array of key-value pairs in BigQuery:
WITH unnested_table AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') as ga_session_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') as engagement_time_msec
FROM myDataset.myTable
WHERE event_name in ('screen_view', 'user_engagement')
),
session_duration_table AS (
SELECT
user_pseudo_id,
ga_session_id,
SUM(engagement_time_msec) as session_duration
FROM unnested_table
GROUP BY user_pseudo_id, ga_session_id
)
SELECT
user_pseudo_id,
AVG(session_duration) as avg_session_duration
FROM session_duration_table
GROUP BY user_pseudo_id
Below is for BigQuery Standard SQL
#standardSQL
select user_pseudo_id, avg(session_time_msec) as avg_session_time_msec
from (
select user_pseudo_id,
(select value.int_value from e.event_params where key = 'ga_session_id') as ga_session_id,
sum((select value.int_value from e.event_params where key = 'engagement_time_msec')) as session_time_msec
from `project.dataset.table` e
group by 1, 2
)
group by 1

Find sum of engagement_time_msec for users who have done an event named "yt_event" in BigQuery

My table looks like this:
There's one more column named "user_pseudo_id" which is unique id for users. I want to take sum of event_params.key = 'engagement_time_msec' for user_pseudo_id who have done event_name = 'yt_event'.
Also, event_params.key = 'engagement_time_msec' is only present in two events only, i.e. event_name = 'user_engagement' and 'screen_view'.
I have tried subqueries like this:
SELECT
user_pseudo_id,
(
select
sum(value.int_value/60000)
from unnest(event_params)
where
key = 'engagement_time_msec'
and
user_pseudo_id = 'yt_users') as eng_time_min
FROM
`Xyz.events_20201030`
where
event_name = 'yt_event'
But I am not able to get it.
Please help me. I will be highly obliged.
Thanks.
Below is for BigQuery Standard SQL
#standardSQL
select user_pseudo_id,
sum(( select value.int_value/60000
from t.event_params
where key = 'engagement_time_msec'
)) as eng_time_min
from `Xyz.events_20201030` t
where user_pseudo_id in (
select distinct user_pseudo_id
from `Xyz.events_20201030`
where event_name = 'yt_event'
)
group by user_pseudo_id
Hmmm . . . you can use aggregation and a having clause:
select user_pseudo_id,
sum(case when key ep.key = 'engagement_time_msec' then ep.int_val / 60000) as etm
from `Xyz.events_20201030` e cross join
unnest(event_params) ep
where ep.key in ( 'engagement_time_msec')
group by 1
having countif(event_name = 'yt_event') > 0;

query with partition and count

Given the following table (it records users' item viewing history with session)
create table view_log (
server_time timestamp,
device char(2),
session_id char(10),
uid char(7),
item_id char(7)
);
I'm trying to understand what the following code does..
create table coo_cs as
select
item_id,
session_id,
count(distinct session_id) / (sum(count(distinct session_id)) over (partition by item_id)) cs
from view_log
group by item_id, session_id;
I've tried to break down the line with the partition to understand what it's doing but then it emits DISTINCT is not implemented for window functions.
I understand basic partition and group by but can't make sense of the above sql..
edit
there's a rather large data for test...
http://pakdd2017.recobell.io/site_view_log_small.csv000.gz
Some databases do not (yet) support count(distinct) as a window function. For this query, the count(distinct) is not necessary, because you are aggregating by the same column used for the count(distinct). Hence, count(distinct session_id) is 1 on each row.
Your query is essentially:
select item_id, session_id,
1.0 / count(session_id) over (partition by item_id)) as cs
from view_log
group by item_id, session_id;
I wouldn't be surprising if you wanted the ratios at the level of item_id, so the intended query is:
select item_id, count(distinct session_id),
count(distinct session_id) * 1.0 / sum(count(distinct session_id)) over ()) as cs
from view_log
group by item_id;
If so, the equivalent logic can use a subquery:
select vl.*, sum(numsession) over () as cs
from (select item_id, count(distinct session_id) as numsessions
from view_log vl
group by item_id
) vl;