BigQuery - UNNEST where event_params.key matches a certain value - google-bigquery

I'm trying to pull a distinct count of users WHERE the traffic medium = 'referral'
On running the below query, I get this error:
Syntax error: Unexpected keyword UNNEST at [4:1]
I'm trying to UNNEST the event_params field so that I flatten the table. I'm also sharing a sample row of the data which has the event_params.key / value pairs.
Thanks.
SELECT
COUNT (DISTINCT(user_pseudo_id)) AS total_users
FROM `project-table`
UNNEST (event_params) AS event_params
WHERE event_name = 'page_view'
AND event_params.medium='referral'

Below is for BigQuery Standard SQL
#standardSQL
SELECT COUNT (DISTINCT(user_pseudo_id)) AS total_users
FROM `project.dataset.table`,
UNNEST (event_params) AS event_param
WHERE event_name = 'page_view'
AND event_param.key = 'medium'
AND event_param.value.string_value = 'referral'

Related

UPDATE statement GA4 Big Query SQL with events and UNNEST

I'm trying to find a way to update records that have a event_name page_view and key page_location where the latter is containing some pattern. The query below gives me the selection I'm after. Now the problem is that I cannot wrap my head around how to include and UPDATE statement to change the values of page_location in that selection. Do you know?
SELECT *
FROM (
SELECT (SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_location') AS page
FROM `project-name.analytics_299XXXXXX.events_*`
WHERE
_table_suffix BETWEEN '20220322'
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND
event_name = 'page_view'
LIMIT 1000
) x
WHERE x.page LIKE '%login%';
If I understand you correctly i think you should use the following pattern. Google docs don't have this update pattern listed which is a shame as it's the most useful IMO:
update ds.targettable t
set t.targetfield = s.sourcefield
from (select keyfield, sourcefield
from ds.sourcetable
) s
where t.keyfield = s.keyfield
I apologize in advance as I don't have access to the GA events table so i've just coded this up from the top of my head; here is some code that should get you close:
update `project-name.analytics_299XXXXXX.events_*` tgt
set page_location = src.page
from (SELECT event_name, key, _table_suffix (SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_location') AS page
FROM `project-name.analytics_299XXXXXX.events_*`
WHERE _table_suffix BETWEEN '20220322'
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND event_name = 'page_view'
and page LIKE '%login%') src
--join tables together on PK (or make one)
where farm_fingerprint(concat(tgt.key, tgt.event_name) = farm_fingerprint(concat(src.key, src.event_name)
and src._table_suffix = tgt._table_suffix
and tgt.page LIKE '%login%'
I'm not exactly sure how to manage the table suffix so you may need to play with that

PANDAS divide for a given value with groupby

I want to divide each 'Value' in this dataset by the Value at TIME=='1970-Q1' grouped by LOCATION.
This is how I'd implement the logic in SQL
WITH first_year AS (
SELECT LOCATION, Value
FROM `table`
WHERE TIME = '1970-Q1'
)
SELECT t.LOCATION, t.TIME, ((t.Value / f.Value) * 100) normValue
FROM `table` t,
first_year f
WHERE t.LOCATION = f.LOCATION
ORDER BY LOCATION, TIME ASC
However, you can also assume that we can sort (ascending) the column TIME within the group and take the first value. It's always a string like 'YYYY-QX'
Expected result:
Try with transform
df['normal'] = df.Value / df['VALUE'].where(df.TIME.str[5:] =='Q1').groupby(df['LOCATION']).transform('first')

GROUPING multiple LIKE string

Data:
2015478 warning occurred at 20201403021545
2020179 error occurred at 20201303021545
2025480 timeout occurred at 20201203021545
2025481 timeout occurred at 20201103021545
2020482 error occurred at 20201473021545
2020157 timeout occurred at 20201403781545
2020154 warning occurred at 20201407851545
2027845 warning occurred at 20201403458745
In above data, there are 3 kinds of strings I am interested in warning, error and timeout
Can we have a single query where it will group by string and give the count of occurrences as below
Output:
timeout 3
warning 3
error 2
I know I can write separate queries to find count individually. But interested in a single query
Thanks
You can use filtered aggregation for that:
select count(*) filter (where the_column like '%timeout%') as timeout_count,
count(*) filter (where the_column like '%error%') as error_count,
count(*) filter (where the_column like '%warning%') as warning_count
from the_table;
This returns the counts in three columns rather then three rows as your indicated.
If you do need this in separate rows, you can use regexp_replace() to cleanup the string, then group by that:
select regexp_replace(the_column, '(.*)(warning|error|timeout)(.*)', '\2') as what,
count(*)
from the_table
group by what;
Please use below query, without hard coding the values using STRPOS
select val, count(1) from
(select substring(column_name ,position(' ' in (column_name))+1,
length(column_name) - position(reverse(' ') in reverse(column_name)) -
position(' ' in (column_name))) as val from matching) qry
group by val; -- Provide the proper column name
Demo:
If you want this on separate rows you can also use a lateral join:
select which, count(*)
from t cross join lateral
(values (case when col like '%error%' then 'error' end),
(case when col like '%warning%' then 'warning' end),
(case when col like '%timeout%' then 'timeout' end)
) v(which)
where which is not null
group by which;
On the other hand, if you simply want the second word -- but don't want to hardcode the values -- then you can use:
select split_part(col, ' ', 2) as which, count(*)
from t
group by which;
Here is a db<>fiddle.

Convert ARRAY<STRUCT> to multiple columns in BigQuery SQL

I'm trying to convert Array< struct > to multiple columns.
The data structure looks like:
column name: Parameter
[
-{
key: "Publisher_name"
value: "Rubicon"
}
-{
key: "device_type"
value: "IDFA"
}
-{
key: "device_id"
value: "AAAA-BBBB-CCCC-DDDD"
}
]
What I want to get:
publisher_name device_type device_id
Rubicon IDFA AAAA-BBBB-CCCC-DDDD
I have tried this which caused the duplicates of other columns.
select h from table unnest(parameter) as h
BTW, I am very curious why do we want to use this kind of structure in Bigquery. Can't we just add the above 3 columns into table?
Below is for BigQuery Standard SQL
#standardSQL
SELECT
(SELECT value FROM UNNEST(Parameter) WHERE key = 'Publisher_name') AS Publisher_name,
(SELECT value FROM UNNEST(Parameter) WHERE key = 'device_type') AS device_type,
(SELECT value FROM UNNEST(Parameter) WHERE key = 'device_id') AS device_id
FROM `project.dataset.table`
You can further refactor code by using SQL UDF as below
#standardSQL
CREATE TEMP FUNCTION getValue(k STRING, arr ANY TYPE) AS
((SELECT value FROM UNNEST(arr) WHERE key = k));
SELECT
getValue('Publisher_name', Parameter) AS Publisher_name,
getValue('device_type', Parameter) AS device_type,
getValue('device_id', Parameter) AS device_id
FROM `project.dataset.table`
To convert to multiple columns, you will need to aggregate, something like this:
select ?,
max(case when pv.parameter = 'Publisher_name' then value end) as Publisher_name,
max(case when pv.parameter = 'device_type' then value end) as device_type,
max(case when pv.parameter = 'device_id' then value end) as device_id
from t cross join
unnest(parameter) pv
group by ?
You need to explicitly list the new columns that you want. The ? is for the columns that remain the same.

BigQuery select multiple key values

With a custom event in Firebase exported to BigQuery, multiple key-value params can exist within it. I can't seem to figure out how to select more than just one of these using "standard SQL".
Let's say that you wanted to select the string_value that corresponds with firebase_event_origin and the int_value associated with firebase_screen_id for all control_reading events. You could express the query as:
#standardSQL
SELECT
(SELECT param.value.string_value
FROM UNNEST(event_dim.params) AS param
WHERE param.key = 'firebase_event_origin') AS firebase_event_origin,
(SELECT param.value.int_value
FROM UNNEST(event_dim.params) AS param
WHERE param.key = 'firebase_screen_id') AS firebase_screen_id
FROM `your_dataset.your_table_*`
CROSS JOIN UNNEST(event_dim) AS event_dim
WHERE _TABLE_SUFFIX BETWEEN '20170501' AND '20170503' AND
event_dim.name = 'control_reading';