I am trying to find a way how to update records in the BigQuery-Export of GA4 data. This is the corresponding field:
To get that field I am using following query:
select
pageLocation
from
(select
(select value.string_value from unnest(event_params) where key = 'page_location') as pageLocation
from `myTable`
)
My update statement currently looks like this:
update `myTable` t
set
t.event_params = (
select
array_agg(
struct(
(select value.string_value from unnest(t.event_params) where key = 'page_location') = 'REDACTED'
)
)
from
unnest(t.event_params) as ep
)
where
true
But I am getting the error "Value of type ARRAY<STRUCT> cannot be assigned to t.event_params, which has type ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64, float_value FLOAT64, ..."
So it looks like the whole array needs to be reconstructed, but as there are many different values for event_params.key this does not seem to be the best way. Is there are way to directly update the corresponding field with BigQuery?
You might consider below:
CREATE TEMP TABLE `ga_events_20210131` AS
SELECT * FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`;
UPDATE `ga_events_20210131` t
SET event_params = ARRAY(
SELECT AS STRUCT
key,
STRUCT (
IF(key = 'page_location', 'REDACTED', value.string_value) AS string_value,
value.int_value, value.float_value, value.double_value
) AS value
FROM t.event_params
)
WHERE TRUE;
SELECT * FROM `ga_events_20210131` LIMIT 100;
Query results
Related
I am trying to install a function. I don't understand what the problem is. I can install correctly when:
I delete the pivot
I use the Table and not the unnest only (so from the table, unnest(a))
CREATE OR REPLACE FUNCTION `dataset.function_naming` (a ARRAY<STRUCT<ROW_ID STRING, KEY STRING, VALUE STRING>>, id_one STRING, id_two STRING, start_date DATE, end_date DATE) RETURNS INT64
AS (
with tmp1 as (
select ROW_ID,X,Y,Z,W
from
(
select prop.ROW_ID,prop.KEY, prop.VALUE
from unnest(a) prop
where prop.KEY in ('X','Y','Z','W')
)
PIVOT
(
MAX(VALUE)
FOR UPPER(KEY) in('X','Y','Z','W')
) as PIVOT
)
select case when X is not null then 1,
when Y is not null then 2,
when Z is not null then 2,
when W is not null then 2
else 0
from tmp1
);
Thanks all.
There are few minor issues I see in your code.
missing extra (...) around function body
extra commas (,) within case statement
So, try below
CREATE OR REPLACE FUNCTION `dataset.function_naming` (
a ARRAY<STRUCT<ROW_ID STRING, KEY STRING, VALUE STRING>>,
id_one STRING,
id_two STRING,
start_date DATE,
end_date DATE
) RETURNS INT64
AS ((
with tmp1 as (
select ROW_ID,X,Y,Z,W
from
(
select prop.ROW_ID,prop.KEY, prop.VALUE
from unnest(a) prop
where prop.KEY in ('X','Y','Z','W')
)
PIVOT
(
MAX(VALUE)
FOR UPPER(KEY) in('X','Y','Z','W')
) as PIVOT
)
select case when X is not null then 1
when Y is not null then 2
when Z is not null then 2
when W is not null then 2
else 0
end
from tmp1
));
Seams there is an internal issue when using pivots and the unnest on the array. You can use the following, that executes the same logic, and also, create an case on issue tracker, as a BigQuery issue with Google cloud Support.
CREATE OR REPLACE FUNCTION `<dataset>.function_naming` (
a ARRAY<STRUCT<ROW_ID STRING, KEY STRING, VALUE STRING>>,
id_one STRING,
id_two STRING,
start_date DATE,
end_date DATE
) RETURNS INT64
AS (( WITH tmp AS (
SELECT
CASE
WHEN KEY="X" THEN 1
WHEN KEY="Y" THEN 2
WHEN KEY="Z" THEN 2
WHEN KEY="W" THEN 2
ELSE
0
END
teste_column
#-- FROM ( SELECT UPPER(prop.KEY) KEY, MAX(prop.VALUE) VALUE FROM -- following your query patern, but not really necessary
FROM ( SELECT UPPER(prop.KEY) KEY FROM
UNNEST(a) prop
WHERE
UPPER(key) IN ('X', 'Y', 'Z', 'W')
GROUP BY key )
ORDER BY teste_column DESC LIMIT 1 )
SELECT * FROM tmp
UNION ALL
SELECT 0 teste_column
FROM (SELECT 1)
LEFT JOIN tmp
ON FALSE
WHERE NOT EXISTS ( SELECT 1 FROM tmp)
));
#--- Testing the function:
select `<project>.<dataset>.function_naming`([STRUCT("1" AS ROW_ID, "x" AS KEY, "10"AS VALUE), STRUCT("1" AS ROW_ID, "x" AS KEY, "20"AS VALUE), STRUCT("1" AS ROW_ID, "w" AS KEY, "20"AS VALUE), STRUCT("1" AS ROW_ID, "y" AS KEY, "20"AS VALUE)], "1", "2", "2022-12-10", "2022-12-10")
I'm trying to use DML in BigQuery to update nested revenue fields.
The challenge is, that I do not want to simply replace the value of the revenue, but multiply it with a specific factor instead.
For just replacing I've found:
UPDATE `project.dataset.table`
SET hits = ARRAY(
SELECT AS STRUCT * REPLACE (
(SELECT AS STRUCT transaction.* REPLACE ( 1 AS transactionRevenue)) AS transaction
)
FROM UNNEST(hits) as transactionRevenue
)
WHERE true
But I would like to have something like:
UPDATE `project.dataset.table`
SET hits = ARRAY(
SELECT AS STRUCT * REPLACE (
(SELECT AS STRUCT transaction.* REPLACE ( (transactionRevenue*5) AS transactionRevenue)) AS transaction
)
FROM UNNEST(hits) as transactionRevenue
)
WHERE true
This approach doesn't work.
Error Message: No matching signature for operator * for argument types: STRUCT, INT64. Supported signatures: INT64 * INT64; FLOAT64 * FLOAT64; NUMERIC * NUMERIC at [4:48]
Below should work
UPDATE `project.dataset.table`
SET hits = ARRAY(
SELECT AS STRUCT * REPLACE(
(SELECT AS STRUCT * REPLACE(5 * transactionRevenue AS transactionRevenue)
FROM UNNEST([transaction])
) AS transaction
)
FROM t.hits
)
WHERE true
Our project has some events recording how long the time that a user stay in a page. We add a event_params.key named time_ms, and its value shows the duration. How can I select the sum of 'time_ms'?
I tried to use SQL statements but failed.
SELECT *
FROM analytics_152426080.events_20190626
WHERE event_name = 'details_viewtime' AND
event_params.key = 'time_ms'
It shows the error message:
'Cannot access field key on a value with type ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64, float_value FLOAT64, ...>>> at [7:20]'.
I expect to get the sum of 'time_ms', but I should solve this question first.
I think you need unnest:
SELECT *
FROM analytics_152426080.events_20190626 e CROSS JOIN
UNNEST(event_params) ep
WHERE e.event_name = 'details_viewtime' AND
ep.key = 'time_ms';
I'm not sure where the actual value is located, but something like this:
SELECT SUM(ep.value.int_value)
FROM analytics_152426080.events_20190626 e CROSS JOIN
UNNEST(event_params) ep
WHERE ep.event_name = 'details_viewtime' AND
ep.key = 'time_ms';
Assuming the value you want to sum is an integer.
This assumes that the value column is a number of some sort. Otherwise, you need to convert it to one.
Or, if you want to sum the value per row:
SELECT e.*,
(SELECT SUM(ep.value.int_value)
FROM UNNEST(event_params) ep
WHERE ep.key = 'time_ms'
) as sum_ms
FROM analytics_152426080.events_20190626 e
WHERE e.event_name = 'details_viewtime'
I am trying to write a query in Google BigQuery that pulls two keys and two values. The query should be: count distinct psuedo user IDs from one table where event_params.key = result and event_params.key = confirmation number (and is not null), and event_params.value.string_value = success. This has already been unnested. I'm SUPER new to SQL, so please dumb down any answers.
SELECT
*
FROM
`table_name`,
UNNEST(event_params) AS params
WHERE
(stream_id = '1168190076'
OR stream_id = '1168201031')
AND params.key = 'result'
AND params.value.string_value IN ('success',
'SUCCESS')
AND params.key = 'confirmationNumber' NOT NULL
I keep getting errors, and when I don't get errors, my numbers are off by a lot! I'm not sure where to go next.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.table`
WHERE stream_id IN ('1168190076', '1168201031')
AND 2 = (
SELECT COUNT(1)
FROM UNNEST(event_params) param
WHERE (
param.key = 'result' AND
LOWER(param.value.string_value) = 'success'
) OR (
param.key = 'confirmationNumber' AND
NOT param.value.string_value IS NULL
)
)
I suspect that you want something more like this:
SELECT t.*
FROM `table_name`t
UNNEST(event_params) AS params
WHERE t.stream_id IN ('1168190076', '1168201031') AND
EXISTS (SELECT 1
FROM UNNEST(t.event_params) p
WHERE p.key = 'result' AND
p.value.string_value IN ('success', 'SUCCESS')
) AND
EXISTS (SELECT 1
FROM UNNEST(t.event_params) p
WHERE p.key = 'confirmationNumber'
);
That is, test each parameter independently. You don't need to unnest the result for the result set -- unless you really want to, of course.
I don't know what the lingering NOT NULL is for in your query, so I'm ignoring it. You might want to check the value, however.
I'm trying to convert Array< struct > to multiple columns.
The data structure looks like:
column name: Parameter
[
-{
key: "Publisher_name"
value: "Rubicon"
}
-{
key: "device_type"
value: "IDFA"
}
-{
key: "device_id"
value: "AAAA-BBBB-CCCC-DDDD"
}
]
What I want to get:
publisher_name device_type device_id
Rubicon IDFA AAAA-BBBB-CCCC-DDDD
I have tried this which caused the duplicates of other columns.
select h from table unnest(parameter) as h
BTW, I am very curious why do we want to use this kind of structure in Bigquery. Can't we just add the above 3 columns into table?
Below is for BigQuery Standard SQL
#standardSQL
SELECT
(SELECT value FROM UNNEST(Parameter) WHERE key = 'Publisher_name') AS Publisher_name,
(SELECT value FROM UNNEST(Parameter) WHERE key = 'device_type') AS device_type,
(SELECT value FROM UNNEST(Parameter) WHERE key = 'device_id') AS device_id
FROM `project.dataset.table`
You can further refactor code by using SQL UDF as below
#standardSQL
CREATE TEMP FUNCTION getValue(k STRING, arr ANY TYPE) AS
((SELECT value FROM UNNEST(arr) WHERE key = k));
SELECT
getValue('Publisher_name', Parameter) AS Publisher_name,
getValue('device_type', Parameter) AS device_type,
getValue('device_id', Parameter) AS device_id
FROM `project.dataset.table`
To convert to multiple columns, you will need to aggregate, something like this:
select ?,
max(case when pv.parameter = 'Publisher_name' then value end) as Publisher_name,
max(case when pv.parameter = 'device_type' then value end) as device_type,
max(case when pv.parameter = 'device_id' then value end) as device_id
from t cross join
unnest(parameter) pv
group by ?
You need to explicitly list the new columns that you want. The ? is for the columns that remain the same.