Convert ARRAY<STRUCT> to multiple columns in BigQuery SQL - sql

I'm trying to convert Array< struct > to multiple columns.
The data structure looks like:
column name: Parameter
[
-{
key: "Publisher_name"
value: "Rubicon"
}
-{
key: "device_type"
value: "IDFA"
}
-{
key: "device_id"
value: "AAAA-BBBB-CCCC-DDDD"
}
]
What I want to get:
publisher_name device_type device_id
Rubicon IDFA AAAA-BBBB-CCCC-DDDD
I have tried this which caused the duplicates of other columns.
select h from table unnest(parameter) as h
BTW, I am very curious why do we want to use this kind of structure in Bigquery. Can't we just add the above 3 columns into table?

Below is for BigQuery Standard SQL
#standardSQL
SELECT
(SELECT value FROM UNNEST(Parameter) WHERE key = 'Publisher_name') AS Publisher_name,
(SELECT value FROM UNNEST(Parameter) WHERE key = 'device_type') AS device_type,
(SELECT value FROM UNNEST(Parameter) WHERE key = 'device_id') AS device_id
FROM `project.dataset.table`
You can further refactor code by using SQL UDF as below
#standardSQL
CREATE TEMP FUNCTION getValue(k STRING, arr ANY TYPE) AS
((SELECT value FROM UNNEST(arr) WHERE key = k));
SELECT
getValue('Publisher_name', Parameter) AS Publisher_name,
getValue('device_type', Parameter) AS device_type,
getValue('device_id', Parameter) AS device_id
FROM `project.dataset.table`

To convert to multiple columns, you will need to aggregate, something like this:
select ?,
max(case when pv.parameter = 'Publisher_name' then value end) as Publisher_name,
max(case when pv.parameter = 'device_type' then value end) as device_type,
max(case when pv.parameter = 'device_id' then value end) as device_id
from t cross join
unnest(parameter) pv
group by ?
You need to explicitly list the new columns that you want. The ? is for the columns that remain the same.

Related

GA4 data in BigQuery: Update page_location field

I am trying to find a way how to update records in the BigQuery-Export of GA4 data. This is the corresponding field:
To get that field I am using following query:
select
pageLocation
from
(select
(select value.string_value from unnest(event_params) where key = 'page_location') as pageLocation
from `myTable`
)
My update statement currently looks like this:
update `myTable` t
set
t.event_params = (
select
array_agg(
struct(
(select value.string_value from unnest(t.event_params) where key = 'page_location') = 'REDACTED'
)
)
from
unnest(t.event_params) as ep
)
where
true
But I am getting the error "Value of type ARRAY<STRUCT> cannot be assigned to t.event_params, which has type ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64, float_value FLOAT64, ..."
So it looks like the whole array needs to be reconstructed, but as there are many different values for event_params.key this does not seem to be the best way. Is there are way to directly update the corresponding field with BigQuery?
You might consider below:
CREATE TEMP TABLE `ga_events_20210131` AS
SELECT * FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`;
UPDATE `ga_events_20210131` t
SET event_params = ARRAY(
SELECT AS STRUCT
key,
STRUCT (
IF(key = 'page_location', 'REDACTED', value.string_value) AS string_value,
value.int_value, value.float_value, value.double_value
) AS value
FROM t.event_params
)
WHERE TRUE;
SELECT * FROM `ga_events_20210131` LIMIT 100;
Query results

Flatten the Data in BigQuery

I have dimensions.key_value of RECORD type i run the following query with following output.
SELECT * from table;
event_id value dimensions
1 140 {"key_value": [{"key": "app", "value": "20"}]}
2 150 {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "1"}]}
3 600 {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "2"}]}
To unnest the data i have created the following view:
with temp as (
select
(select value from t.dimensions.key_value where key = 'region') as region,
(select value from t.dimensions.key_value where key = 'loc') as loc,
(select value from t.dimensions.key_value where key = 'app') as app,
value,
event_id
from table t
) select *
from temp;
My Output:
region loc app count event_id
null null 20 140 1
8 1 null 150 2
8 2 null. 600. 3
There are two thing i need to verify is my query correct ?
How i can make the query generic i.e if i don't know all the key, there some other keys may also be present in our dataset ?
Update:
My Schema :
My OutPut :
Problem : Let says a user want to do group by using region and loc so there is no easy way of writing the query for that i decided create a view so user can easily do group by
with temp as (
select
(select value from t.dimensions.key_value where key = 'region') as region,
(select value from t.dimensions.key_value where key = 'loc') as loc,
(select value from t.dimensions.key_value where key = 'store') as store,
value,
metric_name, event_time
from table t
) select *
from temp;
Based on this view the user can easily do group by. So i wanted to check if their is way to create generic view since we don't know the all the unique key or is there a easy way to do groupby.
How i can make the query generic i.e if i don't know all the key, there some other keys may also be present in our dataset ?
Consider below
execute immediate (select
''' select event_id, value, ''' || string_agg('''
(select value from b.key_value where key = "''' || key_name || '''") as ''' || key_name , ''', ''')
|| '''
from (
select event_id, value,
array(
select as struct
json_extract_scalar(kv, '$.key') key,
json_extract_scalar(kv, '$.value') value
from a.kvs kv
) key_value
from `project.dataset.table`,
unnest([struct(json_extract_array(dimensions, '$.key_value') as kvs)]) a
) b
'''
from (
select distinct json_extract_scalar(kv, '$.key') key_name
from `project.dataset.table`,
unnest(json_extract_array(dimensions, '$.key_value')) as kv
)
)
if applied to sample data in your question - ooutput is
As you can see in query - there is no any explicit references to actual key names - rather they are dynamically extracted - so no need to know them in advance and there can be any number of them too

how to convert jsonarray to multi column from hive

example:
there is a json array column(type:string) from a hive table like:
"[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]"
how to convert it into :
name age
alice 14
by hive sql?
I've tried lateral view explode but it's not working.
thanks a lot!
This is working example of how it can be parsed in Hive. Customize it yourself and debug on real data, see comments in the code:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field_map['field'] = 'name' then field_map['value'] end) as name,
max(case when field_map['field'] = 'age' then field_map['value'] end) as age --do the same for all fields
from
(
select t.id,
t.str as original_string,
str_to_map(regexp_replace(regexp_replace(trim(a.field),', +',','),'\\{|\\}|"','')) field_map --remove extra characters and convert to map
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14
One more approach using get_json_object:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field = 'name' then value end) as name,
max(case when field = 'age' then value end) as age --do the same for all fields
from
(
select t.id,
get_json_object(trim(a.field),'$.field') field,
get_json_object(trim(a.field),'$.value') value
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14

Reshape query results in pure PostgreSQL

I have a SQL query result table like this :
Date,metric,value
1,x,2
2,x,3
2,y,5
3,y,8
3,z,9
And I would like to get the sum by day for each metric ( filling with 0 when not present) :
Date,x,y,z
1,2,0,0
2,3,5,0
3,0,8,9
I do not know beforehand the names of the metrics. At the moment I'm loading the results in python and reshaping using pandas but surely there is a PostgreSQL way to do it.
How to achieve the above in PostgreSQL ?
you can use conditional aggregation with case when expression
select date,
max(case when metric='x' then value end) as x,
max(case when metric='y' then value end) as y,
max(case when metric='z' then value end) as z
from tablename
group by date
you can use crosstab
select * from crosstab('select date, metric ,value from metric_table order by date, metric ,value '
,'select metric from metric_table group by metric order by metric')
as ct( date integer ,y integer,x integer, z integer);
But beware, "as ct( date integer ,y integer,x integer, z integer)" part must be dynamically created before calling query, based on "select metric from metric_table group by metric order by metric" result set

if more than one record on a column is populated with different values then use thevalue from different column

I am fairly new to SQL and I tried to google this but I couldn't find an answer, hence posting here. Any help is appreciated.
My table is:
My output should look like this:
My output should be based on the conditions below :
If more than one record is populated for same NID with different values(other than null) in "val " column, , then use Val where Typ= T.
For same NID, if there is null value for Typ "T", then get "Val" where Typ=O
For Same NID, if there is null value for Typ "O", then get "Val" where Typ=T
I believe this is what you are looking for. Let me know if it does what you are looking for or not.
SELECT TT.NID
, COALESCE(TT.Val, OO.Val) AS Val
FROM
(
SELECT T.NID
, T.Val
FROM [SomeTable] T
WHERE T.Typ = 'T'
) TT
LEFT JOIN
(
SELECT O.NID
, O.Val
FROM [SomeTable] O
WHERE O.Typ = 'O'
) OO
ON TT.NID = OO.NID