Flatten the Data in BigQuery

Flatten the Data in BigQuery - sql

I have dimensions.key_value of RECORD type i run the following query with following output.
SELECT * from table;
event_id value dimensions
1 140 {"key_value": [{"key": "app", "value": "20"}]}
2 150 {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "1"}]}
3 600 {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "2"}]}
To unnest the data i have created the following view:
with temp as (
select
(select value from t.dimensions.key_value where key = 'region') as region,
(select value from t.dimensions.key_value where key = 'loc') as loc,
(select value from t.dimensions.key_value where key = 'app') as app,
value,
event_id
from table t
) select *
from temp;
My Output:
region loc app count event_id
null null 20 140 1
8 1 null 150 2
8 2 null. 600. 3
There are two thing i need to verify is my query correct ?
How i can make the query generic i.e if i don't know all the key, there some other keys may also be present in our dataset ?
Update:
My Schema :
My OutPut :
Problem : Let says a user want to do group by using region and loc so there is no easy way of writing the query for that i decided create a view so user can easily do group by
with temp as (
select
(select value from t.dimensions.key_value where key = 'region') as region,
(select value from t.dimensions.key_value where key = 'loc') as loc,
(select value from t.dimensions.key_value where key = 'store') as store,
value,
metric_name, event_time
from table t
) select *
from temp;
Based on this view the user can easily do group by. So i wanted to check if their is way to create generic view since we don't know the all the unique key or is there a easy way to do groupby.

How i can make the query generic i.e if i don't know all the key, there some other keys may also be present in our dataset ?
Consider below
execute immediate (select
''' select event_id, value, ''' || string_agg('''
(select value from b.key_value where key = "''' || key_name || '''") as ''' || key_name , ''', ''')
|| '''
from (
select event_id, value,
array(
select as struct
json_extract_scalar(kv, '$.key') key,
json_extract_scalar(kv, '$.value') value
from a.kvs kv
) key_value
from `project.dataset.table`,
unnest([struct(json_extract_array(dimensions, '$.key_value') as kvs)]) a
) b
'''
from (
select distinct json_extract_scalar(kv, '$.key') key_name
from `project.dataset.table`,
unnest(json_extract_array(dimensions, '$.key_value')) as kv
)
)
if applied to sample data in your question - ooutput is
As you can see in query - there is no any explicit references to actual key names - rather they are dynamically extracted - so no need to know them in advance and there can be any number of them too

Related

Convert struct values to row in big query

I want to convert values of struct to independent row
My table looks like
|id | details
| 1 | {d_0:{id:'1_0'},d_1:{id:'1_1'}}
| 2 | {d_0:{id:'2_0'},d_1:{id:'2_1'}}
Expected Result (will be flattening the inner struct here)
| id |
|'1_0'|
|'1_1'|
|'2_0'|
|'2_1'|
Since IDK how many fields will be there in details is there any way to convert all the individual fields of the struct as independent rows.
The schema for all values in the details.d_0, details.d_1,... will be the same.
Any help or pointer to resources is appreciated.

You may use this query that iterates array to achieve your desired output:
Creating table:
CREATE TABLE `<proj_id>.<dataset>.<table>` as
WITH data AS (
SELECT "1" AS id, STRUCT(STRUCT( '1_0' as id) as d_0, STRUCT( '1_1' as id) as d_1) as details,
union all SELECT "2" AS id, STRUCT(STRUCT( '2_0' as id) as d_0, STRUCT( '2_1' as id) as d_1) as details
),
tier_1 as (
select id,details.* from data
)
select * from tier_1
Actual Query:
DECLARE i INT64 DEFAULT 0;
DECLARE query_ary ARRAY<STRING> DEFAULT
ARRAY(
select concat(column_name,'.id') from `<dataset>.INFORMATION_SCHEMA.COLUMNS`
WHERE
table_name = <your-table> AND regexp_contains(column_name, r'd\_\d')
);
CREATE TEMP TABLE result(id STRING);
LOOP
SET i = i + 1;
IF i > ARRAY_LENGTH(query_ary) THEN
LEAVE;
END IF;
EXECUTE IMMEDIATE '''
INSERT result
SELECT ''' || query_ary[ORDINAL(i)] || ''' FROM `<proj_id>.<dataset>.<table>`
''';
END LOOP;
SELECT * FROM result;
Output:

Consider below approach
select id from your_table,
unnest(split(translate(format('%t', details), '()', ''), ', ')) id
if applied to sample data in your question as
with your_table as (
select "1" id, struct(struct('1_0' as id) as d_0, struct('1_1' as id) as d_1) details union all
select "2", struct(struct('2_0'), struct('2_1'))
)
output is

How to convert an array of key values to columns in BigQuery / GoogleSQL?

I have an array in BigQuery that looks like the following:
SELECT params FROM mySource;
[
{
key: "name",
value: "apple"
},{
key: "color",
value: "red"
},{
key: "delicious",
value: "yes"
}
]
Which looks like this:
params
[{ key: "name", value: "apple" },{ key: "color", value: "red" },{ key: "delicious", value: "yes" }]
How do I change my query so that the table looks like this:
name
color
delicious
apple
red
yes
Currently I'm able to accomplish this with:
SELECT
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "name"
) as name,
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "color"
) as color,
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "delicious"
) as delicious,
FROM mySource;
But I'm wondering if there is a way to do this without manually specifying the key name for each. We may not know all the names of the keys ahead of time.
Thanks!

Consider below approach
select * except(id) from (
select to_json_string(t) id, param.*
from mySource t, unnest(parameters) param
)
pivot (min(value) for key in ('name', 'color', 'delicious'))
if applied to sample data in your question - output is like below
As you can see - you still need to specify key names but whole query is much simpler and more manageable
Meantime, above query can be enhanced with use of EXECUTE IMMEDIATE where list of key names is auto generated. I have at least few answers with such technique, so search for it here on SO if you want (I just do not want to make a duplicates here)

Here is my try based on Mikhail's answer here
--DDL for sample view
create or replace view sample.sampleview
as
with _data
as
(
select 1 as id,
array (
select struct(
"name" as key,
"apple" as value
)
union all
select struct(
"color" as key,
"red" as value
)
union all
select struct(
"delicious" as key,
"yes" as value
)
) as _arr
union all
select 2 as id,
array (
select struct(
"name" as key,
"orange" as value
)
union all
select struct(
"color" as key,
"orange" as value
)
union all
select struct(
"delicious" as key,
"yes" as value
)
)
)
select * from _data
Execute immediate
declare sql string;
set sql =
(
select
concat(
"select id,",
string_agg(
concat("max(if (key = '",key,"',value,NULL)) as ",key)
),
' from sample.sampleview,unnest(_arr) group by id'
)
from (
select key from
sample.sampleview,unnest(_arr)
group by key
)
);
execute immediate sql;

how to convert jsonarray to multi column from hive

example:
there is a json array column(type:string) from a hive table like:
"[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]"
how to convert it into :
name age
alice 14
by hive sql?
I've tried lateral view explode but it's not working.
thanks a lot!

This is working example of how it can be parsed in Hive. Customize it yourself and debug on real data, see comments in the code:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field_map['field'] = 'name' then field_map['value'] end) as name,
max(case when field_map['field'] = 'age' then field_map['value'] end) as age --do the same for all fields
from
(
select t.id,
t.str as original_string,
str_to_map(regexp_replace(regexp_replace(trim(a.field),', +',','),'\\{|\\}|"','')) field_map --remove extra characters and convert to map
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14
One more approach using get_json_object:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field = 'name' then value end) as name,
max(case when field = 'age' then value end) as age --do the same for all fields
from
(
select t.id,
get_json_object(trim(a.field),'$.field') field,
get_json_object(trim(a.field),'$.value') value
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14

Convert ARRAY<STRUCT> to multiple columns in BigQuery SQL

I'm trying to convert Array< struct > to multiple columns.
The data structure looks like:
column name: Parameter
[
-{
key: "Publisher_name"
value: "Rubicon"
}
-{
key: "device_type"
value: "IDFA"
}
-{
key: "device_id"
value: "AAAA-BBBB-CCCC-DDDD"
}
]
What I want to get:
publisher_name device_type device_id
Rubicon IDFA AAAA-BBBB-CCCC-DDDD
I have tried this which caused the duplicates of other columns.
select h from table unnest(parameter) as h
BTW, I am very curious why do we want to use this kind of structure in Bigquery. Can't we just add the above 3 columns into table?

Below is for BigQuery Standard SQL
#standardSQL
SELECT
(SELECT value FROM UNNEST(Parameter) WHERE key = 'Publisher_name') AS Publisher_name,
(SELECT value FROM UNNEST(Parameter) WHERE key = 'device_type') AS device_type,
(SELECT value FROM UNNEST(Parameter) WHERE key = 'device_id') AS device_id
FROM `project.dataset.table`
You can further refactor code by using SQL UDF as below
#standardSQL
CREATE TEMP FUNCTION getValue(k STRING, arr ANY TYPE) AS
((SELECT value FROM UNNEST(arr) WHERE key = k));
SELECT
getValue('Publisher_name', Parameter) AS Publisher_name,
getValue('device_type', Parameter) AS device_type,
getValue('device_id', Parameter) AS device_id
FROM `project.dataset.table`

To convert to multiple columns, you will need to aggregate, something like this:
select ?,
max(case when pv.parameter = 'Publisher_name' then value end) as Publisher_name,
max(case when pv.parameter = 'device_type' then value end) as device_type,
max(case when pv.parameter = 'device_id' then value end) as device_id
from t cross join
unnest(parameter) pv
group by ?
You need to explicitly list the new columns that you want. The ? is for the columns that remain the same.

How do you make an Oracle SQL query to

This table is rather backwards from a normal schema, and I'm not sure how to get the data I need from it.
Here is some sample data,
Value (column) Info (column)
---------------------------------------------
Supplier_1 'Some supplier'
Supplier_1_email 'foo#gmail.com'
Supplier_1_rating '5'
Supplier_1_status 'Active'
Supplier_2 'Some other supplier'
Supplier_2_email 'bar#gmail.com'
Supplier_2_rating '4'
Supplier_2_status 'Active'
Supplier_3 'Yet another supplier'
...
I need a query to find the email of the supplier which has the highest rating and is currently of status 'Active'.

select
m.sup_email, r.sup_rating
from
(select substr(value, 1, length(value) - length('_email') as sup_name, info as sup_email from table where value like '%_email') as m
left join
(select substr(value, 1, length(value) - length('_rating') as sup_name), info as sub_rating from table where value like '%_rating') as r on m.sup_name = r.sup_name
order by
sup_rating desc
limit
1;

For a single pass solution, try:
select "email" from
(select
substr("value", 1, 8 + instr(substr("value", 10, length("value")-9),'_')) "supplier",
max(case when "value" like '%_status' then "info" end) as "status",
max(case when "value" like '%_rating' then cast("info" as integer) end) as "rating",
max(case when "value" like '%_email' then "info" end) as "email"
from "table" t
where "value" like '%_rating' or "value" like '%_email' or "value" like '%_status'
group by substr("value", 1, 8 + instr(substr("value", 10, length("value")-9),'_'))
having max(case when "value" like '%_status' then "info" end) = 'Active'
order by 3 desc
) where rownum = 1
(Column names are all double-quoted as some are reserved words.)

Expanding on Mike's excellent suggestion:
CREATE VIEW supplier_names AS
SELECT SUBSTR(Value,INSTR(Value,'_')+1) AS supplier_id
,Info AS supplier_name
FROM the_table
WHERE INSTR(Value,'_',1,2) = 0;
CREATE VIEW supplier_emails AS
SELECT SUBSTR(Value,INSTR(Value,'_')+1,INSTR(Value,'_',1,2)-INSTR(Value,'_')-1)
AS supplier_id
,Info AS supplier_email
FROM the_table
WHERE Value LIKE '%email';
CREATE VIEW supplier_ratings AS
SELECT SUBSTR(Value,INSTR(Value,'_')+1,INSTR(Value,'_',1,2)-INSTR(Value,'_')-1)
AS supplier_id
,Info AS supplier_rating
FROM the_table
WHERE Value LIKE '%rating';
CREATE VIEW supplier_statuses AS
SELECT SUBSTR(Value,INSTR(Value,'_')+1,INSTR(Value,'_',1,2)-INSTR(Value,'_')-1)
AS supplier_id
,Info AS supplier_rating
FROM the_table
WHERE Value LIKE '%status';
The queries will perform like dogs, so I'd suggest you look into creating some virtual columns, or at least function-based indexes, to optimise these queries.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Flatten the Data in BigQuery - sql

Related

Convert struct values to row in big query

How to convert an array of key values to columns in BigQuery / GoogleSQL?

how to convert jsonarray to multi column from hive

Convert ARRAY<STRUCT> to multiple columns in BigQuery SQL

How do you make an Oracle SQL query to

Categories

Resources