How can I only get keys from an array in BigQuery - sql

I have an event table in BigQuery. I have two columns: event_name and event_parameters. I need to export parameter keys.
I'm looking for something similar to
dict.keys()
from pyton.
for example:
event_name
event_parameters
event_a
{"from": "main_page","ID":"123"}
event_b
{"ID":"242","custom_value":"true"
output table:
event_name
event_parameters
event_a
{"from","ID"}
event_b
{"ID","custom_value"

You can extract the keys from a json column using the following SQL code as a template:
WITH data AS(
SELECT '{"from":"main_page","ID":"123"}' AS params
)
SELECT keys from(
select
params,
array(select trim(split(kv, ':')[offset(0)]) from t.kv kv) as keys,
array(
select as struct
trim(split(kv, ':')[offset(0)]) as key,
trim(split(kv, ':')[offset(1)]) as value
from t.kv kv
) as key_value
from data,
unnest([struct(split(translate(params, '{}"', '')) as kv)]) t
)
Credit to #Mikhail Berlyant for providing the original code for the following solution:
BigQuery: extract keys from json object, convert json from object to key-value array

Related

How to get values from a string dict list in sql bigquery?

Hi I have a sql table with the following data
Outbound click
[{'action_type': 'outbound_click', 'value': '1025'}]
How do I get the just the value as this is a string in bigquery sql?
I want the output to be
Outbound click
1025
Just UNNEST the array then extract the json value to get 1025. See approach below:
with sample_data as (
select array(select "{'action_type': 'outbound_click', 'value': '1025'}") as arr_data
)
select
JSON_VALUE(json_data, '$.value') as outbound_click
from sample_data,
unnest(arr_data) as json_data
Output:

BigQuery extract repeated JSON

I have the following data in one column in BigQuery:
{"id": "81", "type": ["{'id2': '12', 'type2': 'main'}", "{'id2': '15', 'type2': 'sub'}"]}
I would like to parse this and have 'id2' and 'type2' as nested fields. I tried using JSON_VALUE_ARRAY(data, "$.type") that correctly creates the nested rows but couldn't process extracting 'id2' and 'type2'. I believe maybe the "s are the issue inside the list, how could I get past those?
UPDATE:
This is the format I would like to achieve.
Consider below approach
select
json_value(json, '$.id') id, array(
select as struct json_value(trim(type, '"'), '$.id2') as id2, json_value(trim(type, '"'), '$.type2') as type2
from unnest(json_extract_array(json, '$.type')) type
) type
from your_table

How can I get the last element of an array? SQL Bigquery

I'm working on building a follow-network form Github's available data on Google BigQuery, e.g.: https://bigquery.cloud.google.com/table/githubarchive:day.20210606
The key data is contained in the "payload" field, STRING type. I managed to unnest the data contained in that field and convert it to an array, but how can I get the last element?
Here is what I have so far...
select type,
array(select trim(val) from unnest(split(trim(payload, '[]'))) val) payload
from `githubarchive.day.20210606`
where type = 'MemberEvent'
Which outputs:
How can I get only the last element, "Action":"added"} ?
I know that
select array_reverse(your_array)[offset(0)]
should do the trick, however I'm unsure how to combine that in my code. I've been trying different options without success, for example:
with payload as ( select array(select trim(val) from unnest(split(trim(payload, '[]'))) val) payload from `githubarchive.day.20210606`)
select type, ARRAY_REVERSE(payload)[ORDINAL(1)]
from `githubarchive.day.20210606` where type = 'MemberEvent'
The desired output should look like:
To get last element in array you can use below approach
select array_reverse(your_array)[offset(0)]
I'm unsure how to combine that in my code
select type, array_reverse(array(
select trim(val)
from unnest(split(trim(payload, '[]'))) val
))[offset(0)]
from `githubarchive.day.20210606`
where type = 'MemberEvent'
There is a solution without reversing the array.
SELECT event[OFFSET(ARRAY_LENGTH(event)-1)

PostgreSQL: select from field with json format

Table has column, named "config" with following content:
{
"A":{
"B":[
{"name":"someName","version":"someVersion"},
{"name":"someName","version":"someVersion"}
]
}
}
The task is to select all name and version values. The output is expected selection with 2 columns: name and value.
I successfully select the content of B:
select config::json -> 'A' -> 'B' as B
from my_table;
But when I'm trying to do something like:
select config::json -> 'A' -> 'B' ->> 'name' as name,
config::json -> 'A' -> 'B' ->> 'version' as version
from my_table;
I receive selection with empty-value columns
If the array size is fixed, you just need to tell which element of the array you want to retrieve,e.g.:
SELECT config->'A'->'B'->0->>'name' AS name,
config->'A'->'B'->0->>'version' AS version
FROM my_table;
But as your array contains multiple elements, use the function jsonb_array_elements in a subquery or CTE and in the outer query parse the each element individually, e.g:
SELECT rec->>'name', rec->>'version'
FROM (SELECT jsonb_array_elements(config->'A'->'B')
FROM my_table) j (rec);
Demo: db<>fiddle
First you should use the jsonb data type instead of json, see the documentation :
In general, most applications should prefer to store JSON data as
jsonb, unless there are quite specialized needs, such as legacy
assumptions about ordering of object keys.
Using jsonb, you can do the following :
SELECT DISTINCT ON (c) c->'name' AS name, c->'version' AS version
FROM my_table
CROSS JOIN LATERAL jsonb_path_query(config :: jsonb, '$.** ? (exists(#.name))') AS c
dbfiddle
select e.value ->> 'name', e.value ->> 'version'
from
my_table cross join json_array_elements(config::json -> 'A' -> 'B') e

Extract information from a json string in BigQuery

I am storing a table in Bigquery with the results of a classification algorithm. The table schema is INT, STRING and looks something like this :
ID
Output
1001
{'Apple Cider': 0.7, 'Coffee' : 0.2, 'Juice' : 0.1}
1002
{'Black Coffee':0.9, 'Tea':0.1}
The problem is how to fetch the first (or second or any order) element of each string together with its score. It doesn't seem likely that JSON_EXTRACT can work and most likely it can be done with Javascript. Was wondering what an elegant solution would look like here.
Consider below
select ID,
trim(split(kv, ':')[offset(0)], " '") element,
cast(split(kv, ':')[offset(1)] as float64) score,
element_position
from `project.dataset.table` t,
unnest(regexp_extract_all(trim(Output, '{}'), r"'[^':']+'\s?:\s?[^,]+")) kv with offset as element_position
If applied to sample data in your question - output is
Note: you can use less verbose unnest statement if you wish
unnest(split(trim(Output, '{}'))) kv with offset as element_position