I have some data in a STRUCT in BigQuery. Below I have visualised an example of the data as JSON:
{
...
siblings: {
david: { a: 1 }
sarah: { b: 1, c: 1 }
}
...
}
I want to produce a field from a query that resembles ["david", "sarah"]. Essentially I just want to get the keys from the STRUCT (object). Note that every user will have different key names in the siblings STRUCT.
Is this possible in BigQuery?
Thanks,
A
Your structs schema must be consistent throughout the table. They can't change keys because they're part of the table schema. To get the keys you simply take a look at the table schema.
If values change, they're probably values in an array - I guess you might have something like this:
WITH t AS (
SELECT 1 AS id, [STRUCT('david' AS name, 33 as age), ('sarah', 42)] AS siblings
union all
SELECT 2, [('ken', 19), ('ryu',21), ('chun li',23)]
)
SELECT * FROM t
If you tried to introduce new keys in the second row or within the array, you'd get an error Array elements of types {...} do not have a common supertype at ....
The first element of the above example in json representation looks like this:
{
"id": "1",
"siblings": [
{
"name": "david",
"age": "33"
},
{
"name": "sarah",
"age": "42"
}
]
}
I have a JSON structure similar to this:
{
"id":"1234"
"feedback": {
"Features": []
}
}
I wish to find all the documents where Features is not an empty array.
This is what I have tried:
SELECT * FROM c where ARRAY_LENGTH([c.feedback.Features])> 0
I am not sure if this is the correct approach. Any suggestions are appreciated.
Your query would not work fine and still return the document in below JSON you provided:
{
"id":"1234"
"feedback": {
"Features": []
}
}
Would suggest to use the query like below. This will cover when there are zero items present in 'Feature' attribute as well as if 'Feature' attribute is missing altogether.
SELECT * FROM c where ARRAY_LENGTH([c.feedback.Features[0]]) > 0
It should work if you exclude the surrounding brackets from the property path:
SELECT * FROM c where ARRAY_LENGTH(c.feedback.Features) > 0
I'm trying to parse a the below nested JSON in Snowflake using the latteral function in Snowflake but I wanted to each nested column in "GoalTime" to show up as a column. For example,
GoalTime_InDoorOpen
2020-03-26T12:58:00-04:00
GoalTime_InLastOff
null
GoalTime_OutStartBoarding
2020-03-27T14:00:00-04:00
"GoalTime": [
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
},
or if you have many rows (what appear to be flights) and thus you need to columns per flight this code be what you are after
with data as (
select flight_code, parse_json(json) as json from values ('nz101','{GoalTime:[{"GoalName": "GoalA", "GoalTime": "2020-03-26T12:58:00-04:00"}, {"GoalName": "GoalB"}]}'),
('nz201','{GoalTime:[{"GoalName": "GoalA"}, {"GoalName": "GoalB", "GoalTime": "2020-03-26T12:58:00-02:00"}]}')
j(flight_code, json)
), unrolled as (
select d.flight_code, f.value:GoalName as goal_name, f.value:GoalTime as goal_time
from data d,
lateral flatten (input => json:GoalTime) f
)
select *
from unrolled
pivot(min(goal_time) for goal_name in ('GoalA', 'GoalB'))
order by flight_code;
it gives the results:
FLIGHT_CODE 'GoalA' 'GoalB'
nz101 "2020-03-26T12:58:00-04:00" null
nz201 null "2020-03-26T12:58:00-02:00"
create or replace function JSON_STRING()
returns string
language javascript
as
$$
return `
[
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
`;
$$;
select value:GoalName::string as GoalName, value:GoalTime::timestamp as GoalTime
from lateral flatten(input => parse_json(JSON_STRING()));
-- See how the lateral flatten combination works on a JSON variant:
select * from lateral flatten(input => parse_json(JSON_STRING()));
I wrote this to run in any Snowflake worksheet, no tables needed. The function on top simply allows the JSON to be written as a multi-line string in the SQL statement below it. It has no other use than representing a string holding your JSON.
Step 1 is to PARSE_JSON, which converts a string into a variant data type formatted as a JSON object.
Step 2 is the lateral flatten. If you do a select star on that, it will return a number of columns. One of them is "value".
Step 3 is to extract the properties you want using single : notation for the property name and dots to traverse down the nodes from there (if there are any).
Step 4 is to cast the property to the data type you want using double :: notation. This is especially important if you're doing comparisons on the column particularly in join keys.
Note that there's a slight invalid part of the JSON that did not allow it to parse. In the top level the array had a property, which did not parse. I removed that to allow parsing.
Probably close to what you seek is using a standard SQL UNION statement.
Given the following are true to recreate the solution:
Created a table 'JSON_GOALS' with one column for raw JSON called, GOALS_RAW
You have loaded JSON data into a table as the raw JSON, with compliant JSON object array syntax, and a parent, GoalTimeGroup, ex: {[{}]}, so
{
"GoalTimeGroup": [{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
}
Doing so allows you to write a fairly standard JSON retrieve in Snowflake with the following syntax:
SELECT GOALS_RAW:GoalTimeGroup[0].GoalName, GOALS_RAW:GoalTimeGroup[1].GoalName, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
UNION
SELECT GOALS_RAW:GoalTimeGroup[0].GoalTime, GOALS_RAW:GoalTimeGroup[1].GoalTime, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
;
This gives you closer to the answer you are looking for and seems to provide a simpler solution. You can also control how many rows you'd want based on your JSON object attributes for each GOAL object.
Recommendations to enhance this would be to create a function that could detect the depth of each nested element and perhaps auto generate the indexes for 'n' number of columns.
The library below provides a method called "ExecuteAll" which one of the params is "tags", so if you provide an array of tags and values, all of them will be parsed and validated plus keeping the features of the sql injection protection from Snowflake.
snowflake-multisql
I'm using following schema for the JSONB column of my table (named fields). There are several of these field entries.
{
"FIELD_NAME": {
"value" : "FIELD_VALUE",
"meta": {
"indexable": true
}
}
}
I need to find all the fields that contain this object
"meta": {
"indexable": true
}
Here is a naive attempt at having json_object_keys in where clause, which doesn't work, but illustrates what I'm trying to do.
with entry(fields) as (values('{
"login": {
"value": "fred",
"meta": {
"indexable": true
}
},
"password_hash": {
"value": "88a3d1c7463d428f0c44fb22e2d9dc06732d1a4517abb57e2b8f734ce4ef2010",
"meta": {
"indexable": false
}
}
}'::jsonb))
select * from entry where fields->jsonb_object_keys(fields) #> '{"meta": {"indexable": "true"}}'::jsonb;
How can I query on the value of nested object? Can I somehow join the result of json_object_keys with the table iself?
demo:db<>fiddle
First way: using jsonb_each()
SELECT
jsonb_build_object(elem.key, elem.value) -- 3
FROM
entry,
jsonb_each(fields) as elem -- 1
WHERE
elem.value #> '{"meta": {"indexable": true}}' -- 2
Expand all subobjects into one row per "field". This creates 2 columns: the key and the value (in your case login and {"meta": {"indexable": true}, "value": "fred"})
Filter the records by checking the value column for containing the meta object using the #> as you already mentioned
Recreate the JSON object (combining the key/value columns)
Second way: Using jsonb_object_keys()
SELECT
jsonb_build_object(keys, fields -> keys) -- 3
FROM
entry,
jsonb_object_keys(fields) as keys -- 1
WHERE
fields -> keys #> '{"meta": {"indexable": true}}' -- 2
Finding all keys as you did
and 3. are very similar to the first way
I have a table with a JSONB column containing multiple rows.
Each row holds a JSON array.
Now I need to put this into rows but I need the position of the elements as well.
e.g.
select *
from (
select jsonb_array_elements("values"::jsonb) from "myData"
) as x;
Gives me all elements from all arrays of all rows.
How can I get the array index as well?
data
----------------------------
{ text: "Hello" } | 0 <-- first array element from first row
{ text: "World" } | 1 <-- second array element from first row
{ text: "Hallo" } | 0 <-- first array element from second row
{ text: "Welt!" } | 1 <-- second array element from second row
Use a lateral join and with ordinality:
select *
from (
select t.*
from "myData", jsonb_array_elements("values") with ordinality as t(data, idx)
) as x;
If "values" is already a jsonb column there is no need to cast it to jsbon