Snowflake SQL: How to loop through array with JSON objects, to find item that meets condition - sql

Breaking my head on this. In Snowflake my field city_info looks like (for 3 sample records)
[{"name": "age", "content": 35}, {"name": "city", "content": "Chicago"}]
[{"name": "age", "content": 20}, {"name": "city", "content": "Boston"}]
[{"name": "city", "content": "New York"}, {"name": "age", "content": 42}]
I try to extract a column city from this
Chicago
Boston
New York
I tried to flatten this
select *
from lateral flatten(input =>
select city_info::VARIANT as event
from data
)
And from there I can derive the value, but this only allows me to do this for 1 row (so I have to add limit 1 which doesn't makes sense, as I need this for all my rows).
If I try to do it for the 3 rows it tells me subquery returns more than one row.
Any help is appreciated! Chris

You could write it as:
SELECT value:content::string AS city_name
FROM tab,
LATERAL FLATTEN(input => tab.city_info)
WHERE value:name::string = 'city'

Related

How to expand a list of dict in presto db

I have a column in prestodb that is a list of dictionaries:
[{"id": 45238, "kind": "product", "name": "Ball", "category": "toy"}, {"id": 117852, "kind": "service", "name": "courier", "category": "transport"}]
is a there a way to expand this column to get something like this:
id kind name category
4528 product Ball toy
117852 service courier transport
Also sometimes the key's can be different from the example above also can have more key's than the 4 above
I am trying:
with cte as ( select cast(divs as json) as json_field from table)
select m['id'] id,
m['kind'] kind,
m['name'] name,
m['category'] category
from cte
cross join unnest(cast(json_field as array(map(varchar, json)))) as t(m)
Error:
INVALID_CAST_ARGUMENT: Cannot cast to array(map(varchar, json)). Expected a json array, but got [{"id": 36112, "kind"....
Assuming your data contains json - you can cast it to array of maps from varchar to json (array(map(varchar, json))) and then use unnest to flatten the array:
WITH dataset (json_str) AS (
VALUES (json '[{"id": 45238, "kind": "product", "name": "Ball", "category": "toy"}, {"id": 117852, "kind": "service", "name": "courier", "category": "transport"}]')
)
select m['id'] id,
m['kind'] kind,
m['name'] name,
m['category'] category
from dataset
cross join unnest(cast(json_str as array(map(varchar, json)))) as t(m)
id
kind
name
category
45238
product
Ball
toy
117852
service
courier
transport
UPD
If original column type is varchar - use json_parse to convert it to json.

Flattening a nested and repeated structure in BigQuery (standard SQL)

There are a lot of posts on unnesting repeated fields in BigQuery -- but, being new to this environment, I have tried almost every code variation I found to flatten a data file. But, I cannot seem to produce one without creating blanks in the id field. It seem like I need to unflatten a nested variable?
I'm using a COVID Dimensions data set that is part of the public collection. Here is some minimal code that produces my problem:
SELECT
id,
authors
FROM
`covid-19-dimensions-ai.data.publications`
CROSS JOIN
UNNEST(authors)
LIMIT 1000
And, here is the JSON structure after running this query. Everything is flattened with the structure I want, but I don't know how to fill in / avoid the blank id variables.
{
"id": "pub.1130234899",
"authors": {
"first_name": "Eric M",
"last_name": "Yoshida",
"initials": null,
"researcher_id": "ur.01071531321.03",
"grid_ids": [
"grid.17091.3e"
],
"corresponding": false,
"raw_affiliations": [
"Division of Gastroenterology, University of British Columbia, Vancouver, British Columbia, Canada"
],
"affiliations_address": [
{
"grid_id": "grid.17091.3e",
"city_id": "6173331",
"state_code": "CA-BC",
"country_code": "CA",
"raw_affiliation": "Division of Gastroenterology, University of British Columbia, Vancouver, British Columbia, Canada"
}
]
}
}
See small correction to your original query
SELECT
id,
author
FROM
`covid-19-dimensions-ai.data.publications`
CROSS JOIN
UNNEST(authors) author
LIMIT 1000

SQL query to return nested array of objects in JSON for SQLite

I have 2 simple tables in a SQLite db and a nodejs, express api endpoint that should get results by student and have the subjects as a nested array of objects.
Tables:
Student(id, name) and Subject(id, name, studentId)
This is what I need to result to look like:
{
"id": 1,
"name": "Student name",
"subjects":
[{
"id": 1,
"name": "Subject 1"
},
{
"id": 2,
"name": "Subject 2"
}]
}
How can I write a query to get this result?
If your version of sqlite was built with support for the JSON1 extension, it's easy to generate the JSON from the query itself:
SELECT json_object('id', id, 'name', name
, 'subjects'
, (SELECT json_group_array(json_object('id', subj.id, 'name', subj.name))
FROM subject AS subj
WHERE subj.studentid = stu.id)) AS record
FROM student AS stu
WHERE id = 1;
record
---------------------------------------------------------------------------------------------------
{"id":1,"name":"Student Name","subjects":[{"id":1,"name":"Subject 1"},{"id":2,"name":"Subject 2"}]}
It seems that all you need is a LEFT JOIN statement:
SELECT subject.id, subject.name, student.id, student.name
FROM subject
LEFT JOIN student ON subject.studentId = student.id
ORDER BY student.id;
Then just parse the rows of the response into the object structure you require.

How to query JSON column for unique object values in PostgreSQL

I'm looking to query a table for a distinct list of values in a given JSON column.
In the code snippet below, the Survey_Results table has 3 columns:
Name, Email, and Payload. Payload is the JSON object to I want to query.
Table Name: Survey_Results
Name Email Payload
Ying SmartStuff#gmail.com [
{"fieldName":"Product Name", "Value":"Calculator"},
{"fieldName":"Product Price", "Value":"$54.99"}
]
Kendrick MrTexas#gmail.com [
{"fieldName":"Food Name", "Value":"Texas Toast"},
{"fieldName":"Food Taste", "Value":"Delicious"}
]
Andy WhereTheBass#gmail.com [
{"fieldName":"Band Name", "Value":"MetalHeads"}
{"fieldName":"Valid Member", "Value":"TRUE"}
]
I am looking for a unique list of all fieldNames mentioned.
The ideal answer would be query giving me a list containing "Product Name", "Product Price", "Food Name", "Food Taste", "Band Name", and "Valid Member".
Is something like this possible in Postgres?
Use jsonb_array_elements() in a lateral join:
select distinct value->>'fieldName' as field_name
from survey_results
cross join json_array_elements(payload)
field_name
---------------
Product Name
Valid Member
Food Taste
Product Price
Food Name
Band Name
(6 rows)
How to find distinct Food Name values?
select distinct value->>'Value' as food_name
from survey_results
cross join json_array_elements(payload)
where value->>'fieldName' = 'Food Name'
food_name
-------------
Texas Toast
(1 row)
Db<>fiddle.
Important. Note that the json structure is illogical and thus unnecessarily large and complex. Instead of
[
{"fieldName":"Product Name", "Value":"Calculator"},
{"fieldName":"Product Price", "Value":"$54.99"}
]
use
{"Product Name": "Calculator", "Product Price": "$54.99"}
Open this db<>fiddle to see that proper json structure implies simpler and faster queries.

Adding an ORDER BY statement to a query without flattening results leads to "Cannot query the cross product of repeated fields"

Query:
"SELECT * FROM [table] ORDER BY id DESC LIMIT 10"
AllowLargeResults = true
FlattenResults = false
table schema:
[
{
"name": "id",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "repeated_field_1",
"type": "STRING",
"mode": "REPEATED"
},
{
"name": "repeated_field_2",
"type": "STRING",
"mode": "REPEATED"
}
]
The query "SELECT * FROM [table] LIMIT 10" works just fine. I get this error when I add an order by clause, even though the order by does not mention either repeated field.
Is there any way to make this work?
The ORDER BY clause causes BigQuery to automatically flatten the output of a query, causing your query to attempt to generate a cross product of repeated_field_1 and repeated_field_2.
If you don't care about preserving the repeatedness of the fields, you could explicitly FLATTEN both of them, which will cause your query to generate the cross product that the original query is complaining about.
SELECT *
FROM FLATTEN(FLATTEN([table], repeated_field_1), repeated_field_2)
ORDER BY id DESC
LIMIT 10
Other than that, I don't have a good workaround for your query to both ORDER BY and also output repeated fields.
See also: BigQuery flattens result when selecting into table with GROUP BY even with “noflatten_results” flag on