How to expand a list of dict in presto db - sql

I have a column in prestodb that is a list of dictionaries:
[{"id": 45238, "kind": "product", "name": "Ball", "category": "toy"}, {"id": 117852, "kind": "service", "name": "courier", "category": "transport"}]
is a there a way to expand this column to get something like this:
id kind name category
4528 product Ball toy
117852 service courier transport
Also sometimes the key's can be different from the example above also can have more key's than the 4 above
I am trying:
with cte as ( select cast(divs as json) as json_field from table)
select m['id'] id,
m['kind'] kind,
m['name'] name,
m['category'] category
from cte
cross join unnest(cast(json_field as array(map(varchar, json)))) as t(m)
Error:
INVALID_CAST_ARGUMENT: Cannot cast to array(map(varchar, json)). Expected a json array, but got [{"id": 36112, "kind"....

Assuming your data contains json - you can cast it to array of maps from varchar to json (array(map(varchar, json))) and then use unnest to flatten the array:
WITH dataset (json_str) AS (
VALUES (json '[{"id": 45238, "kind": "product", "name": "Ball", "category": "toy"}, {"id": 117852, "kind": "service", "name": "courier", "category": "transport"}]')
)
select m['id'] id,
m['kind'] kind,
m['name'] name,
m['category'] category
from dataset
cross join unnest(cast(json_str as array(map(varchar, json)))) as t(m)
id
kind
name
category
45238
product
Ball
toy
117852
service
courier
transport
UPD
If original column type is varchar - use json_parse to convert it to json.

Related

Snowflake SQL: How to loop through array with JSON objects, to find item that meets condition

Breaking my head on this. In Snowflake my field city_info looks like (for 3 sample records)
[{"name": "age", "content": 35}, {"name": "city", "content": "Chicago"}]
[{"name": "age", "content": 20}, {"name": "city", "content": "Boston"}]
[{"name": "city", "content": "New York"}, {"name": "age", "content": 42}]
I try to extract a column city from this
Chicago
Boston
New York
I tried to flatten this
select *
from lateral flatten(input =>
select city_info::VARIANT as event
from data
)
And from there I can derive the value, but this only allows me to do this for 1 row (so I have to add limit 1 which doesn't makes sense, as I need this for all my rows).
If I try to do it for the 3 rows it tells me subquery returns more than one row.
Any help is appreciated! Chris
You could write it as:
SELECT value:content::string AS city_name
FROM tab,
LATERAL FLATTEN(input => tab.city_info)
WHERE value:name::string = 'city'

SQL query to return nested array of objects in JSON for SQLite

I have 2 simple tables in a SQLite db and a nodejs, express api endpoint that should get results by student and have the subjects as a nested array of objects.
Tables:
Student(id, name) and Subject(id, name, studentId)
This is what I need to result to look like:
{
"id": 1,
"name": "Student name",
"subjects":
[{
"id": 1,
"name": "Subject 1"
},
{
"id": 2,
"name": "Subject 2"
}]
}
How can I write a query to get this result?
If your version of sqlite was built with support for the JSON1 extension, it's easy to generate the JSON from the query itself:
SELECT json_object('id', id, 'name', name
, 'subjects'
, (SELECT json_group_array(json_object('id', subj.id, 'name', subj.name))
FROM subject AS subj
WHERE subj.studentid = stu.id)) AS record
FROM student AS stu
WHERE id = 1;
record
---------------------------------------------------------------------------------------------------
{"id":1,"name":"Student Name","subjects":[{"id":1,"name":"Subject 1"},{"id":2,"name":"Subject 2"}]}
It seems that all you need is a LEFT JOIN statement:
SELECT subject.id, subject.name, student.id, student.name
FROM subject
LEFT JOIN student ON subject.studentId = student.id
ORDER BY student.id;
Then just parse the rows of the response into the object structure you require.

How to query and iterate over array of structures in Athena (Presto)?

I have a S3 bucket with 500,000+ json records, eg.
{
"userId": "00000000001",
"profile": {
"created": 1539469486,
"userId": "00000000001",
"primaryApplicant": {
"totalSavings": 65000,
"incomes": [
{ "amount": 5000, "incomeType": "SALARY", "frequency": "FORTNIGHTLY" },
{ "amount": 2000, "incomeType": "OTHER", "frequency": "MONTHLY" }
]
}
}
}
I created a new table in Athena
CREATE EXTERNAL TABLE profiles (
userId string,
profile struct<
created:int,
userId:string,
primaryApplicant:struct<
totalSavings:int,
incomes:array<struct<amount:int,incomeType:string,frequency:string>>,
>
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://profile-data'
I am interested in the incomeTypes, eg. "SALARY", "PENSIONS", "OTHER", etc.. and ran this query changing jsonData.incometype each time:
SELECT jsonData
FROM "sampledb"."profiles"
CROSS JOIN UNNEST(sampledb.profiles.profile.primaryApplicant.incomes) AS la(jsonData)
WHERE jsonData.incometype='SALARY'
This worked fine with CROSS JOIN UNNEST which flattened the incomes array so that the data example above would span across 2 rows. The only idiosyncratic thing was that CROSS JOIN UNNEST made all the field names lowercase, eg. a row looked like this:
{amount=1520, incometype=SALARY, frequency=FORTNIGHTLY}
Now I have been asked how many users have two or more "SALARY" entries, eg.
"incomes": [
{ "amount": 3000, "incomeType": "SALARY", "frequency": "FORTNIGHTLY" },
{ "amount": 4000, "incomeType": "SALARY", "frequency": "MONTHLY" }
],
I'm not sure how to go about this.
How do I query the array of structures to look for duplicate incomeTypes of "SALARY"?
Do I have to iterate over the array?
What should the result look like?
UNNEST is a very powerful feature, and it's possible to solve this problem using it. However, I think using Presto's Lambda functions is more straight forward:
SELECT COUNT(*)
FROM sampledb.profiles
WHERE CARDINALITY(FILTER(profile.primaryApplicant.incomes, income -> income.incomeType = 'SALARY')) > 1
This solution uses FILTER on the profile.primaryApplicant.incomes array to get only those with an incomeType of SALARY, and then CARDINALITY to extract the length of that result.
Case sensitivity is never easy with SQL engines. In general I think you should not expect them to respect case, and many don't. Athena in particular explicitly converts column names to lower case.
You can combine filter with cardinality to filter array elements having incomeType = 'SALARY' more than once.
This can be further improve so that intermediate array is not materialized by using reduce (see examples in the docs; I'm not quoting them here, since they do not directly answer your question).

How to query JSON column for unique object values in PostgreSQL

I'm looking to query a table for a distinct list of values in a given JSON column.
In the code snippet below, the Survey_Results table has 3 columns:
Name, Email, and Payload. Payload is the JSON object to I want to query.
Table Name: Survey_Results
Name Email Payload
Ying SmartStuff#gmail.com [
{"fieldName":"Product Name", "Value":"Calculator"},
{"fieldName":"Product Price", "Value":"$54.99"}
]
Kendrick MrTexas#gmail.com [
{"fieldName":"Food Name", "Value":"Texas Toast"},
{"fieldName":"Food Taste", "Value":"Delicious"}
]
Andy WhereTheBass#gmail.com [
{"fieldName":"Band Name", "Value":"MetalHeads"}
{"fieldName":"Valid Member", "Value":"TRUE"}
]
I am looking for a unique list of all fieldNames mentioned.
The ideal answer would be query giving me a list containing "Product Name", "Product Price", "Food Name", "Food Taste", "Band Name", and "Valid Member".
Is something like this possible in Postgres?
Use jsonb_array_elements() in a lateral join:
select distinct value->>'fieldName' as field_name
from survey_results
cross join json_array_elements(payload)
field_name
---------------
Product Name
Valid Member
Food Taste
Product Price
Food Name
Band Name
(6 rows)
How to find distinct Food Name values?
select distinct value->>'Value' as food_name
from survey_results
cross join json_array_elements(payload)
where value->>'fieldName' = 'Food Name'
food_name
-------------
Texas Toast
(1 row)
Db<>fiddle.
Important. Note that the json structure is illogical and thus unnecessarily large and complex. Instead of
[
{"fieldName":"Product Name", "Value":"Calculator"},
{"fieldName":"Product Price", "Value":"$54.99"}
]
use
{"Product Name": "Calculator", "Product Price": "$54.99"}
Open this db<>fiddle to see that proper json structure implies simpler and faster queries.

Adding an ORDER BY statement to a query without flattening results leads to "Cannot query the cross product of repeated fields"

Query:
"SELECT * FROM [table] ORDER BY id DESC LIMIT 10"
AllowLargeResults = true
FlattenResults = false
table schema:
[
{
"name": "id",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "repeated_field_1",
"type": "STRING",
"mode": "REPEATED"
},
{
"name": "repeated_field_2",
"type": "STRING",
"mode": "REPEATED"
}
]
The query "SELECT * FROM [table] LIMIT 10" works just fine. I get this error when I add an order by clause, even though the order by does not mention either repeated field.
Is there any way to make this work?
The ORDER BY clause causes BigQuery to automatically flatten the output of a query, causing your query to attempt to generate a cross product of repeated_field_1 and repeated_field_2.
If you don't care about preserving the repeatedness of the fields, you could explicitly FLATTEN both of them, which will cause your query to generate the cross product that the original query is complaining about.
SELECT *
FROM FLATTEN(FLATTEN([table], repeated_field_1), repeated_field_2)
ORDER BY id DESC
LIMIT 10
Other than that, I don't have a good workaround for your query to both ORDER BY and also output repeated fields.
See also: BigQuery flattens result when selecting into table with GROUP BY even with “noflatten_results” flag on