JSON_Extract from list of json string - sql

I want to extract some values for particular keys from a table with json string as below.
raw_data
...
{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]}
...
I am expecting an outcome like
label
amount
category
XXX
[1000, 100, 10]
['A', 'B', 'C']
I am using the following sql query to achieve that
select
JSON_EXTRACT(raw_data, '$.lines[*].amount') AS amount,
JSON_EXTRACT(raw_data, '$.lines[*].category') AS category,
JSON_EXTRACT(raw_data, '$.label') AS label
from table
I can get a specific element of the list with [0] , [1] etc. But the sql code doesn't work with [*]. I am getting the following error
Invalid JSON path: '$.lines[*].amount'
Edit
I am using Presto

Json path support in Presto is very limited, so you need to do some processing manually for example with casts and array functions:
-- sample data
with dataset (raw_data) as (
values '{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]}'
)
-- query
select label,
transform(lines, l -> l['amount']) amount,
transform(lines, l -> l['category']) category
from (
select JSON_EXTRACT(raw_data, '$.label') AS label,
cast(JSON_EXTRACT(raw_data, '$.lines') as array(map(varchar, json))) lines
from dataset
);
Output:
label
amount
category
XXX
[1000, 100, 10]
["A", "B", "C"]
In Trino json path support was vastly improved, so you can do next:
-- query
select JSON_EXTRACT(raw_data, '$.label') label,
JSON_QUERY(raw_data, 'lax $.lines[*].amount' WITH ARRAY WRAPPER) amount,
JSON_QUERY(raw_data, 'lax $.lines[*].category' WITH ARRAY WRAPPER) category
from dataset;

You can use json_table and json_arrayagg:
select json_extract(t.raw_data, '$.label'),
(select json_arrayagg(t1.v) from json_table(t.raw_data, '$.lines[*]' columns (v int path '$.amount')) t1),
(select json_arrayagg(t1.v) from json_table(t.raw_data, '$.lines[*]' columns (v text path '$.category')) t1)
from tbl t

I was able to get the expected output using unnest to flatten and array_agg to aggregate in Presto. Below is the SQL used and output generated:
WITH dataset AS (
SELECT
* from sf_73535794
)
SELECT raw_data.label,array_agg(t.lines.amount) as amount,array_agg(t.lines.category) as category FROM dataset
CROSS JOIN UNNEST(raw_data.lines) as t(lines) group by 1
Output:

Related

Get JSON object keys as array in Presto/Trino

I have JSON data like this in one of my columns
{"foo": 1, "bar": 2}
{"foo": 1}
and I would like to run a query that returns the keys as an array
foo,bar
foo
Convert your JSON into a MAP and then use map_keys():
-- sample data
WITH dataset(js) as (
VALUES (JSON '{"foo": 1, "bar": 2}'),
(JSON '{"foo": 1}')
)
-- query
SELECT array_join(map_keys(CAST(js AS MAP(VARCHAR, JSON))), ', ')
FROM dataset
Use json_parse() if your JSON column is of type VARCHAR
SELECT array_join(map_keys(CAST(json_parse(js) AS MAP(VARCHAR, JSON))), ', ')
FROM dataset
Output:
_col0
bar, foo
foo
I'm not sure how to work well with JSON, but if we convert the JSON to a MAP, the process is simple using map_values:
WITH data as (SELECT * FROM (VALUES JSON '{"foo": 1, "bar": 2}', JSON '{"foo": 1}') AS t(json_col))
SELECT map_values(CAST(json_col AS MAP(VARCHAR, INTEGER))) json_col
FROM data
Output:
json_col
{2,1}
{1}

How to parse this array of dicts and extract key columns from a big query external table

I have this Gaint Array of (dicts) loaded from a Json in a date partitioned big query external table with table structure as below as
Field name
Type.
Mode
meta
Record
Nullable
Messages
String
Repeated
date
Integer
Nullable
Every "Messages" Field is in its own row/record in my Bigquery table (New_line_delimited_Json)
I am trying to parse the "messages" field/column to extract some fields Key1 and Key2 which happens to be inside an Array (of dicts). For sake of simplicity ,below is the snippet of json of which "messages" is a field that I am trying to unnest/explode.
Ignore this schema;updated schema below***
[
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-10",
"Key2":"H21257061"
}"
],
"date": "20220110"
},
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-11",
"Key2":"H21257062"
}"
],
"date": "20220111"
}
]
updated schema on 01/17
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-10",
"Key2":"H21257061"
}",
"{
"Key1":"2022-01-10",
"Key2":"H21257062"
}"
],
"date": "20220110"
},
updated schema representation on 01/17:
so far I have tried this but I am getting sql output of key1 and Key2 as Nulls
WITH table AS (SELECT Messages as array_column FROM `project.dataset.table` )
SELECT
json_extract_scalar(flattened_array, '$.Messages.key1') as key1,
json_extract_scalar(flattened_array, '$.Messages.key2') as key2
FROM table t
CROSS JOIN UNNEST(t.array_column) AS flattened_array
Still a little ambiguous so I assume below correctly represents your table (at least it matches the structure/schema in your question)
If my assumption correct - consider below approach
select * except(id) from (
select to_json_string(t) id, kv[offset(0)] as key, kv[safe_offset(1)] as value
from your_table t,
t.messages as message,
unnest([struct( split(translate(message, '"', ''), ':') as kv)])
)
pivot (min(value) for key in ('Key1', 'Key2'))
If / when applied to above sample data - output is
Edit: Trying to help you further - Ok so, looks like your table looks like below
In this case - try below (quite light modification of previous version)
select * except(id) from (
select to_json_string(t) id, kv[offset(0)] as key, kv[safe_offset(1)] as value
from your_table t,
unnest(regexp_extract_all(messages, r'"[^"]+":"[^"]+"')) as message,
unnest([struct( split(translate(message, '"', ''), ':') as kv)])
)
pivot (min(value) for key in ('Key1', 'Key2'))
with output
But obviously, I would use below simplest approach
select
json_extract_scalar(messages, '$.Key1') as Key1,
json_extract_scalar(messages, '$.Key2') as Key2
from your_table
SELECT
JSON_QUERY(message,"$.Key1") as Key1,
JSON_QUERY(message,"$.Key2") as Key2
FROM
`project.dataset.table` as table
CROSS JOIN UNNEST(table.Messages) as message
CROSS JOIN for flattening the array,which will return a row for each message.
After that “JSON_QUERY” to extract the needed values from the JSON string.

Select from JSON Array postgresql JSON column

I have the following JSON stored in a PostgreSQL JSON column
{
"status": "Success",
"message": "",
"data": {
"serverIp": "XXXX",
"ruleId": 32321,
"results": [
{
"versionId": 555555,
"PriceID": "8abf35ec-3e0e-466b-a4e5-2af568e90eec",
"price": 550,
"Convert": 0.8922953080331764,
"Cost": 10
}
]
}
}
I would like to search for a specific priceID across the entire JSON column (name info) and select the entire results element by the PriceID.
How do i do that in postgresql JSON?
One option uses exists and json(b)_array_elements(). Assuming that your table is called mytable and that the jsonb column is mycol, this would look like:
select t.*
from mytable t
where exists (
select 1
from jsonb_array_elements(t.mycol -> 'data' -> 'results') x(elt)
where x.elt ->> 'PriceID' = '8abf35ec-3e0e-466b-a4e5-2af568e90eec'
)
In the subquery, jsonb_array_elements() unnest the json array located at the given path. Then, the where clause ensures that at least one elment in the array has the given PriceID.
If your data is of json datatype rather than jsonb, you need to use json_array_elements() instead of jsonb_array_elements().
If you want to display some information coming from the matching array element, then it is different. You can use a lateral join instead of exists. Keep in mind, though, that this will duplicate the rows if more than one array element matches:
select t.*, x.elt ->> 'price' price
from mytable t
cross join lateral jsonb_array_elements(t.mycol -> 'data' -> 'results') x(elt)
where x.elt ->> 'PriceID' = '8abf35ec-3e0e-466b-a4e5-2af568e90eec'

PostgreSQL: exclude complete jsonb array if one element fails the WHERE clause

Assume a table json_table with columns id (int), data (jsonb).
A sample jsonb value would be
{"a": [{"b":{"c": "xxx", "d": 1}},{"b":{"c": "xxx", "d": 2}}]}
When I use an SQL statement like the following:
SELECT data FROM json_table j, jsonb_array_elements(j.data#>'{a}') dt WHERE (dt#>>'{b,d}')::integer NOT IN (2,4,6,9) GROUP BY id;
... the two array elements are unnested and the one that qualifies the WHERE clause is still returned. This makes sense since each array element is considered individually. In this example I will get back the complete row
{"a": [{"b":{"c": "xxx", "d": 1}},{"b":{"c": "xxx", "d": 2}}]}
I'm looking for a way to exclude the complete json_table row when any jsonb array element fails the condition
You can move the condition to the WHERE clause and use NOT EXISTS:
SELECT data
FROM json_table j
WHERE NOT EXISTS (SELECT 1
FROM jsonb_array_elements(j.data#>'{a}') dt
WHERE (dt#>>'{b,d}')::integer IN (2, 4, 6, 9)
);
You can achieve it with the following query:
select data
from json_table
where jsonb_path_match(data, '!exists($.a[*].b.d ? ( # == 2 || # == 4 || # == 6 || # == 9))')

Select particular values from JSON column in Presto

I have a JSON column points of type VARCHAR in a table which I want to parse in Presto. For example:
points = {"0": 0.2, "1": 1.2, "2": 0.5, "15": 1.2, "20": 0.7}
I want to select only the values for keys "0", "2" and "20". How do I use the UNNEST functionality of Presto to get them. What I've done till now is:
select t.value from myTable CROSS JOIN UNNEST(points) AS t(key, value) limit 1
But this gives this error:
Cannot unnest type: varchar
Update:
I ran the following query and got the result but it is returning one random key-value pair from the JSON whereas I need specific keys.
select key, value from myTable CROSS JOIN UNNEST(SPLIT_TO_MAP(points, ',', ':')) AS t(key, value) limit 1
You can unnest an Array or Map. So you first need to convert the JSON string into a MAP:
CAST(json_parse(str) AS MAP<BIGINT, DOUBLE>)
Here is an example:
presto> select tt.value
-> from (VALUES '{"0": 0.2, "1": 1.2, "2": 0.5, "15": 1.2, "20": 0.7}') as t(json)
-> CROSS JOIN UNNEST(CAST(json_parse(json) AS MAP<BIGINT, DOUBLE>)) AS tt(key, value)
-> ;
value
-------
0.2
1.2
1.2
0.5
0.7
(5 rows)
You may need to cast to json datatype first according to these docs: enter link description here
UNNEST(CAST(points AS JSON))
Full query:
select t.value from myTable
CROSS JOIN UNNEST(CAST(points AS JSON)) AS t(key, value) limit 1