Get JSON object keys as array in Presto/Trino - sql

I have JSON data like this in one of my columns
{"foo": 1, "bar": 2}
{"foo": 1}
and I would like to run a query that returns the keys as an array
foo,bar
foo

Convert your JSON into a MAP and then use map_keys():
-- sample data
WITH dataset(js) as (
VALUES (JSON '{"foo": 1, "bar": 2}'),
(JSON '{"foo": 1}')
)
-- query
SELECT array_join(map_keys(CAST(js AS MAP(VARCHAR, JSON))), ', ')
FROM dataset
Use json_parse() if your JSON column is of type VARCHAR
SELECT array_join(map_keys(CAST(json_parse(js) AS MAP(VARCHAR, JSON))), ', ')
FROM dataset
Output:
_col0
bar, foo
foo

I'm not sure how to work well with JSON, but if we convert the JSON to a MAP, the process is simple using map_values:
WITH data as (SELECT * FROM (VALUES JSON '{"foo": 1, "bar": 2}', JSON '{"foo": 1}') AS t(json_col))
SELECT map_values(CAST(json_col AS MAP(VARCHAR, INTEGER))) json_col
FROM data
Output:
json_col
{2,1}
{1}

Related

How do I Unnest varchar to json in Athena

I am crawling data from Google Big Query and staging them into Athena.
One of the columns crawled as string, contains json :
{
"key": "Category",
"value": {
"string_value": "something"
}
I need to unnest these and flatten them to be able to use them in a query. I require key and string value (so in my query it will be where Category = something
I have tried the following :
WITH dataset AS (
SELECT cast(json_column as json) as json_column
from "thedatabase"
LIMIT 10
)
SELECT
json_extract_scalar(json_column, '$.value.string_value') AS string_value
FROM dataset
which is returning null.
Casting the json_column as json adds \ into them :
"[{\"key\":\"something\",\"value\":{\"string_value\":\"app\"}}
If I use replace on the json, it doesn't allow me as it's not a varchar object.
So how do I extract the values from the some_column field?
Presto's json_extract_scalar actually supports extracting just from the varchar (string) value :
-- sample data
WITH dataset(json_column) AS (
values ('{
"key": "Category",
"value": {
"string_value": "something"
}}')
)
--query
SELECT
json_extract_scalar(json_column, '$.value.string_value') AS string_value
FROM dataset;
Output:
string_value
something
Casting to json will encode data as json (in case of string you will get a double encoded one), not parse it, use json_parse (in this particular case it is not needed, but there are cases when you will want to use it):
-- query
SELECT
json_extract_scalar(json_parse(json_column), '$.value.string_value') AS string_value
FROM dataset;

JSON_Extract from list of json string

I want to extract some values for particular keys from a table with json string as below.
raw_data
...
{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]}
...
I am expecting an outcome like
label
amount
category
XXX
[1000, 100, 10]
['A', 'B', 'C']
I am using the following sql query to achieve that
select
JSON_EXTRACT(raw_data, '$.lines[*].amount') AS amount,
JSON_EXTRACT(raw_data, '$.lines[*].category') AS category,
JSON_EXTRACT(raw_data, '$.label') AS label
from table
I can get a specific element of the list with [0] , [1] etc. But the sql code doesn't work with [*]. I am getting the following error
Invalid JSON path: '$.lines[*].amount'
Edit
I am using Presto
Json path support in Presto is very limited, so you need to do some processing manually for example with casts and array functions:
-- sample data
with dataset (raw_data) as (
values '{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]}'
)
-- query
select label,
transform(lines, l -> l['amount']) amount,
transform(lines, l -> l['category']) category
from (
select JSON_EXTRACT(raw_data, '$.label') AS label,
cast(JSON_EXTRACT(raw_data, '$.lines') as array(map(varchar, json))) lines
from dataset
);
Output:
label
amount
category
XXX
[1000, 100, 10]
["A", "B", "C"]
In Trino json path support was vastly improved, so you can do next:
-- query
select JSON_EXTRACT(raw_data, '$.label') label,
JSON_QUERY(raw_data, 'lax $.lines[*].amount' WITH ARRAY WRAPPER) amount,
JSON_QUERY(raw_data, 'lax $.lines[*].category' WITH ARRAY WRAPPER) category
from dataset;
You can use json_table and json_arrayagg:
select json_extract(t.raw_data, '$.label'),
(select json_arrayagg(t1.v) from json_table(t.raw_data, '$.lines[*]' columns (v int path '$.amount')) t1),
(select json_arrayagg(t1.v) from json_table(t.raw_data, '$.lines[*]' columns (v text path '$.category')) t1)
from tbl t
I was able to get the expected output using unnest to flatten and array_agg to aggregate in Presto. Below is the SQL used and output generated:
WITH dataset AS (
SELECT
* from sf_73535794
)
SELECT raw_data.label,array_agg(t.lines.amount) as amount,array_agg(t.lines.category) as category FROM dataset
CROSS JOIN UNNEST(raw_data.lines) as t(lines) group by 1
Output:

How to select single field from array of json objects?

I have a JSONB column with values in following JSON structure
{
"a": "value1", "b": [{"b1": "value2", "b3": "value4"}, {"b1": "value5", "b3": "value6"}]
}
I need to select only b1 field in the result. So expected result would be
["value2", "value5"]
I can select complete array using query
select columnname->>'b' from tablename
step-by-step demo:db<>fiddle
SELECT
jsonb_agg(elements -> 'b1') -- 2
FROM mytable,
jsonb_array_elements(mydata -> 'b') as elements -- 1
a) get the JSON array from the b element (b) extract the array elements into one row each
a) get the b1 values from the array elements (b) reaggregate these values into a new JSON array
If you are using Postgres 12 or later, you an use a JSON path query:
select jsonb_path_query_array(the_column, '$.b[*].b1')
from the_table;

Json Array element extract from json file bigquery

I have JSON file like below and facing challenge to extract value in bigquery
{
'c_fields': [
{
'id': 34605,
'value': None,
}
]
}
output
id value
24605 null
how i will extract value of id and value
If your JSON is in a table column, you could use the JSON FUNCTIONS to accomplish that, here are the docs.
It could be something like:
WITH test_table AS (
SELECT '{"c_fields":[{"id":34605,"value":"None"},{"id":34606,"value":"32"}]}' AS json_field
)
SELECT JSON_EXTRACT(json_value, '$.id') AS id, JSON_EXTRACT(json_value, '$.value') AS value
FROM test_table, UNNEST(JSON_EXTRACT_ARRAY(json_field, '$.c_fields')) AS json_value

Remove a known subset of nodes from JSON, in SQLAzure

Once upon a time, there was one row of data (massively simplified, the actual json data is 10KB+) thus:
ID, json
1, '{
"a1.arr": [1,2,3],
"a1.boo": true,
"a1.str": "hello",
"a1.num": 123
}'
A process was supposed to write another record with predominantly different data:
ID, json
2, '{
"a1.arr": [1,2,3], //from ID 1
"a2.arr": [4,5,6], //new (and so are all below)
"a2.boo": false,
"a2.str": "goodbye",
"a2.num": 456
}'
But due to some external error, the original set of json from ID 1 ended up also being represented in ID 2, so now the table looks like this:
ID, json
1, '{
"a1.arr": [1,2,3],
"a1.boo": true,
"a1.str": "hello",
"a1.num": 123
}'
2, '{
"a1.arr": [1,2,3],
"a1.boo": true, //extraneous
"a1.str": "hello", //extraneous
"a1.num": 123, //extraneous
"a2.arr": [4,5,6],
"a2.boo": false,
"a2.str": "goodbye",
"a2.num": 456
}'
I'd like to know if there's a way to remove the extraneous lines from the ID 2 record.
I believe that the entire JSON string from ID 1 is represented in ID 2 as a contiguous block, so string replacement could work but there's a chance that some reordering has taken place. Gets a bit messy with the element that is supposed to remain, though
There's also a chance that some of the a1.* nodes' values have been changed slightly, (I didn't do a diff) but I'm happy to use just the node names, not their values, in assessing whether an node should be removed. One of the nodes (a1.arr) should be kept in ID 2. The resultset should hence look like:
ID, json
1, '{
"a1.arr": [1,2,3],
"a1.boo": true,
"a1.str": "hello",
"a1.num": 123
}'
2, '{
"a1.arr": [1,2,3],
"a2.arr": [4,5,6],
"a2.boo": false,
"a2.str": "goodbye",
"a2.num": 456
}'
I've started playing about with https://dba.stackexchange.com/questions/168303/can-sql-server-2016-extract-node-names-from-json to get the list of node names from ID 1 that I want to remove from ID 2, just not sure how I then strip them out of ID 2's JSON - presumably a deserialize, reduce and reserialize sequence?
You can follow this approach:
get the keys you want to replace with an openjson on row with id=1 value
use cross apply to filter keys in rows with id>1
rebuild the json string without unwanted keys using STRING_AGG and group by
This code should work:
declare #tmp table ([id] int, jsonValue nvarchar(max))
declare #source_json nvarchar(max)
insert into #tmp values
(1, '{"a1.arr":[1,2,3],"a1.boo":true,"a1.str":"hello","a1.num":123}')
,(2, '{"a1.arr":[1,2,3],"a1.boo":true,"a1.str":"hello","a1.num":123,"a2.arr":[4,5,6],"a2.boo":false,"a2.str":"goodbye","a2.num":456}')
,(3, '{"a1.arr":[1,2,3],"a1.boo":true,"a1.str":"hello","a1.num":123,"a3.arr":[4,5,6],"a3.boo":false,"a3.str":"goodbye","a3.num":456}')
--get json string from id=1
select #source_json = jsonValue from #tmp where [id] = 1
--rebuild json string for id > 1 removing keys from id = 1
select t.[id],
'{' + STRING_AGG( '"' + g.[key] + '":"' + STRING_ESCAPE(g.[value], 'json') + '"', ',') + '}' as [json]
from #tmp t cross apply openjson(jsonValue) g
where t.id > 1
and g.[key] not in (select [key] from openjson(#source_json) where [key] <> 'a1.arr')
group by t.id
Result: