Extract all JSON keys - sql

I have a JSON column j like:
{'a': 2, 'b': {'b1': 3, 'b2': 5}}
{'c': 3, 'a': 5}
{'d': 1, 'c': 7}
How can I get all distinct (top-level) key names from Presto? I.e. I something like
select distinct foo(j)
To return
['a', 'b', 'c', 'd']
(note that in this instance I'm not too concerned with the nested keys)
Presto documentation doesn't have any function that explicitly fits the bill. The only thing that looks close is mention of JSONPath syntax, but even this seems to be inaccurate. At least one of the following should return something but all failed in Presto for me:
select json_extract(j, '$.*')
select json_extract(j, '$..*')
select json_extract(j, '$[*]')
select json_extract(j, '*')
select json_extract(j, '..*')
select json_extract(j, '$*.*')
Further, I suspect this will return the values, not the keys, from j (i.e., [2, 3, 5, 3, 5, 1, 7]).

You can
extract JSON top-level keys with map_keys(cast(json_column as map<varchar,json>))
later "flatten" the key collections using CROSS JOIN UNNEST
then you can SELECT DISTINCT to get distinct top-level keys.
Example putting this together:
presto> SELECT DISTINCT m.key
-> FROM (VALUES JSON '{"a": 2, "b": {"b1": 3, "b2": 5}}', JSON '{"c": 3, "a": 5}')
-> example_table(json_column)
-> CROSS JOIN UNNEST (map_keys(CAST(json_column AS map<varchar,json>))) AS m(key);
key
-----
a
b
c
(3 rows)

Related

Sum values in Athena table where column having key/value pair json

I have an Athena table with one column having JSON and key/value pairs.
Ex:
Select test_client, test_column from ABC;
test_client, test_column
john, {"d":13, "e":210}
mark, {"a":1,"b":10,"c":1}
john, {"e":100,"a":110,"b":10, "d":10}
mark, {"a":56,"c":11,"f":9, "e": 10}
And I need to sum the values corresponding to keys and return in some sort like the below: return o/p format doesn't matter. I want to sum it up.
john: d: 23, e:310, a:110, b:10
mark: a:57, b:10, c:12, f:9, e:10
It is a combination of a few useful functions in Trino:
WITH example_table AS
(SELECT 'john' as person, '{"d":13, "e":210}' as _json UNION ALL
SELECT 'mark', ' {"a":1,"b":10,"c":1}' UNION ALL
SELECT 'john', '{"e":100,"a":110,"b":10, "d":10}' UNION ALL
SELECT 'mark', '{"a":56,"c":11,"f":9, "e": 10}')
SELECT person, reduce(
array_agg(CAST(json_parse(_json) AS MAP(VARCHAR, INTEGER))),
MAP(ARRAY['a'],ARRAY[0]),
(s, x) -> map_zip_with(
s,x, (k, v1, v2) ->
if(v1 is null, 0, v1) +
if(v2 is null, 0, v2)
),
s -> s
)
FROM example_table
GROUP BY person
json_parse - Parses the string to a JSON object
CAST ... AS MAP... - Creates a MAP from the JSON object
array_agg - Aggregates the maps for each Person based on the group by
reduce - steps through the aggregated array and reduce it to a single map
map_zip_with - applies a function on each similar key in two maps
if(... is null ...) - puts 0 instead of null if the key is not present

JSON_Extract from list of json string

I want to extract some values for particular keys from a table with json string as below.
raw_data
...
{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]}
...
I am expecting an outcome like
label
amount
category
XXX
[1000, 100, 10]
['A', 'B', 'C']
I am using the following sql query to achieve that
select
JSON_EXTRACT(raw_data, '$.lines[*].amount') AS amount,
JSON_EXTRACT(raw_data, '$.lines[*].category') AS category,
JSON_EXTRACT(raw_data, '$.label') AS label
from table
I can get a specific element of the list with [0] , [1] etc. But the sql code doesn't work with [*]. I am getting the following error
Invalid JSON path: '$.lines[*].amount'
Edit
I am using Presto
Json path support in Presto is very limited, so you need to do some processing manually for example with casts and array functions:
-- sample data
with dataset (raw_data) as (
values '{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]}'
)
-- query
select label,
transform(lines, l -> l['amount']) amount,
transform(lines, l -> l['category']) category
from (
select JSON_EXTRACT(raw_data, '$.label') AS label,
cast(JSON_EXTRACT(raw_data, '$.lines') as array(map(varchar, json))) lines
from dataset
);
Output:
label
amount
category
XXX
[1000, 100, 10]
["A", "B", "C"]
In Trino json path support was vastly improved, so you can do next:
-- query
select JSON_EXTRACT(raw_data, '$.label') label,
JSON_QUERY(raw_data, 'lax $.lines[*].amount' WITH ARRAY WRAPPER) amount,
JSON_QUERY(raw_data, 'lax $.lines[*].category' WITH ARRAY WRAPPER) category
from dataset;
You can use json_table and json_arrayagg:
select json_extract(t.raw_data, '$.label'),
(select json_arrayagg(t1.v) from json_table(t.raw_data, '$.lines[*]' columns (v int path '$.amount')) t1),
(select json_arrayagg(t1.v) from json_table(t.raw_data, '$.lines[*]' columns (v text path '$.category')) t1)
from tbl t
I was able to get the expected output using unnest to flatten and array_agg to aggregate in Presto. Below is the SQL used and output generated:
WITH dataset AS (
SELECT
* from sf_73535794
)
SELECT raw_data.label,array_agg(t.lines.amount) as amount,array_agg(t.lines.category) as category FROM dataset
CROSS JOIN UNNEST(raw_data.lines) as t(lines) group by 1
Output:

how to return "sparse" json (choose a number of attributes ) from PostgreSQL

MongoDB has a way of choosing the fields of a JSON documents that are returned as a result of query. I am looking for the same with PostgreSQL.
Let's assume that I've got a JSON like this:
{
a: valuea,
b: valueb,
c: valuec,
...
z: valuez
}
The particular values may be either simple values or subobjects with further nesting.
I want to have a way of returning JSON Documents containing only the atttributes I choose, something like:
SELECT json_col including_only a,b,c,g,n from table where...
I know that there is the "-" operator, allowing eliminating specific attributes, but is there an operator that does exactly the opposite?
In trivial cases you can use jsonb_to_record(jsonb)
with data(json_col) as (
values
('{"a": 1, "b": 2, "c": 3, "d": 4}'::jsonb)
)
select *, to_jsonb(rec) as result
from data
cross join jsonb_to_record(json_col) as rec(a int, d int)
json_col | a | d | result
----------------------------------+---+---+------------------
{"a": 1, "b": 2, "c": 3, "d": 4} | 1 | 4 | {"a": 1, "d": 4}
(1 row)
See JSON Functions and Operators.
If you need a more generic tool, the function does the job:
create or replace function jsonb_sparse(jsonb, text[])
returns jsonb language sql immutable as $$
select $1 - (
select array_agg(key)
from jsonb_object_keys($1) as key
where key <> all($2)
)
$$;
-- use:
select jsonb_sparse('{"a": 1, "b": 2, "c": 3, "d": 4}', '{a, d}')
Test it in db<>fiddle.

PostgreSQL: exclude complete jsonb array if one element fails the WHERE clause

Assume a table json_table with columns id (int), data (jsonb).
A sample jsonb value would be
{"a": [{"b":{"c": "xxx", "d": 1}},{"b":{"c": "xxx", "d": 2}}]}
When I use an SQL statement like the following:
SELECT data FROM json_table j, jsonb_array_elements(j.data#>'{a}') dt WHERE (dt#>>'{b,d}')::integer NOT IN (2,4,6,9) GROUP BY id;
... the two array elements are unnested and the one that qualifies the WHERE clause is still returned. This makes sense since each array element is considered individually. In this example I will get back the complete row
{"a": [{"b":{"c": "xxx", "d": 1}},{"b":{"c": "xxx", "d": 2}}]}
I'm looking for a way to exclude the complete json_table row when any jsonb array element fails the condition
You can move the condition to the WHERE clause and use NOT EXISTS:
SELECT data
FROM json_table j
WHERE NOT EXISTS (SELECT 1
FROM jsonb_array_elements(j.data#>'{a}') dt
WHERE (dt#>>'{b,d}')::integer IN (2, 4, 6, 9)
);
You can achieve it with the following query:
select data
from json_table
where jsonb_path_match(data, '!exists($.a[*].b.d ? ( # == 2 || # == 4 || # == 6 || # == 9))')

Why is this code giving me error (postgresql JSONB )?

SELECT *
FROM goods
WHERE jsonb_exists_any(params->'sex', array[1, 2, 3, 4, 5])
ERROR: function jsonb_exists_any(jsonb, integer[]) does not exist
LINE 1: SELECT * FROM goods WHERE jsonb_exists_any(params->'sex',
ar...
Call functions that exist, jsonb_exists_any does not exist. Why did you think jsonb_exists_any existed? Was it just a typo?
SELECT *
FROM goods
WHERE jsonb_exists_any(params->'sex', array[1, 2, 3, 4, 5])
Find the functions that exist in the latest version here
I'm guessing you want this..
SELECT *
FROM goods
WHERE params->'sex' = ANY(ARRAY[1, 2, 3, 4, 5]);