Hierarchically aggregate JSON depending on value in row using PostgreSQL10

Hierarchically aggregate JSON depending on value in row using PostgreSQL10 - sql

I have a PostgreSQL 10 table that works as a "dictionary" and is structured as follows:
key
value
style_selection_color
style_selection_weight
style_line_color
style_line_weight
...
Now I was wondering if there is a way of building a JSON with the values in the table where it would build a hierarchy depending on the value of "key"?
Something like:
style --> selection --> color and
style --> line --> color
Ending up with a JSON:
{
style: [
selection: {
color: "...",
weight: "..."
},
line: {
color: "...",
weight: "..."
}
]
}
Is such a feat achievable? If so, how would I go about it?
Could it be done so that regardless of what keys I have in my table it always returns the JSON properly built?
Thanks in advance

Working solution with PosGres 10 and above
I propose you a generic solution which convert the key data into text[] type so that it can be used as jsonpath inside the standard jsonb_set() function.
But as we will iterate on the jsonb_set() function, we need first to create an aggregate function associated to that function :
CREATE AGGREGATE jsonb_set_agg(p text[], z jsonb, b boolean)
( sfunc = jsonb_set
, stype = jsonb
, initcond = '{}'
)
Then we convert the key data into text[] and we automatically generate the list of jsonpath that will allow to build progressively and iteratively the final jsonb data :
SELECT i.id
, max(i.id) OVER (PARTITION BY t.key) AS id_max
, p.path[1 : i.id] AS jsonbpath
, to_jsonb(t.value) AS value
FROM mytable AS t
CROSS JOIN LATERAL string_to_array(t.key, '_') AS p(path)
CROSS JOIN LATERAL generate_series(1, array_length(p.path, 1)) AS i(id)
The final query looks like this :
WITH list AS
( SELECT i.id
, max(i.id) OVER (PARTITION BY t.key) AS id_max
, p.path[1 : i.id] AS jsonpath
, to_jsonb(t.value) AS value
FROM mytable AS t
CROSS JOIN LATERAL string_to_array(t.key, '_') AS p(path)
CROSS JOIN LATERAL generate_series(1, array_length(p.path, 1)) AS i(id)
)
SELECT jsonb_set_agg( l.jsonpath
, CASE
WHEN l.id = l.id_max THEN l.value
ELSE '{}' :: jsonb
END
, true
ORDER BY l.id
)
FROM list AS l
And the result is slightly different from your expectation (the top-level json array is replaced by a json object) but it sounds like more logic to me :
{"style": {"line": {"color": "C"
, "weight": "D"
}
, "selection": {"color": "A"
, "weight": "B"
}
}
}
full test result in dbfiddle.

Well, I am not sure about Postgres version, hoping this would work on your version, I tried this on version 11.
;WITH dtbl as (
select split_part(tbl.col, '_', 1) as style,
split_part(tbl.col, '_', 2) as cls,
split_part(tbl.col, '_', 3) as property_name,
tbl.val
from (
select 'style_selection_color' as col, 'red' as val
union all
select 'style_selection_weight', '1rem'
union all
select 'style_line_color', 'gray'
union all
select 'style_line_weight', '200'
union all
select 'stil_line_weight', '200'
) as tbl
),
classes as (
select dtbl.style,
dtbl.cls,
(
SELECT json_object_agg(
nested_props.property_name, nested_props.val
)
FROM (
SELECT dtbl2.property_name,
dtbl2.val
FROM dtbl dtbl2
where dtbl2.style = dtbl.style
and dtbl2.cls = dtbl.cls
) AS nested_props
) AS properties
from dtbl
group by dtbl.style, dtbl.cls),
styles as (
select style
from dtbl
group by style
)
,
class_obj as (
select classes.style,
classes.cls,
json_build_object(
classes.cls, classes.properties) as cls_json
from styles
join classes on classes.style = styles.style
)
select json_build_object(
class_obj.style,
json_agg(class_obj.cls_json)
)
from class_obj
group by style
;
If you change the first part of the query to match your table and column names this should work.
The idea is to build the json objects nested, but you cannot do this on one pass, as it does not let you nest json_agg functions, this is why we have to use more than 1 query. first build line and selection objects then aggregate them in the style objects.
Sorry for the naming, this is the best I could do.
EDIT1:
This is the output of that query.
"{""stil"" : [{""line"" : [{""weight"" : ""200""}]}]}"
"{""style"" : [{""selection"" : [{""color"" : ""red""}, {""weight"" : ""1rem""}]}, {""line"" : [{""color"" : ""gray""}, {""weight"" : ""200""}]}]}"
Looking at this output, it is not what exactly you wanted, you got an array of objects for properties:)
You wanted {"color":"red", "weight": "1rem"} but the output is
[{"color":"red"}, {"weight": "1rem"}]
EDIT2:
Well, json_object_agg is the solution, so I combined json_object_agg to build the prop objects, now I am thinking this might be made even more simpler.
This is the new output from the query.
"{""stil"" : [{""line"" : { ""weight"" : ""200"" }}]}"
"{""style"" : [{""selection"" : { ""color"" : ""red"", ""weight"" : ""1rem"" }}, {""line"" : { ""color"" : ""gray"", ""weight"" : ""200"" }}]}"

This is trimmed down version, as I thought json_object_agg made things a bit more simpler, so I got rid off some subselects. Tested on postgres 10.
https://www.db-fiddle.com/f/tjzNBoQ3LTbECfEWb9Nrcp/0
;
WITH dtbl as (
select split_part(tbl.col, '_', 1) as style,
split_part(tbl.col, '_', 2) as cls,
split_part(tbl.col, '_', 3) as property_name,
tbl.val
from (
select 'style_selection_color' as col, 'red' as val
union all
select 'style_selection_weight', '1rem'
union all
select 'style_line_color', 'gray'
union all
select 'style_line_weight', '200'
union all
select 'stil_line_weight', '200'
) as tbl
),
result as (
select dtbl.style,
dtbl.cls,
json_build_object(dtbl.cls,
(
SELECT json_object_agg(
nested_props.property_name, nested_props.val
)
FROM (
SELECT dtbl2.property_name,
dtbl2.val
FROM dtbl dtbl2
where dtbl2.style = dtbl.style
and dtbl2.cls = dtbl.cls
) AS nested_props
)) AS cls_json
from dtbl
group by dtbl.style, dtbl.cls)
select json_build_object(
result.style,
json_agg(result.cls_json)
)
from result
group by style
;
You can think of dtbl as your main table, I just added a bonus row called stil similar to other rows, to make sure that grouping is correct.
Here is the output;
{"style":
[{"line":{"color":"gray", "weight":"200"}},
{"selection":{"color":"red","weight":"1rem"}}]
}
{"stil":[{"line":{"weight":"200"}}]}

Related

How to convert an array of key values to columns in BigQuery / GoogleSQL?

I have an array in BigQuery that looks like the following:
SELECT params FROM mySource;
[
{
key: "name",
value: "apple"
},{
key: "color",
value: "red"
},{
key: "delicious",
value: "yes"
}
]
Which looks like this:
params
[{ key: "name", value: "apple" },{ key: "color", value: "red" },{ key: "delicious", value: "yes" }]
How do I change my query so that the table looks like this:
name
color
delicious
apple
red
yes
Currently I'm able to accomplish this with:
SELECT
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "name"
) as name,
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "color"
) as color,
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "delicious"
) as delicious,
FROM mySource;
But I'm wondering if there is a way to do this without manually specifying the key name for each. We may not know all the names of the keys ahead of time.
Thanks!

Consider below approach
select * except(id) from (
select to_json_string(t) id, param.*
from mySource t, unnest(parameters) param
)
pivot (min(value) for key in ('name', 'color', 'delicious'))
if applied to sample data in your question - output is like below
As you can see - you still need to specify key names but whole query is much simpler and more manageable
Meantime, above query can be enhanced with use of EXECUTE IMMEDIATE where list of key names is auto generated. I have at least few answers with such technique, so search for it here on SO if you want (I just do not want to make a duplicates here)

Here is my try based on Mikhail's answer here
--DDL for sample view
create or replace view sample.sampleview
as
with _data
as
(
select 1 as id,
array (
select struct(
"name" as key,
"apple" as value
)
union all
select struct(
"color" as key,
"red" as value
)
union all
select struct(
"delicious" as key,
"yes" as value
)
) as _arr
union all
select 2 as id,
array (
select struct(
"name" as key,
"orange" as value
)
union all
select struct(
"color" as key,
"orange" as value
)
union all
select struct(
"delicious" as key,
"yes" as value
)
)
)
select * from _data
Execute immediate
declare sql string;
set sql =
(
select
concat(
"select id,",
string_agg(
concat("max(if (key = '",key,"',value,NULL)) as ",key)
),
' from sample.sampleview,unnest(_arr) group by id'
)
from (
select key from
sample.sampleview,unnest(_arr)
group by key
)
);
execute immediate sql;

Running nested CTEs

On the BigQuery SELECT syntax page it gives the following:
query_statement:
query_expr
query_expr:
[ WITH cte[, ...] ]
{ select | ( query_expr ) | set_operation }
[ ORDER BY expression [{ ASC | DESC }] [, ...] ]
[ LIMIT count [ OFFSET skip_rows ] ]
I understand how the (second line) select could be either:
{ select | set_operation }
But what is the ( query_expr ) in the middle for? For example, if it can refer to itself, wouldn't it make the possibility to construct a lisp-like query such as:
with x as (select 1 a)
(with y as (select 2 b)
(with z as (select 3 c)
select * from x, y, z))
Actually, I just tested it and the answer is yes. If so, what would be an actual use case of the above construction where you can use ( query_expr ) ?
And, is there ever a case where using a nested CTE can do something that multiple CTEs cannot? (For example, the current answer is just a verbose way or writing what would more properly be written with a single WITH expression with multiple CTEs)

The following construction might be useful to model how an ETL flow might work and to encapsulate certain 'steps' that you don't want the outer query to have access to. But this is quite a stretch...
WITH output AS (
WITH step_csv_query AS (
SELECT * FROM 'Sales.CSV'),
step_filter_csv AS (
SELECT * FROM step_csv_query WHERE Country='US'),
step_mysql_query AS (
SELECT * FROM MySQL1 LEFT OUTER JOIN MySQL2...),
step_join_queries AS (
SELECT * FROM step_filter_csv INNER JOIN step_mysql_query USING (id)
) SELECT * FROM step_join_queries -- output is last step
) SELECT * FROM output -- or whatever we want to do with the output...

This query might be useful in that case where a CTE is being referred to by subsequent CTEs.
For instance, you can use this if you want to join two tables, use expressions and query the resulting table.
with x as (select '2' as id, 'sample' as name )
(with y as ( select '2' as number, 'customer' as type)
(with z as ( select CONCAT('C00',id), name, type from x inner join y on x.id=y.number)
Select * from z))
The above query gives the following output :
Though there are other ways to achieve the same, the above method would be much easier for debugging.
Following article can be referred for further information on the use cases.
In nested CTEs, the same CTE alias can be reused which is not possible in case of multiple CTEs. For example, in the following query the inner CTE will override the outer CTEs with the same alias :
with x as (select '1')
(with x as (select '2' as id, 'sample' as name )
(with x as ( select '2' as number, 'customer' as type)
select * from x))

BigQuery - Correlated subquery unnesting array not working

I'm trying to join array elements in BigQuery but I am getting the following error message: Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
Imagine I have two mapping tables:
CREATE OR REPLACE TABLE `test.field_id_name` (
id STRING,
name STRING
) AS (
SELECT * FROM UNNEST(
[STRUCT("s1", "string1"),
STRUCT("s2", "string2"),
STRUCT("s3", "string3")]
)
)
CREATE OR REPLACE TABLE `test.field_values` (
id STRING,
name STRING
) AS (
SELECT * FROM UNNEST(
[STRUCT("v1", "val1"),
STRUCT("v2", "val2"),
STRUCT("v3", "val3")]
)
)
And I have the following as input:
CREATE OR REPLACE TABLE `test.input` AS
SELECT [
STRUCT<id STRING, value ARRAY<STRING>>("s1", ["v1"]),
STRUCT("s2", ["v1"]),
STRUCT("s3", ["v1"])
] records
UNION ALL
SELECT [
STRUCT("s1", ["v1", "v2"]),
STRUCT("s2", ["v1", "v2"]),
STRUCT("s3", ["v1", "v2"])
]
UNION ALL
SELECT [
STRUCT("s1", ["v1", "v2", "v3"]),
STRUCT("s2", ["v1", "v2", "v3"]),
STRUCT("s3", ["v1", "v2", "v3"])
]
I am trying to produce this output:
SELECT [
STRUCT<id_mapped STRING, value_mapped ARRAY<STRING>>("string1", ["val1"]),
STRUCT("string2", ["val1"]),
STRUCT("string3", ["val1"])
] records
UNION ALL
SELECT [
STRUCT("string1", ["val1", "val2"]),
STRUCT("string2", ["val1", "val2"]),
STRUCT("string3", ["val1", "val2"])
]
UNION ALL
SELECT [
STRUCT("string1", ["val1", "val2", "val3"]),
STRUCT("string2", ["val1", "val2", "val3"]),
STRUCT("string3", ["val1", "val2", "val3"])
]
However the following query is failing with the correlated subqueries error.
SELECT
ARRAY(
SELECT
STRUCT(fin.name, ARRAY(SELECT fv.name FROM UNNEST(value) v JOIN test.field_values fv ON (v = fv.id)))
FROM UNNEST(records) r
JOIN test.field_id_name fin ON (fin.id = r.id)
)
FROM test.input

Below is for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(STRUCT(id AS id_mapped, val AS value_mapped)) AS records
FROM (
SELECT fin.name AS id, ARRAY_AGG(fv.name) AS val, FORMAT('%t', t) id1, FORMAT('%t', RECORD) id2
FROM `test.input` t,
UNNEST(records) record,
UNNEST(value) val
JOIN `test.field_id_name` fin ON record.id = fin.id
JOIN `test.field_values` fv ON val = fv.id
GROUP BY id, id1, id2
)
GROUP BY id1
If to apply to sample data from your question - returns exact output you expecting

Redshift Postgresql - How to Parse Nested JSON

I am trying to parse a JSON text using JSON_EXTRACT_PATH_TEXT() function.
JSON sample:
{
"data":[
{
"name":"ping",
"idx":0,
"cnt":27,
"min":16,
"max":33,
"avg":24.67,
"dev":5.05
},
{
"name":"late",
"idx":0,
"cnt":27,
"min":8,
"max":17,
"avg":12.59,
"dev":2.63
}
]
}
'
I tried JSON_EXTRACT_PATH_TEXT(event , '{"name":"late"}', 'avg') function to get 'avg' for name = "late", but it returns blank.
Can anyone help, please?
Thanks

This is a rather complicated task in Redshift, that, unlike Postgres, has poor support to manage JSON, and no function to unnest arrays.
Here is one way to do it using a number table; you need to populate the table with incrementing numbers starting at 0, like:
create table nums as
select 0 i union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 n union all select 6
union all select 7 union all select 8 union all select 9
;
Once the table is created, you can use it to walk the JSON array using json_extract_array_element_text(), and check its content with json_extract_path_text():
select json_extract_path_text(item, 'avg') as my_avg
from (
select json_extract_array_element_text(t.items, n.i, true) as item
from (
select json_extract_path_text(mycol, 'data', true ) as items
from mytable
) t
inner join nums n on n.i < json_array_length(t.items, true)
) t
where json_extract_path_text(item, 'name') = 'late';

You'll need to use json_array_elements for that:
select obj->'avg'
from foo f, json_array_elements(f.event->'data') obj
where obj->>'name' = 'late';
Working example
create table foo (id int, event json);
insert into foo values (1,'{
"data":[
{
"name":"ping",
"idx":0,
"cnt":27,
"min":16,
"max":33,
"avg":24.67,
"dev":5.05
},
{
"name":"late",
"idx":0,
"cnt":27,
"min":8,
"max":17,
"avg":12.59,
"dev":2.63
}]}');

why Snowflake changing the order of JSON values when converting into flatten list?

I have JSON objects stored in the table and I am trying to write a query to get the first element from that JSON.
Replication Script
create table staging.par.test_json (id int, val varchar(2000));
insert into staging.par.test_json values (1, '{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}');
insert into staging.par.test_json values (2,'
{
"list": [
{
"element": "Wholesale jeweler"
},
{
"element": "Fashion"
},
{
"element": "Industry"
},
{
"element": "Jewelry store"
},
{
"element": "Business service"
},
{
"element": "Corporate office"
}
]
}');
with cte_get_cats AS
(
select id,
val as category_list
from staging.par.test_json
),
cats_parse AS
(
select id,
parse_json(category_list) as c
from cte_get_cats
),
distinct_cats as
(
select id,
INDEX,
UPPER(cast(value:element AS varchar)) As c
from
cats_parse,
LATERAL flatten(INPUT => c:"list")
order by 1,2
) ,
cat_array AS
(
SELECT
id,
array_agg(DISTINCT c) AS sds_categories
FROM
distinct_cats
GROUP BY 1
),
sds_cats AS
(
select id,
cast(sds_categories[0] AS varchar) as sds_primary_category
from cat_array
)
select * from sds_cats;
Values: Categories
{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}
Flattening it to a list gives me
["Plumber","Craft","Plumbing","Electrics","Electrical","Tradesperson","Home services","Housekeepings","Electrical Goods"]
Issue:
The order of this is not always same. Snowflake seems to change the ordering sometimes snowflake changes the order as per the alphabet.
How can I make this static. I do not want the order to be changed.

The problem is the way you're using ARRAY_AGG:
array_agg(DISTINCT c) AS sds_categories
Specifying it like that gives Snowflake no guidelines on how the contents of array should be arranged. You should not assume that the arrays will be created in the same order as their input records - it might, but it's not guaranteed. So you probably want to do
array_agg(DISTINCT c) within group (order by index) AS sds_categories
But that won't work, as if you use DISTINCT c, the value of index for each c is unknown. Perhaps you don't need DISTINCT, then this will work
array_agg(c) within group (order by index) AS sds_categories
If you do need DISTINCT, you need to somehow associate an index with a distinct c value. One way is to use a MIN function on index in the input. Here's a full query
with cte_get_cats AS
(
select id,
val as category_list
from staging.par.test_json
),
cats_parse AS
(
select id,
parse_json(category_list) as c
from cte_get_cats
),
distinct_cats as
(
select id,
MIN(INDEX) AS index,
UPPER(cast(value:element AS varchar)) As c
from
cats_parse,
LATERAL flatten(INPUT => c:"list")
group by 1,3
) ,
cat_array AS
(
SELECT
id,
array_agg(c) within group (order by index) AS sds_categories
FROM
distinct_cats
GROUP BY 1
),
sds_cats AS
(
select id,
cast(sds_categories[0] AS varchar) as sds_primary_category
from cat_array
)
select * from cat_array;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hierarchically aggregate JSON depending on value in row using PostgreSQL10 - sql

Related

How to convert an array of key values to columns in BigQuery / GoogleSQL?

Running nested CTEs

BigQuery - Correlated subquery unnesting array not working

Redshift Postgresql - How to Parse Nested JSON

why Snowflake changing the order of JSON values when converting into flatten list?

Categories

Resources