BigQuery - Correlated subquery unnesting array not working - google-bigquery

I'm trying to join array elements in BigQuery but I am getting the following error message: Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
Imagine I have two mapping tables:
CREATE OR REPLACE TABLE `test.field_id_name` (
id STRING,
name STRING
) AS (
SELECT * FROM UNNEST(
[STRUCT("s1", "string1"),
STRUCT("s2", "string2"),
STRUCT("s3", "string3")]
)
)
CREATE OR REPLACE TABLE `test.field_values` (
id STRING,
name STRING
) AS (
SELECT * FROM UNNEST(
[STRUCT("v1", "val1"),
STRUCT("v2", "val2"),
STRUCT("v3", "val3")]
)
)
And I have the following as input:
CREATE OR REPLACE TABLE `test.input` AS
SELECT [
STRUCT<id STRING, value ARRAY<STRING>>("s1", ["v1"]),
STRUCT("s2", ["v1"]),
STRUCT("s3", ["v1"])
] records
UNION ALL
SELECT [
STRUCT("s1", ["v1", "v2"]),
STRUCT("s2", ["v1", "v2"]),
STRUCT("s3", ["v1", "v2"])
]
UNION ALL
SELECT [
STRUCT("s1", ["v1", "v2", "v3"]),
STRUCT("s2", ["v1", "v2", "v3"]),
STRUCT("s3", ["v1", "v2", "v3"])
]
I am trying to produce this output:
SELECT [
STRUCT<id_mapped STRING, value_mapped ARRAY<STRING>>("string1", ["val1"]),
STRUCT("string2", ["val1"]),
STRUCT("string3", ["val1"])
] records
UNION ALL
SELECT [
STRUCT("string1", ["val1", "val2"]),
STRUCT("string2", ["val1", "val2"]),
STRUCT("string3", ["val1", "val2"])
]
UNION ALL
SELECT [
STRUCT("string1", ["val1", "val2", "val3"]),
STRUCT("string2", ["val1", "val2", "val3"]),
STRUCT("string3", ["val1", "val2", "val3"])
]
However the following query is failing with the correlated subqueries error.
SELECT
ARRAY(
SELECT
STRUCT(fin.name, ARRAY(SELECT fv.name FROM UNNEST(value) v JOIN test.field_values fv ON (v = fv.id)))
FROM UNNEST(records) r
JOIN test.field_id_name fin ON (fin.id = r.id)
)
FROM test.input

Below is for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(STRUCT(id AS id_mapped, val AS value_mapped)) AS records
FROM (
SELECT fin.name AS id, ARRAY_AGG(fv.name) AS val, FORMAT('%t', t) id1, FORMAT('%t', RECORD) id2
FROM `test.input` t,
UNNEST(records) record,
UNNEST(value) val
JOIN `test.field_id_name` fin ON record.id = fin.id
JOIN `test.field_values` fv ON val = fv.id
GROUP BY id, id1, id2
)
GROUP BY id1
If to apply to sample data from your question - returns exact output you expecting

Related

Joining tables and creating a json out of the joined information

Is there a way to join 2 tables with one query to DB in a way when records from one table are put as an array value in a 'new' column in another table?
(It's clear how to do it with 2 queries to both tables and processing results in code, but is there a way to use only one SELECT with joining the tables "during" the query?)
So, here is a simple example:
Table 1:
id
value
1
v1
2
v2
Table 2:
id
id_t1
value
1
1
v3
2
1
v4
3
2
v5
As a query result of selecting all the values from Table 1 joined with Table 2 there should be the next array of objects (to make the example more general id_t1 from Table 2 is filtered from the joined results):
[
{
id: 1,
value: v1,
newColumnForJoinedValuesFromTable2: [ { id: 1, value: v3 }, { id: 2, value: v4} ]
},
{
id: 2,
value: v2,
newColumnForJoinedValuesFromTable2: [ { id: 3, value: v5 } ]
}
]
You can achieve your json by stacking twice the following functions:
JSON_BUILD_OBJECT, to build your jsons, given <key,value> pairs
JSON_AGG, to aggregate your arrays
WITH tab2_agg AS (
SELECT id_t1,
JSON_AGG(
JSON_BUILD_OBJECT('id' , id_,
'value', value_)
) AS tab2_json
FROM tab2
GROUP BY id_t1
)
SELECT JSON_AGG(
JSON_BUILD_OBJECT('id' , id_,
'value' , value_,
'newColumnForJoinedValuesFromTable2', tab2_json)
) AS your_json
FROM tab1
INNER JOIN tab2_agg
ON tab1.id_ = tab2_agg.id_t1
Check the demo here.
Use json_agg(json_build_object(...)) and group by.
select json_agg(to_json(t)) as json_result from
(
select t1.id, t1.value,
json_agg(json_build_object('id',t2.id,'value',t2.value)) as "JoinedValues"
from t1 join t2 on t2.id_t1 = t1.id
group by t1.id, t1.value
) as t;
See demo

Hierarchically aggregate JSON depending on value in row using PostgreSQL10

I have a PostgreSQL 10 table that works as a "dictionary" and is structured as follows:
key
value
style_selection_color
style_selection_weight
style_line_color
style_line_weight
...
Now I was wondering if there is a way of building a JSON with the values in the table where it would build a hierarchy depending on the value of "key"?
Something like:
style --> selection --> color and
style --> line --> color
Ending up with a JSON:
{
style: [
selection: {
color: "...",
weight: "..."
},
line: {
color: "...",
weight: "..."
}
]
}
Is such a feat achievable? If so, how would I go about it?
Could it be done so that regardless of what keys I have in my table it always returns the JSON properly built?
Thanks in advance
Working solution with PosGres 10 and above
I propose you a generic solution which convert the key data into text[] type so that it can be used as jsonpath inside the standard jsonb_set() function.
But as we will iterate on the jsonb_set() function, we need first to create an aggregate function associated to that function :
CREATE AGGREGATE jsonb_set_agg(p text[], z jsonb, b boolean)
( sfunc = jsonb_set
, stype = jsonb
, initcond = '{}'
)
Then we convert the key data into text[] and we automatically generate the list of jsonpath that will allow to build progressively and iteratively the final jsonb data :
SELECT i.id
, max(i.id) OVER (PARTITION BY t.key) AS id_max
, p.path[1 : i.id] AS jsonbpath
, to_jsonb(t.value) AS value
FROM mytable AS t
CROSS JOIN LATERAL string_to_array(t.key, '_') AS p(path)
CROSS JOIN LATERAL generate_series(1, array_length(p.path, 1)) AS i(id)
The final query looks like this :
WITH list AS
( SELECT i.id
, max(i.id) OVER (PARTITION BY t.key) AS id_max
, p.path[1 : i.id] AS jsonpath
, to_jsonb(t.value) AS value
FROM mytable AS t
CROSS JOIN LATERAL string_to_array(t.key, '_') AS p(path)
CROSS JOIN LATERAL generate_series(1, array_length(p.path, 1)) AS i(id)
)
SELECT jsonb_set_agg( l.jsonpath
, CASE
WHEN l.id = l.id_max THEN l.value
ELSE '{}' :: jsonb
END
, true
ORDER BY l.id
)
FROM list AS l
And the result is slightly different from your expectation (the top-level json array is replaced by a json object) but it sounds like more logic to me :
{"style": {"line": {"color": "C"
, "weight": "D"
}
, "selection": {"color": "A"
, "weight": "B"
}
}
}
full test result in dbfiddle.
Well, I am not sure about Postgres version, hoping this would work on your version, I tried this on version 11.
;WITH dtbl as (
select split_part(tbl.col, '_', 1) as style,
split_part(tbl.col, '_', 2) as cls,
split_part(tbl.col, '_', 3) as property_name,
tbl.val
from (
select 'style_selection_color' as col, 'red' as val
union all
select 'style_selection_weight', '1rem'
union all
select 'style_line_color', 'gray'
union all
select 'style_line_weight', '200'
union all
select 'stil_line_weight', '200'
) as tbl
),
classes as (
select dtbl.style,
dtbl.cls,
(
SELECT json_object_agg(
nested_props.property_name, nested_props.val
)
FROM (
SELECT dtbl2.property_name,
dtbl2.val
FROM dtbl dtbl2
where dtbl2.style = dtbl.style
and dtbl2.cls = dtbl.cls
) AS nested_props
) AS properties
from dtbl
group by dtbl.style, dtbl.cls),
styles as (
select style
from dtbl
group by style
)
,
class_obj as (
select classes.style,
classes.cls,
json_build_object(
classes.cls, classes.properties) as cls_json
from styles
join classes on classes.style = styles.style
)
select json_build_object(
class_obj.style,
json_agg(class_obj.cls_json)
)
from class_obj
group by style
;
If you change the first part of the query to match your table and column names this should work.
The idea is to build the json objects nested, but you cannot do this on one pass, as it does not let you nest json_agg functions, this is why we have to use more than 1 query. first build line and selection objects then aggregate them in the style objects.
Sorry for the naming, this is the best I could do.
EDIT1:
This is the output of that query.
"{""stil"" : [{""line"" : [{""weight"" : ""200""}]}]}"
"{""style"" : [{""selection"" : [{""color"" : ""red""}, {""weight"" : ""1rem""}]}, {""line"" : [{""color"" : ""gray""}, {""weight"" : ""200""}]}]}"
Looking at this output, it is not what exactly you wanted, you got an array of objects for properties:)
You wanted {"color":"red", "weight": "1rem"} but the output is
[{"color":"red"}, {"weight": "1rem"}]
EDIT2:
Well, json_object_agg is the solution, so I combined json_object_agg to build the prop objects, now I am thinking this might be made even more simpler.
This is the new output from the query.
"{""stil"" : [{""line"" : { ""weight"" : ""200"" }}]}"
"{""style"" : [{""selection"" : { ""color"" : ""red"", ""weight"" : ""1rem"" }}, {""line"" : { ""color"" : ""gray"", ""weight"" : ""200"" }}]}"
This is trimmed down version, as I thought json_object_agg made things a bit more simpler, so I got rid off some subselects. Tested on postgres 10.
https://www.db-fiddle.com/f/tjzNBoQ3LTbECfEWb9Nrcp/0
;
WITH dtbl as (
select split_part(tbl.col, '_', 1) as style,
split_part(tbl.col, '_', 2) as cls,
split_part(tbl.col, '_', 3) as property_name,
tbl.val
from (
select 'style_selection_color' as col, 'red' as val
union all
select 'style_selection_weight', '1rem'
union all
select 'style_line_color', 'gray'
union all
select 'style_line_weight', '200'
union all
select 'stil_line_weight', '200'
) as tbl
),
result as (
select dtbl.style,
dtbl.cls,
json_build_object(dtbl.cls,
(
SELECT json_object_agg(
nested_props.property_name, nested_props.val
)
FROM (
SELECT dtbl2.property_name,
dtbl2.val
FROM dtbl dtbl2
where dtbl2.style = dtbl.style
and dtbl2.cls = dtbl.cls
) AS nested_props
)) AS cls_json
from dtbl
group by dtbl.style, dtbl.cls)
select json_build_object(
result.style,
json_agg(result.cls_json)
)
from result
group by style
;
You can think of dtbl as your main table, I just added a bonus row called stil similar to other rows, to make sure that grouping is correct.
Here is the output;
{"style":
[{"line":{"color":"gray", "weight":"200"}},
{"selection":{"color":"red","weight":"1rem"}}]
}
{"stil":[{"line":{"weight":"200"}}]}

How to convert an array of key values to columns in BigQuery / GoogleSQL?

I have an array in BigQuery that looks like the following:
SELECT params FROM mySource;
[
{
key: "name",
value: "apple"
},{
key: "color",
value: "red"
},{
key: "delicious",
value: "yes"
}
]
Which looks like this:
params
[{ key: "name", value: "apple" },{ key: "color", value: "red" },{ key: "delicious", value: "yes" }]
How do I change my query so that the table looks like this:
name
color
delicious
apple
red
yes
Currently I'm able to accomplish this with:
SELECT
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "name"
) as name,
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "color"
) as color,
(
SELECT p.value
FROM UNNEST(params) AS p
WHERE p.key = "delicious"
) as delicious,
FROM mySource;
But I'm wondering if there is a way to do this without manually specifying the key name for each. We may not know all the names of the keys ahead of time.
Thanks!
Consider below approach
select * except(id) from (
select to_json_string(t) id, param.*
from mySource t, unnest(parameters) param
)
pivot (min(value) for key in ('name', 'color', 'delicious'))
if applied to sample data in your question - output is like below
As you can see - you still need to specify key names but whole query is much simpler and more manageable
Meantime, above query can be enhanced with use of EXECUTE IMMEDIATE where list of key names is auto generated. I have at least few answers with such technique, so search for it here on SO if you want (I just do not want to make a duplicates here)
Here is my try based on Mikhail's answer here
--DDL for sample view
create or replace view sample.sampleview
as
with _data
as
(
select 1 as id,
array (
select struct(
"name" as key,
"apple" as value
)
union all
select struct(
"color" as key,
"red" as value
)
union all
select struct(
"delicious" as key,
"yes" as value
)
) as _arr
union all
select 2 as id,
array (
select struct(
"name" as key,
"orange" as value
)
union all
select struct(
"color" as key,
"orange" as value
)
union all
select struct(
"delicious" as key,
"yes" as value
)
)
)
select * from _data
Execute immediate
declare sql string;
set sql =
(
select
concat(
"select id,",
string_agg(
concat("max(if (key = '",key,"',value,NULL)) as ",key)
),
' from sample.sampleview,unnest(_arr) group by id'
)
from (
select key from
sample.sampleview,unnest(_arr)
group by key
)
);
execute immediate sql;

recursive tree as a list (array items) in a new attribute value child

How to get hierarchy data(recursive tree) as a new column like below? (if there is last child then column child array is empty)
rows: [{
'id' : 1,
'parent_id': null,
'name': a
'child': [{
'id' : 2,
'parent_id': 1,
'name': a1,
'child': ...
}]
}]
WITH RECURSIVE t AS (
SELECT t.id AS id FROM category AS t WHERE parent_id is null
UNION ALL
SELECT child.id FROM category AS child JOIN t ON t.id = child.parent_id
)
SELECT * FROM category WHERE id IN (SELECT * FROM t);
https://www.db-fiddle.com/f/ufnG1WpBX4Z8jsBEg6bsLs/4
UPDATE
because I do it with node-postgres it is return json already so I made another version use row_to_json for easier to understand my question
WITH RECURSIVE t AS (
SELECT t.id AS id FROM category AS t WHERE parent_id = 1
UNION ALL
SELECT child.id FROM category AS child JOIN t ON t.id = child.parent_id
)
SELECT row_to_json(row) FROM (
SELECT * FROM category WHERE id IN (SELECT * FROM t)
) row;
https://www.db-fiddle.com/f/ufnG1WpBX4Z8jsBEg6bsLs/5
it returns data like below
[
{"id":3,"parent_id":1,"name":"a1"},
{"id":4,"parent_id":3,"name":"a2"}
]
expected output
{"id":3,"parent_id":1,"name":"a1", "child": [{"id":4,"parent_id":3,"name":"a2"}]}
SELECT row_to_json(row) FROM (
SELECT * FROM category WHERE id IN (SELECT * FROM t::json as something)
) row;

why Snowflake changing the order of JSON values when converting into flatten list?

I have JSON objects stored in the table and I am trying to write a query to get the first element from that JSON.
Replication Script
create table staging.par.test_json (id int, val varchar(2000));
insert into staging.par.test_json values (1, '{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}');
insert into staging.par.test_json values (2,'
{
"list": [
{
"element": "Wholesale jeweler"
},
{
"element": "Fashion"
},
{
"element": "Industry"
},
{
"element": "Jewelry store"
},
{
"element": "Business service"
},
{
"element": "Corporate office"
}
]
}');
with cte_get_cats AS
(
select id,
val as category_list
from staging.par.test_json
),
cats_parse AS
(
select id,
parse_json(category_list) as c
from cte_get_cats
),
distinct_cats as
(
select id,
INDEX,
UPPER(cast(value:element AS varchar)) As c
from
cats_parse,
LATERAL flatten(INPUT => c:"list")
order by 1,2
) ,
cat_array AS
(
SELECT
id,
array_agg(DISTINCT c) AS sds_categories
FROM
distinct_cats
GROUP BY 1
),
sds_cats AS
(
select id,
cast(sds_categories[0] AS varchar) as sds_primary_category
from cat_array
)
select * from sds_cats;
Values: Categories
{"list":[{"element":"Plumber"},{"element":"Craft"},{"element":"Plumbing"},{"element":"Electrics"},{"element":"Electrical"},{"element":"Tradesperson"},{"element":"Home services"},{"element":"Housekeepings"},{"element":"Electrical Goods"}]}
Flattening it to a list gives me
["Plumber","Craft","Plumbing","Electrics","Electrical","Tradesperson","Home services","Housekeepings","Electrical Goods"]
Issue:
The order of this is not always same. Snowflake seems to change the ordering sometimes snowflake changes the order as per the alphabet.
How can I make this static. I do not want the order to be changed.
The problem is the way you're using ARRAY_AGG:
array_agg(DISTINCT c) AS sds_categories
Specifying it like that gives Snowflake no guidelines on how the contents of array should be arranged. You should not assume that the arrays will be created in the same order as their input records - it might, but it's not guaranteed. So you probably want to do
array_agg(DISTINCT c) within group (order by index) AS sds_categories
But that won't work, as if you use DISTINCT c, the value of index for each c is unknown. Perhaps you don't need DISTINCT, then this will work
array_agg(c) within group (order by index) AS sds_categories
If you do need DISTINCT, you need to somehow associate an index with a distinct c value. One way is to use a MIN function on index in the input. Here's a full query
with cte_get_cats AS
(
select id,
val as category_list
from staging.par.test_json
),
cats_parse AS
(
select id,
parse_json(category_list) as c
from cte_get_cats
),
distinct_cats as
(
select id,
MIN(INDEX) AS index,
UPPER(cast(value:element AS varchar)) As c
from
cats_parse,
LATERAL flatten(INPUT => c:"list")
group by 1,3
) ,
cat_array AS
(
SELECT
id,
array_agg(c) within group (order by index) AS sds_categories
FROM
distinct_cats
GROUP BY 1
),
sds_cats AS
(
select id,
cast(sds_categories[0] AS varchar) as sds_primary_category
from cat_array
)
select * from cat_array;