Is there a way to join 2 tables with one query to DB in a way when records from one table are put as an array value in a 'new' column in another table?
(It's clear how to do it with 2 queries to both tables and processing results in code, but is there a way to use only one SELECT with joining the tables "during" the query?)
So, here is a simple example:
Table 1:
id
value
1
v1
2
v2
Table 2:
id
id_t1
value
1
1
v3
2
1
v4
3
2
v5
As a query result of selecting all the values from Table 1 joined with Table 2 there should be the next array of objects (to make the example more general id_t1 from Table 2 is filtered from the joined results):
[
{
id: 1,
value: v1,
newColumnForJoinedValuesFromTable2: [ { id: 1, value: v3 }, { id: 2, value: v4} ]
},
{
id: 2,
value: v2,
newColumnForJoinedValuesFromTable2: [ { id: 3, value: v5 } ]
}
]
You can achieve your json by stacking twice the following functions:
JSON_BUILD_OBJECT, to build your jsons, given <key,value> pairs
JSON_AGG, to aggregate your arrays
WITH tab2_agg AS (
SELECT id_t1,
JSON_AGG(
JSON_BUILD_OBJECT('id' , id_,
'value', value_)
) AS tab2_json
FROM tab2
GROUP BY id_t1
)
SELECT JSON_AGG(
JSON_BUILD_OBJECT('id' , id_,
'value' , value_,
'newColumnForJoinedValuesFromTable2', tab2_json)
) AS your_json
FROM tab1
INNER JOIN tab2_agg
ON tab1.id_ = tab2_agg.id_t1
Check the demo here.
Use json_agg(json_build_object(...)) and group by.
select json_agg(to_json(t)) as json_result from
(
select t1.id, t1.value,
json_agg(json_build_object('id',t2.id,'value',t2.value)) as "JoinedValues"
from t1 join t2 on t2.id_t1 = t1.id
group by t1.id, t1.value
) as t;
See demo
Related
I have the following query for the BigQuery instance:
CREATE TABLE my_dataset.PRODUCT AS (
SELECT "1,2,3" AS PRODUCT_DESCRIPTION_IDS UNION ALL
SELECT "2,3" AS PRODUCT_DESCRIPTION_IDS UNION ALL
SELECT "1" AS PRODUCT_DESCRIPTION_IDS
);
CREATE TABLE my_dataset.DESCRIPTION AS (
SELECT "1" AS DESCRIPTION_ID, "VALUE1" AS DESCRIPTION_VALUE UNION ALL
SELECT "2" AS DESCRIPTION_ID, "VALUE2" AS DESCRIPTION_VALUE UNION ALL
SELECT "3" AS DESCRIPTION_ID, "VALUE3" AS DESCRIPTION_VALUE
);
SELECT
FORMAT('%T', ARRAY_AGG(ELEMENT)) AS desc_ids,
FORMAT('%T', ARRAY_AGG((SELECT DESCRIPTION_VALUE FROM my_dataset.DESCRIPTION WHERE DESCRIPTION_ID = ELEMENT))) AS desc_values,
FROM UNNEST((
SELECT
SPLIT(PRODUCT_DESCRIPTION_IDS, ',') as arr
FROM my_dataset.PRODUCT
limit 1
)) AS ELEMENT
It executes fine but only when I have limit 1 specified, otherwise I receive an exception:
Scalar subquery produced more than one element
How should I update my query to receive not only one resulting row but all of them?
Consider below
select
format('%T', array_agg(ELEMENT)) as desc_ids,
format('%T', array_agg(DESCRIPTION_VALUE)) as desc_values
from PRODUCT t, unnest(split(PRODUCT_DESCRIPTION_IDS)) as ELEMENT
left join DESCRIPTION
on ELEMENT = DESCRIPTION_ID
group by format('%T',t)
if applied to sample data in your question - output is
Here's my json doc:
[
{
"ID":1,
"Label":"Price",
"Value":399
},
{
"ID":2,
"Label":"Company",
"Value":"Apple"
},
{
"ID":2,
"Label":"Model",
"Value":"iPhone SE"
},
]
Here's my table:
+----+------------------------------------------------------------------------------------------------------------------------------------+
| ID | Properties |
+----+------------------------------------------------------------------------------------------------------------------------------------+
| 1 | [{"ID":1,"Label":"Price","Value":399},{"ID":2,"Label":"Company","Value":"Apple"},{"ID":3,"Label":"Model","Value":"iPhone SE"}] |
| 2 | [{"ID":1,"Label":"Price","Value":499},{"ID":2,"Label":"Company","Value":"Apple"},{"ID":3,"Label":"Model","Value":"iPhone X"}] |
| 3 | [{"ID":1,"Label":"Price","Value":699},{"ID":2,"Label":"Company","Value":"Apple"},{"ID":3,"Label":"Model","Value":"iPhone 11"}] |
| 4 | [{"ID":1,"Label":"Price","Value":999},{"ID":2,"Label":"Company","Value":"Apple"},{"ID":3,"Label":"Model","Value":"iPhone 11 Pro"}] |
+----+------------------------------------------------------------------------------------------------------------------------------------+
Here's what I want to search on search query:
SELECT *
FROM mobiles
WHERE ($.Label = "Price" AND $.Value < 400)
AND ($.Label = "Model" AND $.Value = "iPhone SE")
Above mentioned query is just for illustration purpose only. I just wanted to convey what I want to perform.
Also I know the table can be normalized into two. But this table is also a place holder table and let's just say it is going to stay the same.
I need to know if it's possible to query the given json structure for following operators: >, >=, <, <=, BETWEEN AND, IN, NOT IN, LIKE, NOT LIKE, <>
Since MariaDB does not support JSON_TABLE(), and JSON_PATH supports only member/object selector, it is not so straightforward to filter JSON here. You can try this query, that tries to overcome that limitations:
with a as (
select 1 as id, '[{"ID":1,"Label":"Price","Value":399},{"ID":2,"Label":"Company","Value":"Apple"},{"ID":3,"Label":"Model","Value":"iPhone SE"}]' as properties union all
select 2 as id, '[{"ID":1,"Label":"Price","Value":499},{"ID":2,"Label":"Company","Value":"Apple"},{"ID":3,"Label":"Model","Value":"iPhone X"}]' as properties union all
select 3 as id, '[{"ID":1,"Label":"Price","Value":699},{"ID":2,"Label":"Company","Value":"Apple"},{"ID":3,"Label":"Model","Value":"iPhone 11"}]' as properties union all
select 4 as id, '[{"ID":1,"Label":"Price","Value":999},{"ID":2,"Label":"Company","Value":"Apple"},{"ID":3,"Label":"Model","Value":"iPhone 11 Pro"}]' as properties
)
select *
from a
where json_value(a.properties,
/*Get path to Price property and replace property name to Value*/
replace(replace(json_search(a.properties, 'one', 'Price'), '"', ''), 'Label', 'Value')
) < 400
and json_value(a.properties,
/*And the same for Model name*/
replace(replace(json_search(a.properties, 'one', 'Model'), '"', ''), 'Label', 'Value')
) = "iPhone SE"
| id | properties
+----+------------
| 1 | [{"ID":1,"Label":"Price","Value":399},{"ID":2,"Label":"Company","Value":"Apple"},{"ID":3,"Label":"Model","Value":"iPhone SE"}]
db<>fiddle here.
I would not use string functions. What is missing in MariaDB is the ability to unnest the array to rows - but it has all the JSON accessors we need to access to the data. Using these methods rather than string methods avoids edge cases, for example when the values contain embedded double quotes.
You would typically unnest the array with the help of a table of numbers that has at least as many rows as there are elements in the biggest array. One method to generate that on the fly is row_number() against a table with sufficient rows - say sometable.
You can unnest the arrays as follows:
select t.id,
json_unquote(json_extract(t.properties, concat('$[', n.rn, '].Label'))) as label,
json_unquote(json_extract(t.properties, concat('$[', n.rn, '].Value'))) as value
from mytable t
inner join (select row_number() over() - 1 as rn from sometable) n
on n.rn < json_length(t.properties)
The rest is just aggregation:
select t.id
from (
select t.id,
json_unquote(json_extract(t.properties, concat('$[', n.rn, '].Label'))) as label,
json_unquote(json_extract(t.properties, concat('$[', n.rn, '].Value'))) as value
from mytable t
inner join (select row_number() over() - 1 as rn from sometable) n
on n.rn < json_length(t.properties)
) t
group by id
having
max(label = 'Price' and value + 0 < 400) = 1
and max(label = 'Model' and value = 'iPhone SE') = 1
Demo on DB Fiddle
I'm trying to join array elements in BigQuery but I am getting the following error message: Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
Imagine I have two mapping tables:
CREATE OR REPLACE TABLE `test.field_id_name` (
id STRING,
name STRING
) AS (
SELECT * FROM UNNEST(
[STRUCT("s1", "string1"),
STRUCT("s2", "string2"),
STRUCT("s3", "string3")]
)
)
CREATE OR REPLACE TABLE `test.field_values` (
id STRING,
name STRING
) AS (
SELECT * FROM UNNEST(
[STRUCT("v1", "val1"),
STRUCT("v2", "val2"),
STRUCT("v3", "val3")]
)
)
And I have the following as input:
CREATE OR REPLACE TABLE `test.input` AS
SELECT [
STRUCT<id STRING, value ARRAY<STRING>>("s1", ["v1"]),
STRUCT("s2", ["v1"]),
STRUCT("s3", ["v1"])
] records
UNION ALL
SELECT [
STRUCT("s1", ["v1", "v2"]),
STRUCT("s2", ["v1", "v2"]),
STRUCT("s3", ["v1", "v2"])
]
UNION ALL
SELECT [
STRUCT("s1", ["v1", "v2", "v3"]),
STRUCT("s2", ["v1", "v2", "v3"]),
STRUCT("s3", ["v1", "v2", "v3"])
]
I am trying to produce this output:
SELECT [
STRUCT<id_mapped STRING, value_mapped ARRAY<STRING>>("string1", ["val1"]),
STRUCT("string2", ["val1"]),
STRUCT("string3", ["val1"])
] records
UNION ALL
SELECT [
STRUCT("string1", ["val1", "val2"]),
STRUCT("string2", ["val1", "val2"]),
STRUCT("string3", ["val1", "val2"])
]
UNION ALL
SELECT [
STRUCT("string1", ["val1", "val2", "val3"]),
STRUCT("string2", ["val1", "val2", "val3"]),
STRUCT("string3", ["val1", "val2", "val3"])
]
However the following query is failing with the correlated subqueries error.
SELECT
ARRAY(
SELECT
STRUCT(fin.name, ARRAY(SELECT fv.name FROM UNNEST(value) v JOIN test.field_values fv ON (v = fv.id)))
FROM UNNEST(records) r
JOIN test.field_id_name fin ON (fin.id = r.id)
)
FROM test.input
Below is for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(STRUCT(id AS id_mapped, val AS value_mapped)) AS records
FROM (
SELECT fin.name AS id, ARRAY_AGG(fv.name) AS val, FORMAT('%t', t) id1, FORMAT('%t', RECORD) id2
FROM `test.input` t,
UNNEST(records) record,
UNNEST(value) val
JOIN `test.field_id_name` fin ON record.id = fin.id
JOIN `test.field_values` fv ON val = fv.id
GROUP BY id, id1, id2
)
GROUP BY id1
If to apply to sample data from your question - returns exact output you expecting
I am trying to parse a JSON text using JSON_EXTRACT_PATH_TEXT() function.
JSON sample:
{
"data":[
{
"name":"ping",
"idx":0,
"cnt":27,
"min":16,
"max":33,
"avg":24.67,
"dev":5.05
},
{
"name":"late",
"idx":0,
"cnt":27,
"min":8,
"max":17,
"avg":12.59,
"dev":2.63
}
]
}
'
I tried JSON_EXTRACT_PATH_TEXT(event , '{"name":"late"}', 'avg') function to get 'avg' for name = "late", but it returns blank.
Can anyone help, please?
Thanks
This is a rather complicated task in Redshift, that, unlike Postgres, has poor support to manage JSON, and no function to unnest arrays.
Here is one way to do it using a number table; you need to populate the table with incrementing numbers starting at 0, like:
create table nums as
select 0 i union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 n union all select 6
union all select 7 union all select 8 union all select 9
;
Once the table is created, you can use it to walk the JSON array using json_extract_array_element_text(), and check its content with json_extract_path_text():
select json_extract_path_text(item, 'avg') as my_avg
from (
select json_extract_array_element_text(t.items, n.i, true) as item
from (
select json_extract_path_text(mycol, 'data', true ) as items
from mytable
) t
inner join nums n on n.i < json_array_length(t.items, true)
) t
where json_extract_path_text(item, 'name') = 'late';
You'll need to use json_array_elements for that:
select obj->'avg'
from foo f, json_array_elements(f.event->'data') obj
where obj->>'name' = 'late';
Working example
create table foo (id int, event json);
insert into foo values (1,'{
"data":[
{
"name":"ping",
"idx":0,
"cnt":27,
"min":16,
"max":33,
"avg":24.67,
"dev":5.05
},
{
"name":"late",
"idx":0,
"cnt":27,
"min":8,
"max":17,
"avg":12.59,
"dev":2.63
}]}');
I have the following table :
row | query_params | query_values
1 foo bar
param val
2 foo baz
JSON :
{
"query_params" : [ "foo", "param"],
"query_values" : [ "bar", "val" ]
}, {
"query_params" : [ "foo" ],
"query_values" : [ "baz" ]
}
Using legacy SQL I want to filter repeated field on their value, something like
SELECT * FROM table WHERE query_params = 'foo'
Which would output
row | query_params | query_values
1 foo bar
2 foo baz
PS : this question is related to the same question but using standard SQL answered here
I can't think of any better ideas for legacy SQL aside from using a JOIN after flattening each array separately. If you have a table T with the contents indicated above, you can do:
SELECT
[t1.row],
t1.query_params,
t2.query_values
FROM
FLATTEN((SELECT [row], query_params, POSITION(query_params) AS pos
FROM T WHERE query_params = 'foo'), query_params) AS t1
JOIN
FLATTEN((SELECT [row], query_values, POSITION(query_values) AS pos
FROM T), query_values) AS t2
ON [t1.row] = [t2.row] AND
t1.pos = t2.pos;
The idea is to associate the elements of the two arrays by row and position after filtering for query_params that are equal to 'foo'.
Try below version
SELECT [row], query_params, query_values
FROM (
SELECT [row], query_params, param_pos, query_values, POSITION(query_values) AS value_pos
FROM FLATTEN((
SELECT [row], query_params, POSITION(query_params) AS param_pos, query_values
FROM YourTable
), query_params)
WHERE query_params = 'foo'
)
WHERE param_pos = value_pos