How to select a field of a JSON object coming from the WHERE condition - sql

I have this table
id name json
1 alex {"type": "user", "items": [ {"name": "banana", "color": "yellow"}, {"name": "apple", "color": "red"} ] }
2 peter {"type": "user", "items": [ {"name": "watermelon", "color": "green"}, {"name": "pepper", "color": "red"} ] }
3 john {"type": "user", "items": [ {"name": "tomato", "color": "red"} ] }
4 carl {"type": "user", "items": [ {"name": "orange", "color": "orange"}, {"name": "nut", "color": "brown"} ] }
Important, each json object can have different number of "items", but what I need is the "product name" of JUST the object that matched in the WHERE condition.
My desired output would be the two first columns and just the name of the item, WHERE the color is like %red%:
id name fruit
1 alex apple
2 peter pepper
3 john tomato
select id, name, ***** (this is what I don't know) FROM table
where JSON_EXTRACT(json, "$.items[*].color") like '%red%'

I would recommend json_table(), if you are running MySQL 8.0:
select t.id, t.name, x.name as fruit
from mytable t
cross join json_table(
t.js,
'$.items[*]' columns (name varchar(50) path '$.name', color varchar(50) path '$.color')
) x
where x.color = 'red'
This function is not implemented in MariaDB. We can unnest manually with the help of a numbers table:
select t.id, t.name,
json_unquote(json_extract(t.js, concat('$.items[', x.num, '].name'))) as fruit
from mytable t
inner join (select 0 as num union all select 1 union all select 2 ...) x(num)
on x.num < json_length(t.js, '$.items')
where json_unquote(json_extract(t.js, concat('$.items[', x.num, '].color'))) = 'red'

You can use JSON_EXTRACT() function along with Recursive Common Table Expression in order to generate rows dynamically such as
WITH RECURSIVE cte AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM cte
WHERE cte.n < (SELECT MAX(JSON_LENGTH(json)) FROM t )
)
SELECT id, name,
JSON_UNQUOTE(JSON_EXTRACT(json,CONCAT('$.items[',n-1,'].name'))) AS fruit
FROM cte
JOIN t
WHERE JSON_EXTRACT(json,CONCAT('$.items[',n-1,'].color')) = "red"
Demo

Related

Is there a way to filter rows in BigQuery by the contents of an array?

I have data in a BigQuery table that looks like this:
[
{ "id": 1, "labels": [{"key": "a", "value": 1}, {"key": "b", "value": 2}] },
{ "id": 2, "labels": [{"key": "a", "value": 1}, {"key": "b", "value": 3}] },
// a lot more rows
]
My question is, how can I find all rows where "key" = "a", "value" = 1, but also "key" = "b" and "value" = 3?
I've tried various forms of using UNNEST but I haven't been able to get it right. The CROSS JOIN leaves me with one row for every object in the labels array, leaving me unable to query by both of them.
Try this:
select *
from mytable
where exists (select 1 from unnest(labels) where key = "a" and value=1)
and exists (select 1 from unnest(labels) where key = "b" and value=3)
Assuming there is no duplicate entries in labels array - you can use below
select *
from `project.dataset.table` t
where 2 = (
select count(1)
from t.labels kv
where kv in (('a', 1), ('b', 3))
)
You can try parse JSON and then you can apply different filter conditions on it as per your requirements, in following query I have tried to identified which all records have Key=a then I tried to identify which record has value=30 then joined them through id :-
WITH data1 AS (
SELECT '{ "id": 1, "labels": [{"key": "a", "value": 11}, {"key": "b", "value": 22}] }' as c1
union all
SELECT '{ "id": 2, "labels": [{"key": "a", "value": 10}, {"key": "b", "value": 30}] }' AS C1
)
select T1.id, T1.C1, T1.key, T2.value from
(SELECT JSON_EXTRACT_SCALAR(c1 , "$.id") AS id,
json_extract_scalar(curSection, '$.value') as value, json_extract_scalar(curSection, '$.key') as key,
c1
FROM data1 tbl LEFT JOIN unnest(json_extract_array(tbl.c1, '$.labels')
) curSection
where json_extract_scalar(curSection, '$.key')='a') T1,
(
SELECT JSON_EXTRACT_SCALAR(c1 , "$.id") AS id,
json_extract_scalar(curSection, '$.value') as value, json_extract_scalar(curSection, '$.key') as key,
c1
FROM data1 tbl LEFT JOIN unnest(json_extract_array(tbl.c1, '$.labels')
) curSection
where json_extract_scalar(curSection, '$.value')='30') T2
where T1.id = T2.id
Hopefully it may work for you.

Only extract json if field not null

I want to extract a key value from a (nullable) JSONB field. If the field is NULL, I want the record still present in my result set, but with a null field.
customer table:
id, name, phone_num, address
1, "john", 983, [ {"street":"23, johnson ave", "city":"Los Angeles", "state":"California", "current":true}, {"street":"12, marigold drive", "city":"Davis", "state":"California", "current":false}]
2, "jane", 9389, null
3, "sally", 352, [ "street":"90, park ave", "city":"Los Angeles", "state":"California", "current":true} ]
Current PostgreSQL query:
select id, name, phone_num, items.city
from customer,
jsonb_to_recordset(customer) as items(city str, current bool)
where items.current=true
It returns:
id, name, phone_num, city
1, "john", 983, "Los Angeles"
3, "sally", 352, "Los Angeles"
Required Output:
id, name, phone_num, city
1, "john", 983, "Los Angeles"
2, "jane", 9389, null
3, "sally", 352, "Los Angeles"
How do I achieve the above output?
Use a left join lateral instead of an implicit lateral join:
select c.id, c.name, c.phone_num, i.city
from customer c
left join lateral jsonb_to_recordset(c.address) as i(city str, current bool)
on i.current=true

PLSQL: remove unwanted double quotes in a Json

I have a Json like this (it is contained in a clob variable):
{"id": "33", "type": "abc", "val": "2", "cod": "", "sg1": "1", "sg2": "1"}
{"id": "359", "type": "abcef", "val": "52", "cod": "aa", "sg1": "", "sg2": "0"}
…
I need to remove " from values of: id, val, sg1, sg2
Is it possibile?
For example, I need to obtain this:
{"id": 33, "type": "abc", "val": 2, "cod": "", "sg1": 1, "sg2": 1}
{"id": 359, "type": "abcef", "val": 52, "cod": "aa", "sg1": , "sg2": 0}
…
If you are using Oracle 12 (R2?) or later then you can convert your JSON to the appropriate data types and then convert it back to JSON.
Oracle 18 Setup:
CREATE TABLE test_data ( value CLOB );
INSERT INTO test_data ( value )
VALUES ( '{"id": "33", "type": "abc", "val": "2", "cod": "", "sg1": "1", "sg2": "1"}' );
INSERT INTO test_data ( value )
VALUES ( '{"id": "359", "type": "abcef", "val": "52", "cod": "aa", "sg1": "", "sg2": "0"}' );
Query:
SELECT JSON_OBJECT(
'id' IS j.id,
'type' IS j.typ,
'val' IS j.val,
'cod' IS j.cod,
'sg1' IS j.sg1,
'sg2' IS j.sg2
) AS JSON
FROM test_data t
CROSS JOIN
JSON_TABLE(
t.value,
'$'
COLUMNS
id NUMBER(5,0) PATH '$.id',
typ VARCHAR2(10) PATH '$.type',
val NUMBER(5,0) PATH '$.val',
cod VARCHAR2(10) PATH '$.cod',
sg1 NUMBER(5,0) PATH '$.sg1',
sg2 NUMBER(5,0) PATH '$.sg2'
) j
Output:
| JSON |
| :--------------------------------------------------------------- |
| {"id":33,"type":"abc","val":2,"cod":null,"sg1":1,"sg2":1} |
| {"id":359,"type":"abcef","val":52,"cod":"aa","sg1":null,"sg2":0} |
Or, if you want to use regular expressions (you shouldn't if you have the choice and should use a proper JSON parser instead) then:
Query 2:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
value,
'"(id|val|sg1|sg2)": ""',
'"\1": "null"'
),
'"(id|val|sg1|sg2)": "(\d+|null)"',
'"\1": \2'
) AS JSON
FROM test_data
Output:
| JSON |
| :-------------------------------------------------------------------------- |
| {"id": 33, "type": "abc", "val": 2, "cod": "", "sg1": 1, "sg2": 1} |
| {"id": 359, "type": "abcef", "val": 52, "cod": "aa", "sg1": null, "sg2": 0} |
db<>fiddle here

folding over large BigQuery result

Is there any easy way for me to do something like Ocaml's fold_left on a result of a BigQuery query, where each iteration corresponds to one row in the result?
What product or approach would be the easiest way? It would be great if:
all I need to do is to supply the initial state and the 'folder' function
preferably, I'd like to write the 'folder' function in a functional language
I don't need to install any GCP package
Since I don't know which product or language would work, I cannot be more specific, but pseudocode would be like:
let my_init = []
let my_folder = fun state row ->
// append for now, but it will be complicated. I need to do some set operations here. The point is that I need some way of transferring "state" across rows, when I iterate over rows in a predefined order.
row.col1 :: state
let query = "SELECT col1, col2, col3 FROM table1 ORDER BY timestamp"
query |> List.fold my_folder my_init
The result that I want to get from this simplified example is the final "state".
--- UPDATED ---
There is no bound on the number of rows---if we receive more, we get more rows. Typically, the number is more than a few millions but it can be larger than that.
Here's a simplified example that shows the major problem I'm encountering. We have a table with a few columns:
timestamp
user_id: a string id
operation_json: a stringified JSON object, which is a list of operations, each of which corresponds to either:
add user_id to a set
remove user_id from a set
For example, the followings are valid rows:
----------+---------+----------------------------------------------
timestamp | user_id | operation_json
----------+---------+----------------------------------------------
1 | id1 | [ { "op": "add", "set": "set1" } ]
2 | id2 | [ { "op": "add", "set": "set1" } ]
3 | id1 | [ { "op": "add", "set": "set2" } ]
4 | id3 | [ { "op": "add", "set": "set2" } ]
5 | id1 | [ { "op": "remove", "set": "set1" } ]
----------+---------+----------------------------------------------
As a result, I'd like to get sets of users; i.e.,
set1 |-> { id2 }
set2 |-> { id1, id3 }
I thought fold_left-like operation would be convenient. The state would be map>, and the initial-state would be an empty map.
Below [quick and simple] example for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION fold(arr ARRAY<INT64>, init INT64)
RETURNS FLOAT64
LANGUAGE js AS """
const reducer = (accumulator, currentValue) => accumulator + parseInt(currentValue);
return arr.reduce(reducer, 5);
""";
WITH `project.dataset.table` AS (
SELECT 1 id, [1, 2, 3, 4] arr, 5 initial_state UNION ALL
SELECT 2, [1, 2, 3, 4, 5, 6, 7], 10
)
SELECT id, fold(arr, initial_state) result
FROM `project.dataset.table`
output is
Row id result
1 1 15.0
2 2 33.0
I think it is self-explanatory enough
See more for JS UDF
folding list of rows
See below extension of above
Here you are assembling array from the result's rows before applying fold function (of course you have some limits for UDF here to have in mind and also on how big your ARRAY of rows can go, etc.
#standardSQL
CREATE TEMP FUNCTION fold(arr ARRAY<INT64>, init INT64)
RETURNS FLOAT64
LANGUAGE js AS """
const reducer = (accumulator, currentValue) => accumulator + parseInt(currentValue);
return arr.reduce(reducer, 5);
""";
WITH `project.dataset.table` AS (
SELECT 1 id, 1 item UNION ALL
SELECT 1, 2 UNION ALL
SELECT 1, 3 UNION ALL
SELECT 1, 4 UNION ALL
SELECT 2, 1 UNION ALL
SELECT 2, 2 UNION ALL
SELECT 2, 3 UNION ALL
SELECT 2, 4 UNION ALL
SELECT 2, 5 UNION ALL
SELECT 2, 6 UNION ALL
SELECT 2, 7
)
SELECT id, fold(ARRAY_AGG(item), 5) result
FROM `project.dataset.table`
GROUP BY id
Note, if you need to include more than one field from each row - you can use ARRAY of STRUCT as below example
ARRAY_AGG(STRUCT(id , item) ORDER by id)
Of course, you will need to adjust respectively signature of fold UDF
For example:
#standardSQL
CREATE TEMP FUNCTION fold(arr ARRAY<STRUCT<id INT64, item INT64>>, init INT64)
RETURNS FLOAT64
LANGUAGE js AS """
const reducer = (accumulator, currentValue) => accumulator + parseInt(currentValue.item);
return arr.reduce(reducer, 5);
""";
WITH `project.dataset.table` AS (
SELECT 1 id, 1 item UNION ALL
SELECT 1, 2 UNION ALL
SELECT 1, 3 UNION ALL
SELECT 1, 4 UNION ALL
SELECT 2, 1 UNION ALL
SELECT 2, 2 UNION ALL
SELECT 2, 3 UNION ALL
SELECT 2, 4 UNION ALL
SELECT 2, 5 UNION ALL
SELECT 2, 6 UNION ALL
SELECT 2, 7
)
SELECT id, fold(ARRAY_AGG(t), 5) result
FROM `project.dataset.table` t
GROUP BY id
Below approach has nothing to do with folding per se, but rather attempt to translate your challenge into set-based one (which is more natural for when you dealing with sql) by identifying the latest op action for each user per set and if it is "remove" just eliminate that user from further consideration - if it is "add" just use the latest "add" for that user / set. This in assumption that there cannot be multiple consecutive "add" action for the same user / set - rather - it can be add /remove / add and so on. of course this can be further adjusted based on real use case
So having above in mind - below example for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 ts, 'id1' user_id, '[ { "op": "add", "set": "set1" } ]' operation_json UNION ALL
SELECT 2, 'id2', '[ { "op": "add", "set": "set1" } ]' UNION ALL
SELECT 3, 'id1', '[ { "op": "add", "set": "set2" } ]' UNION ALL
SELECT 4, 'id3', '[ { "op": "add", "set": "set2" } ]' UNION ALL
SELECT 5, 'id1', '[ { "op": "remove", "set": "set1" } ]'
)
SELECT bin, STRING_AGG(user_id, ',' ORDER BY ts) result
FROM (
SELECT user_id, bin, ARRAY_AGG(ts ORDER BY ts DESC LIMIT 1)[OFFSET(0)] ts
FROM (
SELECT ts, user_id, op, bin, LAST_VALUE(op) OVER(win) fin
FROM (
SELECT ts, user_id,
JSON_EXTRACT_SCALAR(REGEXP_REPLACE(operation_json, r'^\[|\]$', ''), '$.op') op,
JSON_EXTRACT_SCALAR(REGEXP_REPLACE(operation_json, r'^\[|\]$', ''), '$.set') bin
FROM `project.dataset.table`
)
WINDOW win AS (
PARTITION BY user_id, bin
ORDER BY ts
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
)
WHERE fin = 'add'
GROUP BY user_id, bin
)
GROUP BY bin
-- ORDER BY bin
output is
Row bin result
1 set1 id2
2 set2 id1,id3
if to apply to below dummy data
WITH `project.dataset.table` AS (
SELECT 1 ts, 'id1' user_id, '[ { "op": "add", "set": "set1" } ]' operation_json UNION ALL
SELECT 2, 'id2', '[ { "op": "add", "set": "set1" } ]' UNION ALL
SELECT 3, 'id1', '[ { "op": "add", "set": "set2" } ]' UNION ALL
SELECT 4, 'id3', '[ { "op": "add", "set": "set2" } ]' UNION ALL
SELECT 5, 'id1', '[ { "op": "remove", "set": "set1" } ]' UNION ALL
SELECT 6, 'id1', '[ { "op": "add", "set": "set1" } ]' UNION ALL
SELECT 7, 'id1', '[ { "op": "remove", "set": "set1" } ]' UNION ALL
SELECT 8, 'id1', '[ { "op": "add", "set": "set1" } ]' UNION ALL
SELECT 9, 'id1', '[ { "op": "remove", "set": "set2" } ]' UNION ALL
SELECT 10, 'id1', '[ { "op": "add", "set": "set2" } ]'
)
result will be
Row bin result
1 set1 id2,id1
2 set2 id3,id1

Postgres Build Complex JSON Object from Wide Column Like Design to Key Value

I could really use some help here before my mind explodes...
Given the following data structure:
SELECT * FROM (VALUES (1, 1, 1, 1), (2, 2, 2, 2)) AS t(day, apple, banana, orange);
day | apple | banana | orange
-----+-------+--------+--------
1 | 1 | 1 | 1
2 | 2 | 2 | 2
I want to construct a JSON object which looks like the following:
{
"data": [
{
"day": 1,
"fruits": [
{
"key": "apple",
"value": 1
},
{
"key": "banana",
"value": 1
},
{
"key": "orange",
"value": 1
}
]
}
]
}
Maybe I am not so far away from my goal:
SELECT json_build_object(
'data', json_agg(
json_build_object(
'day', t.day,
'fruits', t)
)
) FROM (VALUES (1, 1, 1, 1), (2, 2, 2, 2)) AS t(day, apple, banana, orange);
Results in:
{
"data": [
{
"day": 1,
"fruits": {
"day": 1,
"apple": 1,
"banana": 1,
"orange": 1
}
}
]
}
I know that there is json_each which may do the trick. But I am struggling to apply it to the query.
Edit:
This is my updated query which, I guess, is pretty close. I have dropped the thought to solve it with json_each. Now I only have to return an array of fruits instead appending to the fruits object:
SELECT json_build_object(
'data', json_agg(
json_build_object(
'day', t.day,
'fruits', json_build_object(
'key', 'apple',
'value', t.apple,
'key', 'banana',
'value', t.banana,
'key', 'orange',
'value', t.orange
)
)
)
) FROM (VALUES (1, 1, 1, 1), (2, 2, 2, 2)) AS t(day, apple, banana, orange);
Would I need to add a subquery to prevent a nested aggregate function?
Use the function jsonb_each() to get pairs (key, value), so you do not have to know the number of columns and their names to get a proper output:
select jsonb_build_object('data', jsonb_agg(to_jsonb(s) order by day))
from (
select day, jsonb_agg(jsonb_build_object('key', key, 'value', value)) as fruits
from (
values (1, 1, 1, 1), (2, 2, 2, 2)
) as t(day, apple, banana, orange),
jsonb_each(to_jsonb(t)- 'day')
group by 1
) s;
The above query gives this object:
{
"data": [
{
"day": 1,
"fruits": [
{
"key": "apple",
"value": 1
},
{
"key": "banana",
"value": 1
},
{
"key": "orange",
"value": 1
}
]
},
{
"day": 2,
"fruits": [
{
"key": "apple",
"value": 2
},
{
"key": "banana",
"value": 2
},
{
"key": "orange",
"value": 2
}
]
}
]
}