extract nested values as a list - sql

I have a table with a column "TAGS". Each row in this column has a bunch of dictionaries separated by commas. It looks like this:
{
"id": "334",
"name": "A",
"synonyms": "tul",
"path": [
"179",
"1689",
]
},
{
"id": "8999",
"name": "B",
"synonyms": "hh",
"path": [
"1098",
"167",
]
}
I want to create a new column "NAMES" that contains a list of all names. For example this:
NAMES
["A", "B"]
Select * from TAGS_TABLE
How can I do this?

Well your data is "almost" JSON, so if we convert it to json, we can then parse it, and flatten it:
with data as (
select parse_json('['||column1||']') as json from values
('{
"id": "334",
"name": "A",
"synonyms": "tul",
"path": [
"179",
"1689",
]
},
{
"id": "8999",
"name": "B",
"synonyms": "hh",
"path": [
"1098",
"167",
]
}'),
('{
"id": "334",
"name": "C",
"synonyms": "tul",
"path": [
"179",
"1689",
]
},
{
"id": "8999",
"name": "D",
"synonyms": "hh",
"path": [
"1098",
"167",
]
}')
)
select array_agg(f.value:name) within group (order by f.index) as output
from data d,
lateral flatten(input=>d.json) f
group by f.seq
order by f.seq
gives:
OUTPUT
[ "A", "B" ]
[ "C", "D" ]
REGEXP_SUBSTR_ALL
As already given to you in your other question...
select regexp_substr_all(column1, '"name"\\s*:\\s*"([^"]*)"',1,1,'e') as answer
from data;

Related

Karate - How count number of instances of element in JSON response with embedded elements

I want to work out the total number of occurences of 'id' in the following JSON String.
Does Karate have a quick way of doing this?
If it was at the top level I could do response.result.length but they are are in the embedded elements of 'test'. I could do this in javascript but just wondering if Karate has a quicker method.
{
"result": [
{
"test": [
{
"id": "x",
"price": "£5.00"
},
{
"id": "y",
"price": "£10.00"
},
{
"id": "z",
"price": "£10.00"
},
{
"id": "a",
"price": "£10.00"
}
]
},
{
"test": [
{
"id": "b",
"price": "£5.00"
},
{
"id": "c",
"price": "£10.00"
}
]
}
]
}
Here you go:
* def ids = $..id
* assert ids.length == 6
Do take some time to read about JsonPath in the docs.

BigQuery select rows with two (or more / less) matches in a repeated field

I am having a schema that looks like:
[
{
"name": "name",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "frm",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "c",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "n",
"type": "STRING",
"mode": "REQUIRED"
}
]
},
{
"name": "",
"type": "STRING",
"mode": "NULLABLE"
}
]
With a sample record that looks like this:
I am trying to write a query that selects this row when there is a row in frm that matches C = 'X' and another row that has C = 'Z'. Only when both conditions are true, I would love to select the "name" of the parent row. I actually have no clue how I could achieve this. Any suggestions?
E.g. this works, but I am unnesting frm two times, there must a more efficient way I guess.
SELECT name FROM `t2`
WHERE 'X' in UNNEST(frm.c) AND 'Y' in UNNEST(frm.c)
Consider below approach
select name
from your_table t
where 2 = (
select count(distinct c)
from t.frm
where c in ('X', 'Z')
)

Filtering out objects from multiple arrays in a JSONB column

I have a JSON structure with two arrays saved in a JSONB column. A bit simplified it looks like this
{
"prop1": "abc",
"prop2": "xyz",
"items": [
{
"itemId": "123",
"price": "10.00"
},
{
"itemId": "124",
"price": "9.00"
},
{
"itemId": "125",
"price": "8.00"
}
],
"groups": [
{
"groupId": "A",
"discount": "20",
"discountId": "1"
},
{
"groupId": "B",
"discount": "30",
"discountId": "2"
},
{
"groupId": "B",
"discount": "20",
"discountId": "3"
},
{
"groupId": "C",
"discount": "40",
"discountId": "4"
}
]
}
Schema:
CREATE TABLE campaign
(
id TEXT PRIMARY KEY,
data JSONB
);
Since each row (data column) can be fairly large, I'm trying to filter out matching item objects and group objects from the items and groups arrays.
My current query is this
SELECT * FROM campaign
WHERE
(data -> 'items' #> '[{"productId": "123"}]') OR
(data -> 'groups' #> '[{"groupId": "B"}]')
which returns rows containing either the matching group or the matching item. However, depending on the row, the data column can be a fairly large JSON object (there may be hundreds of objects in items and tens in groups and I've omitted several keys/properties for brevity in this example) which is affecting query performance (I've added GIN indexes on the items and groups arrays, so missing indices is not why it's slow).
How can I filter out the items and groups arrays to only contain matching elements?
Given this matching row
{
"prop1": "abc",
"prop2": "xyz",
"items": [
{
"itemId": "123",
"price": "10.00"
},
{
"itemId": "124",
"price": "9.00"
},
{
"itemId": "125",
"price": "8.00"
}
],
"groups": [
{
"groupId": "A",
"discount": "20",
"discountId": "1"
},
{
"groupId": "B",
"discount": "30",
"discountId": "2"
},
{
"groupId": "B",
"discount": "20",
"discountId": "3"
},
{
"groupId": "C",
"discount": "40",
"discountId": "4"
}
]
}
I'd like the result to be something like this (the matching item/group could be in different columns from the rest of the data column - doesn't have to be returned in a single JSON object with two arrays like this, but I would prefer it if doesn't affect performance or lead to a really hairy query):
{
"prop1": "abc",
"prop2": "xyz",
"items": [
{
"itemId": "123",
"price": "10.00"
}
],
"groups": [
{
"groupId": "B"
"discount": "20",
"discountId": "3"
}
]
}
What I've managed to do so far is unwrap and match an object in the items array using this query, which removes the 'items' array from the data column and filters out the matching item object to a separate column, but I'm struggling to join this with matches in the groups array.
SELECT data - 'items', o.obj
FROM campaign c
CROSS JOIN LATERAL jsonb_array_elements(c.data #> '{items}') o(obj)
WHERE o.obj ->> 'productId' = '124'
How can I filter both arrays in one query?
Bonus question: For the groups array I also want to return the object with the lowest discount value if possible. Or else the result would need to be an array of matching group objects instead of a single matching group.
Related questions: How to filter jsonb array elements and How to join jsonb array elements in Postgres?
If your postgres version is 12 or more, you can use the jsonpath language and functions. The query below returns the expected result with the subset of items and groups which match the given criteria. Then you can adapt this query within a sql function so that the search criteria is an input parameter.
SELECT jsonb_set(jsonb_set( data
, '{items}'
, jsonb_path_query_array(data, '$.items[*] ? (#.itemId == "123" && #.price == "10.00")'))
, '{groups}'
, jsonb_path_query_array(data, '$.groups[*] ? (#.groupId == "B" && #.discount == "20" && #.discountId == "3")'))
FROM (SELECT
'{
"prop1": "abc",
"prop2": "xyz",
"items": [
{
"itemId": "123",
"price": "10.00"
},
{
"itemId": "124",
"price": "9.00"
},
{
"itemId": "125",
"price": "8.00"
}
],
"groups": [
{
"groupId": "A",
"discount": "20",
"discountId": "1"
},
{
"groupId": "B",
"discount": "30",
"discountId": "2"
},
{
"groupId": "B",
"discount": "20",
"discountId": "3"
},
{
"groupId": "C",
"discount": "40",
"discountId": "4"
}
]
}' :: jsonb) AS d(data)
WHERE jsonb_path_exists(data, '$.items[*] ? (#.itemId == "123" && #.price == "10.00")')
AND jsonb_path_exists(data, '$.groups[*] ? (#.groupId == "B" && #.discount == "20" && #.discountId == "3")')

Extract array from varchar in PrestoSQL

I have a VARCHAR field like this:
[
{
"config": 0,
"type": "0
},
{
"config": x,
"type": "1"
},
{
"config": "",
"type": ""
},
{
"config": [
{
"address": {},
"category": "",
"merchant": {
"data": [
10,12,23
],
"file": 0
},
"range_id": 1,
"shop_id_info": null
}
],
"type": "new"
}
]
And I need to extract merchant data from this. Desirable output is:
10
12
23
Please advise. I keep getting Cannot cast VARCHAR to array/unnest type VARCHAR
You can try using json path $.*.config.*.merchant.data.* but if it does not work for you (as for me in Athena version, where arrays in json path are not supported well) you can cast your json to ARRAY(JSON) and do some manipultaions from there (needed to fix your JSON a little bit):
Test data:
WITH dataset AS (
SELECT * FROM (VALUES
(JSON '[
{
"config": {},
"type": "0"
},
{
"config": "x",
"type": "1"
},
{
"config": "",
"type": ""
},
{
"config": [
{
"address": {},
"category": "",
"merchant": {
"data": [
10,12,23
],
"file": 0
},
"range_id": 1,
"shop_id_info": null
}
],
"type": "new"
}
]')
) AS t (json_value))
And query:
SELECT flatten(
transform(
flatten(
transform(
CAST(json_value AS ARRAY(JSON))
, json_object -> try(CAST(json_extract(json_object, '$.config') AS ARRAY(JSON))))),
json_config -> CAST(json_extract(json_config, '$.merchant.data') as ARRAY(INTEGER))))
FROM dataset
Which will give you array of numbers:
_col0
[10, 12, 23]
And from there you can continue with unnest and so on if needed.

PostgreSQL (v9.6) query that filters JSON array elements by key/value

We have a jsonb column with data of the type:
"basket": {
"total": 6,
"items": [
{ "type": "A", "name": "A", "price": 1 },
{ "type": "A", "name": "B", "price": 2 },
{ "type": "C", "name": "C", "price": 3 },
]
}
We need to construct few queries that will filter specific elements of the items[] array for SELECT and SUM.
We have PG v9.6 so using jsonb_path_query_array didn't work.
Using basket->'items' #> '{"type":"A"}' works to find all entries that has type-A.
But how do we get subquery to
select only basket items of type-A
sum of prices of items of type-A
Thank you!
This will select the required items:
select * from jsonb_array_elements('{"basket":
{
"total": 6,
"items": [
{ "type": "A", "name": "A", "price": 1 },
{ "type": "A", "name": "B", "price": 2 },
{ "type": "C", "name": "C", "price": 3 }
]
}}'::jsonb#>'{basket,items}') e(it)
where it->>'type' = 'A';
and this the sum of prices:
select sum(cast(it->>'price' as numeric)) from jsonb_array_elements('{"basket":
{
"total": 6,
"items": [
{ "type": "A", "name": "A", "price": 1 },
{ "type": "A", "name": "B", "price": 2 },
{ "type": "C", "name": "C", "price": 3 }
]
}}'::jsonb#>'{basket,items}') e(it)
where it->>'type' = 'A';