MS SQL json query/where clause nested array items - sql

I have json data that i can query on using CROSS APPLY OPENJSON( which gets slow once you start adding multiple cross applies or once your json document get too large. So i wanted to add an index on the data im trying to filter on, but i cant get the syntax on nested array items to work with out using a cross apply. As such i cant create an index as you cant use a cross apply when making an index. According to the MS docs i should just be able to do
JSON_query(my_column, $.parentItem.nestedItemsArray1.nestedItemsArray2)
I should be able to get all the values of the nested, array items to then query on and improve performance by adding an index, something like this
ALTER TABLE mytable
ADD vdata AS JSON_query(my_column,
$.parentItem.nestedItemsArray1.nestedItemsArray2')
CREATE INDEX idx_json_my_column ON mytable(vdata)
but the above $.array.arrayitems syntax doesn't work ?
On a side note, I cant help but think in relational terms where normally in Sql you would index a column of data like so
col
---
1|
2|
3|
But json data seem to get flattened so when i use JSON_QUERY as per MS example i get "1,2,3" " I assume i want to incdex an array of values rather than a flattened version unless the index will return the inner data of the fattened data ?
my plug and play working example
declare #mydata table (
ID int NOT NULL,
jsondata varchar(max) NOT NULL
)
INSERT INTO #mydata (id, jsondata)
VALUES (789, '{ "Id": "12345", "FinanceProductResults": [ { "Term": 12, "AnnualMileage": 5000, "Deposits": 0, "ProductResults": [] }, { "Term": 18, "AnnualMileage": 30000, "Deposits": 15000, "ProductResults": [] }, { "Term": 24, "AnnualMileage": 5000, "Deposits": 0, "ProductResults": [ { "Key": "HP", "Payment": 460.28 } ] }, { "Term": 24, "AnnualMileage": 10000, "Deposits": 0, "ProductResults": [ { "Key": "HP", "Payment": 500.32 } ] }]}')
SELECT
j_Id
,JSON_query (c.value, '$.Term') as Term
,JSON_Value (c.value, '$.AnnualMileage') as AnnualMileage
,JSON_Value (c.value, '$.Deposits') as Deposits
,JSON_Value (p.value, '$.Key') as [Key]
,JSON_Value (p.value, '$.Payment') as Payment
--,c.value
FROM #mydata f
CROSS APPLY OPENJSON(f.jsondata)
WITH (j_Id nvarchar(100) '$.Id')
CROSS APPLY OPENJSON(f.jsondata, '$.FinanceProductResults') AS c
CROSS APPLY OPENJSON(c.value, '$."ProductResults"') AS p
where
ID = 789
AND JSON_Value (p.value, '$.Payment') = '460.28'
I'm using these MS docs to guide me :
How to create an index
How to get data
Update
I was able to improve performance slightly using the "with" method
SELECT
j_Id,
FinanceDetails.Term,
FinanceDetails.AnnualMileage,
FinanceDetails.Deposits,
Payments.Payment
FROM #mydata f
CROSS APPLY OPENJSON(f.jsondata)
WITH (j_Id nvarchar(100) '$.Id')
OUTER APPLY OPENJSON (f.jsondata, '$.FinanceProductResults' )
WITH (
Term INT '$.Term',
AnnualMileage INT '$.AnnualMileage',
Deposits INT '$.Deposits',
ProductResults NVARCHAR(MAX) '$.ProductResults' AS JSON
) AS FinanceDetails
OUTER APPLY OPENJSON(ProductResults, '$')
WITH (
Payment DECIMAL(19, 4) '$.Payment'
) AS Payments
WHERE
Payments.Payment = 460.28
but i still like to add an index on the sub array data to aid in improving performance ?

Currently, you cannot index nested properties.
Is Full-text search possible option? You might create FTS on JSON column and add predicate:
WHERE ....
AND CONTAINS( jsondata, 'NEAR(('Payments,460),1)')
Since JSON is text, this predicate will filter out all records that don't have something like "Payment" and 460 near to each other (this will identify key:value pairs), and you can apply CROSS APPLY on the reduced set of rows.

Related

Add data type to JSON values query

I am building the following view on SQL Server. The data is extracted from Soap API via Data Factory and stored in SQL table. I union two pieces of the same code because I am getting outputs from API as a Objects or Arrays.
This query works, however when try to order or filter the view got this error:
JSON text is not properly formatted. Unexpected character '.' is found at position 12.
/*Object - invoiceDetails*/
SELECT
XML.[CustomerID],
XML.[SiteId],
XML.[Date],
JSON_VALUE(m.[value], '$.accountNbr') AS [AccountNumber],
JSON_VALUE(m.[value], '$.actualUsage') AS [ActualUsage]
FROM stage.Bill XML
CROSS APPLY openjson(XML.xmldata) AS n
CROSS APPLY openjson(n.value, '$.invoiceDetails') AS m
WHERE XML.XmlData IS NOT NULL
AND ISJSON (XML.xmldata) > 0
AND n.type = 5
AND m.type = 5
UNION ALL
/*Array - invoiceDetails*/
SELECT
XML.[CustomerID],
XML.[SiteId],
XML.[Date],
JSON_VALUE(o.[value], '$.accountNbr') AS [AccountNumber],
JSON_VALUE(o.[value], '$.actualUsage') AS [ActualUsage]
FROM stage.Bill XML
CROSS APPLY openjson(XML.xmldata) AS n
CROSS APPLY openjson(n.value) AS m
CROSS APPLY openjson(m.value, '$.invoiceDetails') AS o
WHERE XML.XmlData IS NOT NULL
AND ISJSON (XML.xmldata) > 0
AND n.type = 4
Just did a small exercise using the WITH clause in order to specify data types to the values and noticed I am able to order and filter the view. So I believe the way is to add data type to this query. My problem now is that I don't know how to add WITH clause to my query in order to make it work.
Any advice?
My apologies, you might find the XML naming and prefixes confusing. In my first tests, I supposed to received XML data from Data Factory so my table and columns includes XML prefixes untill I noticed Data Factory delivers JSON data, so started query data as JSON but I have not changed table name and prefixes.
Thank you for your comments to enrich this question. I took this example below from a YouTube channel (Not sure if I am allowed to mention channel's name:https://www.youtube.com/watch?v=yl9jKGgASTY&t=474s). I believe it is a similar approach. Could you please help how could I add the WITH clause in order to specify data types?
DECLARE #json NVARCHAR(MAX)
SET #json =
N'{
"OrderHeader": [
{
"OrderID": 100,
"CustomerID": 2000,
"OrderDetail": [
{
"ProductID": 2000,
"UnitPrice": 350
},
{
"ProductID": 5000,
"UnitPrice": 800
},
{
"ProductID": 9000,
"UnitPrice": 200
}
]
}
]
}'
SELECT
JSON_VALUE(a.value, '$.OrderID') AS OrderID,
JSON_VALUE(a.value, '$.CustomerID') AS CustomerID,
JSON_VALUE(a.value, '$.ProductID') AS ProductID,
JSON_VALUE(a.value, '$.UnitPrice') AS UnitPrice
FROM OPENJSON(#json, '$.OrderHeader') AS a
CROSS APPLY OPENJSON(a.value, 'OrderDetail') AS b
You have a couple of typos.
The second OPENJSON should have the path starting $.
The third and fourth SELECT columns should use b.value not a.value
DECLARE #json NVARCHAR(MAX)
SET #json =
N'{
"OrderHeader": [
{
"OrderID": 100,
"CustomerID": 2000,
"OrderDetail": [
{
"ProductID": 2000,
"UnitPrice": 350
},
{
"ProductID": 5000,
"UnitPrice": 800
},
{
"ProductID": 9000,
"UnitPrice": 200
}
]
}
]
}'
SELECT
JSON_VALUE(a.value, '$.OrderID') AS OrderID,
JSON_VALUE(a.value, '$.CustomerID') AS CustomerID,
JSON_VALUE(b.value, '$.ProductID') AS ProductID,
JSON_VALUE(b.value, '$.UnitPrice') AS UnitPrice
FROM OPENJSON(#json, '$.OrderHeader') AS a
CROSS APPLY OPENJSON(a.value, '$.OrderDetail') AS b;
An alternative syntax is to use OPENJSON on each object with an explicit schema
SELECT
oh.OrderID,
oh.CustomerID,
od.ProductID,
od.UnitPrice
FROM OPENJSON(#json, '$.OrderHeader')
WITH (
OrderID int,
CustomerID int,
OrderDetail nvarchar(max) AS JSON
) AS oh
CROSS APPLY OPENJSON(oh.OrderDetail)
WITH (
ProductID int,
UnitPrice decimal(18,9)
) AS od;
db<>fiddle

Extract complex json with random key field

I am trying to extract the following JSON into its own rows like the table below in Presto query. The issue here is the name of the key/av engine name is different for each row, and I am stuck on how I can extract and iterate on the keys without knowing the value of the key.
The json is a value of a table row
{
"Bkav":
{
"detected": false,
"result": null,
},
"Lionic":
{
"detected": true,
"result": Trojan.Generic.3611249',
},
...
AV Engine Name
Detected Virus
Result
Bkav
false
null
Lionic
true
Trojan.Generic.3611249
I have tried to use json_extract following the documentation here https://teradata.github.io/presto/docs/141t/functions/json.html but there is no mention of extraction if we don't know the key :( I am trying to find a solution that works in both presto & hive query, is there a common query that is applicable to both?
You can cast your json to map(varchar, json) and process it with unnest to flatten:
-- sample data
WITH dataset (json_str) AS (
VALUES (
'{"Bkav":{"detected": false,"result": null},"Lionic":{"detected": true,"result": "Trojan.Generic.3611249"}}'
)
)
--query
select k "AV Engine Name", json_extract_scalar(v, '$.detected') "Detected Virus", json_extract_scalar(v, '$.result') "Result"
from (
select cast(json_parse(json_str) as map(varchar, json)) as m
from dataset
)
cross join unnest (map_keys(m), map_values(m)) t(k, v)
Output:
AV Engine Name
Detected Virus
Result
Bkav
false
Lionic
true
Trojan.Generic.3611249
The presto query suggested by #Guru works, but for hive, there is no easy way.
I had to extract the json
Parse it with replace to remove some character and bracket
Then convert it back to a map, and repeat for one more time to get the nested value out
SELECT
av_engine,
str_to_map(regexp_replace(engine_result, '\\}', ''),',', ':') AS output_map
FROM (
SELECT
str_to_map(regexp_replace(regexp_replace(get_json_object(raw_response, '$.scans'), '\"', ''), '\\{',''),'\\},', ':') AS key_val_map
FROM restricted_antispam.abuse_malware_scanning
) AS S
LATERAL VIEW EXPLODE(key_val_map) temp AS av_engine, engine_result

Extracting Array elements in Presto w/o using unnest function

I have a requirement around this data where I need to extract array elements but I still want to keep them grouped, which means I can not use unnest function. Below is the sample data:
[
{ "emp_id": 8291828, "name": "bruce", },
{ "emp_id": 8291823, "name": "Rolli" }
]
My data is in the same format as above,i.e. (array(row(emp_id varchar, name varchar))) what I need is to get rid of the array, so that data look like
{ "emp_id": 8291828, "name": "bruce", },
{ "emp_id": 8291823, "name": "Rolli" }
Would appreciate if anyone can help me on this.
You could use element_at If you have a sequence table (1,2,3, ..).
with numbers as
(
select * from
(
Values
(1),(2),(3)
) as x(i)
)
,emp as
(
select *
from (
values
(ARRAY[cast(ROW(8291828,'bruce') as row(emp_id bigint, name varchar)), cast(row(8291823,'Rolli') as row(emp_id bigint, name varchar))])
) as emp (records)
)
select
element_at(emp.records,i) record
from numbers n
cross join emp
where n.i <= cardinality(emp.records);

Select from JSON Array postgresql JSON column

I have the following JSON stored in a PostgreSQL JSON column
{
"status": "Success",
"message": "",
"data": {
"serverIp": "XXXX",
"ruleId": 32321,
"results": [
{
"versionId": 555555,
"PriceID": "8abf35ec-3e0e-466b-a4e5-2af568e90eec",
"price": 550,
"Convert": 0.8922953080331764,
"Cost": 10
}
]
}
}
I would like to search for a specific priceID across the entire JSON column (name info) and select the entire results element by the PriceID.
How do i do that in postgresql JSON?
One option uses exists and json(b)_array_elements(). Assuming that your table is called mytable and that the jsonb column is mycol, this would look like:
select t.*
from mytable t
where exists (
select 1
from jsonb_array_elements(t.mycol -> 'data' -> 'results') x(elt)
where x.elt ->> 'PriceID' = '8abf35ec-3e0e-466b-a4e5-2af568e90eec'
)
In the subquery, jsonb_array_elements() unnest the json array located at the given path. Then, the where clause ensures that at least one elment in the array has the given PriceID.
If your data is of json datatype rather than jsonb, you need to use json_array_elements() instead of jsonb_array_elements().
If you want to display some information coming from the matching array element, then it is different. You can use a lateral join instead of exists. Keep in mind, though, that this will duplicate the rows if more than one array element matches:
select t.*, x.elt ->> 'price' price
from mytable t
cross join lateral jsonb_array_elements(t.mycol -> 'data' -> 'results') x(elt)
where x.elt ->> 'PriceID' = '8abf35ec-3e0e-466b-a4e5-2af568e90eec'

Using Postgres JSON Functions on table columns

I have searched extensively (in Postgres docs and on Google and SO) to find examples of JSON functions being used on actual JSON columns in a table.
Here's my problem: I am trying to extract key values from an array of JSON objects in a column, using jsonb_to_recordset(), but get syntax errors. When I pass the object literally to the function, it works fine:
Passing JSON literally:
select *
from jsonb_to_recordset('[
{ "id": 0, "name": "400MB-PDF.pdf", "extension": ".pdf",
"transferId": "ap31fcoqcajjuqml6rng"},
{ "id": 0, "name": "1000MB-PDF.pdf", "extension": ".pdf",
"transferId": "ap31fcoqcajjuqml6rng"}
]') as f(name text);`
results in:
400MB-PDF.pdf
1000MB-PDF.pdf
It extracts the value of the key "name".
Here's the JSON in the column, being extracted using:
select journal.data::jsonb#>>'{context,data,files}'
from journal
where id = 'ap32bbofopvo7pjgo07g';
resulting in:
[ { "id": 0, "name": "400MB-PDF.pdf", "extension": ".pdf",
"transferId": "ap31fcoqcajjuqml6rng"},
{ "id": 0, "name": "1000MB-PDF.pdf", "extension": ".pdf",
"transferId": "ap31fcoqcajjuqml6rng"}
]
But when I try to pass jsonb#>>'{context,data,files}' to jsonb_to_recordset() like this:
select id,
journal.data::jsonb#>>::jsonb_to_recordset('{context,data,files}') as f(name text)
from journal
where id = 'ap32bbofopvo7pjgo07g';
I get a syntax error. I have tried different ways but each time it complains about a syntax error:
Version:
PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
The expressions after select must evaluate to a single value. Since jsonb_to_recordset returns a set of rows and columns, you can't use it there.
The solution is a cross join lateral, which allows you to expand one row into multiple rows using a function. That gives you single rows that select can act on. For example:
select *
from journal j
cross join lateral
jsonb_to_recordset(j.data#>'{context, data, files}') as d(id int, name text)
where j.id = 'ap32bbofopvo7pjgo07g'
Note that the #>> operator returns type text, and the #> operator returns type jsonb. As jsonb_to_recordset expects jsonb as its first parameter I'm using #>.
See it working at rextester.com
jsonb_to_recordset is a set-valued function and can only be invoked in specific places. The FROM clause is one such place, which is why your first example works, but the SELECT clause is not.
In order to turn your JSON array into a "table" that you can query, you need to use a lateral join. The effect is rather like a foreach loop on the source recordset, and that's where you apply the jsonb_to_recordset function. Here's a sample dataset:
create table jstuff (id int, val jsonb);
insert into jstuff
values
(1, '[{"outer": {"inner": "a"}}, {"outer": {"inner": "b"}}]'),
(2, '[{"outer": {"inner": "c"}}]');
A simple lateral join query:
select id, r.*
from jstuff
join lateral jsonb_to_recordset(val) as r("outer" jsonb) on true;
id | outer
----+----------------
1 | {"inner": "a"}
1 | {"inner": "b"}
2 | {"inner": "c"}
(3 rows)
That's the hard part. Note that you have to define what your new recordset looks like in the AS clause -- since each element in our val array is a JSON object with a single field named "outer", that's what we give it. If your array elements contain multiple fields you're interested in, you declare those in a similar manner. Be aware also that your JSON schema needs to be consistent: if an array element doesn't contain a key named "outer", the resulting value will be null.
From here, you just need to pull the specific value you need out of each JSON object using the traversal operator as you were. If I wanted only the "inner" value from the sample dataset, I would specify select id, r.outer->>'inner'. Since it's already JSONB, it doesn't require casting.