How to query nested object in kusto? - kql

I have a kusto table with one of the columns as dynamic type with nested json. The dynamic object is two-dimensional array.
{
"OtherField": "Unknown",
"First": [
{
"Id": "",
"Second": [
{
"ConfidenceLevel": "Low",
"Count": 3
}
]
},
{
"Id": "",
"Second": [
{
"ConfidenceLevel": "High",
"Count": 2
},
{
"ConfidenceLevel": "Low",
"Count": 2
}
]
}
]
}
Previously we use "tostring(ColumnName) has_cs '"Level":"High"'" to select rows if "Level" was matched, but now I want to select "Level == 'High' and Count > 0".
For this two-dimensional array, if one item was matched, then this row should be selected. How to implement nested object query in kusto?
I tried regex,
tostring(ColumnName) matches regex '"Level":"High","Count":[^0]'
but during review, regex was not allowed.
Then I tried "mv-expand" or "mv-apply", but it seems passing column name to toscalar function was not allowed.
How to pass column name to toscalar function?
let T = datatable(ColumnName:dynamic)
[
dynamic({"OtherField": "Unknown","First": [{"Id": "","Second": [{"ConfidenceLevel": "Low","Count": 3}]},{"Id": "","Second":[{"ConfidenceLevel": "High","Count": 0}]}]}),
dynamic({"OtherField": "Unknown","First": [{"Id": "","Second": [{"ConfidenceLevel": "Low","Count": 3}]},{"Id": "","Second":[{"ConfidenceLevel": "High","Count": 2}]}]})
];
let result = T
// The following line works, but regex is not allowed during review.
// | where tostring(ColumnName) matches regex '"ConfidenceLevel":"High","Count":[^0]'
| where isnotnull(toscalar(
// print s = '{"OtherField": "Unknown","First": [{"Id": "","Second": [{"ConfidenceLevel": "Low","Count": 3}]},{"Id": "","Second":[{"ConfidenceLevel": "High","Count": 0}]}]}'
print s = tostring(ColumnName) // Error here: The name 'ColumnName' does not refer to any column, table, varible or function.
| project obj0 = parse_json(s)
| mv-expand obj1 = obj0.First
| mv-expand obj2 = obj1.Second
| where obj2.ConfidenceLevel == "High" and obj2.Count > 0)
)
;
result
I tried to use mv-expand function but got an error "The name 'ColumnName' does not refer to any column, table, varible or function."
Expected result (The second row will be selected):
ColumnName
{"OtherField":"Unknown","First":[{"Id":"","Second":[{"ConfidenceLevel":"Low","Count":3}]},{"Id":"","Second":[{"ConfidenceLevel":"High","Count":2}]}]}

Nested mv-apply, to deal with the nested arrays
let T = datatable(ColumnName:dynamic)
[
dynamic({"OtherField": "Unknown","First": [{"Id": "","Second": [{"ConfidenceLevel": "Low","Count": 3}]},{"Id": "","Second":[{"ConfidenceLevel": "High","Count": 0}]}]}),
dynamic({"OtherField": "Unknown","First": [{"Id": "","Second": [{"ConfidenceLevel": "Low","Count": 3}]},{"Id": "","Second":[{"ConfidenceLevel": "High","Count": 2}]}]})
];
T
| mv-apply ColumnName.First on
(
mv-apply ColumnName_First.Second on
(
where ColumnName_First_Second.ConfidenceLevel == "High"
and ColumnName_First_Second.Count > 0
)
)
| project ColumnName
ColumnName
{"OtherField":"Unknown","First":[{"Id":"","Second":[{"ConfidenceLevel":"Low","Count":3}]},{"Id":"","Second":[{"ConfidenceLevel":"High","Count":2}]}]}
Fiddle

Related

LIKE in Array of Objects in JSONB column

I have JSONB data in a Postgres column like this:
{
"Id": "5c6d3210-1def-489b-badd-2bcc4a1cda28",
"Name": "Jane Doe",
"Tags": [
{
"Key": "Project",
"Value": "1004345"
}
]
}
How can I query data where Name contains "Jane" or "Tags.Key" contains "4345"?
I tried this but this only matches the exact "Key" value:
select * from documents where data->'Tags' #> '[{ "Value":"1004345"}]';
You can use a JSON path operator using like_regex
select *
from documents
where data ## '$.Tags[*].Value like_regex "4345"'
you can do this way
select *
from documents
where 'Tags' ->> 'Value' = '1004345';

Can't find record filter by zero array item

my_column has type jsonb
here json
{
"id": 4107,
"states": [
{
"dt": "2020-11-06T10:24:30.277+0000",
"id": "order.new"
}
]
}
I need to find all records where states[0].id="order.new" (zero item in array)
I try this
SELECT * FROM history WHERE my_column #> '{states,0, id}'= 'order.new'
limit 10
But I get error:
ERROR: invalid input syntax for type json
LINE 1: SELECT * FROM history WHERE my_column #> '{states,0, id}'= 'or...
The #> operator tests whether the right operand is contained in the left operand. The right operand must be valid Json document or a literal, that's why you're getting a syntax error.
The operator you want is #>, which uses the array as the path for extraction:
# SELECT * FROM history WHERE my_column #> '{states,0, id}' = '"order.new"'
limit 10;
id | my_column
----+-------------------------------------------------------------------------------------
1 | {"id": 4107, "states": [{"dt": "2020-11-06T10:24:30.277+0000", "id": "order.new"}]}
(1 row)
With the #> operator you could do the following, but it would be checking for matching elements in any array position, not only index 0:
# SELECT * FROM history WHERE my_column #> '{"states": [{ "id": "order.new" }]}';
id | my_column
----+-------------------------------------------------------------------------------------------------------------------------------------------------
1 | {"id": 4107, "states": [{"dt": "2020-11-06T10:24:30.277+0000", "id": "order.new"}]}
4 | {"id": 4107, "states": [{"dt": "2020-11-06T10:24:30.333+0000", "id": "order.test"}, {"dt": "2020-11-06T10:24:33.333+0000", "id": "order.new"}]}

postgresql search jsonb array items

I want to find rows base on data in the jsonb data type column with postgresql, is this the correct syntax to do search with array items?
SELECT * FROM table
WHERE diff_data #> '{"rfc6902": [{"op": "replace", "path": "/status/"}, "value": "0"]}';
output first row in below table
table
id | diff_data
1 | {"rfc6902": [{"op": "replace", "path": "/status", "value": "0"}]}
2 | {"rfc6902": [{"op": "replace", "path": "/status", "value": "1"}]}

karate match throws an error com.intuit.karate.exception.KarateException: Character '.' on position 3 is not valid

I am trying perform match on tis json array. Scenario goes like this
Scenario : match lob
* def op =
"""
[
{
"_id": "1",
"_class":
"com.xxx.versionone.tir.enterprise.persistence.model.xxx",
"lobName": "abc",
"changeDate": "2016-11-04T11:41:40",
"changedBy": "abc",
"createdDate": "2014-07-01T11:47:23",
"lastSdpPublishDate": "2018-10-31T00:00:00"
},
{
"_id": {
"$oid": "57883a41e4b076d23a82e9e7"
},
"_class":
"com.xxx.versionone.tir.enterprise.persistence.model.xxx",
"lobName": "asda",
"changeDate": "2016-07-14T21:20:54",
"changedBy": "TXA858",
"createdDate": "2016-07-14T21:20:01",
"createdBy": "TXA858",
"lastSdpPublishDate": "2018-10-31T00:00:00"
}
]
"""
* match $op...lobName contains ["abc"]
I get this error:
com.intuit.karate.exception.KarateException: Character '.' on position 3 is not valid.
at com.intuit.karate.StepDefs.matchNamed(StepDefs.java:540)
at com.intuit.karate.StepDefs.matchContains(StepDefs.java:532)
JSON path deep scan is .. not ...
This should work,
* match $op..lobName contains ["abc"]
Json path operators

BigQuery: Create column of JSON datatype

I am trying to load json with the following schema into BigQuery:
{
key_a:value_a,
key_b:{
key_c:value_c,
key_d:value_d
}
key_e:{
key_f:value_f,
key_g:value_g
}
}
The keys under key_e are dynamic, ie in one response key_e will contain key_f and key_g and for another response it will instead contain key_h and key_i. New keys can be created at any time so I cannot create a record with nullable fields for all possible keys.
Instead I want to create a column with JSON datatype that can then be queried using the JSON_EXTRACT() function. I have tried loading key_e as a column with STRING datatype but value_e is analysed as JSON and so fails.
How can I load a section of JSON into a single BigQuery column when there is no JSON datatype?
Having your JSON as a single string column inside BigQuery is definitelly an option. If you have large volume of data this can end up with high query price as all your data will end up in one column and actually querying logic can become quite messy.
If you have luxury of slightly changing your "design" - I would recommend considering below one - here you can employ REPEATED mode
Table schema:
[
{ "name": "key_a",
"type": "STRING" },
{ "name": "key_b",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{ "name": "key",
"type": "STRING"},
{ "name": "value",
"type": "STRING"}
]
},
{ "name": "key_e",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{ "name": "key",
"type": "STRING"},
{ "name": "value",
"type": "STRING"}
]
}
]
Example of JSON to load
{"key_a": "value_a1", "key_b": [{"key": "key_c", "value": "value_c"}, {"key": "key_d", "value": "value_d"}], "key_e": [{"key": "key_f", "value": "value_f"}, {"key": "key_g", "value": "value_g"}]}
{"key_a": "value_a2", "key_b": [{"key": "key_x", "value": "value_x"}, {"key": "key_y", "value": "value_y"}], "key_e": [{"key": "key_h", "value": "value_h"}, {"key": "key_i", "value": "value_i"}]}
Please note: it should be newline delimited JSON so each row must be on one line
You can't do this directly with BigQuery, but you can make it work in two passes:
(1) Import your JSON data as a CSV file with a single string column.
(2) Transform each row to pack your "any-type" field into a string. Write a UDF that takes a string and emits the final set of columns you would like. Append the output of this query to your target table.
Example
I'll start with some JSON:
{"a": 0, "b": "zero", "c": { "woodchuck": "a"}}
{"a": 1, "b": "one", "c": { "chipmunk": "b"}}
{"a": 2, "b": "two", "c": { "squirrel": "c"}}
{"a": 3, "b": "three", "c": { "chinchilla": "d"}}
{"a": 4, "b": "four", "c": { "capybara": "e"}}
{"a": 5, "b": "five", "c": { "housemouse": "f"}}
{"a": 6, "b": "six", "c": { "molerat": "g"}}
{"a": 7, "b": "seven", "c": { "marmot": "h"}}
{"a": 8, "b": "eight", "c": { "badger": "i"}}
Import it into BigQuery as a CSV with a single STRING column (I called it 'blob'). I had to set the delimiter character to something arbitrary and unlikely (thorn -- 'รพ') or it tripped over the default ','.
Verify your table imported correctly. You should see your simple one-column schema and the preview should look just like your source file.
Next, we write a query to transform it into your desired shape. For this example, we'd like the following schema:
a (INTEGER)
b (STRING)
c (STRING -- packed JSON)
We can do this with a UDF:
// Map a JSON string column ('blob') => { a (integer), b (string), c (json-string) }
bigquery.defineFunction(
'extractAndRepack', // Name of the function exported to SQL
['blob'], // Names of input columns
[{'name': 'a', 'type': 'integer'}, // Output schema
{'name': 'b', 'type': 'string'},
{'name': 'c', 'type': 'string'}],
function (row, emit) {
var parsed = JSON.parse(row.blob);
var repacked = JSON.stringify(parsed.c);
emit({a: parsed.a, b: parsed.b, c: repacked});
}
);
And a corresponding query:
SELECT a, b, c FROM extractAndRepack(JsonAnyKey.raw)
Now you just need to run the query (selecting your desired target table) and you'll have your data in the form you like.
Row a b c
1 0 zero {"woodchuck":"a"}
2 1 one {"chipmunk":"b"}
3 2 two {"squirrel":"c"}
4 3 three {"chinchilla":"d"}
5 4 four {"capybara":"e"}
6 5 five {"housemouse":"f"}
7 6 six {"molerat":"g"}
8 7 seven {"marmot":"h"}
9 8 eight {"badger":"i"}
One way to do it, is to load this file as CSV instead of JSON (and quote the values or eliminate newlines in the middle), then it will become single STRING column inside BigQuery.
P.S. You are right that having a native JSON data type would have made this scenario much more natural, and BigQuery team is well aware of it.