Related
Is the following a full list of all value types as they're passed to json in BigQuery? I've gotten this by trial and error but haven't been able to find this in the documentation:
select
NULL as NullValue,
FALSE as BoolValue,
DATE '2014-01-01' as DateValue,
INTERVAL 1 year as IntervalValue,
DATETIME '2014-01-01 01:02:03' as DatetimeValue,
TIMESTAMP '2014-01-01 01:02:03' as TimestampValue,
"Hello" as StringValue,
B"abc" as BytesValue,
123 as IntegerValue,
NUMERIC '3.14' as NumericValue,
3.14 as FloatValue,
TIME '12:30:00.45' as TimeValue,
[1,2,3] as ArrayValue,
STRUCT('Mark' as first, 'Thomas' as last) as StructValue,
[STRUCT(1 as x, 2 as y), STRUCT(5 as x, 6 as y)] as ArrayStructValue,
STRUCT(1 as x, [1,2,3] as y, ('a','b','c') as z) as StructNestedValue
{
"NullValue": null,
"BoolValue": "false", // why not just false without quotes?
"DateValue": "2014-01-01",
"IntervalValue": "1-0 0 0:0:0",
"DatetimeValue": "2014-01-01T01:02:03",
"TimestampValue": "2014-01-01T01:02:03Z",
"StringValue": "Hello",
"BytesValue": "YWJj",
"IntegerValue": "123",
"NumericValue": "3.14",
"FloatValue": "3.14",
"TimeValue": "12:30:00.450000",
"ArrayValue": ["1", "2", "3"],
"StructValue": {
"first": "Mark",
"last": "Thomas"
},
"ArrayStructValue": [
{"x": "1", "y": "2"},
{"x": "5", "y": "6"}
],
"StructNestedValue": {
"x": "1",
"y": ["1", 2", "3"],
"z": {"a": "a", b": "b", "c": "c"}
}
}
Honestly, it seems to me that other than the null value and the array [] or struct {} container, everything is string-enclosed, which seems a bit odd.
According to this document, json is built on two structures:
A collection of name/value pairs. In various languages, this is
realized as an object, record, struct, dictionary, hash table, keyed
list, or associative array.
An ordered list of values. In most
languages, this is realized as an array, vector, list, or sequence.
The result of the SELECT query is in json format, wherein [] depicts an array datatype, {} depicts an object datatype and double quotes(" ") depicts a string value as in the query itself.
I have a list containing the keys and another list containing values (obtained from splitting a log line). How can I combine the two to make a proeprty-bag in Kusto?
let headers = pack_array("A", "B", "C");
datatable(RawData:string)
[
"1,2,3",
"4,5,6",
]
| expand fields = split(RawData, ",")
| expand dict = ???
Expected:
dict
-----
{"A": 1, "B": 2, "C": 3}
{"A": 4, "B": 5, "C": 6}
Here's one option, that uses the combination of:
mv-apply: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/mv-applyoperator
pack(): https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/packfunction
make_bag(): https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/make-bag-aggfunction
let keys = pack_array("A", "B", "C");
datatable(RawData:string)
[
"1,2,3",
"4,5,6",
]
| project values = split(RawData, ",")
| mv-apply with_itemindex = i key = keys to typeof(string) on (
summarize dict = make_bag(pack(key, values[i]))
)
values
dict
[ "1", "2", "3"]
{ "A": "1", "B": "2", "C": "3"}
[ "4", "5", "6"]
{ "A": "4", "B": "5", "C": "6"}
I have data in a BigQuery table that looks like this:
[
{ "id": 1, "labels": [{"key": "a", "value": 1}, {"key": "b", "value": 2}] },
{ "id": 2, "labels": [{"key": "a", "value": 1}, {"key": "b", "value": 3}] },
// a lot more rows
]
My question is, how can I find all rows where "key" = "a", "value" = 1, but also "key" = "b" and "value" = 3?
I've tried various forms of using UNNEST but I haven't been able to get it right. The CROSS JOIN leaves me with one row for every object in the labels array, leaving me unable to query by both of them.
Try this:
select *
from mytable
where exists (select 1 from unnest(labels) where key = "a" and value=1)
and exists (select 1 from unnest(labels) where key = "b" and value=3)
Assuming there is no duplicate entries in labels array - you can use below
select *
from `project.dataset.table` t
where 2 = (
select count(1)
from t.labels kv
where kv in (('a', 1), ('b', 3))
)
You can try parse JSON and then you can apply different filter conditions on it as per your requirements, in following query I have tried to identified which all records have Key=a then I tried to identify which record has value=30 then joined them through id :-
WITH data1 AS (
SELECT '{ "id": 1, "labels": [{"key": "a", "value": 11}, {"key": "b", "value": 22}] }' as c1
union all
SELECT '{ "id": 2, "labels": [{"key": "a", "value": 10}, {"key": "b", "value": 30}] }' AS C1
)
select T1.id, T1.C1, T1.key, T2.value from
(SELECT JSON_EXTRACT_SCALAR(c1 , "$.id") AS id,
json_extract_scalar(curSection, '$.value') as value, json_extract_scalar(curSection, '$.key') as key,
c1
FROM data1 tbl LEFT JOIN unnest(json_extract_array(tbl.c1, '$.labels')
) curSection
where json_extract_scalar(curSection, '$.key')='a') T1,
(
SELECT JSON_EXTRACT_SCALAR(c1 , "$.id") AS id,
json_extract_scalar(curSection, '$.value') as value, json_extract_scalar(curSection, '$.key') as key,
c1
FROM data1 tbl LEFT JOIN unnest(json_extract_array(tbl.c1, '$.labels')
) curSection
where json_extract_scalar(curSection, '$.value')='30') T2
where T1.id = T2.id
Hopefully it may work for you.
The problem was in appending new JSON array to the existing JSON array:
Suppose I have the following JSON Array
[{"id": 1, "data": "One"}, {"id": 2, "data": "Two"}]
How do I append [{"id": 3, "data": "Three"}, {"id": 4, "data": "Four"}] to it using JSON_MODIFY?
resulting value for updated column:
[{"id": 1, "data": "One"}, {"id": 2, "data": "Two"}, {"id": 3, "data": "Three"}, {"id": 4, "data": "Four"}]
I don't think, that you can merge two JSON arrays with one JSON_MODIFY() call, but the following statement (using JSON_MODIFY()) is a possible solution:
Statement:
DECLARE #json NVARCHAR(500)='[{"id": 1, "data": "One"}, {"id": 2, "data": "Two"}]'
DECLARE #new NVARCHAR(500)='[{"id": 3, "data": "Three"}, {"id": 4, "data": "Four"}]'
SELECT #json = JSON_MODIFY(
#json,
'append $',
JSON_QUERY([value])
)
FROM OPENJSON(#new)
SELECT #json
Result:
[{"id": 1, "data": "One"}, {"id": 2, "data": "Two"},{"id": 3, "data": "Three"},{"id": 4, "data": "Four"}]
You can use "JSON_MODIFY" function and append key to modify JSON object like below:
SQL-FIDDLE
It's for individual JSON array:
DECLARE #json1 NVARCHAR(500)='[{"id": 1, "data": "One"}, {"id": 2, "data": "Two"}]';
DECLARE #json2 NVARCHAR(500)='[{"id": 3, "data": "Three"}, {"id": 4, "data": "Four"}]';
SELECT t.id, t.[data]
FROM
(
SELECT * FROM OPENJSON(#json1) WITH(id int,[data] NVARCHAR(MAX))
UNION ALL
SELECT * FROM OPENJSON(#json2) WITH(id int,[data] NVARCHAR(MAX))
) t
FOR JSON PATH;
It's for individual JSON hash:
DECLARE #info NVARCHAR(500)='[{"id": 1, "data": "One"}, {"id": 2, "data": "Two"}]';
PRINT #info;
SET #info = JSON_MODIFY(#info, 'append $', JSON_QUERY('{"id": 3, "data": "Three"}'))
SET #info = JSON_MODIFY(#info, 'append $', JSON_QUERY('{"id": 4, "data": "Four"}'))
PRINT #info;
Workaround I found for my project
I have two tables t1 and t2 which are identical in structure. Table t1 keeps records of supplier certificates. Table t2 gets with API new certificates obtained by supplier. So Table t1 shall be updated with new certificates from Table t2. Certificates data are placed in JSON array of objects, similar to the example of the topic starter.
Task
JSON array in t1 col JSON_t1 shall be appended with JSON array from t2 col JSON_t2. Here's the structure simplified for the example purposes:
Table "t1"
recordId
JSON_t1
1
[{"id": 1, "data": "One"}, {"id": 2, "data": "Two"}]
Table "t2"
recordId
JSON_t2
1
[{"id": 3, "data": "Three"}, {"id": 4, "data": "Four"}]
RESULT
appended t1.JSON_t1
[{"id": 1, "data": "One"}, {"id": 2, "data": "Two"},{"id": 3, "data": "Three"}, {"id": 4, "data": "Four"}]
SQL method
SELECT
t1.JSON_t1,
t2.JSON_t2,
concat('[', replace(replace(json_modify(t1.JSON_t1, 'append $', json_query(t2.JSON_t2)), '[', ''), ']', ''), ']') as "appended t1.JSON_t1"
FROM t1
INNER JOIN t2 ON t1.recordId = t2.recordId
Method explained
JSON_t2 is converted to JSON format with json_query(t2.JSON_t2) to avoid escaping of characters
JSON_t1 is appended with JSON_t2 with json_modify(t1.JSON_t1, 'append $', json_query(t2.JSON_t2)) resulting to the following output: [{"id": 1, "data": "One"}, {"id": 2, "data": "Two"},[{"id": 3, "data": "Three"}, {"id": 4, "data": "Four"}]]. Note square brackets in bold which shall be fixed as this will not be a correct final array of objects.
Final JSON is stripped from all square brackets with replace method used two times: for "[" and "]" replace(replace(json_modify(t1.JSON_t1, 'append $', json_query(t2.JSON_t2)), '[', ''), ']', '')
Final JSON is added with two square brackets at start and end to make a valid JSON array concat('[', replace(replace(json_modify(t1.JSON_t1, 'append $', json_query(t2.JSON_t2)), '[', ''), ']', ''), ']')
You can test if the final JSON is valid with ISJSON()
Points to note
If you don't use json_query, you get the following result:
[{"id": 1, "data": "One"}, {"id": 2, "data": "Two"},"[{\"id\": 3, \"data\": \"Three\"}, {\"id\": 4, \"data\": \"Four\"}]"]. See more on this here.
I tried to strip from square brackets only JSON_t2 and use json_modify like this json_modify(t1.JSON_t1, 'append $', json_query('{"id": 3, "data": "Three"}, {"id": 4, "data": "Four"}') but this results to appending only the first item from the JSON_t2 like this: [{"id": 1, "data": "One"}, {"id": 2, "data": "Two"},{"id": 3, "data": "Three"}]
NB. if you have nested arrays, this method is not suitable. In my case it works well since I have a simple array of objects (certificates with various key/value pairs like validity, type, issue date, etc.)
I have a Json like this (it is contained in a clob variable):
{"id": "33", "type": "abc", "val": "2", "cod": "", "sg1": "1", "sg2": "1"}
{"id": "359", "type": "abcef", "val": "52", "cod": "aa", "sg1": "", "sg2": "0"}
…
I need to remove " from values of: id, val, sg1, sg2
Is it possibile?
For example, I need to obtain this:
{"id": 33, "type": "abc", "val": 2, "cod": "", "sg1": 1, "sg2": 1}
{"id": 359, "type": "abcef", "val": 52, "cod": "aa", "sg1": , "sg2": 0}
…
If you are using Oracle 12 (R2?) or later then you can convert your JSON to the appropriate data types and then convert it back to JSON.
Oracle 18 Setup:
CREATE TABLE test_data ( value CLOB );
INSERT INTO test_data ( value )
VALUES ( '{"id": "33", "type": "abc", "val": "2", "cod": "", "sg1": "1", "sg2": "1"}' );
INSERT INTO test_data ( value )
VALUES ( '{"id": "359", "type": "abcef", "val": "52", "cod": "aa", "sg1": "", "sg2": "0"}' );
Query:
SELECT JSON_OBJECT(
'id' IS j.id,
'type' IS j.typ,
'val' IS j.val,
'cod' IS j.cod,
'sg1' IS j.sg1,
'sg2' IS j.sg2
) AS JSON
FROM test_data t
CROSS JOIN
JSON_TABLE(
t.value,
'$'
COLUMNS
id NUMBER(5,0) PATH '$.id',
typ VARCHAR2(10) PATH '$.type',
val NUMBER(5,0) PATH '$.val',
cod VARCHAR2(10) PATH '$.cod',
sg1 NUMBER(5,0) PATH '$.sg1',
sg2 NUMBER(5,0) PATH '$.sg2'
) j
Output:
| JSON |
| :--------------------------------------------------------------- |
| {"id":33,"type":"abc","val":2,"cod":null,"sg1":1,"sg2":1} |
| {"id":359,"type":"abcef","val":52,"cod":"aa","sg1":null,"sg2":0} |
Or, if you want to use regular expressions (you shouldn't if you have the choice and should use a proper JSON parser instead) then:
Query 2:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
value,
'"(id|val|sg1|sg2)": ""',
'"\1": "null"'
),
'"(id|val|sg1|sg2)": "(\d+|null)"',
'"\1": \2'
) AS JSON
FROM test_data
Output:
| JSON |
| :-------------------------------------------------------------------------- |
| {"id": 33, "type": "abc", "val": 2, "cod": "", "sg1": 1, "sg2": 1} |
| {"id": 359, "type": "abcef", "val": 52, "cod": "aa", "sg1": null, "sg2": 0} |
db<>fiddle here