BigQuery JSON Field Extraction

BigQuery JSON Field Extraction - google-bigquery

I have the following JSON payload stored in a single string column in a BQ table.
{
"customer" : "ABC Ltd",
"custom_fields" : [
{
"name" : "DOB",
"value" : "2000-01-01"
},
{
"name" : "Account_Open_Date",
"value" : "2019-01-01"
}
]
}
I am trying to figure out how I can extract the custom_fields name value pairs as columns?
Something like follows.
| Customer.name | Customer.DOB | Customer.Account_Open_Date |
| ABC Ltd | 2000-01-01 | 2019-01-01 |

You can use json-functions , such as
JSON_EXTRACT(json_string_expr, json_path_string_literal)
In your case will be
SELECT
JSON_EXTRACT(json_text, '$.customer') as Customer.Name,
JSON_EXTRACT(json_text, '$.custom_fields[0].value') as Customer.DOB,
JSON_EXTRACT(json_text, '$.custom_fields[1].value') as Customer.Account_Open_Date

Related

Splunk : Extracting the elements from JSON structure as separate fields

In Splunk, I'm trying to extract the key value pairs inside that "tags" element of the JSON structure so each one of the become a separate column so I can search through them.
for example :
| spath data | rename data.tags.EmailAddress AS Email
This does not help though and Email field comes as empty.I'm trying to do this for all the tags. Any thoughts/pointers?
{
"timestamp": "2021-10-26T18:23:05.180707Z",
"data": {
"tags": [
{
"key": "Email",
"value": "john.doe#example.com"
},
{
"key": "ProjectCode",
"value": "ABCD"
},
{
"key": "Owner",
"value": "John Doe"
}
]
},
"field1": "random1",
"field2": "random2"
}

I think does what you want:
| spath data.tags{}
| mvexpand data.tags{}
| spath input=data.tags{}
| table key value
| transpose header_field=key
| fields - column
How it works:
| spath data.tags{} takes the json and creates a multi value field that contains each item in the tags array
| mvexpand data.tags{} splits the multi value field into individual events - each one contains one of the items in the tags array
| spath input=data.tags{} takes the json in each event and makes a field for each KVP in that item (key and value in this case)
| table key value limits further commands to these two fields
| transpose header_field=key makes a field for each value of the key field (including one for the field named column)`
| fields - column removes the column field from the output
Here is a fully runnable example:
| makeresults
| eval _raw="
{
\"timestamp\": \"2021-10-26T18:23:05.180707Z\",
\"data\": {
\"tags\": [
{\"key\": \"Email\", \"value\": \"john.doe#example.com\"},
{\"key\": \"ProjectCode\", \"value\": \"ABCD\"},
{\"key\": \"Owner\", \"value\": \"John Doe\"}
]
},
\"field1\": \"random1\",
\"field2\": \"random2\"
}
"
| spath data.tags{}
| mvexpand data.tags{}
| spath input=data.tags{}
| table key value
| transpose header_field=key
It creates this output:
+----------------------+-------------+----------+
| Email | ProjectCode | Owner |
+----------------------+-------------+----------+
| john.doe#example.com | ABCD | John Doe |
+----------------------+-------------+----------+

SQL select from array in JSON

I have a table with a json field with the following json:
[
{
productId: '1',
other : [
otherId: '2'
]
},
{
productId: '3',
other : [
otherId: '4'
]
}
]
I am trying to select the productId and otherId for every array element like this:
select JSON_EXTRACT(items, $.items[].productId) from order;
But this is completely wrong since it takes only the first element in the array
Do I need to write a loop or something?

First of all, the data you show is not valid JSON. It has multiple mistakes that make it invalid.
Here's a demo using valid JSON:
mysql> create table orders ( items json );
mysql> insert into orders set items = '[ { "productId": "1", "other": { "otherId": "2" } }, { "productId": "3", "other" : { "otherId": "4" } } ]'
mysql> SELECT JSON_EXTRACT(items, '$[*].productId') AS productIds FROM orders;
+------------+
| productIds |
+------------+
| ["1", "3"] |
+------------+
If you want each productId on a row by itself as a scalar value instead of a JSON array, you'd have to use JSON_TABLE() in MySQL 8.0:
mysql> SELECT j.* FROM orders CROSS JOIN JSON_TABLE(items, '$[*]' COLUMNS(productId INT PATH '$.productId')) AS j;
+-----------+
| productId |
+-----------+
| 1 |
| 3 |
+-----------+
This is tested in MySQL 8.0.23.
You also tagged your question MariaDB. I don't use MariaDB, and MariaDB has its own incompatible implementation of JSON support, so I can't predict how it will work.

How to parse JSON metrics array in Splunk

I receive JSON from API in the following format:
[
{
"scId": "000DD2",
"sensorId": 2,
"metrics": [
{
"s": 5414,
"dateTime": "2018-02-02T13:03:30+01:00"
},
{
"s": 5526,
"dateTime": "2018-02-02T13:04:56+01:00"
},
{
"s": 5631,
"dateTime": "2018-02-02T13:06:22+01:00"
}
}, .... ]
Currently trying to display these metrics on the linear chart with dateTime for the X-axis and "s" for Y.
I use the following search query:
index="main" source="rest://test3" | spath input=metrics{}.s| mvexpand metrics{}.s
| mvexpand metrics{}.dateTime | rename metrics{}.s as s
| rename metrics{}.dateTime as dateTime| table s,dateTime
And I receive the data in the following format which is not applicable for linear chart. The point is - how to correctly parse the JSON to apply date-time from dateTime field in JSON to _time in Splunk.
Query results

#Max Zhylochkin,
Can you please try following search?
index="main" source="rest://test3"
| spath input=metrics{}.s
| mvexpand metrics{}.s
| mvexpand metrics{}.dateTime
| rename metrics{}.s as s
| rename metrics{}.dateTime as dateTime
| table s,dateTime
| eval _time = strptime(dateTime,"%Y-%m-%dT%H:%M:%S.%3N")
Thanks

How to extract a repeated nested field from json string and join with existing repeated nested field in bigquery

I have a table with one nested repeated field called article_id and a string field that contains a json string.
Here is the schema of my table:
Here is an example row of the table:
[
{
"article_id": "2732930586",
"author_names": [
{
"AuN": "h kanahashi",
"AuId": "2591665239",
"AfN": null,
"AfId": null,
"S": "1"
},
{
"AuN": "t mukai",
"AuId": "2607493793",
"AfN": null,
"AfId": null,
"S": "2"
},
{
"AuN": "y yamada",
"AuId": "2606624579",
"AfN": null,
"AfId": null,
"S": "3"
},
{
"AuN": "k shimojima",
"AuId": "2606600298",
"AfN": null,
"AfId": null,
"S": "4"
},
{
"AuN": "m mabuchi",
"AuId": "2606138976",
"AfN": null,
"AfId": null,
"S": "5"
},
{
"AuN": "t aizawa",
"AuId": "2723380540",
"AfN": null,
"AfId": null,
"S": "6"
},
{
"AuN": "k higashi",
"AuId": "2725066679",
"AfN": null,
"AfId": null,
"S": "7"
}
],
"extra_informations": "{
\"DN\": \"Experimental study for improvement of crashworthiness in AZ91 magnesium foam controlling its microstructure.\",
\"S\":[{\"Ty\":1,\"U\":\"https://shibaura.pure.elsevier.com/en/publications/experimental-study-for-improvement-of-crashworthiness-in-az91-mag\"}],
\"VFN\":\"Materials Science and Engineering\",
\"FP\":283,
\"LP\":287,
\"RP\":[{\"Id\":2024275625,\"CoC\":5},{\"Id\":2035451257,\"CoC\":5}, {\"Id\":2141952446,\"CoC\":5},{\"Id\":2126566553,\"CoC\":6}, {\"Id\":2089573897,\"CoC\":5},{\"Id\":2069241702,\"CoC\":7}, {\"Id\":2000323790,\"CoC\":6},{\"Id\":1988924750,\"CoC\":16}],
\"ANF\":[
{\"FN\":\"H.\",\"LN\":\"Kanahashi\",\"S\":1},
{\"FN\":\"T.\",\"LN\":\"Mukai\",\"S\":2},
{\"FN\":\"Y.\",\"LN\":\"Yamada\",\"S\":3},
{\"FN\":\"K.\",\"LN\":\"Shimojima\",\"S\":4},
{\"FN\":\"M.\",\"LN\":\"Mabuchi\",\"S\":5},
{\"FN\":\"T.\",\"LN\":\"Aizawa\",\"S\":6},
{\"FN\":\"K.\",\"LN\":\"Higashi\",\"S\":7}
],
\"BV\":\"Materials Science and Engineering\",\"BT\":\"a\"}"
}
]
In the extra_information.ANF I have an nested array that contains some more author name information.
The nested repeated author_name field has a sub-field author_name.S which can be mapped into extra_informations.ANF.S for a join. Using this mapping I am trying to achieve the following table:
| article_id | author_names.AuN | S | extra_information.ANF.FN | extra_information.ANF.LN|
| 2732930586 | h kanahashi | 1 | H. | Kanahashi |
| 2732930586 | t mukai | 2 | T. | Mukai |
| 2732930586 | y yamada | 3 | Y. | Yamada. |
| 2732930586 | k shimojima | 4 | K. | Shimojima |
| 2732930586 | m mabuchi | 5 | M. | Mabuchi |
| 2732930586 | t aizawa | 6 | T. | Aizawa |
| 2732930586 | k higashi | 7 | K. | Higashi |
The primary problem I faced is that when I convert a json_string using JSON_EXTRACT(extra_information,"$.ANF"), it does not give me an array, instead it gives me the string format of the nested repeated array, which I could not convert into an array.
Is it possible to produce such table using standards-sql in bigquery?

Option 1
This is based on REGEXP_REPLACE function and few more functions (REPLACE, SPLIT, etc.) to manipulate with result. Note - we need extra manipulation because wildcards and filters are not supported in JsonPath expressions in BigQuery?
#standard SQL
SELECT
article_id, author.AuN, author.S,
REPLACE(SPLIT(extra, '","')[OFFSET(0)], '"FN":"', '') FirstName,
REPLACE(SPLIT(extra, '","')[OFFSET(1)], 'LN":"', '') LastName
FROM `table` , UNNEST(author_names) author
LEFT JOIN UNNEST(SPLIT(REGEXP_REPLACE(JSON_EXTRACT(extra_informations, '$.ANF'), r'\[{|}\]', ''), '},{')) extra
ON author.S = CAST(REPLACE(SPLIT(extra, '","')[OFFSET(2)], 'S":', '') AS INT64)
Option 2
To overcome BigQuery "limitation" for JsonPath, you can use custom function as the example below shows:
Note : it uses jsonpath-0.8.0.js that can be downloaded from https://code.google.com/archive/p/jsonpath/downloads and assumed to be uploaded to Google Cloud Storage - gs://your_bucket/jsonpath-0.8.0.js
#standard SQL
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS STRING
LANGUAGE js AS """
try { var parsed = JSON.parse(json);
return jsonPath(parsed, json_path);
} catch (e) { return null }
"""
OPTIONS (
library="gs://your_bucket/jsonpath-0.8.0.js"
);
SELECT
article_id, author.AuN, author.S,
CUSTOM_JSON_EXTRACT(extra_informations, CONCAT('$.ANF[?(#.S==', CAST(author.S AS STRING), ')].FN')) FirstName,
CUSTOM_JSON_EXTRACT(extra_informations, CONCAT('$.ANF[?(#.S==', CAST(author.S AS STRING), ')].LN')) LastName
FROM `table`, UNNEST(author_names) author
As you can see - now you can do all magic in one simple JsonPath

Linq to XML query to SQL

UPDATE:
I've turned my xml into a query table in coldfusion, so this may help to solve this.
So my data is:
[id] | [code] | [desc] | [supplier] | [name] | [price]
------------------------------------------------------
1 | ABCDEF | "Tst0" | "XYZ" | "Test" | 123.00
2 | ABCDXY | "Tst1" | "XYZ" | "Test" | 130.00
3 | DCBAZY | "Tst2" | "XYZ" | "Tst2" | 150.00
Now what I need is what the linq to xml query outputs below. Output should be something like (i'll write it in JSON so it's easier for me to type) this:
[{
"code": "ABCD",
"name": "Test",
"products":
{
"id": 1,
"code": "ABCDEF",
"desc": "Tst0",
"price": 123.00
},
{
"id": 2,
"code": "ABCDXY",
"desc": "Tst1",
"price": 130.00
}
},
{
"code": "DCBA",
"name": "Tst2",
"products":
{
"id": 3,
"code": "DCBAZY",
"desc": "Tst2",
"price": 150.00
}
}]
As you can see, Group by the first 4 characters of 'CODE' and 'Supplier' code.
Thanks
How would i convert the following LINQ to XML query to SQL?
from q in query
group q by new { Code = q.code.Substring(0, 4), Supplier = q.supplier } into g
select new
{
code = g.Key.Code,
fullcode = g.FirstOrDefault().code,
supplier = g.Key.Supplier,
name = g.FirstOrDefault().name,
products = g.Select(x => new Product { id = x.id, c = x.code, desc = string.IsNullOrEmpty(x.desc) ? "Description" : x.desc, price = x.price })
}
Best i could come up with:
SELECT c, supplier, n
FROM products
GROUP BY C, supplier, n
Not sure how to get the subquery in there or get the substring of code.
ps: this is for coldfusion, so I guess their version of sql might be different to ms sql..

The easiest way is to attache a profiler to you database and see what query is generate by the linq-to-SQL engine.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery JSON Field Extraction - google-bigquery

Related

Splunk : Extracting the elements from JSON structure as separate fields

SQL select from array in JSON

How to parse JSON metrics array in Splunk

How to extract a repeated nested field from json string and join with existing repeated nested field in bigquery

Linq to XML query to SQL

Categories

Resources