How to extract a repeated nested field from json string and join with existing repeated nested field in bigquery - google-bigquery

I have a table with one nested repeated field called article_id and a string field that contains a json string.
Here is the schema of my table:
Here is an example row of the table:
[
{
"article_id": "2732930586",
"author_names": [
{
"AuN": "h kanahashi",
"AuId": "2591665239",
"AfN": null,
"AfId": null,
"S": "1"
},
{
"AuN": "t mukai",
"AuId": "2607493793",
"AfN": null,
"AfId": null,
"S": "2"
},
{
"AuN": "y yamada",
"AuId": "2606624579",
"AfN": null,
"AfId": null,
"S": "3"
},
{
"AuN": "k shimojima",
"AuId": "2606600298",
"AfN": null,
"AfId": null,
"S": "4"
},
{
"AuN": "m mabuchi",
"AuId": "2606138976",
"AfN": null,
"AfId": null,
"S": "5"
},
{
"AuN": "t aizawa",
"AuId": "2723380540",
"AfN": null,
"AfId": null,
"S": "6"
},
{
"AuN": "k higashi",
"AuId": "2725066679",
"AfN": null,
"AfId": null,
"S": "7"
}
],
"extra_informations": "{
\"DN\": \"Experimental study for improvement of crashworthiness in AZ91 magnesium foam controlling its microstructure.\",
\"S\":[{\"Ty\":1,\"U\":\"https://shibaura.pure.elsevier.com/en/publications/experimental-study-for-improvement-of-crashworthiness-in-az91-mag\"}],
\"VFN\":\"Materials Science and Engineering\",
\"FP\":283,
\"LP\":287,
\"RP\":[{\"Id\":2024275625,\"CoC\":5},{\"Id\":2035451257,\"CoC\":5}, {\"Id\":2141952446,\"CoC\":5},{\"Id\":2126566553,\"CoC\":6}, {\"Id\":2089573897,\"CoC\":5},{\"Id\":2069241702,\"CoC\":7}, {\"Id\":2000323790,\"CoC\":6},{\"Id\":1988924750,\"CoC\":16}],
\"ANF\":[
{\"FN\":\"H.\",\"LN\":\"Kanahashi\",\"S\":1},
{\"FN\":\"T.\",\"LN\":\"Mukai\",\"S\":2},
{\"FN\":\"Y.\",\"LN\":\"Yamada\",\"S\":3},
{\"FN\":\"K.\",\"LN\":\"Shimojima\",\"S\":4},
{\"FN\":\"M.\",\"LN\":\"Mabuchi\",\"S\":5},
{\"FN\":\"T.\",\"LN\":\"Aizawa\",\"S\":6},
{\"FN\":\"K.\",\"LN\":\"Higashi\",\"S\":7}
],
\"BV\":\"Materials Science and Engineering\",\"BT\":\"a\"}"
}
]
In the extra_information.ANF I have an nested array that contains some more author name information.
The nested repeated author_name field has a sub-field author_name.S which can be mapped into extra_informations.ANF.S for a join. Using this mapping I am trying to achieve the following table:
| article_id | author_names.AuN | S | extra_information.ANF.FN | extra_information.ANF.LN|
| 2732930586 | h kanahashi | 1 | H. | Kanahashi |
| 2732930586 | t mukai | 2 | T. | Mukai |
| 2732930586 | y yamada | 3 | Y. | Yamada. |
| 2732930586 | k shimojima | 4 | K. | Shimojima |
| 2732930586 | m mabuchi | 5 | M. | Mabuchi |
| 2732930586 | t aizawa | 6 | T. | Aizawa |
| 2732930586 | k higashi | 7 | K. | Higashi |
The primary problem I faced is that when I convert a json_string using JSON_EXTRACT(extra_information,"$.ANF"), it does not give me an array, instead it gives me the string format of the nested repeated array, which I could not convert into an array.
Is it possible to produce such table using standards-sql in bigquery?

Option 1
This is based on REGEXP_REPLACE function and few more functions (REPLACE, SPLIT, etc.) to manipulate with result. Note - we need extra manipulation because wildcards and filters are not supported in JsonPath expressions in BigQuery?
#standard SQL
SELECT
article_id, author.AuN, author.S,
REPLACE(SPLIT(extra, '","')[OFFSET(0)], '"FN":"', '') FirstName,
REPLACE(SPLIT(extra, '","')[OFFSET(1)], 'LN":"', '') LastName
FROM `table` , UNNEST(author_names) author
LEFT JOIN UNNEST(SPLIT(REGEXP_REPLACE(JSON_EXTRACT(extra_informations, '$.ANF'), r'\[{|}\]', ''), '},{')) extra
ON author.S = CAST(REPLACE(SPLIT(extra, '","')[OFFSET(2)], 'S":', '') AS INT64)
Option 2
To overcome BigQuery "limitation" for JsonPath, you can use custom function as the example below shows:
Note : it uses jsonpath-0.8.0.js that can be downloaded from https://code.google.com/archive/p/jsonpath/downloads and assumed to be uploaded to Google Cloud Storage - gs://your_bucket/jsonpath-0.8.0.js
#standard SQL
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS STRING
LANGUAGE js AS """
try { var parsed = JSON.parse(json);
return jsonPath(parsed, json_path);
} catch (e) { return null }
"""
OPTIONS (
library="gs://your_bucket/jsonpath-0.8.0.js"
);
SELECT
article_id, author.AuN, author.S,
CUSTOM_JSON_EXTRACT(extra_informations, CONCAT('$.ANF[?(#.S==', CAST(author.S AS STRING), ')].FN')) FirstName,
CUSTOM_JSON_EXTRACT(extra_informations, CONCAT('$.ANF[?(#.S==', CAST(author.S AS STRING), ')].LN')) LastName
FROM `table`, UNNEST(author_names) author
As you can see - now you can do all magic in one simple JsonPath

Related

Splunk : Extracting the elements from JSON structure as separate fields

In Splunk, I'm trying to extract the key value pairs inside that "tags" element of the JSON structure so each one of the become a separate column so I can search through them.
for example :
| spath data | rename data.tags.EmailAddress AS Email
This does not help though and Email field comes as empty.I'm trying to do this for all the tags. Any thoughts/pointers?
{
"timestamp": "2021-10-26T18:23:05.180707Z",
"data": {
"tags": [
{
"key": "Email",
"value": "john.doe#example.com"
},
{
"key": "ProjectCode",
"value": "ABCD"
},
{
"key": "Owner",
"value": "John Doe"
}
]
},
"field1": "random1",
"field2": "random2"
}
I think does what you want:
| spath data.tags{}
| mvexpand data.tags{}
| spath input=data.tags{}
| table key value
| transpose header_field=key
| fields - column
How it works:
| spath data.tags{} takes the json and creates a multi value field that contains each item in the tags array
| mvexpand data.tags{} splits the multi value field into individual events - each one contains one of the items in the tags array
| spath input=data.tags{} takes the json in each event and makes a field for each KVP in that item (key and value in this case)
| table key value limits further commands to these two fields
| transpose header_field=key makes a field for each value of the key field (including one for the field named column)`
| fields - column removes the column field from the output
Here is a fully runnable example:
| makeresults
| eval _raw="
{
\"timestamp\": \"2021-10-26T18:23:05.180707Z\",
\"data\": {
\"tags\": [
{\"key\": \"Email\", \"value\": \"john.doe#example.com\"},
{\"key\": \"ProjectCode\", \"value\": \"ABCD\"},
{\"key\": \"Owner\", \"value\": \"John Doe\"}
]
},
\"field1\": \"random1\",
\"field2\": \"random2\"
}
"
| spath data.tags{}
| mvexpand data.tags{}
| spath input=data.tags{}
| table key value
| transpose header_field=key
It creates this output:
+----------------------+-------------+----------+
| Email | ProjectCode | Owner |
+----------------------+-------------+----------+
| john.doe#example.com | ABCD | John Doe |
+----------------------+-------------+----------+

Pulling text out of JSON using VARCHAR

Trying to pull out text value out of column with json using varchar but get an invalid argument error on snowflake while running on mode. This json has a bit of different structure that what I'm used to seeing.
Have tried these to pull out the text:
changes:comment:new_value::varchar
changes:new_value::varchar
changes:comment::varchar
JSON looks like this:
{
"comment":
{
"new_value": "Hello there. Welcome to our facility.",
"old_value": ""
}
}
Wish to pull out the data in this column so the output reads:
Hello there. Welcome to our facility.
You can't extract fields from VARCHAR. If your string is JSON, you have to convert it to the VARIANT type, e.g. through PARSE_JSON function.
Example below:
create or replace table x(v varchar) as select * from values('{
"comment":
{
"new_value": "Hello there. Welcome to our facility.",
"old_value": ""
}
}');
select v, parse_json(v):comment.new_value::varchar from x;
--------------------------------------------------------------+------------------------------------------+
V | PARSE_JSON(V):COMMENT.NEW_VALUE::VARCHAR |
--------------------------------------------------------------+------------------------------------------+
{ | Hello there. Welcome to our facility. |
"comment": | |
{ | |
"new_value": "Hello there. Welcome to our facility.", | |
"old_value": "" | |
} | |
} | |
--------------------------------------------------------------+------------------------------------------+

BigQuery JSON Field Extraction

I have the following JSON payload stored in a single string column in a BQ table.
{
"customer" : "ABC Ltd",
"custom_fields" : [
{
"name" : "DOB",
"value" : "2000-01-01"
},
{
"name" : "Account_Open_Date",
"value" : "2019-01-01"
}
]
}
I am trying to figure out how I can extract the custom_fields name value pairs as columns?
Something like follows.
| Customer.name | Customer.DOB | Customer.Account_Open_Date |
| ABC Ltd | 2000-01-01 | 2019-01-01 |
You can use json-functions , such as
JSON_EXTRACT(json_string_expr, json_path_string_literal)
In your case will be
SELECT
JSON_EXTRACT(json_text, '$.customer') as Customer.Name,
JSON_EXTRACT(json_text, '$.custom_fields[0].value') as Customer.DOB,
JSON_EXTRACT(json_text, '$.custom_fields[1].value') as Customer.Account_Open_Date

Match each using Examples from Scenario Outline

I find using * match each response.xyz super powerful for merging object structure testing with content testing. Is there a way to use it with Examples tables and <placeholder>?
I've got something like this, that I want to use an Examples table on:
* match each response.Services ==
"""
{
"ServiceId" : #present,
"Name" : <Name>,
"Description" : #present,
"InActive" : <Inactive>,
}
"""
Examples:
| ClientId | Name | Status | ErrorCode | Inactive |
| 400152 | "Foxtrot" | 200 | 0 | false |
| 400152 | "Waltz" | 200 | 0 | false |
I get
"Services": [
{
"ServiceId": 3,
"Name": "Waltz",
"Description": "Waltzing like Matilda",
"InActive": false,
},
{
"ServiceId": 4,
"Name": "Foxtrot",
"Description": "",
"InActive": false,
},
back as a response.
Obviously, when I'm using multiple lines in Examples:, it results in several tests.
What I'm looking for is to test each object in the array against predefined values, but without knowing what order they'll show up in. And use the ordered approach like tables produce.
Instead of each, try this:
* match response.Services contains
"""
{
"ServiceId" : #present,
"Name" : <Name>,
"Description" : #present,
"InActive" : <Inactive>,
}
"""
EDIT: okay, an alternate option. By the way there are at least 5 different ways I can think of :P
Scenario:
* table data
| ClientId | Name | Status | ErrorCode | Inactive |
| 400152 | "Foxtrot" | 200 | 0 | false |
| 400152 | "Waltz" | 200 | 0 | false |
* def expected = karate.map(data, function(x){ return { ServiceId: '#present', Name: x.Name, Description: '#present', InActive: x.Inactive} })
* match response.Services contains expected
EDIT2: if you can control the whole table:
Scenario:
* table expected
| Name | InActive | ServiceId | Description |
| "Foxtrot" | false | '#present' | '#present' |
| "Waltz" | false | '#present' | '#present' |
* match response.Services contains expected

Linq to XML query to SQL

UPDATE:
I've turned my xml into a query table in coldfusion, so this may help to solve this.
So my data is:
[id] | [code] | [desc] | [supplier] | [name] | [price]
------------------------------------------------------
1 | ABCDEF | "Tst0" | "XYZ" | "Test" | 123.00
2 | ABCDXY | "Tst1" | "XYZ" | "Test" | 130.00
3 | DCBAZY | "Tst2" | "XYZ" | "Tst2" | 150.00
Now what I need is what the linq to xml query outputs below. Output should be something like (i'll write it in JSON so it's easier for me to type) this:
[{
"code": "ABCD",
"name": "Test",
"products":
{
"id": 1,
"code": "ABCDEF",
"desc": "Tst0",
"price": 123.00
},
{
"id": 2,
"code": "ABCDXY",
"desc": "Tst1",
"price": 130.00
}
},
{
"code": "DCBA",
"name": "Tst2",
"products":
{
"id": 3,
"code": "DCBAZY",
"desc": "Tst2",
"price": 150.00
}
}]
As you can see, Group by the first 4 characters of 'CODE' and 'Supplier' code.
Thanks
How would i convert the following LINQ to XML query to SQL?
from q in query
group q by new { Code = q.code.Substring(0, 4), Supplier = q.supplier } into g
select new
{
code = g.Key.Code,
fullcode = g.FirstOrDefault().code,
supplier = g.Key.Supplier,
name = g.FirstOrDefault().name,
products = g.Select(x => new Product { id = x.id, c = x.code, desc = string.IsNullOrEmpty(x.desc) ? "Description" : x.desc, price = x.price })
}
Best i could come up with:
SELECT c, supplier, n
FROM products
GROUP BY C, supplier, n
Not sure how to get the subquery in there or get the substring of code.
ps: this is for coldfusion, so I guess their version of sql might be different to ms sql..
The easiest way is to attache a profiler to you database and see what query is generate by the linq-to-SQL engine.