Transform Array into columns in BigQuery - google-bigquery

I have a json string stored in a string column in BigQuery. There is an Array in it. I would like to pick some fields from array and write its value to BQ columns.
For example - Consider a below json stored in BQ
{
"pool": "mypool",
"statusCode": "0",
"payloads": [
{
"name": "request",
"fullpath": "com.gcp.commontools.edlpayload.EDLPayloadManagerTest$Request",
"jsonPayload": {
"body": "{\"data\":\"foo\"}"
},
"orientation": "REQUEST",
"httpTransport": {
"httpMethod": "POST",
"headers": {
"headers": {
"a": "1"
}
},
"sourceEndpoint": "/v1/foobar"
}
},
{
"name": "response",
"fullpath": "com.gcp.commontools.edlpayload.EDLPayloadManagerTest$Response",
"jsonPayload": {
"body": "{\"data\":\"bar\"}"
},
"orientation": "RESPONSE",
"httpTransport": {
"headers": {
"headers": {
"b": "2"
}
},
"httpResponseCode": 200
}
},
{
"name": "attributes",
"fullpath": "java.util.HashMap",
"nameValuePairs": {
"data": {
"one": "1"
}
},
"orientation": "TRANSITORY"
}
],
"uuid": "11EC-C714-8ADE2390-9619-1B80E63968CC",
"payloadName": "my-overall-name"
}
Consider a target BQ table schema is
pool, requestFullPath, requestPayload, responseFullPath, responsePayload
From the above json, i would like to pick few json elements and map there value to a column in BQ. Please note, array of payload will be dynamic in nature. There can be only 1 payload in the payloads array or there can be multiple. And the order of them is not fixed. For example, request payload can come at [0]th position, 1st position etc.

Consider below
select * from (
select
json_value(json_col, '$.pool') as pool,
json_value(payload, '$.name') as name,
json_value(payload, '$.fullpath') as FullPath,
json_value(payload, '$.jsonPayload.body') as Payload,
from your_table t
, unnest(json_extract_array(json_col, '$.payloads')) payload
)
pivot (any_value(FullPath) as FullPath, any_value(Payload) as Payload for name in ('request', 'response') )
if applied to sample data in your question - output is

Related

Deserialise multiple objects into a select statment

In a table, I store multiple string records in several records.
declare #x nvarchar(max) = {
"totalSize": 1000,
"done": true,
"records": [
{
"attributes": {
"type": "Contract",
"url": ""
},
"Name": "Harpy",
"Job_Schedule_Date__c": null,
"EndDate": "2021-03-24",
"Account": {
"attributes": {
"type": "Account",
"url": ""
},
"Name": "Madison"
},
"ContractNumber": "12345",
"Related_Site__r": {
"attributes": {
"type": "Site__c",
"url": ""
},
"Name": "Jackson"
}
},
.
.
.
]
}
select * from openJson(#x, '$.records')
I am trying to use open JSON to unpack the records.
I am able to unpack a single record, but it doesn't unpack them into columns and need to unpack multiple records and join them.
Since each record only stores 1000 records, I need to join them up.
What I want is output like below as a Select
Name, Job_Schedule_Date__c, EndDate, AccountName, ContractNumber, RelatedSiteName
Harpy, null, 2021-03-24, Madison, 12345, Jackson

Get the value from the response based on a condition and store it to a variable

I would like to get the value from the response based on a condition and store it to a variable.
In the below JSON, I would like to store the value when the name matches to something I prefer. Is there a way to achieve this using Karate API?
{
"results": [
{
"name": "Sample1",
"email": "sample1#text.com",
"id": "U-123"
},
{
"name": "Sample2",
"email": "sample2#text.com",
"id": "U-456"
},
{
"name": "Sample3",
"email": "sample3#text.com",
"id": "U-789"
}
]
}
So after reading the comment, my interpretation is to "find" the id where name is Sample2. Easy, just use a filter() operation, refer the docs: https://github.com/karatelabs/karate#jsonpath-filters
Instead of using a filter, I'm using the JS Array find() method as a very concise example below:
* def response =
"""
{ "results": [
{ "name": "Sample1", "email": "sample1#text.com", "id": "U-123" },
{ "name": "Sample2", "email": "sample2#text.com", "id": "U-456" },
{ "name": "Sample3", "email": "sample3#text.com", "id": "U-789" }
]
}
"""
* def id = response.results.find(x => x.name == 'Sample2').id
* match id == 'U-456'
Take some time to understand how it works. Talk to someone who knows JS if needed.

Use Athena SQL to get a value from JSON key

I need to get the email address from this 'facets' table I created from my firehose logs (JSON).
Now, I am using Athena to get particular information.
I need to get the email addresses from this:
This is my out of 'facets' when I pass-
SELECT * FROM "sampledb"."facets" limit 10
{email_channel={mail_event={mail={message_id=oadfosadu6237864237615, message_send_timestamp=1622696691764, from_address=abcd#jk.com, destination=[abcd#jk.com], headers_truncated=false, headers=[{name=From, value=abcd#jk.com}, {name=To, value=abcd#jk.com}, {name=MIME-Version, value=1.0}], common_headers={from=ghjk#li.com, to=[abcd#jk.com]}}, send={}, rendering_failure=null}}}
Assuming you have one column which stores json in provided format you can use json_extract with needed paths (and maybe some casts):
with dataset1 as (
select * from (values(JSON
'{
"email_channel": {
"mail_event": {
"mail": {
"message_id": "oadfosadu6237864237615",
"message_send_timestamp": 1622696691764,
"from_address": "abcd#jk.com",
"destination": [
"abcd#jk.com"
],
"headers_truncated": false,
"headers": [
{
"name": "From",
"value": "abcd#jk.com"
},
{
"name": "To",
"value": "abcd#jk.com"
},
{
"name": "MIME-Version",
"value": "1.0"
}
],
"common_headers": {
"from": "ghjk#li.com",
"to": [
"abcd#jk.com"
]
}
},
"send": {},
"rendering_failure": null
}
}
}')) as facets(facet))
select
json_extract(facet, '$.email_channel.mail_event.mail.from_address') mail_from,
CAST(json_extract(facet, '$.email_channel.mail_event.mail.destination') AS ARRAY(VARCHAR)) destination
from dataset1
And output:
mail_from
destination
"abcd#jk.com"
{abcd#jk.com}

Unexpected behavior of ARRAY_SLICE in Cosmos Db SQL API

I have Cosmos DB collection (called sample) containing the following documents:
[
{
"id": "id1",
"messages": [
{
"messageId": "message1",
"Text": "Value1"
},
{
"messageId": "message2",
"Text": "Value2"
}
]
},
{
"id": "id2",
"messages": [
{
"messageId": "message3",
"Text": "Value3"
},
{
"messageId": "message4",
"Text": "Value1"
}
]
},
{
"id": "id3",
"messages": [
{
"messageId": "message5",
"Text": "Value1"
},
{
"messageId": "message6",
"Text": "Value2"
}
]
},
{
"id": "id4",
"messages": [
{
"messageId": "message7",
"Text": "Value5"
},
{
"messageId": "message8",
"Text": "Value2"
}
]
},
]
I am trying to retrieve all the Documents, having messages and the first message has the field "Text"= 'Value1'.
In this sample the documents with the ids '1' and '3' would be retrieved. Please notice that the document with id='id2' wouldn't be retrieved,
since the value of the text of the first message is 'Value3'.
The collection as mentioned is called sample and I am running the following Query:
"select sample.id, sample.messages, ARRAY_SLICE(sample.messages, 0, 1)[0].Text as valueOfText from sample"
As you can see in the first two images, I retrieve all Documents and every one of them have the field "valueOfText" set to value of the first message, as expected.
Now when I filter the collection (the third image), I retrieve no results at all.
Is this an expected behavior?
Following your sql, got same results:
But why you have to use ARRAY_SLICE,it is used to return truncated array.Since your requirement is specific:
trying to retrieve all the Documents, having messages and the first
message has the field "Text"= 'Value1'
Just use sql:
SELECT c.id,c.messages,c.messages[0].Text as valueOfText FROM c
where c.messages[0].Text = 'Value1'
Output:

How to prepare Google Natural Language Proscessing output (json) for Big Query

I'm trying to query the output of a Natural Language Processing (NLP) call in Big Query (BQ) but I'm struggling to get the output in the right format for BQ.
I understand that BQ takes json files (as newline delimited) - but just not sure that (a) the output of NLP is json newline delimited and (b) if my schema is correct.
Here's the json output I'm working with:
{
"entities": [
{
"name": "Rowling",
"type": "PERSON",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/J._K._Rowling"
},
"salience": 0.65751493,
"mentions": [
{
"text": {
"content": " J.",
"beginOffset": -1
}
},
{
"text": {
"content": "K. Rowl",
"beginOffset": -1
}
}
]
},
{
"name": "LONDON",
"type": "LOCATION",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/London"
},
"salience": 0.14284456,
"mentions": [
{
"text": {
"content": "\ufeffLON",
"beginOffset": -1
}
}
]
},
{
"name": "Harry Potter",
"type": "WORK_OF_ART",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/Harry_Potter"
},
"salience": 0.0726779,
"mentions": [
{
"text": {
"content": "th Harry Pot",
"beginOffset": -1
}
},
{
"text": {
"content": "‘Harry Pot",
"beginOffset": -1
}
}
]
},
{
"name": "Deathly Hallows",
"type": "WORK_OF_ART",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/Harry_Potter_and_the_Deathly_Hallows"
},
"salience": 0.022565609,
"mentions": [
{
"text": {
"content": "he Deathly Hall",
"beginOffset": -1
}
}
]
}
],
"language": "en"
}
Is there a way to send the output directly to big query via the command line in Google Cloud shell?
Any information would be greatly appreciated!
Thanks
Glad you found my Harry Potter blog post! I'd recommend storing the NL API's JSON response as a string in BigQuery and then using a user-defined function to query it. You should be able to run the following (the table is publicly viewable) to get a count of how often each entity appears in the JSON you posted:
SELECT
COUNT(*) as entity_count, entity
FROM
JS(
(SELECT entities FROM [sara-bigquery:samples.hp_udf]),
entities,
"[{ name: 'entity', type: 'string'}]",
"function(row, emit) {
try {
x = JSON.parse(row.entities);
entities = x['entities'];
entities.forEach(function(data) {
emit({ entity: data.name });
});
} catch (e) {}
}"
)
GROUP BY entity
ORDER BY entity_count DESC
send the output directly to big query via the command line in Google Cloud shell
Look at this page, and search for "bq load"
https://cloud.google.com/bigquery/bq-command-line-tool
Here they have some example about json schema.
Schema to load json data to google big query