Use Athena SQL to get a value from JSON key - sql

I need to get the email address from this 'facets' table I created from my firehose logs (JSON).
Now, I am using Athena to get particular information.
I need to get the email addresses from this:
This is my out of 'facets' when I pass-
SELECT * FROM "sampledb"."facets" limit 10
{email_channel={mail_event={mail={message_id=oadfosadu6237864237615, message_send_timestamp=1622696691764, from_address=abcd#jk.com, destination=[abcd#jk.com], headers_truncated=false, headers=[{name=From, value=abcd#jk.com}, {name=To, value=abcd#jk.com}, {name=MIME-Version, value=1.0}], common_headers={from=ghjk#li.com, to=[abcd#jk.com]}}, send={}, rendering_failure=null}}}

Assuming you have one column which stores json in provided format you can use json_extract with needed paths (and maybe some casts):
with dataset1 as (
select * from (values(JSON
'{
"email_channel": {
"mail_event": {
"mail": {
"message_id": "oadfosadu6237864237615",
"message_send_timestamp": 1622696691764,
"from_address": "abcd#jk.com",
"destination": [
"abcd#jk.com"
],
"headers_truncated": false,
"headers": [
{
"name": "From",
"value": "abcd#jk.com"
},
{
"name": "To",
"value": "abcd#jk.com"
},
{
"name": "MIME-Version",
"value": "1.0"
}
],
"common_headers": {
"from": "ghjk#li.com",
"to": [
"abcd#jk.com"
]
}
},
"send": {},
"rendering_failure": null
}
}
}')) as facets(facet))
select
json_extract(facet, '$.email_channel.mail_event.mail.from_address') mail_from,
CAST(json_extract(facet, '$.email_channel.mail_event.mail.destination') AS ARRAY(VARCHAR)) destination
from dataset1
And output:
mail_from
destination
"abcd#jk.com"
{abcd#jk.com}

Related

select node value from json column type

A table I called raw_data with three columns: ID, timestamp, payload, the column paylod is a json type having values such as:
{
"data": {
"author_id": "1461871206425108480",
"created_at": "2022-08-17T23:19:14.000Z",
"geo": {
"coordinates": {
"type": "Point",
"coordinates": [
-0.1094,
51.5141
]
},
"place_id": "3eb2c704fe8a50cb"
},
"id": "1560043605762392066",
"text": " ALWAYS # London, United Kingdom"
},
"matching_rules": [
{
"id": "1560042248007458817",
"tag": "london-paris"
}
]
}
From this I want to select rows where the coordinates is available, such as [-0.1094,51.5141]in this case.
SELECT *
FROM raw_data, json_each(payload)
WHERE json_extract(json_each.value, '$.data.geo.') IS NOT NULL
LIMIT 20;
Nothing was returned.
EDIT
NOT ALL json objects have the coordinates node. For example this value:
{
"data": {
"author_id": "1556031969062010881",
"created_at": "2022-08-18T01:42:21.000Z",
"geo": {
"place_id": "006c6743642cb09c"
},
"id": "1560079621017796609",
"text": "Dear Desperate sister say husband no dey oo."
},
"matching_rules": [
{
"id": "1560077018183630848",
"tag": "kaduna-kano-katsina-dutse-zaria"
}
]
}
The correct path is '$.data.geo.coordinates.coordinates' and there is no need for json_each():
SELECT *
FROM raw_data
WHERE json_extract(payload, '$.data.geo.coordinates.coordinates') IS NOT NULL;
See the demo.

Transform Array into columns in BigQuery

I have a json string stored in a string column in BigQuery. There is an Array in it. I would like to pick some fields from array and write its value to BQ columns.
For example - Consider a below json stored in BQ
{
"pool": "mypool",
"statusCode": "0",
"payloads": [
{
"name": "request",
"fullpath": "com.gcp.commontools.edlpayload.EDLPayloadManagerTest$Request",
"jsonPayload": {
"body": "{\"data\":\"foo\"}"
},
"orientation": "REQUEST",
"httpTransport": {
"httpMethod": "POST",
"headers": {
"headers": {
"a": "1"
}
},
"sourceEndpoint": "/v1/foobar"
}
},
{
"name": "response",
"fullpath": "com.gcp.commontools.edlpayload.EDLPayloadManagerTest$Response",
"jsonPayload": {
"body": "{\"data\":\"bar\"}"
},
"orientation": "RESPONSE",
"httpTransport": {
"headers": {
"headers": {
"b": "2"
}
},
"httpResponseCode": 200
}
},
{
"name": "attributes",
"fullpath": "java.util.HashMap",
"nameValuePairs": {
"data": {
"one": "1"
}
},
"orientation": "TRANSITORY"
}
],
"uuid": "11EC-C714-8ADE2390-9619-1B80E63968CC",
"payloadName": "my-overall-name"
}
Consider a target BQ table schema is
pool, requestFullPath, requestPayload, responseFullPath, responsePayload
From the above json, i would like to pick few json elements and map there value to a column in BQ. Please note, array of payload will be dynamic in nature. There can be only 1 payload in the payloads array or there can be multiple. And the order of them is not fixed. For example, request payload can come at [0]th position, 1st position etc.
Consider below
select * from (
select
json_value(json_col, '$.pool') as pool,
json_value(payload, '$.name') as name,
json_value(payload, '$.fullpath') as FullPath,
json_value(payload, '$.jsonPayload.body') as Payload,
from your_table t
, unnest(json_extract_array(json_col, '$.payloads')) payload
)
pivot (any_value(FullPath) as FullPath, any_value(Payload) as Payload for name in ('request', 'response') )
if applied to sample data in your question - output is

Extract array from varchar in PrestoSQL

I have a VARCHAR field like this:
[
{
"config": 0,
"type": "0
},
{
"config": x,
"type": "1"
},
{
"config": "",
"type": ""
},
{
"config": [
{
"address": {},
"category": "",
"merchant": {
"data": [
10,12,23
],
"file": 0
},
"range_id": 1,
"shop_id_info": null
}
],
"type": "new"
}
]
And I need to extract merchant data from this. Desirable output is:
10
12
23
Please advise. I keep getting Cannot cast VARCHAR to array/unnest type VARCHAR
You can try using json path $.*.config.*.merchant.data.* but if it does not work for you (as for me in Athena version, where arrays in json path are not supported well) you can cast your json to ARRAY(JSON) and do some manipultaions from there (needed to fix your JSON a little bit):
Test data:
WITH dataset AS (
SELECT * FROM (VALUES
(JSON '[
{
"config": {},
"type": "0"
},
{
"config": "x",
"type": "1"
},
{
"config": "",
"type": ""
},
{
"config": [
{
"address": {},
"category": "",
"merchant": {
"data": [
10,12,23
],
"file": 0
},
"range_id": 1,
"shop_id_info": null
}
],
"type": "new"
}
]')
) AS t (json_value))
And query:
SELECT flatten(
transform(
flatten(
transform(
CAST(json_value AS ARRAY(JSON))
, json_object -> try(CAST(json_extract(json_object, '$.config') AS ARRAY(JSON))))),
json_config -> CAST(json_extract(json_config, '$.merchant.data') as ARRAY(INTEGER))))
FROM dataset
Which will give you array of numbers:
_col0
[10, 12, 23]
And from there you can continue with unnest and so on if needed.

Unexpected behavior of ARRAY_SLICE in Cosmos Db SQL API

I have Cosmos DB collection (called sample) containing the following documents:
[
{
"id": "id1",
"messages": [
{
"messageId": "message1",
"Text": "Value1"
},
{
"messageId": "message2",
"Text": "Value2"
}
]
},
{
"id": "id2",
"messages": [
{
"messageId": "message3",
"Text": "Value3"
},
{
"messageId": "message4",
"Text": "Value1"
}
]
},
{
"id": "id3",
"messages": [
{
"messageId": "message5",
"Text": "Value1"
},
{
"messageId": "message6",
"Text": "Value2"
}
]
},
{
"id": "id4",
"messages": [
{
"messageId": "message7",
"Text": "Value5"
},
{
"messageId": "message8",
"Text": "Value2"
}
]
},
]
I am trying to retrieve all the Documents, having messages and the first message has the field "Text"= 'Value1'.
In this sample the documents with the ids '1' and '3' would be retrieved. Please notice that the document with id='id2' wouldn't be retrieved,
since the value of the text of the first message is 'Value3'.
The collection as mentioned is called sample and I am running the following Query:
"select sample.id, sample.messages, ARRAY_SLICE(sample.messages, 0, 1)[0].Text as valueOfText from sample"
As you can see in the first two images, I retrieve all Documents and every one of them have the field "valueOfText" set to value of the first message, as expected.
Now when I filter the collection (the third image), I retrieve no results at all.
Is this an expected behavior?
Following your sql, got same results:
But why you have to use ARRAY_SLICE,it is used to return truncated array.Since your requirement is specific:
trying to retrieve all the Documents, having messages and the first
message has the field "Text"= 'Value1'
Just use sql:
SELECT c.id,c.messages,c.messages[0].Text as valueOfText FROM c
where c.messages[0].Text = 'Value1'
Output:

Apache Nifi: UpdateRecord replace child values

I'm trying to use UpdateRecord 1.9.0 processor to modify a JSON but it does not replace the values as I want.
this is the source message
{
"type": "A",
"ids": [{
"id": "1",
"value": "abc"
}, {
"id": "2",
"value": "def"
}, {
"id": "3",
"value": "ghi"
}
]
}
and the wanted output
{
"ids": [{
"userId": "1",
}, {
"userId": "2",
}, {
"userId": "3",
}
]
}
I have configured the processor as follows
processor config
Reader:
reader
Schema registry:
schema
writer:
writer
And it works, the output is a JSON without the field 'type' and the ids have the field 'userId' instead 'id' and 'value'.
To fill the value of userId, I defined the replace strategy and the property to replace:
strategy
But the output is wrong. The userId is always filled with the id of the last element in the array:
{
"ids": [{
"userId": "3"
}, {
"userId": "3"
}, {
"userId": "3"
}
]
}
I think the value of the expression is ok because if I try to replace only one record it works fine (/ids[0]/userId, ..id)
Nifi docs has a really similar example (example 3):
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.7.1/org.apache.nifi.processors.standard.UpdateRecord/additionalDetails.html
But it does not work for me.
What am I doing wrong?
thanks
Finally I have used JoltJSONTransform processor instead UpdateRecord
JoltJSONTransform
template:
[
{
"operation": "shift",
"spec": {
"ids":{
"*":{
"id": "ids[&1].userId"
}
}
}
}
]
Easier than UpdateRecord