BigQuery: Convert a JSON stored as string to a struct dynamically - sql

I have JSONs stored to a string field in BQ in the following manner:
select 12345 some_id, '{"163": {"sign": 1, "14": {"20": {"sign": 0}}, "13": {"28": {"sign": 1}}},"154": {"sign": 1, "12": {"21": {"sign": 1}}}}' as x
I want to export this data to JSON file, but the JSON string is exported as follows:
[
{
"some_id": "12345",
"x": "{\"163\": {\"sign\": 1, \"14\": {\"20\": {\"sign\": 0}}, \"13\": {\"28\": {\"sign\": 1}}},\"154\": {\"sign\": 1, \"12\": {\"21\": {\"sign\": 1}}}}"
}
]
Tried to use the BQ JSON functions, as well as the JS temp function which makes use of stringify that appears in many threads, but non of them considers dynamic JSON keys. Creative (or even simple) solution anyone?

Related

How to extract a field from an array of JSON objects in AWS Athena?

I have the following JSON data structure in a column in AWS Athena:
[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]
I would like to somehow extract the values of event_id field, e.g. in the form of a list:
["-3368023833341021830", "5692882176024811076"]
(Though I don't insist on exactly this as long as I can get my event IDs.)
I wanted to use the JSON_EXTRACT function and thought it uses the very same syntax as jq. In jq, I can easily get what I want using the following query syntax:
.[].data.event_id
However, in AWS Athena this results in an error, as apparently the syntax is not entirely compatible with jq. Is there an alternative way to achieve the result I want?
JSON_EXTRACT supports quite limited set of json paths. Depending on Athena engine version you can either process column by casting it to array of maps and processing this array via array functions:
-- sample data
with dataset(json_col) as (
values ('[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]')
)
-- query
select transform(
cast(json_parse(json_col) as array(map(varchar, json))),
m -> json_extract(m['data'], '$.event_id'))
from dataset;
Output:
_col0
["-3368023833341021830", "5692882176024811076"]
Or for 3rd Athena engine version you can try using Trino's json_query:
-- query
select JSON_QUERY(json_col, 'lax $[*].data.event_id' WITH ARRAY WRAPPER)
from dataset;
Note that return type of two will differ - in first case you will have array(json) and in the second one - just varchar.

AWS IoT SQL: Parsing string to JSON

I am writing an AWS IoT Core rule where the incoming message object has a property that contains JSON in an escaped string. Is there a way to convert this to JSON in the result?
Example
Input message
{
"Value": "{\"x\": 1, \"y\": 2}",
"Timestamp": "2022-09-09T13:44:37.000Z"
}
Desired output
{
"x": 1,
"y": 2,
"Timestamp": "2022-09-09T13:44:37.000Z"
}
I am aware that it is possible to write a lambda to do this, but I was hoping that it would be possible to do with just SQL

Extract element from output array in a Copy Data activity

I have a copy data activity that dynamically adds a datetime suffix to the sink file name, which is based on utcnow(). This corresponds to the start datetime in the copy data activity. I am looking to extract the 'start' element from the executionDetails array in the output:
{
"dataRead": 0,
"dataWritten": 86,
"filesWritten": 1,
"sourcePeakConnections": 1,
"sinkPeakConnections": 1,
"rowsRead": 0,
"rowsCopied": 0,
"copyDuration": 4,
"throughput": 0,
"errors": [],
"effectiveIntegrationRuntime": "FXL",
"usedParallelCopies": 1,
"executionDetails": [
{
"source": {
"type": "SqlServer"
},
"sink": {
"type": "AzureBlobFS"
},
"status": "Succeeded",
"start": "2019-08-06T12:29:20.477586Z",
"duration": 4,
"usedParallelCopies": 1,
"detailedDurations": {
"queuingDuration": 3,
"transferDuration": 1
}
}
]
}
Assuming the activity is called CopyData, I want to set the value of start to a variable. I am struggling to get this, a simple #activity('CopyData').output.executionDetails.start does not work, telling me to assign an integer value of the executionDetails array. However trying #activity('CopyData').output.executionDetails[3] errors telling me the range is (0,0). I am looking for a method to extract the datetimestamp into a string variable.
I can store executionDetails in an array variable, but still unable thereafter to extract the start value.
Already worked it out, there range is 0,0 because there is only 1 array in executionDetails containing various values. So, I just need to call the array with [0] and then call the start value, so:
#activity('CopyData').output.executionDetails[0].start

AWS boto3 page_iterator.search can't compare datetime.datetime to str

Trying to capture delta files(files created after last processing) sitting on s3. To do that using boto3 filter iterator by query LastModified value rather than returning all the list of files and filtering on the client site.
According to http://jmespath.org/?, the below query is valid and filters the following json respose;
filtered_iterator = page_iterator.search(
"Contents[?LastModified>='datetime.datetime(2016, 12, 27, 8, 5, 37, tzinfo=tzutc())'].Key")
for key_data in filtered_iterator:
print(key_data)
However it fails with;
RuntimeError: xxxxxxx has failed: can't compare datetime.datetime to str
Sample paginator reponse;
{
"Contents": [{
"LastModified": "datetime.datetime(2016, 12, 28, 8, 5, 31, tzinfo=tzutc())",
"ETag": "1022dad2540da33c35aba123476a4622",
"StorageClass": "STANDARD",
"Key": "blah1/blah11/abc.json",
"Owner": {
"DisplayName": "App-AWS",
"ID": "bfc77ae78cf43fd1b19f24f99998cb86d6fd8220dbfce0ce6a98776253646656"
},
"Size": 623
}, {
"LastModified": "datetime.datetime(2016, 12, 28, 8, 5, 37, tzinfo=tzutc())",
"ETag": "1022dad2540da33c35abacd376a44444",
"StorageClass": "STANDARD",
"Key": "blah2/blah22/xyz.json",
"Owner": {
"DisplayName": "App-AWS",
"ID": "bfc77ae78cf43fd1b19f24f99998cb86d6fd8220dbfce0ce6a81234e632c5a8c"
},
"Size": 702
}
]
}
Boto3 Jmespath implementation does not support dates filtering (it will mark them as incompatible types "unicode" and "datetime" in your example). But by the way Dates are parsed by Amazon you can perform lexographical comparison of them using to_string() method of Jmespath.
Something like this:
"Contents[?to_string(LastModified)>='\"2015-01-01 01:01:01+00:00\"']"
But keep in mind that its a lexographical comparison and not dates comparison. Works most of the time tho.
After spend a few minutes on boto3 paginator documentation, I just realist it is actually an syntax problem, which I overlook it as a string.
Actually, the quote that embrace comparison value on the right is a backquote/backtick, symbol [ ` ] . You cannot use single quote [ ' ] for the comparison values/objects.
After inspect JMESPath example, I notice it is using backquote for comparative value. So boto3 paginator implementation indeed comply to JMESPath standard.
Here is the code I run without error using the backquote.
import boto3
s3 = boto3.client("s3")
s3_paginator = s3.get_paginator('list_objects')
s3_iterator = s3_paginator.paginate(Bucket='mytestbucket')
filtered_iterator = s3_iterator.search(
"Contents[?LastModified >= `datetime.datetime(2016, 12, 27, 8, 5, 37, tzinfo=tzutc())`].Key"
)
for key_data in filtered_iterator:
print(key_data)

How to load json data in form of map in spark sql?

I have json data as shown
"vScore": {
"300x600": {
"v1": "0.50",
"v2": "0.67",
"v3": "ATF",
"v4": "H2",
"v5": "0.11"
},
"728x90": {
"v1": "0.48",
"v2": "0.57",
"v3": "Unknown",
"v4": "H2",
"v5": "0.51"
},
"300x250": {
"v1": "0.64",
"v2": "0.77",
"v3": "ATF",
"v4": "H2",
"v5": "0.70"
},
I want to load this json data in the form of map i.e. I want to load vScores in the map so that 300x250 becomes the key and the nested v1...v5 becomes the value of map.
How to do it in spark sql in scala?
You need to load your data using
data = sqlContext.read.json("file")
you can check how your data was loaded
data.printSchema()
get your data with "Select" query , using
data.select....
More:
How to parse jsonfile with spark