JSON Parse in Snowflake SQL - sql

I'm trying to convert the original json to the desired results below using SQL in Snowflake. How can I accomplish this?
I've tried parse_json(newFutureAllocations[0]:fundId) but this only brings back the first fundId element.
ORIGINAL
"newFutureAllocations": [
{
"fundId": 1,
"percentAllocation": 2500
},
{
"fundId": 5,
"percentAllocation": 7500
}
]
DESIRED
"newFutureAllocations": {
"1": 2500,
"5": 7500
}

You need to use flatten to turn your array elements in to rows, then use object_agg() to aggregate them back up again, as an object rather than an array. The exact syntax depends on the rest of your query, data, etc, and you haven't provided enough details about that.

The challenge here is to re-construct the object, I used string concatenations:
with data as (
select parse_json(
'[ { "fundId": 1, "percentAllocation": 2500 }
, { "fundId": 5, "percentAllocation": 7500 } ]') j
)
select parse_json('{'||
listagg('"'||x.value:fundId ||'"'||':'|| x.value:percentAllocation, ',')
||'}')
from data, table(flatten(j)) x
group by seq

Related

How to extract a field from an array of JSON objects in AWS Athena?

I have the following JSON data structure in a column in AWS Athena:
[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]
I would like to somehow extract the values of event_id field, e.g. in the form of a list:
["-3368023833341021830", "5692882176024811076"]
(Though I don't insist on exactly this as long as I can get my event IDs.)
I wanted to use the JSON_EXTRACT function and thought it uses the very same syntax as jq. In jq, I can easily get what I want using the following query syntax:
.[].data.event_id
However, in AWS Athena this results in an error, as apparently the syntax is not entirely compatible with jq. Is there an alternative way to achieve the result I want?
JSON_EXTRACT supports quite limited set of json paths. Depending on Athena engine version you can either process column by casting it to array of maps and processing this array via array functions:
-- sample data
with dataset(json_col) as (
values ('[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]')
)
-- query
select transform(
cast(json_parse(json_col) as array(map(varchar, json))),
m -> json_extract(m['data'], '$.event_id'))
from dataset;
Output:
_col0
["-3368023833341021830", "5692882176024811076"]
Or for 3rd Athena engine version you can try using Trino's json_query:
-- query
select JSON_QUERY(json_col, 'lax $[*].data.event_id' WITH ARRAY WRAPPER)
from dataset;
Note that return type of two will differ - in first case you will have array(json) and in the second one - just varchar.

Parsing JSON in Snowflake

I'm trying to parse a the below nested JSON in Snowflake using the latteral function in Snowflake but I wanted to each nested column in "GoalTime" to show up as a column. For example,
GoalTime_InDoorOpen
2020-03-26T12:58:00-04:00
GoalTime_InLastOff
null
GoalTime_OutStartBoarding
2020-03-27T14:00:00-04:00
"GoalTime": [
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
},
or if you have many rows (what appear to be flights) and thus you need to columns per flight this code be what you are after
with data as (
select flight_code, parse_json(json) as json from values ('nz101','{GoalTime:[{"GoalName": "GoalA", "GoalTime": "2020-03-26T12:58:00-04:00"}, {"GoalName": "GoalB"}]}'),
('nz201','{GoalTime:[{"GoalName": "GoalA"}, {"GoalName": "GoalB", "GoalTime": "2020-03-26T12:58:00-02:00"}]}')
j(flight_code, json)
), unrolled as (
select d.flight_code, f.value:GoalName as goal_name, f.value:GoalTime as goal_time
from data d,
lateral flatten (input => json:GoalTime) f
)
select *
from unrolled
pivot(min(goal_time) for goal_name in ('GoalA', 'GoalB'))
order by flight_code;
it gives the results:
FLIGHT_CODE 'GoalA' 'GoalB'
nz101 "2020-03-26T12:58:00-04:00" null
nz201 null "2020-03-26T12:58:00-02:00"
create or replace function JSON_STRING()
returns string
language javascript
as
$$
return `
[
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
`;
$$;
select value:GoalName::string as GoalName, value:GoalTime::timestamp as GoalTime
from lateral flatten(input => parse_json(JSON_STRING()));
-- See how the lateral flatten combination works on a JSON variant:
select * from lateral flatten(input => parse_json(JSON_STRING()));
I wrote this to run in any Snowflake worksheet, no tables needed. The function on top simply allows the JSON to be written as a multi-line string in the SQL statement below it. It has no other use than representing a string holding your JSON.
Step 1 is to PARSE_JSON, which converts a string into a variant data type formatted as a JSON object.
Step 2 is the lateral flatten. If you do a select star on that, it will return a number of columns. One of them is "value".
Step 3 is to extract the properties you want using single : notation for the property name and dots to traverse down the nodes from there (if there are any).
Step 4 is to cast the property to the data type you want using double :: notation. This is especially important if you're doing comparisons on the column particularly in join keys.
Note that there's a slight invalid part of the JSON that did not allow it to parse. In the top level the array had a property, which did not parse. I removed that to allow parsing.
Probably close to what you seek is using a standard SQL UNION statement.
Given the following are true to recreate the solution:
Created a table 'JSON_GOALS' with one column for raw JSON called, GOALS_RAW
You have loaded JSON data into a table as the raw JSON, with compliant JSON object array syntax, and a parent, GoalTimeGroup, ex: {[{}]}, so
{
"GoalTimeGroup": [{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
}
Doing so allows you to write a fairly standard JSON retrieve in Snowflake with the following syntax:
SELECT GOALS_RAW:GoalTimeGroup[0].GoalName, GOALS_RAW:GoalTimeGroup[1].GoalName, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
UNION
SELECT GOALS_RAW:GoalTimeGroup[0].GoalTime, GOALS_RAW:GoalTimeGroup[1].GoalTime, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
;
This gives you closer to the answer you are looking for and seems to provide a simpler solution. You can also control how many rows you'd want based on your JSON object attributes for each GOAL object.
Recommendations to enhance this would be to create a function that could detect the depth of each nested element and perhaps auto generate the indexes for 'n' number of columns.
The library below provides a method called "ExecuteAll" which one of the params is "tags", so if you provide an array of tags and values, all of them will be parsed and validated plus keeping the features of the sql injection protection from Snowflake.
snowflake-multisql

Querying Deep JSONb Information - PostgreSQL

I have the following JSON array stored on a row:
{
"openings": [
{
"visibleFormData": {
"productName": "test"
}
}
]
}
I'm trying to get the value of productName. So far I've tried something like this:
SELECT tbl.column->'openings'->'0'->'visibleFormData'->>'productName'
The theory being that this would grab the first object (index 0) in the openings array and then grab the productName attribute from that object's visibleFormData object.
All I'm getting is null, though. I've tried multiple configurations of this. I'm thinking it has to do with the grabbing of index zero, but I am unsure. I am not a regular PSQL user, so it's proving a tad tricky to debug.
The json array index is integer, so use 0 instead of '0':
with tbl(col) as (
values
('{
"openings": [
{
"visibleFormData": {
"productName": "test"
}
}
]
}'::jsonb)
)
SELECT tbl.col->'openings'->0->'visibleFormData'->>'productName'
FROM tbl
?column?
----------
test
(1 row)

SQL Server - "for json path" statement does not return more than 2984 lines of JSON string

I'm trying to generate huge amount of data in a complex and nested JSON string using "for json path" statement, and I'm using multiple functions to create different parts of this JSON string, as follow:
declare #queue nvarchar(max)
select #queue = (
select x.ID as layoutID
, l.Title as layoutName
, JSON_QUERY(queue_objects (#productID, x.ID)) as [objects]
from Layouts x
inner join LayoutLanguages l on l.LayoutID = x.ID
where x.ID = #layoutid
group by x.ID, l.Title
for json path
)
select #queue as JSON
Thus far, JSON would be:
{
"root": [{
"layouts": [{
"layoutID": 5
, "layoutName": "foo"
, "objects": []
}]
}]
}
and the "queue_objects" function then would be called to fill out 'objects' array:
queue_objects
select 0 as objectID
, case when (select inherited_counter(#layoutID,0)) > 0 then 'false' else 'true' end as editable
, JSON_QUERY(queue_properties (p.Table2ID)) as propertyObjects
, JSON_QUERY('[]') as inherited
from productList p
where p.Table1ID = #productID
group by p.Table2ID
for json path
And then JSON would be:
{
"root": [{
"layouts": [{
"layoutID": 5
, "layoutName": "foo"
, "objects": [{
"objectID": 1000
, "editable": "true"
, "propertyObjects": []
, "inherited": []
}, {
"objectID": 2000
, "editable": "false"
, "propertyObjects": []
, "inherited": []
}]
}]
}]
}
Also "inherited_counter" and "queue_properties" functions would be called to fill corresponding keys.
This is just a sample, the code won't work as I'm not putting functions here.
But my question is: is it the functions that simultaneously call each other, makes the server return broken JSON string? or it's the server itself that can't handle JSON strings more than 2984 lines?
EDIT: what I mean by 2984 lines, is that I use beautifier on JSON, the server won't return the string line by line, it returns JSON broken, but after beautifying it happens to be 2984 lines of string.
As I wrote in my comment to the OP, this is probably due to SSMS has a limit of how many characters to display in a column in the result grid. It has no impact on the actual result, e.g. the result has all data, it is just that SSMS doesn't display it all.
To fix this, you can increase the number of characters SSMS retrieves:
I would not recommend that - "how long is a piece of string", but instead select the result into a nvarchar(max) variable, and PRINT that variable. That should give you the whole text.
Hope this helps!

In Azure Stream Analytics Query I am getting an error when using Timestamp by

I have some data coming in and I need to calculate the parts per min value in the stream. The following is my query
WITH first AS (
SELECT
TS.ArrayIndex,
TS.ArrayValue.FQN,
TS.ArrayValue.vqts
FROM
[EventHubInput] as hub
CROSS APPLY GetArrayElements(hub.timeseries) AS TS )
SELECT first.FQN ,( max(cast(vqt.arrayvalue.v as BIGINT))-(min(cast(vqt.arrayvalue.v as BIGINT)))) AS PPM
FROM first
CROSS APPLY GetArrayElements(first.vqts) AS vqt where first.FQN like '%Production%' and vqt.arrayvalue.q = 192
timestamp by
vqt.arrayvalue.t
group by
first.FQN, TumblingWindow(minute, 1)
I get an error when I add timestamp by
vqt.arrayvalue.t
My input data looks like this..
{
"timeSeries": [
{
"fqn":"MyEnterprise.Gateways.GatewayE.CLX.Tags.StateBasic",
"vqts":[
{
"v": "" ,
"q": 192 ,
"t":"2016-06-24T16:39:45.683+0000"
}
]
}, {
"fqn":"MyEnterprise.Gateways.GatewayE.CLX.Tags.ProductionCount",
"vqts":[
{
"v": 264 ,
"q": 192 ,
"t":"2016-06-24T16:39:45.683+0000"
}
]
}, {
"fqn":".Gateways.GatewayE.CLX.Tags.StateDetailed",
"vqts":[
{
"v": "" ,
"q": 192 ,
"t":"2016-06-24T16:39:45.683+0000"
}
]
} ]
}
It would be great if anybody could help!!
A single event from the stream can only have one timestamp field. What you are currently doing is assigning timestamp for each individual array value from this event. Unfortunately this is currently unsupported.
Possible workaround is to split your ASA job into two. The first one will not use TIMESTAMP BY but will do CROSS APPLY and send array values as individual event to intermediate Event Hub. The second job will read from there, use TIMESTAMP BY and apply the rest of the logic.