In Azure Stream Analytics Query I am getting an error when using Timestamp by - azure-stream-analytics

I have some data coming in and I need to calculate the parts per min value in the stream. The following is my query
WITH first AS (
SELECT
TS.ArrayIndex,
TS.ArrayValue.FQN,
TS.ArrayValue.vqts
FROM
[EventHubInput] as hub
CROSS APPLY GetArrayElements(hub.timeseries) AS TS )
SELECT first.FQN ,( max(cast(vqt.arrayvalue.v as BIGINT))-(min(cast(vqt.arrayvalue.v as BIGINT)))) AS PPM
FROM first
CROSS APPLY GetArrayElements(first.vqts) AS vqt where first.FQN like '%Production%' and vqt.arrayvalue.q = 192
timestamp by
vqt.arrayvalue.t
group by
first.FQN, TumblingWindow(minute, 1)
I get an error when I add timestamp by
vqt.arrayvalue.t
My input data looks like this..
{
"timeSeries": [
{
"fqn":"MyEnterprise.Gateways.GatewayE.CLX.Tags.StateBasic",
"vqts":[
{
"v": "" ,
"q": 192 ,
"t":"2016-06-24T16:39:45.683+0000"
}
]
}, {
"fqn":"MyEnterprise.Gateways.GatewayE.CLX.Tags.ProductionCount",
"vqts":[
{
"v": 264 ,
"q": 192 ,
"t":"2016-06-24T16:39:45.683+0000"
}
]
}, {
"fqn":".Gateways.GatewayE.CLX.Tags.StateDetailed",
"vqts":[
{
"v": "" ,
"q": 192 ,
"t":"2016-06-24T16:39:45.683+0000"
}
]
} ]
}
It would be great if anybody could help!!

A single event from the stream can only have one timestamp field. What you are currently doing is assigning timestamp for each individual array value from this event. Unfortunately this is currently unsupported.
Possible workaround is to split your ASA job into two. The first one will not use TIMESTAMP BY but will do CROSS APPLY and send array values as individual event to intermediate Event Hub. The second job will read from there, use TIMESTAMP BY and apply the rest of the logic.

Related

How to extract a field from an array of JSON objects in AWS Athena?

I have the following JSON data structure in a column in AWS Athena:
[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]
I would like to somehow extract the values of event_id field, e.g. in the form of a list:
["-3368023833341021830", "5692882176024811076"]
(Though I don't insist on exactly this as long as I can get my event IDs.)
I wanted to use the JSON_EXTRACT function and thought it uses the very same syntax as jq. In jq, I can easily get what I want using the following query syntax:
.[].data.event_id
However, in AWS Athena this results in an error, as apparently the syntax is not entirely compatible with jq. Is there an alternative way to achieve the result I want?
JSON_EXTRACT supports quite limited set of json paths. Depending on Athena engine version you can either process column by casting it to array of maps and processing this array via array functions:
-- sample data
with dataset(json_col) as (
values ('[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]')
)
-- query
select transform(
cast(json_parse(json_col) as array(map(varchar, json))),
m -> json_extract(m['data'], '$.event_id'))
from dataset;
Output:
_col0
["-3368023833341021830", "5692882176024811076"]
Or for 3rd Athena engine version you can try using Trino's json_query:
-- query
select JSON_QUERY(json_col, 'lax $[*].data.event_id' WITH ARRAY WRAPPER)
from dataset;
Note that return type of two will differ - in first case you will have array(json) and in the second one - just varchar.

AWS mqtt SQL query

I have the following mqtt message:
{
"sensors": [
{
"lsid": 412618,
"data": [
{
"temp_in": 72.3,
"heat_index_in": 72,
"dew_point_in": 55.9,
"ts": 1652785241,
"hum_in": 56.3
}
],
"sensor_type": 243,
"data_structure_type": 12
},
{
"lsid": 421195,
}
I can get the "sensors,0.lsid" value and the entire "data" array using this query:
select get(sensors,0).lsid as ls, get(sensors, 0).data as data1 from "topic"
but what I really need is to get "temp_in:72.3" , i.e. the values from the second level array
I've tried using this :AWS Doc., but unless I'm not following it correctly, it doesn't seem to work.
Any help would be greatly appreciated

JSON Parse in Snowflake SQL

I'm trying to convert the original json to the desired results below using SQL in Snowflake. How can I accomplish this?
I've tried parse_json(newFutureAllocations[0]:fundId) but this only brings back the first fundId element.
ORIGINAL
"newFutureAllocations": [
{
"fundId": 1,
"percentAllocation": 2500
},
{
"fundId": 5,
"percentAllocation": 7500
}
]
DESIRED
"newFutureAllocations": {
"1": 2500,
"5": 7500
}
You need to use flatten to turn your array elements in to rows, then use object_agg() to aggregate them back up again, as an object rather than an array. The exact syntax depends on the rest of your query, data, etc, and you haven't provided enough details about that.
The challenge here is to re-construct the object, I used string concatenations:
with data as (
select parse_json(
'[ { "fundId": 1, "percentAllocation": 2500 }
, { "fundId": 5, "percentAllocation": 7500 } ]') j
)
select parse_json('{'||
listagg('"'||x.value:fundId ||'"'||':'|| x.value:percentAllocation, ',')
||'}')
from data, table(flatten(j)) x
group by seq

Set a json objects values from an array in postgresql

I have an Array of JSON objects in PostgreSQL inside data.json, That looks like this
[
{ "10" : [1,2,3,4,5] },
{ "8" : [5,4,3,1,2] },
{ "11" : [1,3,5,4,2] }
]
The data is taken from a select statement
SELECT
ARRAY((SELECT json_build_object(station_id,
ARRAY((SELECT
COALESCE((SELECT SUM(prodtakt_takt)
FROM psproductivity.prodtakt
WHERE prodtakt_start::date = generate_series.shipping_day_sort::date
AND station_id = prodtakt_station_id
),0)
FROM generate_series)) ) FROM psproductivity.station WHERE (SELECT COALESCE(SUM(prodtakt_takt),0) FROM psproductivity.prodtakt WHERE station_id = prodtakt_station_id) > 0
)) AS json, ...
Where generate_series is just a series of dates.
Now I need to that and turn it into this format of a JSON object
{
"x" : "x",
"jsondata" : {
"10" : [1,2,3,4,5]
"8" : [5,4,3,1,2]
"11" : [1,3,5,4,2]
}
}
the software I am working on uses c3.js to process data into graphs so I have to change this format. I am thinking that I need to start with something like
json_build_object( 'jsondata',( SELECT FROM json_each(unnest(data.json)) ) )
But I can think of no route with that logic. Adding the x into the JSON object is easy. I am confident I can do that part if I can just reorganize the array

Parsing JSON in Snowflake

I'm trying to parse a the below nested JSON in Snowflake using the latteral function in Snowflake but I wanted to each nested column in "GoalTime" to show up as a column. For example,
GoalTime_InDoorOpen
2020-03-26T12:58:00-04:00
GoalTime_InLastOff
null
GoalTime_OutStartBoarding
2020-03-27T14:00:00-04:00
"GoalTime": [
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
},
or if you have many rows (what appear to be flights) and thus you need to columns per flight this code be what you are after
with data as (
select flight_code, parse_json(json) as json from values ('nz101','{GoalTime:[{"GoalName": "GoalA", "GoalTime": "2020-03-26T12:58:00-04:00"}, {"GoalName": "GoalB"}]}'),
('nz201','{GoalTime:[{"GoalName": "GoalA"}, {"GoalName": "GoalB", "GoalTime": "2020-03-26T12:58:00-02:00"}]}')
j(flight_code, json)
), unrolled as (
select d.flight_code, f.value:GoalName as goal_name, f.value:GoalTime as goal_time
from data d,
lateral flatten (input => json:GoalTime) f
)
select *
from unrolled
pivot(min(goal_time) for goal_name in ('GoalA', 'GoalB'))
order by flight_code;
it gives the results:
FLIGHT_CODE 'GoalA' 'GoalB'
nz101 "2020-03-26T12:58:00-04:00" null
nz201 null "2020-03-26T12:58:00-02:00"
create or replace function JSON_STRING()
returns string
language javascript
as
$$
return `
[
{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
`;
$$;
select value:GoalName::string as GoalName, value:GoalTime::timestamp as GoalTime
from lateral flatten(input => parse_json(JSON_STRING()));
-- See how the lateral flatten combination works on a JSON variant:
select * from lateral flatten(input => parse_json(JSON_STRING()));
I wrote this to run in any Snowflake worksheet, no tables needed. The function on top simply allows the JSON to be written as a multi-line string in the SQL statement below it. It has no other use than representing a string holding your JSON.
Step 1 is to PARSE_JSON, which converts a string into a variant data type formatted as a JSON object.
Step 2 is the lateral flatten. If you do a select star on that, it will return a number of columns. One of them is "value".
Step 3 is to extract the properties you want using single : notation for the property name and dots to traverse down the nodes from there (if there are any).
Step 4 is to cast the property to the data type you want using double :: notation. This is especially important if you're doing comparisons on the column particularly in join keys.
Note that there's a slight invalid part of the JSON that did not allow it to parse. In the top level the array had a property, which did not parse. I removed that to allow parsing.
Probably close to what you seek is using a standard SQL UNION statement.
Given the following are true to recreate the solution:
Created a table 'JSON_GOALS' with one column for raw JSON called, GOALS_RAW
You have loaded JSON data into a table as the raw JSON, with compliant JSON object array syntax, and a parent, GoalTimeGroup, ex: {[{}]}, so
{
"GoalTimeGroup": [{
"GoalName": "GoalTime_InDoorOpen",
"GoalTime": "2020-03-26T12:58:00-04:00"
},
{
"GoalName": "GoalTime_InLastOff"
},
{
"GoalName": "GoalTime_InReadyToTow"
},
{
"GoalName": "GoalTime_OutTowAtGate"
},
{
"GoalName": "GoalTime_OutStartBoarding",
"GoalTime": "2020-03-27T14:00:00-04:00"
}
]
}
Doing so allows you to write a fairly standard JSON retrieve in Snowflake with the following syntax:
SELECT GOALS_RAW:GoalTimeGroup[0].GoalName, GOALS_RAW:GoalTimeGroup[1].GoalName, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
UNION
SELECT GOALS_RAW:GoalTimeGroup[0].GoalTime, GOALS_RAW:GoalTimeGroup[1].GoalTime, GOALS_RAW:GoalTimeGroup[2].GoalName
FROM JSON_GOALS
;
This gives you closer to the answer you are looking for and seems to provide a simpler solution. You can also control how many rows you'd want based on your JSON object attributes for each GOAL object.
Recommendations to enhance this would be to create a function that could detect the depth of each nested element and perhaps auto generate the indexes for 'n' number of columns.
The library below provides a method called "ExecuteAll" which one of the params is "tags", so if you provide an array of tags and values, all of them will be parsed and validated plus keeping the features of the sql injection protection from Snowflake.
snowflake-multisql