azure stream analytics implementation or the best approach - azure-stream-analytics

I am new to Steam analytics and I need help here to achieve a specific task.
I have telemetry data coming from iot hub in this format.
Basically i will be getting machines telemetry data and the stage of the operations on that machine streamed to iot hub.
The stages will be indicated with tag ex:"stageid":"stage1". I need to calculate the time taken for each stage using stream analytics based on timestamp and stage tag.
packet Ex:
[{
"Payload": {
"devid": "01",
"locid": "loc01",
"machid": "mac01",
"stageid": "stage1",
"timestamp": "2020-01-24T09:22:00.3270000Z"
},
"Payload": {
"devid": "02",
"locid": "loc01",
"machid": "mac01",
"stageid": "stage1",
"timestamp": "2020-01-24T09:22:00.3270000Z"
}
}]
[{
"Payload": {
"devid": "01",
"locid": "loc01",
"machid": "mac01",
"stageid": "stage2",
"timestamp": "2020-01-24T09:26:00.3270000Z"
},
"Payload": {
"devid": "02",
"locid": "loc01",
"machid": "mac01",
"stageid": "stage2",
"timestamp": "2020-01-26T09:24:00.3270000Z"
}
}]
pls help me can we achieve this with query and what could be the query or what is the other best approach?
Thanks,

Per my knowledge,your needs can't be implemented by ASA built-in features. ASA is a real-time collect data and analytics service.In other words,data need to be processed in the real-time.Current data can't wait for next dataset to do some calculate or merge things. Even if you could use windows function and group by,i believe the frequency of messages pushed by the device is also variable.
As a workaround,my idea is using iot-hub azure function trigger.Inside trigger,you could use code to parse message and save key columns(stageid,timestamp,devid) into some storage,maybe azure table storage. Before every insert,just grab latest row of current device to calculate the time taken with current message so that you could produce that time to store in other residence.In the end, update the latest row for every device.

Related

How do I ingest tide gauge data from the NOAA HTTP API into Thingsboard Professional?

NOAA provides tidal and weather data through their own http API, and I would like to be able to use their API to get data into ThingsBoard (Professional) every six minutes to overlay with my device data (their data are updated every 6 minutes). Can someone walk me through the details of using the correct Integrations or Rule chains to get the time series data added to the database? It would also be nice to only use the metadata once. Below you can see how to get the most recent tide gauge level (water level) using their API.
For example, to see the latest tide gauge water level for a tide gauge (in this case, tide gauge 8638610), the API allows for getting the most recent water level information -- https://api.tidesandcurrents.noaa.gov/api/prod/datagetter?date=latest&station=8638610&product=water_level&datum=navd&units=metric&time_zone=lst_ldt&application=web_services&format=json
That call produces the following JSON: {"metadata":{"id":"8638610","name":"Sewells Point","lat":"36.9467","lon":"-76.3300"},"data":[{"t":"2022-02-08 22:42", "v":"-0.134", "s":"0.003", "f":"1,0,0,0", "q":"p"}]}
The Data Converter was fairly easy to construct (except maybe the noaa_data.data[0, 0] used in the code below):
//function Decoder(payload,metadata)
var noaa_data = decodeToJson(payload);
var deviceName = noaa_data.metadata.id;
var dataType = 'water_level';
var latitude = noaa_data.metadata.lat;
var longitude = noaa_data.metadata.lon;
var waterLevelData = noaa_data.data[0, 0];
//functions
function decodeToString(payload) {
return String.fromCharCode.apply(String, payload);
}
var result = {
deviceName: deviceName,
dataType: dataType,
time: waterLevelData.t,
waterLevel: waterLevelData.v,
waterLevelStDev: waterLevelData.s,
latitude: latitude,
longitude: longitude
}
function decodeToJson(payload) {
var str = decodeToString(payload);
var data = JSON.parse(str);
return data;
}
return result;
which has an Output:
{
"deviceName": "8638610",
"dataType": "water_level",
"time": "2022-02-08 22:42",
"waterLevel": "-0.134",
"waterLevelStDev": "0.003",
"latitude": "36.9467",
"longitude": "-76.3300"
}
I am not sure what process to use to get the data into ThingsBoard to be displayed as a device alongside my other device data.
Thank you for your help.
If you have a specific(and small) number of stations to grab then you can do the following:
Create the devices in Thingsboard manually
Go into rule chains, create a water stations rule chain
For each water station place a 'Generator' node, selecting the originator as required.
Route these into an external "Rest API" node.
Route the result of the post into a blue script node and put your decoder in there
Route result to telemetry
Example rule chain
More complex solution but more scalable:
Use a single generator node
Route the message into a blue script. This will contain a list of station id's that you want to pull info for. By setting the output of a script to the following you can make it send out multiple messages in sequence:
return [{msg:{}, metadata:{}, msgType{}, ...etc...]
Route the blue script into the rest api call and get the station data
Do some post processing with another blue script node if you need to. Don't decode the data here though.
Route all this into another rest api node and POST the data back to your HTTP integration endpoint (if you don't have one you will need to create it. Fairly simple)
Connect your data converter to this integration.
Finally, modify your output so that it is accepted by the converter output
{
"deviceName": "8638610",
"deviceType": "water-station",
"telemetry": {
"dataType": "water_level",
"time": "2022-02-08 22:42",
"waterLevel": "-0.134",
"waterLevelStDev": "0.003",
"latitude": "36.9467",
"longitude": "-76.3300"
}
}
Rough example
Above is how I would do it if I didn't want to use any external services. If you're AWS savvy I'd say set up a CRON job to trigger a lambda function every 6 minutes and post into your platform. Either will work.

Wit AI response for API requests

I'm using wit ai for a bot and I think it's amazing. However, I must provide the customer with screens in my web app to train and manage the app. And here I found a big problem (or maybe I'm just lost). The documentation of the REST API is not enough to design a client that acts like the wit console (not even close). it's like a tutorial of what endpoints you can hit and an overview of the parameters, but no clean explanation of the structure of the response.
For example, there is no endpoint to get the insights edge. Also and most importantly, no clear documentation about the response structure when hitting the message endpoints (i.e. the structure the returned entities: are they prebuilt or not, and if they are, is the value a string or an object or array, and what the object might contain [e.g. datetime]). Also the problem of the deprecated guide and the new guide (the new guide should be done and complete by now). I'm building parts of the code based on my testing. Sometimes when I test something new (like adding a range in the datetime entity instead of just a value), I get an error when I try to set the values to the user since I haven't parsed the response right, and the new info I get makes me modify the DB structure at my end sometimes.
So, the bottom line, is there a complete reference that I can implement a complete client in my web app (my web app is in Java by the way and I couldn't find a client library that handles the latest version of the API)? Again, the tool is AWESOME but the documentation is not enough, or maybe I'm missing something.
The document is not enough of course but I think its pretty straightforward. And from what I read there is response structure under "Return the meaning of a sentence".
It's response in JSON format. So you need to decode the response first.
Example Request:
$ curl -XGET 'https://api.wit.ai/message?v=20170307&q=how%20many%20people%20between%20Tuesday%20and%20Friday' \
-H 'Authorization: Bearer $TOKEN'
Example Response:
{
"msg_id": "387b8515-0c1d-42a9-aa80-e68b66b66c27",
"_text": "how many people between Tuesday and Friday",
"entities": {
"metric": [ {
"metadata": "{'code': 324}",
"value": "metric_visitor",
"confidence": 0.9231
} ],
"datetime": [ {
"value": {
"from": "2014-07-01T00:00:00.000-07:00",
"to": "2014-07-02T00:00:00.000-07:00"
},
"confidence": 1
}, {
"value": {
"from": "2014-07-04T00:00:00.000-07:00",
"to": "2014-07-05T00:00:00.000-07:00"
},
"confidence": 1
} ]
}
}
You can read more about response structure under Return the meaning of a sentence

Correct JSON to POST to PubSub - Dataflow - BiqQuery? Correct dataschema?

I'm taking my first experimental steps with google-pre-setup templates in a Google Cloud Template (Cloud Pub/Sub to BigQuery).
As a milestone to my final goal (having physical gadgets reporting a data stream to Google Cloud Pub/Bub), my wish is to achieve something like this:
POSTMAN (make authenticated POST request with JSON message to an Google Cloud Platform, GPC, endpoint) --> GPC Pub/Sub --> GPC DataFlow --> GPC BigQuery.
Right now I am following the tutorial found in Executing Templates, https://cloud.google.com/dataflow/docs/templates/executing-templates, "Example 2: Custom template, streaming job". This section states:
...This example projects.templates.launch request creates a streaming job
from a template that reads from a Pub/Sub topic and writes to a
BigQuery table. The BigQuery table must already exist with the
appropriate schema. If successful, the response body contains an
instance of LaunchTemplateResponse. ...
and further more how to do a POST:
https://dataflow.googleapis.com/v1b3/projects/[YOUR_PROJECT_ID]/templates:launch?gcsPath=gs://[YOUR_BUCKET_NAME]/templates/TemplateName
{
"jobName": "[JOB_NAME]",
"parameters": {
"topic": "projects/[YOUR_PROJECT_ID]/topics/[YOUR_TOPIC_NAME]",
"table": "[YOUR_PROJECT_ID]:[YOUR_DATASET].[YOUR_TABLE_NAME]"
},
"environment": {
"tempLocation": "gs://[YOUR_BUCKET_NAME]/temp",
"zone": "us-central1-f"
}
}
There are two things that confuses me. Let's for the sake of a simple example say that I have multiple vehicles who continuously should report their current status. I have already created my MQTT topic: VEHICLE_STATUS. Each och my vehicles should be able to report its:
Position [String]
Speed [Float]
Time [String]
VehicleID [Integer]
=======
I'm aware of the prototype for a PubsubMessage:
{
"data": string,
"attributes": {
string: string,
...
},
"messageId": string,
"publishTime": string,
}
My questions:
How should my BigQuery table schema look (which columns do I need to create)?
How should the entire corresponding JSON message look? What should my vehicle report to the endpoint each time?

Checking a SQL Azure Database is available from a c# code

I do an up scale with a code like this on an Azure SQL Database:
ALTER DATABASE [MYDATABASE] Modify (SERVICE_OBJECTIVE = 'S1');
How is it possible to know from a c# code when Azure has completed the job and the table is already available?
Checking for SERVICE_OBJECTIVE value is not enough, the process still continue further.
Instead of performing this task in T-SQL I would perform the task from C# using an API call over to the REST API, you can find all of the details on MSDN.
Specifically, you should look at the Get Create or Update Database Status API method which allows you to call the following URL:
GET https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}}/operationResults/{operation-id}?api-version={api-version}
The JSON body allows you to pass the following parameters:
{
"id": "{uri-of-database}",
"name": "{database-name}",
"type": "{database-type}",
"location": "{server-location}",
"tags": {
"{tag-key}": "{tag-value}",
...
},
"properties": {
"databaseId": "{database-id}",
"edition": "{database-edition}",
"status": "{database-status}",
"serviceLevelObjective": "{performance-level}",
"collation": "{collation-name}",
"maxSizeBytes": {max-database-size},
"creationDate": "{date-create}",
"currentServiceLevelObjectiveId":"{current-service-id}",
"requestedServiceObjectiveId":"{requested-service-id}",
"defaultSecondaryLocation": "{secondary-server-location}"
}
}
In the properties section, the serviceLevelObjective property is the one you can use to resize the database. To finish off you can then perform a GET on the Get Database API method where you can compare both the currentServiceLevelObjectiveId and requestedServiceObjectiveId properties to ensure your command has been successful.
Note: Don't forget to pass all of the common parameters required to make API calls in Azure.

Pentaho BI Server - Charting live data

I have a URL that produces JSON,
{
"status": "success",
"totalRecords": 55,
"records": [
{
"timestamp": 1393418044341,
"load": 40,
"deviceId": 285
},
{
"timestamp": 1393418104337,
"load": 42,
"deviceId": 285
},
{
"timestamp": 1393418164328,
"load": 24.5,
"deviceId": 285
},
{
"timestamp": 1393418224322,
"load": 42.5,
"deviceId": 285
},
It goes on and on, producing data every 30 seconds or so.
I have used Pentaho data-integration to parse and extract each of the data and put them into individual groups - timestamp, load and deviceId.
When I saved this it produced a .ktr file.
From this i have used the report-designer to upload the .ktr file and make charts with the data and then I uploaded the charts to the BI Server.
BUT
Can I just take the data, feed it into the BI Server and produce charts, bypassing the report-designer?
Yes you can do this - and using report designer would definately be the wrong way.
However you've inadvertantly made the right choice in building the first bit in PDI! Thats a good move.
Next step is to install CTools, add your .ktr to a CDA datasource (within CDE) and then using CDE define your charts and finally a refresh interval on the dashboard.
There's lots of good CTools tutorials around if you havent used it yet - it is also easily installed from the marketplace, or via ctools-installer.sh