Correct JSON to POST to PubSub - Dataflow - BiqQuery? Correct dataschema?

Correct JSON to POST to PubSub - Dataflow - BiqQuery? Correct dataschema? - google-bigquery

I'm taking my first experimental steps with google-pre-setup templates in a Google Cloud Template (Cloud Pub/Sub to BigQuery).
As a milestone to my final goal (having physical gadgets reporting a data stream to Google Cloud Pub/Bub), my wish is to achieve something like this:
POSTMAN (make authenticated POST request with JSON message to an Google Cloud Platform, GPC, endpoint) --> GPC Pub/Sub --> GPC DataFlow --> GPC BigQuery.
Right now I am following the tutorial found in Executing Templates, https://cloud.google.com/dataflow/docs/templates/executing-templates, "Example 2: Custom template, streaming job". This section states:
...This example projects.templates.launch request creates a streaming job
from a template that reads from a Pub/Sub topic and writes to a
BigQuery table. The BigQuery table must already exist with the
appropriate schema. If successful, the response body contains an
instance of LaunchTemplateResponse. ...
and further more how to do a POST:
https://dataflow.googleapis.com/v1b3/projects/[YOUR_PROJECT_ID]/templates:launch?gcsPath=gs://[YOUR_BUCKET_NAME]/templates/TemplateName
{
"jobName": "[JOB_NAME]",
"parameters": {
"topic": "projects/[YOUR_PROJECT_ID]/topics/[YOUR_TOPIC_NAME]",
"table": "[YOUR_PROJECT_ID]:[YOUR_DATASET].[YOUR_TABLE_NAME]"
},
"environment": {
"tempLocation": "gs://[YOUR_BUCKET_NAME]/temp",
"zone": "us-central1-f"
}
}
There are two things that confuses me. Let's for the sake of a simple example say that I have multiple vehicles who continuously should report their current status. I have already created my MQTT topic: VEHICLE_STATUS. Each och my vehicles should be able to report its:
Position [String]
Speed [Float]
Time [String]
VehicleID [Integer]
=======
I'm aware of the prototype for a PubsubMessage:
{
"data": string,
"attributes": {
string: string,
...
},
"messageId": string,
"publishTime": string,
}
My questions:
How should my BigQuery table schema look (which columns do I need to create)?
How should the entire corresponding JSON message look? What should my vehicle report to the endpoint each time?

Related

How do I ingest tide gauge data from the NOAA HTTP API into Thingsboard Professional?

NOAA provides tidal and weather data through their own http API, and I would like to be able to use their API to get data into ThingsBoard (Professional) every six minutes to overlay with my device data (their data are updated every 6 minutes). Can someone walk me through the details of using the correct Integrations or Rule chains to get the time series data added to the database? It would also be nice to only use the metadata once. Below you can see how to get the most recent tide gauge level (water level) using their API.
For example, to see the latest tide gauge water level for a tide gauge (in this case, tide gauge 8638610), the API allows for getting the most recent water level information -- https://api.tidesandcurrents.noaa.gov/api/prod/datagetter?date=latest&station=8638610&product=water_level&datum=navd&units=metric&time_zone=lst_ldt&application=web_services&format=json
That call produces the following JSON: {"metadata":{"id":"8638610","name":"Sewells Point","lat":"36.9467","lon":"-76.3300"},"data":[{"t":"2022-02-08 22:42", "v":"-0.134", "s":"0.003", "f":"1,0,0,0", "q":"p"}]}
The Data Converter was fairly easy to construct (except maybe the noaa_data.data[0, 0] used in the code below):
//function Decoder(payload,metadata)
var noaa_data = decodeToJson(payload);
var deviceName = noaa_data.metadata.id;
var dataType = 'water_level';
var latitude = noaa_data.metadata.lat;
var longitude = noaa_data.metadata.lon;
var waterLevelData = noaa_data.data[0, 0];
//functions
function decodeToString(payload) {
return String.fromCharCode.apply(String, payload);
}
var result = {
deviceName: deviceName,
dataType: dataType,
time: waterLevelData.t,
waterLevel: waterLevelData.v,
waterLevelStDev: waterLevelData.s,
latitude: latitude,
longitude: longitude
}
function decodeToJson(payload) {
var str = decodeToString(payload);
var data = JSON.parse(str);
return data;
}
return result;
which has an Output:
{
"deviceName": "8638610",
"dataType": "water_level",
"time": "2022-02-08 22:42",
"waterLevel": "-0.134",
"waterLevelStDev": "0.003",
"latitude": "36.9467",
"longitude": "-76.3300"
}
I am not sure what process to use to get the data into ThingsBoard to be displayed as a device alongside my other device data.
Thank you for your help.

If you have a specific(and small) number of stations to grab then you can do the following:
Create the devices in Thingsboard manually
Go into rule chains, create a water stations rule chain
For each water station place a 'Generator' node, selecting the originator as required.
Route these into an external "Rest API" node.
Route the result of the post into a blue script node and put your decoder in there
Route result to telemetry
Example rule chain
More complex solution but more scalable:
Use a single generator node
Route the message into a blue script. This will contain a list of station id's that you want to pull info for. By setting the output of a script to the following you can make it send out multiple messages in sequence:
return [{msg:{}, metadata:{}, msgType{}, ...etc...]
Route the blue script into the rest api call and get the station data
Do some post processing with another blue script node if you need to. Don't decode the data here though.
Route all this into another rest api node and POST the data back to your HTTP integration endpoint (if you don't have one you will need to create it. Fairly simple)
Connect your data converter to this integration.
Finally, modify your output so that it is accepted by the converter output
{
"deviceName": "8638610",
"deviceType": "water-station",
"telemetry": {
"dataType": "water_level",
"time": "2022-02-08 22:42",
"waterLevel": "-0.134",
"waterLevelStDev": "0.003",
"latitude": "36.9467",
"longitude": "-76.3300"
}
}
Rough example
Above is how I would do it if I didn't want to use any external services. If you're AWS savvy I'd say set up a CRON job to trigger a lambda function every 6 minutes and post into your platform. Either will work.

Data Factory copy pipeline from API

We use Azure Data Factory copy pipeline to transfer data from REST api's to a Azure SQL Database and it is doing some strange things. Because we loop over a set of API's that need to be transferred the mapping is empty from the copy activity.
But for one API the automatic mapping is going wrong, the destination table is created with all the needed columns and correct datatypes based on the received metadata. When we run the pipeline for that specific API, the following message is showed.
{ "errorCode": "2200", "message": "ErrorCode=SchemaMappingFailedInHierarchicalToTabularStage,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to process hierarchical to tabular stage, error message: Ticks must be between DateTime.MinValue.Ticks and DateTime.MaxValue.Ticks.\r\nParameter name: ticks,Source=Microsoft.DataTransfer.ClientLibrary,'", "failureType": "UserError", "target": "Copy data1", "details": [] }
As a test we did do the mapping for that API manually by using the "Import Schema" option on the Mapping page. there we see that all the fields are correctly mapped. We execute the pipeline again using the mapping and everything is working fine.
But of course, we don't want to use a manually mapping because it is used in a loop for different API's also.

azure stream analytics implementation or the best approach

I am new to Steam analytics and I need help here to achieve a specific task.
I have telemetry data coming from iot hub in this format.
Basically i will be getting machines telemetry data and the stage of the operations on that machine streamed to iot hub.
The stages will be indicated with tag ex:"stageid":"stage1". I need to calculate the time taken for each stage using stream analytics based on timestamp and stage tag.
packet Ex:
[{
"Payload": {
"devid": "01",
"locid": "loc01",
"machid": "mac01",
"stageid": "stage1",
"timestamp": "2020-01-24T09:22:00.3270000Z"
},
"Payload": {
"devid": "02",
"locid": "loc01",
"machid": "mac01",
"stageid": "stage1",
"timestamp": "2020-01-24T09:22:00.3270000Z"
}
}]
[{
"Payload": {
"devid": "01",
"locid": "loc01",
"machid": "mac01",
"stageid": "stage2",
"timestamp": "2020-01-24T09:26:00.3270000Z"
},
"Payload": {
"devid": "02",
"locid": "loc01",
"machid": "mac01",
"stageid": "stage2",
"timestamp": "2020-01-26T09:24:00.3270000Z"
}
}]
pls help me can we achieve this with query and what could be the query or what is the other best approach?
Thanks,

Per my knowledge,your needs can't be implemented by ASA built-in features. ASA is a real-time collect data and analytics service.In other words,data need to be processed in the real-time.Current data can't wait for next dataset to do some calculate or merge things. Even if you could use windows function and group by,i believe the frequency of messages pushed by the device is also variable.
As a workaround,my idea is using iot-hub azure function trigger.Inside trigger,you could use code to parse message and save key columns(stageid,timestamp,devid) into some storage,maybe azure table storage. Before every insert,just grab latest row of current device to calculate the time taken with current message so that you could produce that time to store in other residence.In the end, update the latest row for every device.

Are there examples of itineraries that are compatible with the Amadeus self-service trip parse API?

I've tried booking references from a dozen providers (which I don't want to post for privacy reasons) and every time the API returns 'Unable to parse' but with no additional diagnostic information.
As a self-service API they don't offer support through any channel other than Stack Overflow, but I'm hoping someone has successfully used the endpoint.
I'm mostly using GMail to access sample flight booking emails, then selecting "View Original" to download the original MIME format email
This is what I use to read the .eml file into code:
function base64_encode(file) {
// read binary data
var bitmap = fs.readFileSync(file);
// convert binary data to base64 encoded string
return new Buffer(bitmap).toString('base64');
}
However every single email I submit to the endpoint eventually returns:
data:
{ data:
{ type: 'trip-parser-job',
id: 'REDACTED',
self: [Object],
status: 'ERROR',
detail: 'Unable to parse' } } }
and at this point, I'm starting to think that either the API is broken, or they haven't correctly documented what data should be submitted as content. I've decoded the sample document they provide and can't see any major difference between that and my inputs.
Does someone have either some working samples that the API was able to process, or some NodeJS code which seems to reliably get a result from the API?

smartREST template: custom measurement not visible as Datapoint

I created a smartREST template for measurement via "device management -> smartREST templates". I send the reading via MQTT:
s/uc/mytemplateID
777,123,stringValue
The message arrives because I can see it through the API:
{
"time":"2018-07-03T15:36:13.237+01:00",
"id":"47638",
"self":"https://myDomain.mydomain/measurement/measurements/47638",
"source":{
"id":"20018",
"self":"https://myDomain.mydomain/inventory/managedObjects/20018"
},
"type":"myType",
"myStrValue":"stringValue",
"myNumberValue":123
}
But I can not see it as a data point.
I also can not see it under: "device management -> All devices -> myDevice -> Measurements"
If the cause is that the incoming message does not have the expected format, then the question is, how can I use MQTT to send custom measurments with the expected format?
Thank you

To be able to use Cumulocity standard features on your measurements, they must adhere to a certain standard. Transform your template to create measurements like this:
{
"time":"2018-07-03T15:36:13.237+01:00",
"id":"47638",
"self":"https://myDomain.mydomain/measurement/measurements/47638",
"source":{
"id":"20018",
"self":"https://myDomain.mydomain/inventory/managedObjects/20018"
},
"type":"myType",
"myFragment":{
"mySeries":{
"value":123,
"unit":"aUnit"
},
"myOtherSeries":{
"value":321,
"unit":"anotherUnit"
}
}
}
Note that measurement values are always numeric, using string based values here may cause unwanted behavior again.
If you want to communicate string based status variables sending events or alarms is usually a better approach.
The template configuration to send such measurements should look like this:

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Correct JSON to POST to PubSub - Dataflow - BiqQuery? Correct dataschema? - google-bigquery

Related

How do I ingest tide gauge data from the NOAA HTTP API into Thingsboard Professional?

Data Factory copy pipeline from API

azure stream analytics implementation or the best approach

Are there examples of itineraries that are compatible with the Amadeus self-service trip parse API?

smartREST template: custom measurement not visible as Datapoint

Categories

Resources