is it possible to use azure stream analytics with mod bus? - azure-stream-analytics

Actually i am using Modbus simulator and sending data to azure iot hub.I am using azure stream analytics but i am not able to apply any quert on modbus data will it possible to apply azure stream analytics query on modbus data or opc ua simulated data there is only one query which can perform is select * and also will it be possible to find object value through azure analytics query. i am providing json value for reference
{
"body": {
"PublishTimestamp": "2020-06-23 05:34:22",
"Content": [
{
"HwId": "PowerMeter-0a:01:01:01:01:01",
"Data": [
{
"CorrelationId": "DefaultCorrelationId",
"SourceTimestamp": "2020-06-23 05:34:21",
"Values": [
{
"DisplayName": "humidity",
"Address": "400002",
"Value": "47"
},
{
"DisplayName": "Temperature",
"Address": "400001",
"Value": "78"
}
]
}
]
}
]
}
}
i need to find the value data through azure analytics query
this is my query i am able to get the data till Content
SELECT
body.Content
FROM temperature
and when i am trying this query
select
body.Content.data.Values.Value
FROM temperature
i am getting null as output
thanks in advance
Avi

This is the only question I could find about this topic. I am working on basically the same thing: I am using using the Modbus module from Azure Marketplace in an IoT Edge application, and I wanted to parse out the incoming Modbus data from the IoT Hub so I can view it in a table, with each value referencing its associated HwId and Timestamp.
I think I found something that might work for you. You need to specify each of the nested JSON values as arrays using GetArrayElements:
SELECT
ncontent.ArrayValue.HwId as HwId,
ndata.ArrayValue.SourceTimestamp as [Timestamp],
nvalues.ArrayValue.DisplayName as DisplayName,
nvalues.ArrayValue.Address as Address,
nvalues.ArrayValue.Value as Value
INTO
<output>
FROM
<input> i
CROSS APPLY GetArrayElements(i.Content) as ncontent
CROSS APPLY GetArrayElements(ncontent.ArrayValue.Data) as ndata
CROSS APPLY GetArrayElements(ndata.ArrayValue.[Values]) as nvalues
Here is the output that I got in the Test results:
HwId Timestamp DisplayName Address Value
"HwId1" "2020-06-26 19:16:31" "HREG0002" "40002" "32019"
"HwId2" "2020-06-26 19:16:31" "HREG0005" "40005" "17506"
"HwId3" "2020-06-26 19:16:31" "HREG0008" "40008" "33352"

Related

How can I provide metrics to Splunk via HTTP?

I have been reading through Splunk Enterprise documentation and it appears I can provide metrics in JSON format over HTTP/HTTPS: https://docs.splunk.com/Documentation/Splunk/8.1.1/Metrics/GetMetricsInOther#Get_metrics_in_from_clients_over_HTTP_or_HTTPS
However I can't see a reference what exactly this JSON format looks like, beyond one example. I'm also not clear from the docs if Splunk can be configured to poll this endpoint on my process, or if I must push the data to Splunk.
Splunk's HEC interface is receive-only. It does not poll.
Any time you find a Splunk documentation page that is unclear, submit feedback on it. Splunk's Docs team is great about updating the documents in response to feedback.
Let's look at the example payload from the documentation.
{
"time": 1486683865,
"source": "metrics",
"sourcetype": "perflog",
"host": "host_1.splunk.com",
"fields": {
"region": "us-west-1",
"datacenter": "dc2",
"rack": "63",
"os": "Ubuntu16.10",
"arch": "x64",
"team": "LON",
"service": "6",
"service_version": "0",
"service_environment": "test",
"path": "/dev/sda1",
"fstype": "ext3",
"metric_name:cpu.usr": 11.12,
"metric_name:cpu.sys": 12.23,
"metric_name:cpu.idle": 13.34
}
}
The time field is in *nix epoch form and says when the metric was collected.
The source field identifies this as a metric. The value is free-text.
The sourcetype field tells Splunk how to parse the payload. Your system may have a different source type configured for metrics.
The host field identifies the server that generated the metrics. This is free-text.
The fields section is where the metrics data goes. The measurements themselves are noted by the "metric_name:" prefix. The name of the metric is free-text. Splunk will treats dots within the metric name as a hierarchy separator.
Everything does not not begin with "metric_name:" is a dimension rather than a metric. Dimensions describe metrics and are optional.

Selecting the latest document for each "Group"

I am using Azure Cosmos DB SQL API to try to achieve the following;
We have device data stored within a collection and would love to retrieve the latest event data per device serial effectively without having to do N queries for each device separately.
SELECT *
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1') ORDER BY c.EventEnqueuedUtcTime DESC
Im assuming I would need to use Group By - https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-group-by
Any assistance would be greatly appreciated
Rough example of data :
[
{
"temperature": 25.22063251827873,
"humidity": 71.54208429695204,
"serial": "V55555555",
"testid": 1,
"location": {
"type": "Point",
"coordinates": [
30.843687,
-29.789895
]
},
"EventProcessedUtcTime": "2020-09-07T12:04:34.5861918Z",
"PartitionId": 0,
"EventEnqueuedUtcTime": "2020-09-07T12:04:34.4700000Z",
"IoTHub": {
"MessageId": null,
"CorrelationId": null,
"ConnectionDeviceId": "V55555555",
"ConnectionDeviceGenerationId": "637323979596346475",
"EnqueuedTime": "2020-09-07T12:04:34.0000000"
},
"Name": "admin",
"id": "6dac491e-1f28-450d-bf97-3a15a0efaad8",
"_rid": "i2UhAI7ofAo3AQAAAAAAAA==",
"_self": "dbs/i2UhAA==/colls/i2UhAI7ofAo=/docs/i2UhAI7ofAo3AQAAAAAAAA==/",
"_etag": "\"430131c1-0000-0100-0000-5f5621d80000\"",
"_attachments": "attachments/",
"_ts": 1599480280
}
]
UPDATE:
So doing the following returns the correct data but sadly you can only return data thats inside your group by or an aggregate function (i.e. cant do select *)
SELECT c.serial, MAX(c.EventProcessedUtcTime)
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1')
GROUP BY c.serial
[
{
"serial": "synap-aim-g1",
"$1": "2020-09-09T06:29:42.6812629Z"
},
{
"serial": "V55555555",
"$1": "2020-09-07T12:04:34.5861918Z"
}
]
Thanks for #AnuragSharma-MSFT's help:
I am afraid there is no direct way to achieve it using a query in
cosmos db. However you can refer to below link for the same topic. If
you are using any sdk, this would help in achieving the desired
functionality: https://learn.microsoft.com/en-us/answers/questions/38454/index.html
We're glad that you resolved it in this way, thanks for sharing the update:
So doing the following returns the correct data but sadly you can only return data thats inside your group by or an aggregate function (i.e. cant do select *)
SELECT c.serial, MAX(c.EventProcessedUtcTime)
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1')
GROUP BY c.serial
[
{
"serial": "synap-aim-g1",
"$1": "2020-09-09T06:29:42.6812629Z"
},
{
"serial": "V55555555",
"$1": "2020-09-07T12:04:34.5861918Z"
}
]
If the question is really about an efficient approach to this particular query scenario, we can consider denormalization in cases where the query language itself doesn't offer an efficient solution. This guide on partitioning and modeling has a relevant section on getting the latest items in a feed.
We just need to get the 100 most recent posts, without the need to
paginate through the entire data set.
So to optimize this last request, we introduce a third container to
our design, entirely dedicated to serving this request. We denormalize
our posts to that new feed container.
Following this approach, you could create a "Feed" or "LatestEvent" container dedicated to the "latest" query which uses the device serial as id and having a single partition key in order to guarantee that there is only one (the most recent) event item per device, and that it can be fetched by the device serial or listed with least possible cost using a simple query:
SELECT *
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1')
The change feed could be used to upsert the latest event, such that the latest event is created/overwritten in the "LatestEvent" container as its source item is created in the main.

Cumulocity Inventory API filter by Creation Date

I'm currently trying to implement a simple date filter for the Inventory API using the query language. The filter should return a list of managed objects which were created after a given date. For some reasons I always receive an empty list as result but the example in the query language documentation looks the same as my query:
GET {{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'
gives me
{
"managedObjects": [],
"next": "{{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'&pageSize=5&currentPage=2",
"statistics": {
"currentPage": 1,
"pageSize": 5
},
"self": "{{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'&pageSize=5&currentPage=1"
}
And if I try this structure for the timestamp I even receive an error:
GET {{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.3512B1:00'
{
"error": "inventory/Invalid Data",
"info": "https://www.cumulocity.com/guides/reference-guide/#error_reporting",
"message": "Find by filter query failed : Query 'creationTime gt '2018-12-01T09:00:00'' could not be understood. Please try again."
}
Try to filter by
creationTime.date
Background is that the timestamps are stored as MongoDb dates.
You can also check the device list filter in device management which has a filter on creationTime as well.

Azure Data Factory Source Dataset value from Parameter

I have a Dataset in Azure Datafactory backed by a CSV file. I added an additional column in Dataset and want to pass it's value from Dataset parameter but value never gets copied to the column
"type": "AzureBlob",
"structure":
[
{
"name": "MyField",
"type": "String"
}
]
I have a defined parameter as well
"parameters": {
"MyParameter": {
"type": "String",
"defaultValue": "ABC"
}
}
How can copy the parameter value to Column? I tried following but doesn't work
"type": "AzureBlob",
"structure":
[
{
"name": "MyField",
"type": "String",
"value": "#dataset().MyParameter"
}
]
But this does not work. I am getting NULL in destination although parameter value is set
Based on document: Expressions and functions in Azure Data Factory , #dataset().XXX is not supported in Azure Data Factory so far. So, you can't use parameters value as custom column into sink or source with native copy activity directly.
However, you could adopt below workarounds:
1.You could create a custom activity and write code to do whatever you need.
2.You could stage the csv file in a azure data lake, then execute a U-SQL script to read the data from the file and append the new column with the pipeline rundId. Then output it to a new area in the data lake so that the data could be picked up by the rest of your pipeline. To do this, you just need to simply pass a Parameter to U-SQL from ADF. Please refer to the U-SQL Activity.
In this thread: use adf pipeline parameters as source to sink columns while mapping, the customer used the second way.

RESTful API to safeguard server and client from large datasets

I am working on designing a RESTful API and need second opinion on the design. I will be abstracting away the problem statement for better understanding.
Consider a URI /search?key1=value1&key2=value2, which can potentially return a huge result set for a given search criteria for key1 and key2.
My mandate is to make sure that the server and client are bounded by limits to prevent performance degradation. If that limit is reached and the intended data is not found in result set, user will be asked to refine the search query to narrow down. (I am not thinking of pagination, that is for a different problem set)
Approach is to allow client specify a limit to server that it(client) can comfortably handle, and to help server set a limit for itself to prevent from generating huge result sets affecting performance.
Client can do /search?key1=value1&key2=value2&maxresults=xxxx to specify it's limit.
Server can set it's own limit as a configuration param for search URI. While serving a request, server will take a min of (client's limit, server's limit) and generate result set satisfying the effective limit.
The JSON generated will have a meta data part which will mention if the result was truncated or not, and the effective limit set. The client can inspect this part and ask the user to refine search if "truncated" is "true". The problem domain actually allows the user to refine to a single item.
{
"result": {
"truncated": "true",
"limit": "2000",
"data": [
{
"id": "1"
},
{
"id": "2"
}
...
{
"id": "2000"
}
]
}
}
The questions I am trying to answer are:
Is this violating any REST principles?
Is there a standard convention to do the same that I might follow?
Are there good examples on public APIs that you can quote? (Jira RESTful API has a couple of examples)
Is there any gotcha in this design which may affect us in the future?
Any view on this will be appreciated ...
Thanks!
From my point of view this fits REST principles quite well. I would suggest not to add result size meta data values to the response payload but as HTTP headers. So instead of
{
"result": {
"truncated": "true",
"limit": "2000",
"data": [
{
"id": "1"
},
{
"id": "2"
}
...
{
"id": "2000"
}
]
}
}
The service would send
{
"data": [
{
"id": "1"
},
{
"id": "2"
}
...
{
"id": "2000"
}
]
}
and add additional custom HTTP headers
x-result-truncated:1
x-result-limit:1000
This approach has the benefit that meta data values that are not a part of the payload from a client's perspective are sent in the meta data section of the your response where for example content-type are transmitted.
An additional benefit is that packing the meta data into HTTP headers is reusable for other services as well and you do not have to change the schema of the returned payload, that means clients keep working as expected (except that some results may be truncated).