Left join did not working properly in Azure Stream Analytics - azure-stream-analytics

I'm trying to create a simple left join between two inputs (event hubs), the source of inputs is an app function that process a rabbitmq queue and send to a event hub.
In my eventhub1 I have this data:
[{
"user": "user_aa_1"
}, {
"user": "user_aa_2"
}, {
"user": "user_aa_3"
}, {
"user": "user_cc_1"
}]
In my eventhub2 I have this data:
[{
"user": "user_bb_1"
}, {
"user": "user_bb_2"
}, {
"user": "user_bb_3
}, {
"user": "user_cc_1"
}]
I use that sql to create my left join
select hub1.[user] h1,hub2.[user] h2
into thirdTestDataset
from hub1
left join hub2
on hub2.[user] = hub1.[user]
and datediff(hour,hub1,hub2) between 0 and 5
and test result looks ok...
the problem is when I try it on job running... I got this result in power bi dataset...
Any idea why my left isn't working like any sql query?

I tested your query sql and it works well for me too.So when you can't get expected output after executing ASA job,i suggest you following troubleshoot solutions in this document.
Based on your output,it seems that the HUB2 becomes the left table.You could use diagnostic log in ASA to locate the truly output of job execution.

I tested the end-to-end using blob storage for input 1 and 2 and your sample and a PowerBI dataset as output and observed the expected result.
I think there are few things that can go wrong with your query:
First, your join has a 5-hours windows: basically that means it looks at EH1 and EH2 for matches during that large window, so live results will be different from sample input for which you have only 1 row. Can you validate that you had no match during this 5-hour window?
Additionally by default PBI streaming datasets are "hybrid datasets" so it will accumulate results without a good way to know when the result was emitted since there is no timestamp in your output schema. So you can also view previous data here. I'd suggest few things here:
In Power BI, change the option of your dataset: disable "Historic data analysis" to remove caching of data
Add a timestamp column to make sure to identify when the data is generated (the first line of you query will become: select System.timestamp() as time, hub1.[user] h1,hub2.[user] h2 )
Let me know if it works for you.
Thanks,
JS (Azure Stream Analytics)

Related

Writing the correct SQL statement in AWS IoT rule

I am working on AWS IoT to develop my custom solution based on a set of sensors, and I have a problem regarding how to write the SQL statement related to the kind of data I receive from a Zigbee sensor.
An example of what I receive from my sensor is reported here:
{
"type": "reportAttribute",
"from": "WIFI",
"deviceCode": "aws_device_code",
"to": "CLOUD",
"mac": "30:ae:7b:e2:e1:e6",
"time": 1668506014,
"data": {...}
}
What I would like to do is to select messages that have the from field equal to GREENPOWER, something along the lines of SELECT * FROM 'test' WHERE from = 'GREENPOWER', but from is also a keyword in SQL hence my problem. I am no expert whatsoever in SQL, so I am not sure how this can be done. I am also looking for a way to modify the received data, but solving this problem on AWS would be much easier.
Thank you very much for your help!
There are quite a lot of SQL functions that exist in AWS IoT Rule. You can find them here: https://docs.aws.amazon.com/iot/latest/developerguide/iot-sql-functions.html
In your case, something like this should work:
SELECT * FROM 'test' WHERE get(*, "from") = "GREENPOWER"

Querying an IoT Hub Object with number key

I am trying to retrieve information from a twin device of a third party IoT Hub instance.
The data I'm trying to access has the next format:
{
"properties": {
"reported": {
"softwareLoad": {
"0": {
"systemSWVer": "1.2.0",
"picSWVer": "0.0.42",
"bootStatus": "inactive",
"partitionId": 0
},
"1": {
"systemSWVer": "1.2.0",
"picSWVer": "0.0.42",
"bootStatus": "active",
"partitionId": 1
},
"beamTableCRC": "0x5454"
}
}
}
}
I was trying to reach the variable systemSWVer to use it as a where clause, but each time I try to access I had an error retrieving the information in 0 and IoT Hub returns an error of a Bad Request.
I tried with this query
SELECT properties.reported.softwareLoad FROM devices WHERE properties.reported.softwareLoad.0.systemSWVer in ["1.2.0", "2.1.0"]
Is there a way to use it as a where clause in my query?
Note: I don't have access to the resource to change the format of the information.
I tried to recreate this case and found the same result. Always an error when you add a number in the expression. I also found a workaround; instead of using 0 or 1, you can add them as an escaped Unicode character. It's important to add them as an ASCII Unicode character, though. 0 becomes \u0030 and 1 becomes \u0031.
Try this query:
SELECT properties.reported.softwareLoad FROM devices
WHERE properties.reported.softwareLoad.\u0030.systemSWVer in ['1.2.0', '2.1.0']
Please note: in any case, you need to use single quotes.
Edit: in my excitement, I forgot to test if just escaping the integer would also work. It does.
SELECT properties.reported.softwareLoad FROM devices
WHERE properties.reported.softwareLoad.\0.systemSWVer in ['1.2.0', '2.1.0']

Stream Analytics doesn't produce output to SQL table when reference data is used

I am working with ASA lately and I am trying to insert ASA stream directly to the SQL table using reference data. I based my development on this MS article: https://msdn.microsoft.com/en-us/azure/stream-analytics/reference/reference-data-join-azure-stream-analytics.
Overview of data flow - telemetry:
I've many devices of different types (Heat Pumps, Batteries, Water Pumps, AirCon...). Each of these devices has different JSON schema for their telemetry data. I can distinguish JSONs by an attribute in message (e.g.: "DeviceType":"HeatPump" or "DeviceType":"AirCon"...)
All of these devices are sending their telemetry to a single Event Hub
Behind Event Hub, there is a single Stream Analytics component where I redirect streams to different outputs based on attribute Device Type. For example I redirect telemetry from HeatPumps with query SELECT * INTO output-sql-table FROM input-event-hub WHERE DeviceType = 'HeatPump'
I would like to use some reference data to "enrich" ASA stream with some IDKeys, before I inserted stream into SQL table.
What I've already done:
Successfully inserted ASA stream directly to SQL table using ASA query SELECT * INTO [sql-table] FROM Input WHERE DeviceType ='HeatPump', where [sql-table] has the same schema than JSON message + standard columns (EventProcessedUtcTime, PartitionID, EventEnqueueUtcTime)
Successfully inserted ASA stream directly to SQL table using ASA query SELECT Column1, Column2, Column3... INTO [sql-table] FROM Input WHERE DeviceType = 'HeatPump' - basically the same query as above, only this time I used named columns in select statement.
Generated JSON file of reference data and put it to the BLOB storage
Created new static (not using {date} and {time} placeholders) reference data input in ASA pointing to the file in BLOB storage.
Then I joined reference data to the data stream in ASA query using the same statement with named columns
Results no output rows in SQL table
When debugging the problem I've used Test functionality in Query ASA
I sample data from Event Hub - stream data.
I upload sample data from file - reference data.
After sampling data from Event Hub have finished, I tested a query -> output produced some rows -> it's not a problem in a query
Yet... if I run ASA, no output rows are inserted into SQL table.
Some other ideas I tried:
Used TRY_CAST function to cast fields from reference data to appropriate data types before I joined them with fields in stream data
Used TRY_CAST function to cast fields in SELECT before I inserted them into SQL table
I really don't know what to do now. Any suggestions?
EDIT: added data stream JSON, reference data JSON, ASA query, ASA input configuration, BLOB storage configuration and ASA test output result
Data Stream JSON - single message
[
{
"Activation": 0,
"AvailablePowerNegative": 6.0,
"AvailablePowerPositive": 1.91,
"DeviceID": 99999,
"DeviceIsAvailable": true,
"DeviceOn": true,
"Entity": "HeatPumpTelemetry",
"HeatPumpMode": 3,
"Power": 1.91,
"PowerCompressor": 1.91,
"PowerElHeater": 0.0,
"Source": "<omitted>",
"StatusToPowerOff": 1,
"StatusToPowerOn": 9,
"Timestamp": "2018-08-29T13:34:26.0Z",
"TimestampDevice": "2018-08-29T13:34:09.0Z"
}
]
Reference data JSON - single message
[
{
"SourceID": 1,
"Source": "<ommited>",
"DeviceID": 10,
"DeviceSourceCode": 99999,
"DeviceName": "NULL",
"DeviceType": "Heat Pump",
"DeviceTypeID": 1
}
]
ASA Query
WITH HeatPumpTelemetry AS
(
SELECT
*
FROM
[input-eh]
WHERE
source='<omitted>'
AND entity = 'HeatPumpTelemetry'
)
SELECT
e.Activation,
e.AvailablePowerNegative,
e.AvailablePowerPositive,
e.DeviceID,
e.DeviceIsAvailable,
e.DeviceOn,
e.Entity,
e.HeatPumpMode,
e.Power,
e.PowerCompressor,
e.PowerElHeater,
e.Source,
e.StatusToPowerOff,
e.StatusToPowerOn,
e.Timestamp,
e.TimestampDevice,
e.EventProcessedUtcTime,
e.PartitionId,
e.EventEnqueuedUtcTime
INTO
[out-SQL-HeatPumpTelemetry]
FROM
HeatPumpTelemetry e
LEFT JOIN [input-json-devices] d ON
TRY_CAST(d.DeviceSourceCode as BIGINT) = TRY_CAST(e.DeviceID AS BIGINT)
ASA Reference Data Input configuration
Reference Data input configuration in Stream Analytics
BLOB storage directory tree
Blob storage directory tree
ASA test query output
ASA test query output
matejp. I didn't reproduce your issue and you could refer to my steps.
reference data in blob storage:
{
"a":"aaa",
"reference":"www.bing.com"
}
stream data in blob storage
[
{
"id":"1",
"name":"DeIdentified 1",
"DeviceType":"aaa"
},
{
"id":"2",
"name":"DeIdentified 2",
"DeviceType":"No"
}
]
query statement:
SELECT
inputSteam.*,inputRefer.*
into sqloutput
FROM
inputSteam
Join inputRefer on inputSteam.DeviceType = inputRefer.a
Output:
Hope it helps you.Any concern, let me know.
I think I found the error. In past days I tested nearly every combination possible when configuring inputs in Azure Stream Analytics.
I've started with this example as baseline: https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-build-an-iot-solution-using-stream-analytics
I've tried the solution without any changes to be sure that the example with reference data input works -> it worked
Then I've changed ASA output from CosmosDB to SQL table without changing anything -> it worked
Then I've changed my initial ASA job to be the as much the "same" as the ASA job in the example (writing into SQL table) -> it worked
Then I've started playing with BLOB directory names -> here I've found the error.
I think the problem I encountered is due to using a character "-" in folder name.
In my case I've created folder named "reference-data" and upload file named "devices.json" (folder structure "/reference-data/devices.json") -> ASA output to SQL table didn't work
As soon as I've changed the folder name to "refdata" (folder structure "/referencedata/devices.json") -> ASA output to SQL table worked.
Tried 3 times changing reference data input from folder name containing "-" and not containing it => every time ASA output to SQL server stop working when "-" was in folder name.
To recap:
I recommend not to use "-" in BLOB folder names for static reference data input in ASA Jobs.

How to extract this json into a table?

I've a sql column filled with json document, one for row:
[{
"ID":"TOT",
"type":"ABS",
"value":"32.0"
},
{
"ID":"T1",
"type":"ABS",
"value":"9.0"
},
{
"ID":"T2",
"type":"ABS",
"value":"8.0"
},
{
"ID":"T3",
"type":"ABS",
"value":"15.0"
}]
How is it possible to trasform it into tabular form? I tried with redshift json_extract_path_text and JSON_EXTRACT_ARRAY_ELEMENT_TEXT function, also I tried with json_each and json_each_text (on postgres) but didn't get what expected... any suggestions?
desired results should appear like this:
T1 T2 T3 TOT
9.0 8.0 15.0 32.0
I assume you printed 4 rows. In postgresql
SELECT this_column->'ID'
FROM that_table;
will return column with JSON strings. Use ->> if you want text column. More info here: https://www.postgresql.org/docs/current/static/functions-json.html
In case you were using some old Postgresql (before 9.3), this gets harder : )
Your best option is to use COPY from JSON Format. This will load the JSON directly into a normal table format. You then query it as normal data.
However, I suspect that you will need to slightly modify the format of the file by removing the outer [...] square brackets and also the commas between records, eg:
{
"ID": "TOT",
"type": "ABS",
"value": "32.0"
}
{
"ID": "T1",
"type": "ABS",
"value": "9.0"
}
If, however, your data is already loaded and you cannot re-load the data, you could either extract the data into a new table, or add additional columns to the existing table and use an UPDATE command to extract each field into a new column.
Or, very worst case, you can use one of the JSON Functions to access the information in a JSON field, but this is very inefficient for large requests (eg in a WHERE clause).

Azure HBASE REST - simple get failing for colfam:col

This should be a very simple one (been searching for a solution all day - read a thousand and a half posts).
I put a test row in my HBASE table in hbase shell:
put 'iEngine','testrow','SVA:SourceName','Journal of Fun'
I can get the value for a column family using the REST API in DHC Chrome:
https://ienginemaster.azurehdinsight.net/hbaserest/iEngine/testrow/SVA
I can't seem to get it for the specific cell: https://ienginemaster.azurehdinsight.net/hbaserest/iEngine/testrow/SVA:SourceName
{
"Row": [{
"key": "dGVzdHJvdw==",
"Cell": [{
"column": "U1ZBOlNvdXJjZU5hbWU=",
"timestamp": 1440602453975,
"$": "Sm91cm5hbCBvZiBGdW4="
}]
}]
}
I get back a 400 error.
When successfully asking for just the family, I get back:
I tried replacing the encoded value for SVA:SourceName, and a thousand other things. I'm assuming I'm missing something simple.
Also, the following works:
hbase(main):012:0> get 'iEngine', 'testrow', 'SVA:SourceName'
COLUMN CELL
SVA:SourceName timestamp=1440602453975, value=Journal of Fun
1 row(s) in 0.0120 seconds
hbase(main):013:0>
I opened a case with Microsoft support. I received confirmation that it is a bug (IIS and the colon separator not working). They are working on a fix - they are slightly delayed as the decide on the "best" way to fix it.