Azure Stream Analytics Query - get last request for all device - azure-stream-analytics

I have IoT Hub, it collects messages from many devices. Data from IoT Hub are sent to stream analytics, and now I would like to stream analytics, display a list of all devices along with the last request. That is, a table in which there are eg 10 devices and with each device its last request.
My actual code:
SELECT
deviceId,
param1 as humidity,
param2 as temperature,
datetime as data
FROM hubMessage
GROUP By deviceId, data,temperature,humidity,
TumblingWindow(minute,5)
Ont this query i have error on deviceId:
GROUP BY with no aggregate expressions is not supported.
I dont'h have any idea, how resolve my problem with not supported expression and change last request for all device ;/

GROUP BY with no aggregate expressions is not supported.
According to this error message,you need to use group by with aggregate expression.All aggregate expressions supported by ASA are listed here.
If you want to get the latest request,i think the TopOne is suitable for you.
If you want to select all the data match the filter,maybe you could use COUNT with group by.

Related

Hybrid Query Example in AgensGraph

I am using agensgraph but I dont know how to write a hybrid query, any examples of hybrid query in agensgraph would help a lot.
In AgensGraph you can write hybrid queries in two ways:
Let's say you are creating the followings:
CREATE GRAPH AG;
CREATE VLABEL dev;
CREATE (:dev {name: 'someone', year: 2015});
CREATE (:dev {name: 'somebody', year: 2016});
CREATE TABLE history (year, event)
AS VALUES (1996, 'PostgreSQL'), (2016, 'AgensGraph');
1- Cypher in SQL
Syntax:
SELECT [column_name]
FROM ({table_name|SQL-query|CYPHERquery})
WHERE [column_name operator value];
Example:
SELECT n->>'name' as name
FROM history, (MATCH (n:dev) RETURN n) as dev
WHERE history.year > (n->>'year')::int;
Result:
name ----
someone
(1 row)
2- SQL in Cypher
Syntax:
MATCH [table_name]
WHERE (column_name operator {value|SQLquery|CYPHERquery})
RETURN [column_name];
Example:
MATCH (n:dev)
WHERE n.year < (SELECT year FROM history WHERE event =
'AgensGraph')
RETURN properties(n) AS n;
Result:
n ----
{"name": "someone", "year": 2015}
(1 row)
You can find more information here
I found more info on the hybrid query language in these slides. Every other bit of information I have been able to find is just the same example that Eya posted, in different places.
I agree that more information about the hybrid queries in AgensGraph would be great, as it seems like a killer feature of software.
Let’s assume that we have a network management system and we are keeping our network topology in graph part of the AgensGraph (Graph Format) and our time-series data (such as date&time information regarding specific devices) in the relational part of the AgensGraph (Table Format). So, in this case, we know that we have a graph, tables and if we want, we can write a hybrid query to fetch data from both models.
In our graph, we have different devices that are connected to each other such as a modem, IoT sensors, etc. for each of these devices, we also have some information respectively stored in tables - related to those devices such as download speed, the upload speed or CPU usage.
In the following hybrid queries, our goal is to collect the information regarding specific devices by querying both from the graph and the tables simultaneously.
Cypher in SQL
In this hybrid query, we are looking to find modem devices which are having issues and their abnormality type is 2 (2 indicates that this device is having some issues regarding its download and upload speed) and after we find those devices, our goal is to return their id, download, and upload speed to investigate the issue. As you can see in the following query our inner query is Cypher and our outer query is SQL.
SELECT id,sysdnbps, sysupbps
from public.modemrdb where to_jsonb(id) in
(SELECT id FROM (MATCH(m:modem) where
m.abnormaltype=2
return m.name)
AS s(id));
SQL in Cypher
In this hybrid query, we are looking to find modem devices which their CPU usages are more than 80 (not in range of threshold) which indicate there is an issue with these devices and after we find those devices, our goal is to return that modems and any IoT devices that are connected to them. As you can see in the following example our inner query is SQL and our outer query is Cypher.
MATCH p=(n:modem)-[r*1..2]->(iot)
WHERE n.name in
(SELECT to_jsonb(id)
FROM public.modemrdb
WHERE syscpuusage >= 80)
RETURN p;
This can be another example of a hybrid query.

Sigfox or Lora devices with Azure-Digital-Twins

I have a couple of questions for the setup of digital twin with Lora and Sigfox devices which data are encoded:
how do we get the iothubowner string to create the callback to Lora or Sigfox backend ?
how do we deal with mandatory properties especially with HardwareId?
what is the best practice to decode message and then compute the message? Knowing that we have to cascade the processing : decoding then normalization then telemetry analytics (monitor room condition for example)
Here are the answers:
1. IoT Hub connection string (iothubowner) will be exposed in the API in couple of months
2. For device the unique identifier from client side is HardwareId. We recommend adding the MacAddress of the device. For SensorId.HardwareId, you have multiple options that we recommend: either Device.HardwareId + SensorName or just SensorName if unique per device or just a GUID. SensorId.HardwareId is important to be set because this value must match the telemetry message header property DigitalTwins-SensorHardwareId in order for the UDF to kick off. See https://learn.microsoft.com/en-us/azure/digital-twins/concepts-device-ingress#device-to-cloud-message
3. You'd have to create a matcher that associates the right UDF with the code to decode the byte array to a certain type of sensors. For example, if you have sensors of Type: LoRa, and then various DataTypes: you'd create a matcher against the Type to match "LoRa" and then various datatypes. For now, you would have to handle all of that in one UDF. In the future, we might support chaining and you could have a UDF for each step separately, but until then, all in one.

How to get the lag size of a consumer group in redis stream?

Suppose I have stream mystream, and a group mygroup, how do I get the length of unconsumed messages?
No, there is no way to do that afaik.
It is possible to get the last message ID delivered in a group and in a stream with the XINFO GROUPS and XINFO STREAM commands, respectively.
However, there is no command that returns the length of a stream subrange. Such command, was it to exist, would probably require linear time complexity and in that case, it will probably not be implemented.
Use XINFO GROUPS
The command XINFO GROUPS mygroup will provide a field in the response for lag.
According to the documentation:
lag: the number of entries in the stream that are still waiting to be delivered to the group's consumers, or a NULL when that number can't be determined.
If you are wondering why lag can be null:
There are two special cases in which this mechanism is unable to report the lag:
A consumer group is created or set with an arbitrary last delivered ID (the XGROUP CREATE and XGROUP SETID commands, respectively). An arbitrary ID is any ID that isn't the ID of the stream's first entry, its last entry or the zero ("0-0") ID.
One or more entries between the group's last-delivered-id and the stream's last-generated-id were deleted (with XDEL or a trimming operation).
In both cases, the group's read counter is considered invalid, and the returned value is set to NULL to signal that the lag isn't currently available.
More details can be found at https://redis.io/commands/xinfo-groups/

Azure Stream Analytics - long living calculation

I'm using Azure stream analytics for real-time analytics and I have a basic problem. I have a field which I would like to count the number of messages.
The json is in the following format:
{ categoryId: 100, name: 'hello' }
I would like to see the number of count by category, so I assume that the query in Azure stream analytics should be:
SELECT
categoryId,
count(*) as categoryCount
INTO
categoriesCount
FROM
categoriesInput
GROUP BY
categoryId
The problem is that I have to add TumblingWindow or SlidingWindows to the group by clause. Is there a way to avoid that and have the calculation running indefinitely ? Also I need to make sure the output is written to the SQL server.
How about a sliding window with the length of "1". This way it would act like a pointer and everytime it changes you can do the calculation?
Hope this helps!
Mert

Trouble Looking For Events WITHIN a Session In BigQuery or WITHIN Multiple Sessions

I wanted to get a second pair of eyes & some help confirming the best way to look within a session at the hit level in BigQuery. I have read the BigQuery developer documentation thoroughly that provides insight on working WITHIN as session. My challenge is this. Let us assume I write the high level query to count the number of sessions that exist and group the sessions by the device.device category as below:
SELECT device.deviceCategory,
COUNT(DISTINCT CONCAT (fullVisitorId, STRING (visitId)), 10000000) AS SESSIONS
FROM (TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2015-01-01'), TIMESTAMP('2015-06-30')))
GROUP EACH BY device.deviceCategory
ORDER BY sessions DESC
I then run a follow up query like the following to find the number of distinct users (Client ID's):
SELECT device.deviceCategory,
COUNT(DISTINCT fullVisitorID) AS USERS
FROM (TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2015-01-01'), TIMESTAMP('2015-06-30')))
GROUP EACH BY device.deviceCategory
ORDER BY users DESC
(Note that I broke those up because of the sheer size of the data I am working with which produces runs greater than 5TB in some cases).
My challenge is the following. I feel like I have the wrong approach and have not had success with the WITHIN function. For every user ID (or full visitor ID), I want to look within all their various sessions to find out how many sessions from the many they had were desktop and how many were mobile. Basically, these are the cross device users. I want to collect a table with these users. I started here:
SELECT COUNT(DISTINCT CONCAT (fullVisitorId, STRING (visitId)), 10000000) AS SESSIONS
FROM (TABLE_DATE_RANGE([XXXXXX.ga_sessions_], TIMESTAMP('2015-01-01'), TIMESTAMP('2015-06-30')))
WHERE device.deviceCategory = 'desktop' AND device.deviceCategory = 'mobile'
This is not correct though. Moreover, any version I write of a within query is giving me non-sense results or results that have 0 as the number. Does anyone have any strategies or tips to recommend a way forward here? What is the best way to use the WITHIN function to look for sessions that may have multiple events happening WITHIN the session (with my goal being collecting the user ID's that meet certain requirements within a session or over various sessions). Two days ago I did this in a very manual way by manually working through the steps and saving intermediate data frames to generate counts. That said, I wanted to see if there was any guidance to quickly do this using a single query?
I'm not sure if this question is still open on your end, but I believe I see your problem, and it is not with the misuse of the WITHIN function. It is a data understanding problem.
When dealing with GA and cross-device identification, you cannot reliably use any combination of fullVisitorId and visitId to identify users, as these are derived from the cookie that GA places on the users browser. Thus, leveraging the fullVisitorId would identify a specific browser on a specific device more accurately that a specific user.
In order to truly track users across devices, you must be able to leverage the userId functionality follow this link. This requires you to have the user sign in in some way, thus giving them an identifier that you can use across all of their devices and tie their behavior together.
After you implement some type of user identification that you can control, rather than GA's cookie assignment, you can use that to look for details across sessions and within those individual sessions.
Hope that helps!