How to detect if no data ingested in any IoTHub Device using Stream analytics? - azure-iot-hub

Basically I have multiple devices in IoTHub with 30 partition. Now, I want to detect if there is no data being ingested to any devices after 10 mins using Stream Analytics. Once detected, I want to select what device is it and send that information to Azure function for Alert.
The query is a bit tricky given that I'm new with stream analytics. Here is what I came up so far but it seems the out is not what I expected.
SELECT
t1.[IoTHub].[ConnectionDeviceId] as DeviceId
INTO
[NoDataAlertFunctionOutput]
FROM
[iot-hub-data-flow] t1 TIMESTAMP BY Timestamp
LEFT OUTER JOIN [iot-hub-data-flow] t2 TIMESTAMP BY Timestamp
ON
t1.[IoTHub].[ConnectionDeviceId]=t2.[IoTHub].[ConnectionDeviceId]
AND
DATEDIFF(minute,t1,t2) BETWEEN 1 and 10
WHERE t2.[IoTHub].[ConnectionDeviceId] IS NULL
I will greatly appreciate any suggestion or comment.
Here are the references I trying to follow:
Stream Analytics Common Query
Process real-time IoT Data
On the other hand, is there a built-in functionality in IoTHub to detect if no data ingested at certain time period?

From the ASA perspective, I would try this pattern instead. I'm not 100% sure that it checks all your requirements, but I think it's an interesting direction to explore.
Using a hopping window, every minute we scan the last 20
First we go find the last event in that window
Then we check if it's been more than 10 minutes if it was sent
WITH CTE AS (
SELECT
System.Timestamp() AS Window_end,
DATEADD(minute,-10,System.Timestamp()) AS Window_10,
TopOne() OVER (ORDER BY Timestamp DESC) AS Last_event
FROM
[iot-hub-data-flow] t TIMESTAMP BY Timestamp
GROUP BY
HOPPINGWINDOW(minute, 20, 1)
-- Every 1 minute, check the last 20 minutes
)
SELECT
Last_event.DevideId
FROM CTE
WHERE Last_event.Timestamp < Window_10
Note that after 20 minutes, the alert stop being emitted.

Related

Firebird time-sequence data with gaps

I'm using Firebird 2.5 through Delphi 10.2/FireDAC. I have a table of historical data from industrial devices, one row per device at one-minute intervals. I need to be able to retrieve an arbitrary time window of records on an arbitrary interval span (every minute, every 3 minutes, every 15 minutes, etc.) which then gets used to generate a trending grid display or a report. This example query, pulling every 3 minutes records for a two-day window for 2 devices, does most of what I need:
select *
from avc_history
where (avcid in (11,12))
and (tstamp >= '6/26/2022') and (tstamp < '6/27/2022')
and (mod(extract(minute from tstamp), 3) = 0)
order by tstamp, avcid;
The problem with this is that, if device #11 has no rows for a part of the time span, then there is a mismatch in the number of rows returned for each device. I need to return a row for each device on each minutes period regardless of whether a row on that timestamp actually exists in the table - the other fields of the row can be null as long as the tstamp field has the timestamp and the avcid field has the device's ID.
I can fix this up in code, but that's a memory and time hog, so I want to avoid that if possible. Is there a way to structure a query (or if necessary, a stored procedure) to do what I need? Would some capability in the FireDAC library that I'm not aware of make this easier?

Left Join of two tables based on a range of dates

I am a newbie to SQL coding and am trying to figure out how to create a LEFT JOIN statement based on a date range. The database is analytics from a smartphone app that sends messages to users. The two tables are messageLog (which describes the messages sent to each user) and messageOpenLog (which describes the messages that are opened). Both tables are linked to the message table, but not to each other. To complicate matters, there are a couple other rules we have developed on when messages are able to be sent:
If a message is not opened within 7 days, the message can be resent on day 8.
If a message is opened, then the message can be resent within 60 days.
So, what I want to do is join the two tables together based on the following pseudocode (as I have no idea where to start with actual code):
LEFT JOIN
If (messageOpenLog.DateOpened is within 7 days of messageLog.DateSent)
and messageLog.message_id = messageOpenLog.message_id and
messageLog.user_id = messageOpenLog.user_id
Note: the date format is yyyy-mm-dd hh:mm:ss in both tables.
Any help you can provide would be greatly appreciated.
I am unable to comment on shn's answer, but there is a chance that the user has never opened the message and a messageOpenLog record has not been created. In that case you could add a messageOpenLog.message_id is null to the where clause and get those unopened messages with no corresponding messageOpenLog record as well.
I would suggest:
messagelog ml LEFT JOIN
messageOpenLog mol
ON mol.message_id = ml.message_id AND
mol.user_id = ml.user_id AND
mol.DateOpened >= ml.DateSent AND -- probably not needed
mol.DateOpened < ml.DateSent + interval '7 day'
Note that date arithmetic varies a lot among databases. The exact syntax for adding seven days may be a bit different in your database.
From what I understand from your question and the query below, you are finding all the individual messages that have a time difference of longer than 7 days between their send date and open date.
To do the time difference I would recommend using the DATEDIFF() function that is built into SQL (you may need to format the timestamps into date format with DATE()).
Since the two tables are not directly related you could do something like this:
SELECT
messageOpenLog.*,
messageLog.*
FROM
messageLog LEFT JOIN messageOpenLog ON messageLog.message_id=messageOpenLog.message_id
WHERE
messageLog.user_id = messageOpenLog.user_id AND
DATEDIFF(
day,
DATE(messageLog.timestamp),
DATE(messageOpenLog.timestamp)
) > 7
The structure of this query is dependent on the construction of your tables.
Notice I used the .timestamp column but in your tables this may be named different.
Also I'm not sure if this is actually what you want; if you want to see if the message is more than 7 days old, a different query is required.
Assuming that there is only one messageSent row, this query will get all of the messageOpen rows for the same message that are more than 7 days old.
It's very difficult to give you an exact query based on the information that was presented, such as the potential number of rows with the same message_id, as #amac mentions, there could also be cases where one of the tables has no rows with a certain message_id.

Oracle Sql to check online performance based on log in and log out

I have a table that logs record of a device's log in and log out. Log in would mean that the device is working at that time and log out would mean that the device is down.
DEVICE_LOG table
I want to create a query that would check how long the Device is working for a certain period (i.e from 09/15/2013 00:00:00 to 09/16/2013 00:00:00).
use LAG function, for example:
select *, (record_date - LAG(record_date, 1, 0) OVER (PARTITION BY dev_id ORDER BY record_date)) *24*60 min
from device_log
use your partition by what you need

How to find periods without activity in BigQuery

I have a table of some type of activity in BigQuery with just about 40Mb of data now. Activity date is stored in one of the fields (string in format YYYY-MM-DD HH:MM:SS). I need to find way to determine periods of inactivity (with some predefined threshold) running reasonable amount of time.
Query that I built runs already hour. Here it is:
SELECT t1.date, MIN(PARSE_UTC_USEC(t1.date) - PARSE_UTC_USEC(t2.date)) AS mintime
FROM logs t1
JOIN (SELECT date, http_error FROM logs) t2 ON t1.http_error = t2.http_error
WHERE PARSE_UTC_USEC(t1.date) > PARSE_UTC_USEC(t2.date)
GROUP BY t1.date
HAVING mintime > 1000;
Idea is:
1. Take decart multiplication of the table with itself (http_error is field that almost never changes value, so it does the trick)
2. Take only pairs where date1 > date2
3. Take for every date1 date2 with minimal difference
4. Restrict choice by cases where this minimal difference is more than threshold.
I admit that real query I use is burden a bit by fixes to invalid data (this adds additional operations). But I really need better idea to do this. I'll be glad to hear other ideas
I don't know the granularity of inactivity you are looking for, but why not try bucketing by your timestamp, then counting the relative frequency of activities in each bucket:
SELECT
UTC_USEC_TO_HOUR(PARSE_UTC_USEC(timestamp_usec)) AS hour_bucket,
COUNT(*) as activity_count
GROUP BY
hour_bucket
ORDER BY
activity_count ASC;

SQL: Calculating system load statistics

I have a table like this that stores messages coming through a system:
Message
-------
ID (bigint)
CreateDate (datetime)
Data (varchar(255))
I've been asked to calculate the messages saved per second at peak load. The only data I really have to work with is the CreateDate. The load on the system is not constant, there are times when we get a ton of traffic, and times when we get little traffic. I'm thinking there are two parts to this problem: 1. Determine ranges of time that are considered peak load, 2. Calculate the average messages per second during these times.
Is this the right approach? Are there things in SQL that can help with this? Any tips would be greatly appreciated.
I agree, you have to figure out what Peak Load is first before you can start to create reports on it.
The first thing I would do is figure out how I am going to define peak load. Ex. Am I going to look at an hour by hour breakdown.
Next I would do a group by on the CreateDate formated in seconds (no milleseconds). As part of the group by I would do an avg based on number of records.
I don't think you'd need to know the peak hours; you can generate them with SQL, wrapping a the full query and selecting the top 20 entries, for example:
select top 20 *
from (
[...load query here...]
) qry
order by LoadPerSecond desc
This answer had a good lesson about averages. You can calculate the load per second by looking at the load per hour, and dividing by 3600.
To get a first glimpse of the load for the last week, you could try (Sql Server syntax):
select datepart(dy,createdate) as DayOfYear,
hour(createdate) as Hour,
count(*)/3600.0 as LoadPerSecond
from message
where CreateDate > dateadd(week,-7,getdate())
group by datepart(dy,createdate), hour(createdate)
To find the peak load per minute:
select max(MessagesPerMinute)
from (
select count(*) as MessagesPerMinute
from message
where CreateDate > dateadd(days,-7,getdate())
group by datepart(dy,createdate),hour(createdate),minute(createdate)
)
Grouping by datepart(dy,...) is an easy way to distinguish between days without worrying about month borders. It works until you select more that a year back, but that would be unusual for performance queries.
warning, these will run slow!
this will group your data into "second" buckets and list them from the most activity to least:
SELECT
CONVERT(char(19),CreateDate,120) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY CONVERT(Char(19),CreateDate,120)
ORDER BY 2 Desc
this will group your data into "minute" buckets and list them from the most activity to least:
SELECT
LEFT(CONVERT(char(19),CreateDate,120),16) AS CreateDateBucket,COUNT(*) AS CountOf
FROM Message
GROUP BY LEFT(CONVERT(char(19),CreateDate,120),16)
ORDER BY 2 Desc
I'd take those values and calculate what they want