Query Redis data after elapsed time - redis

I am currently in the midst of a POC where I plan to store some IOT data in my Redis.
Here's my question:
I would like to monitor the data sent by multiple IOT devices and raise alarms if a device fails to report telemetry under a certain time threshold.
For Example:
Device 1: Booting: 09:00am : expected turn around time 2min
After 02 min, 01 sec
Device 1 has failed to report back in the given time.
Is there a way to use Redis to query, in order for it to return back the data which has passed a certain time threshold?
Any references will be appreciated, thanks!

Related

Kusto Query limitation gets timeout error

I am running a Kusto Query in my Azure Diagnostics where I am querying logs of last 1 week and the query times out after 10 mins. Is there a way I can increase the timeout limits? if yes can someone please guide me the steps. I downloaded Kusto explorer but couldnt see any easy way of connecting my Azure cluster. Need help as how can i increase this timeout duration from inside Azure portal for query I am running?
It seems like 10 minutes are the max value for timeout.
https://learn.microsoft.com/en-us/azure/azure-monitor/service-limits
Query API
Category
Limit
Comments
Maximum records returned in a single query
500,000
Maximum size of data returned
~104 MB (~100 MiB)
The API returns up to 64 MB of compressed data, which translates to up to 100 MB of raw data.
Maximum query running time
10 minutes
See Timeouts for details.
Maximum request rate
200 requests per 30 seconds per Azure AD user or client IP address
See Log queries and language.
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/api/timeouts
Timeouts
Query execution times can vary widely based on:
The complexity of the query
The amount of data being analyzed
The load on the system at the time of the query
The load on the workspace at the time of the query
You may want to customize the timeout for the query.
The default timeout is 3 minutes, and the maximum timeout is 10 minutes.

Azure Stream Analytics takes too long to process the events

I'm trying to configure an Azure Stream Analytics job, but consistently getting bad performance. I received data from a client system that pushes data into an Event Hub. And the ASA queries that into an Azure SQL database.
A few days ago I noticed that it was generating large amount of InputEventLateBeyondThreshold errors. Here an example out of the ASA. The Timestamp element is set by the client system.
{
"Tag": "MANG_POWER_5",
"Value": 1.08411181,
"ValueType": "Analogue",
"Timestamp": "2022-02-01T09:00:00.0000000Z",
"EventProcessedUtcTime": "2022-02-01T09:36:05.1482308Z",
"PartitionId": 0,
"EventEnqueuedUtcTime": "2022-02-01T09:00:00.8980000Z"
}
You can see that the event arrives pretty quickly, but takes more than 30 mins to process it. To try and avoid InputEventLateBeyondThreshold errors, I have increased the late event threshold. This may be contributing to the increased processing time, but having it too low also increases number of InputEventLateBeyondThreshold errors.
The Watermark Delay is consistently high, and yet SU usage is around 5%. I have increased the SU to as high as I can for this query.
I'm trying to figure out, why it takes so long to process the events once they have arrived.
This is the query I'm using:
WITH PIDataSet AS (SELECT * FROM [<event-hub>] TIMESTAMP BY timestamp)
--Write data to SQL joining with a lookup
SELECT
i.Timestamp as timestamp,
i.Value as value,
INTO [<sql-database>]
FROM PIDataSet as i
INNER JOIN [tagmapping-ref-alias] tm ON tm.sourcename = i.Tag
----Write data to AzureTable joining with a lookup
SELECT
DATEDIFF(second,CAST('1970-01-01' as DateTime), I1.Timestamp) As Rowkey,
I2.TagId as PartitionKey,
I1.Value as Value,
UDF.formatTime(I1.Timestamp) as DeviceTimeStamp
into [<azure-table>]
FROM PIDataSet as I1
JOIN [tagmapping-ref-alias] as I2 on I2.Sourcename = I1.Tag
--Get an hourly count into a SQL Table.
SELECT
I2.TagId,
System.Timestamp() as WindowEndTime, COUNT(I2.TagId) AS InputCount
into [tagmeta-ref-alias]
FROM PIDataSet as I1
JOIN [tagmapping-ref-alias] as I2 on I2.Sourcename = I1.Tag
GROUP BY I2.TagId, TumblingWindow(Duration(hour, 1))
When you set up a 59 minutes out-of-order window, what you do is that you set up a 59 minutes buffer for that input. When records land in that buffer, they will wait 59 minutes until they get out. What you get in exchange, is that we have the opportunity to re-order these events so they will look in order to the job.
Using it at 1h is an extreme setting that will automatically give you 59minute of watermark, by definition. This is very surprising and I'm wondering why you need a value so high.
Edit
Now looking at the late arrival policy.
You are using an event time (TIMESTAMP BY timestamp) which means that your events can now be late, see this doc and this one.
What this means is that when a record comes later than 1h (so timestamp is older than 1h from the wall clock on our servers - in UTC), then we adjust its timestamp to our wall clock minus 1h, and send it to the query. It also means that your tumbling window always has to wait an additional hour to be sure it's not missing those late records.
Here what I would do is restore the default settings (no out of order, 5 seconds late events, adjust events). Then when you get InputEventLateBeyondThreshold, it means that that the job received a timestamp that was later in the past than 5 seconds. You're not losing the data, we are adjusting its system.timestamp to a more recent value (but not the timestamp field, we don't change it).
What we then need to understand is why does it take more than 5 seconds for a record in your pipeline to go from production to consumption. Is it because you have big delays in your ingestion pipeline, or because you have a time skew on your producer clock? Do you know?

Storing time intervals efficiently in redis

I am trying to track server uptimes using redis.
So the approach I have chosen is as follows:
server xyz will keep on sending my service ping indicating that it was alive and working in the last 30 seconds.
My service will store a list of all time intervals during which the server was active. This will be done by storing a list of {startTime, endTime} in redis, with key as name of the server (xyz)
Depending on a user query, I will use this list to generate server uptime metrics. Like % downtime in between times (T1, T2)
Example:
assume that the time is T currently.
at T+30, server sends a ping.
xyz:["{start:T end:T+30}"]
at T+60, server sends another ping
xyz:["{start:T end:T+30}", "{start:T+30 end:T+60}"]
and so on for all pings.
This works fine , but an issue is that over a large time period this list will get a lot of elements. To avoid this currently, on a ping, I pop the last element of the list, check if it can be merged with the latest time interval. If it can be merged, I coalesce and push a single time interval into the list. if not then 2 time intervals are pushed.
So with this my list becomes like this after step 2 : xyz:["{start:T end:T+60}"]
Some problems I see with this approach is:
the merging is being done in my service, and not redis.
incase my service is distributed, The list ordering might get corrupted due to multiple readers and writers.
Is there a more efficient/elegant way to handle this , like maybe handling merging of time intervals in redis itself ?

High precision queue statistics from RabbitMQ

I need to log with the highest possible precision the rate with which messages enter and leave a particular queue in Rabbit. I know the API already provides publishing and delivering rates, but I am interested in capturing raw incoming and outgoing values in a known period of time, so that I can estimate rates with higher precision and time periods of my choice.
Ideally, I would check on-demand (i.e. on a schedule of my choice) e.g. the current cumulative count of messages that have entered the queue so far ("published" messages), and the current cumulative count of messages consumed ("delivered" messages).
With these types of cumulative counts, I could:
Compute my own deltas of messages entering or exiting the queue, e.g. doing Δ_count = cumulative_count(t) - cumulative_count(t-1)
Compute throughput rates doing throughput = Δ_count / Δ_time
Potentially infer how long messages stay on the queue throughout the day.
The last two would ideally rely on the precise timestamps when those cumulative counts were calculated.
I am trying to solve this problem directly using RabbitMQ’s API, but I’m encountering a problem when doing so. When I calculate the message cumulative count in the queue, I get a number that I don’t expect.
For example consider the screenshot below.
The Δ_message_count between entries 90 and 91 is 1810-1633 = 177. So, as I stated, I suppose that the difference between published and delivered messages should be 177 as well (in particular, 177 more messages published than delivered).
However, when I calculate these differences, I see that the difference is not 177:
Δ of published (incoming) messages: 13417517652009 - 13417517651765 = 244
Δ of delivered (outgoing) messages: 1341751765667 - 1341751765450 = 217
so we get 244 - 217 =27 messages. This suggests that there are 177 - 27 = 150 messages "unaccounted" for.
Why?
I tried taking into account the redelivered messages given by the API but they were constant when I run my tests, suggesting that there were no redelivered messages, so I wouldn't expect that to play a role.

What is the maximum LIMIT DURATION in the LAG function in ASA?

I am streaming data from devices and I want to use the LAG function to identify the last value received from a particular device. The data is not streamed at a regular period and in rare cases it could be days between receiving data from a device.
Is there a maximum period for the LIMIT DURATION clause?
Is there any down-side to having long LIMIT DURATION periods?
There is no maximum period for LIMIT DURATION in the language. However it is limited by amount of data the input source can hold - e.g. 1 day is default retention policy for Event Hub (can be increased in configuration).
When job is being started, Azure Stream Analytics reads up to LIMIT DURATION amount of data from the source to make sure it has correct value for the LAG at job start time. If data volume is high, this can increase job start time.
If you need to use data that is more than several days old, it may make more sense to use it as a reference data (which can be updated at daily intervals for example)