Optimize SQLite queries for number of records in time intervals - sql

I have a table which stores a set of events whose schema can be simplified for this example to
CREATE TABLE events (
id INTEGER PRIMARY KEY,
time INTEGER NOT NULL,
data BLOB
);
CREATE INDEX by_time ON events(time);
Given a time interval min to max, I want to get the number of events in each 1-hour interval between min and max (concrete example below).
The most obvious way to achieve this that I can think of is to compute the required intervals in my code and then for each one run the query
SELECT count(*) FROM events WHERE ? <= time AND time < ?;
Is there a faster way to achieve this by making SQLite deal with splitting the interval into smaller chunks?
If this makes the solution simpler, we can assume min and max are exactly at the start/end of an hour interval.
Example
Suppose events contains events with times
100, 200, 1600, 3000,
3800, 4000,
7400,
15000, 15200, 17000,
20400,
22300, 23000
Then I would want a query with min = 3600, max = 21600 to return something like
start | end | count
-------------------
3600 | 7200 | 2
7200 | 10800 | 1
10800 | 14400 | 0
14400 | 18000 | 3
18000 | 21600 | 1
It doesn't matter exactly what the format of the output is as long as it contains the required counts and a way to identify which interval they refer to.

You can use a recursive CTE to get the time intervals and then LEFT join the table to aggregate:
WITH
cte(min, max) AS (SELECT 3600, 21600),
intervals AS (
SELECT min from_time, min + 3600 to_time, max
FROM cte
WHERE min + 3600 <= max
UNION ALL
SELECT to_time, to_time + 3600, max
FROM intervals
WHERE to_time + 3600 <= max
)
SELECT i.from_time, i.to_time,
COUNT(e.id) count
FROM intervals i LEFT JOIN events e
ON e.time >= i.from_time AND e.time < i.to_time
GROUP BY i.from_time, i.to_time;
See the demo.

Related

SQL query compare value with average of similiar records

The table has 3 columns : Category, Value(int), Date
What I want the SQL query to do is check for each record belonging to a specific category, if the value lies within a specific tolerance range (say t) of the average of value over last 100 records which have the same weekday (monday, tuesday, etc) and same category as that of the concerned record.
I was able to implement this partially, as I know the Category before hand, but the weekday depends on the record which is queried. Also, currently I am just checking if the value is greater than the average, instead of which I need to check if it lies within a certain tolerance.
SELECT Value, Date,
CASE WHEN
value > (SELECT AVG(value) FROM Table WHERE Category = 'CategoryX' and Date BETWEEN current_date - 700 and current_date - 1) THEN 1
ELSE 0
END AS check_avg
FROM Table
WHERE Category = 'CategoryX'
Sample :
Category
Value
Date
CategoryX
5000
2022-06-29
CategoryX
4500
2022-06-27
CategoryX
1000
2022-06-22
CategoryY
4500
2022-06-15
CategoryX
2000
2022-06-15
CategoryX
3000
2022-06-08
Expected Result :
Value in Record with today's date : 5000.
Average of values in records with same weekday and same category : 1000 + 2000 + 3000 / 3 = 2000.
If tolerance is 50%, then allowed value should be between 1000 - 3000.
So result should be 0
Validate that in both queries you are evaluating the same category and same weekday. Then sort the values that will be used to compute the average by date, and getting only the inmediate previous 100 records. Finally, check the difference between current value and average is below the tolerance interval epsilon.
SELECT Value, Date,
CASE WHEN
ABS(value - (SELECT AVG(Value) FROM (SELECT TOP 100 Value FROM Table WHERE Category = t.Category and DATEPART(WEEKDAY, Date)=DATEPART(WEEKDAY, t.Date) AND Date <= t.Date ORDER BY Date DESC ))) < epsilon THEN 1
ELSE 0
END AS check_avg
FROM Table t
WHERE Category = 'CategoryX'

How do I calculate spikes in data over the last x amount of minutes in QuestDB

I have a table CPU with some metrics coming in over Influx line protocol, how can I run a query that tells me if data has exceeded a certain threshold in the last x minutes,
I am trying to do something like:
select usage_system, timestamp
from cpu
Where usage_system > 20
But I have all records returned. For the time range, I don't want to hardcode timestamps or have to replace it with the current time to get a relative query
usage
timestamp
27.399999999906
2021-04-14T12:02:30.000000Z
26.400000000139
2021-04-14T12:02:30.000000Z
25.666666666899
2021-04-14T12:02:30.000000Z
...
...
Ideally I would like to be able to check if usage is above an average, but above hardcoded value is fine for now.
The following query will dynamically filter the last 5 minutes on the timestamp field and will return rows based on a hardcoded value (when usage_system is above 20):
SELECT *
FROM cpu
WHERE usage_system > 20
AND timestamp > dateadd('m', -5, now());
If you want to add some more details to the query, you can cross join an aggregate value:
WITH avg_usage AS (select avg(usage_system) average FROM cpu)
SELECT timestamp, cpu.usage_system usage, average, cpu.usage_system > avg_usage.average above_average
FROM cpu CROSS JOIN avg_usage
Where timestamp > dateadd('m', -5, now());
This will add a boolean column above_average and will be true or false depending on whether the row's usage_system is above the aggregate:
timestamp
usage
average
above_average
2021-04-14T13:30:00.000000Z
20
10
true
2021-04-14T13:30:00.000000Z
5
10
false
If you want all columns, it might be useful to move this filter down into the WHERE clause:
WITH avg_usage AS (select avg(usage_system) average FROM cpu)
SELECT *
FROM cpu CROSS JOIN avg_usage
WHERE timestamp > dateadd('m', -5, now())
AND cpu.usage_system > avg_usage.average;
This will then allow to do more complex filtering, such as returning all rows which are in a certain percentile, like the following which will return rows where the usage_system is above 80% of the recorded maximum in the last 5 minutes (i.e. highest by CPU usage):
WITH max_usage AS (select max(usage_system) maximum FROM cpu)
SELECT *
FROM cpu CROSS JOIN max_usage
WHERE timestamp > dateadd('m', -5, now())
AND cpu.usage_system > (max_usage.maximum / 100 * 80);
edit: the last query was based on the example in the QuestDB WITH keyword documentation

Query using group by with steps/range over large data

I have a table that stores a sensor temperature readings every few seconds
Sample data looks like this
nId nOperationId strDeviceIp nIfIndex nValue nTimestamp
97 2 192.168.99.252 1 26502328 1593828551
158 2 192.168.99.252 1 26501704 1593828667
256 2 192.168.99.252 1 26501860 1593828788
354 2 192.168.99.250 1 26501704 1593828908
452 2 192.168.99.250 1 26501692 1593829029
I want to have the average temperature per device so I ran the following query
select strDeviceIp, AVG(CAST(nValue as bigint)) as val1
from myTable
where nOperationId = 2 and nTimestamp >= 1593828600 and nTimestamp <= 1593838600
group by strSwitchIp;
Where I can pass the time range I want.
My issue is that this gives me total average but I want steps or range
I want to achieve that instead of one line I'll get all the values in a range/step of say 5 minutes as a row.
P.S. I'm trying to show a graph.
Running the following query I get
strSwitchIp average
192.168.99.252 26501731
But I would like to get
strSwitchIp average timestamp
192.168.99.252 26201731 1593828600
192.168.99.252 26532731 1593828900
192.168.99.252 24501721 1593829200
192.168.99.252 26506531 1593829500
In this example I would like to get a row every 300 seconds, per device.
Since your nTimestamp is number of seconds, you can simply add it to the GROUP BY. Division by 300 gives you 300 second (5 minute) intervals. In SQL Server / is integer division, which discards the fractional part.
select
strSwitchIp
,AVG(CAST(nValue as bigint)) as val1
,(nTimestamp / 300) * 300 AS Timestamp
from myTable
where
nOperationId = 2 and nTimestamp >= 1593828600 and nTimestamp <= 1593838600
group by
strSwitchIp
,nTimestamp / 300
;
nTimestamp / 300 gives an integer, a number of 5-minute intervals since 1970. / discards here the fractional part.
When this number is multiplied back by 300, it becomes again the number of seconds since 1970, but "rounded" to the nearest 5-minute interval. Just as you showed in the question in the expected result.
For example:
1593828667 / 300 = 5312762.2233333333333333333333333
discard fractional part
1593828667 / 300 = 5312762
5312762 * 300 = 1593828600
So, all timestamps between 1593828600 and 1593828899 become 1593828600 and all values for these timestamps are grouped into one row and averaged.
you ca use partition like this:
select strDeviceIp, AVG(CAST(nValue as bigint)) as val1,
ROW_NUMBER() over(partition by nTimestamp order by nTimestamp desc) as ROW_NO from AmyTable) Q where q.ROW_NO%5=0
....

Selecting records with total greater then specified value over specified time interval

This is for Check Cashing business.
I have a table of checks cashed:
CustomerID, CustomerName, DateTimeCashed, CheckAmount, CheckFee, CheckPaypot
00100 John Doe 01/01/2017 12:40:30 1000 20 980
00200 John Smith 01/02/2017 13:24:45 2000 40 1960
..................
There are thousands of records like this.
I need to build a query which would return all records where total CheckPaypot for each Customer cashed in any 24 hour period exceeds 10000.
I know how to do this if a 24-hour interval is defined as a day from 12:00 AM to 11:59 PM.
Select * from (
Select CustomerID, CustomerName, DateTimeCashed, CheckAmount, CheckFee, CheckPaypot,
(Select sum(ch.CheckPaypot) from Checks ch
where
ch.CustomerID = c.CustomerID and CONVERT(date, cn.DateTimeCashed) = CONVERT(date, c.DateTimeCashed)
) as Total from Checks c) x
where x.Total > 10000
But the requirement is that the time interval is floating meaning that beginning and ending can be anything
as long as the length of the time interval is 24 hours. So of the Customer cashed 3 checks: 1 check in the afternoon
and 2 checks before noon of the next day and total of these checks is over $10000, they all must be included in the result.
Thank you,
lenkost.
SELECT
CustomerID,
SUM(CheckPaypot)
FROM
tb_previsao
WHERE
DateTimeCashed > DateTimeCashed - INTERVAL '1' DAY
GROUP BY
CustomerID
HAVING
SUM(CheckPaypot) > 10000;
Unfortunately, you'll have to use a correlated subquery:
SELECT
FROM (
SELECT outer_ch.*,
(SELECT SUM(checkpayout)
FROM checks inner_ch
WHERE DATEDIFF(HOUR, inner_ch.datetimecashed, outer_ch.datetimecashed)
BETWEEN 0 AND 23
AND inner_ch.customerid = outer_ch.customerid) AS running_sum_checkpayout
FROM checks outer_ch
)
WHERE running_sum_checkpayout > 10000
I say "unfortunately" because correlated subqueries are necessarily inefficient as they execute a separate subquery for each row in the result set. If this doesn't perform well enough, try to avoid doing a full table scan for each of these subqueries, e.g. by putting an index on customerid.

Group DateTime into 5,15,30 and 60 minute intervals

I am trying to group some records into 5-, 15-, 30- and 60-minute intervals:
SELECT AVG(value) as "AvgValue",
sample_date/(5*60) as "TimeFive"
FROM DATA
WHERE id = 123 AND sample_date >= 3/21/2012
i want to run several queries, each would group my average values into the desired time increments. So the 5-min query would return results like this:
AvgValue TimeFive
6.90 1995-01-01 00:05:00
7.15 1995-01-01 00:10:00
8.25 1995-01-01 00:15:00
The 30-min query would result in this:
AvgValue TimeThirty
6.95 1995-01-01 00:30:00
7.40 1995-01-01 01:00:00
The datetime column is in yyyy-mm-dd hh:mm:ss format
I am getting implicit conversion errors of my datetime column. Any help is much appreciated!
Using
datediff(minute, '1990-01-01T00:00:00', yourDatetime)
will give you the number of minutes since 1990-1-1 (you can use the desired base date).
Then you can divide by 5, 15, 30 or 60, and group by the result of this division.
I've cheked it will be evaluated as an integer division, so you'll get an integer number you can use to group by.
i.e.
group by datediff(minute, '1990-01-01T00:00:00', yourDatetime) /5
UPDATE As the original question was edited to require the data to be shown in date-time format after the grouping, I've added this simple query that will do what the OP wants:
-- This convert the period to date-time format
SELECT
-- note the 5, the "minute", and the starting point to convert the
-- period back to original time
DATEADD(minute, AP.FiveMinutesPeriod * 5, '2010-01-01T00:00:00') AS Period,
AP.AvgValue
FROM
-- this groups by the period and gets the average
(SELECT
P.FiveMinutesPeriod,
AVG(P.Value) AS AvgValue
FROM
-- This calculates the period (five minutes in this instance)
(SELECT
-- note the division by 5 and the "minute" to build the 5 minute periods
-- the '2010-01-01T00:00:00' is the starting point for the periods
datediff(minute, '2010-01-01T00:00:00', T.Time)/5 AS FiveMinutesPeriod,
T.Value
FROM Test T) AS P
GROUP BY P.FiveMinutesPeriod) AP
NOTE: I've divided this in 3 subqueries for clarity. You should read it from inside out. It could, of course, be written as a single, compact query
NOTE: if you change the period and the starting date-time you can get any interval you need, like weeks starting from a given day, or whatever you can need
If you want to generate test data for this query use this:
CREATE TABLE Test
( Id INT IDENTITY PRIMARY KEY,
Time DATETIME,
Value FLOAT)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:00:22', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:03:22', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:04:45', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:07:21', 20)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:10:25', 30)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:11:22', 30)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:14:47', 30)
The result of executing the query is this:
Period AvgValue
2012-03-22 00:00:00.000 10
2012-03-22 00:05:00.000 20
2012-03-22 00:10:00.000 30
Building on #JotaBe's answer (to which I cannot comment on--otherwise I would), you could also try something like this which would not require a subquery.
SELECT
AVG(value) AS 'AvgValue',
-- Add the rounded seconds back onto epoch to get rounded time
DATEADD(
MINUTE,
(DATEDIFF(MINUTE, '1990-01-01T00:00:00', your_date) / 30) * 30,
'1990-01-01T00:00:00'
) AS 'TimeThirty'
FROM YourTable
-- WHERE your_date > some max lookback period
GROUP BY
(DATEDIFF(MINUTE, '1990-01-01T00:00:00', your_date) / 30)
This change removes temp tables and subqueries. It uses the same core logic for grouping by 30 minute intervals but, when presenting the data back as part of the result I'm just reversing the interval calculation to get the rounded date & time.
So, in case you googled this, but you need to do it in mysql, which was my case:
In MySQL you can do
GROUP BY
CONCAT(
DATE_FORMAT(`timestamp`,'%m-%d-%Y %H:'),
FLOOR(DATE_FORMAT(`timestamp`,'%i')/5)*5
)
In the new SQL Server 2022, you can use DATE_BUCKET, this rounds it down to the nearest interval specified.
SELECT
DATE_BUCKET(minute, 5, d.sample_date) AS TimeFive,
AVG(d.value) AS AvgValue
FROM DATA d
WHERE d.id = 123
AND d.sample_date >= '20121203'
GROUP BY
DATE_BUCKET(minute, 5, d.sample_date);
You can use the following statement, this removed the second component and calculates the number of minutes away from the five minute mark and uses this to round down to the time block. This is ideal if you want to change your window, you can simply change the mod value.
select dateadd(minute, - datepart(minute, [YOURDATE]) % 5, dateadd(minute, datediff(minute, 0, [YOURDATE]), 0)) as [TimeBlock]
This will help exactly what you want
replace dt - your datetime c - call field astro_transit1 - your table 300 refer 5 min so add 300 each time for time gap increase
SELECT FROM_UNIXTIME( 300 * ROUND( UNIX_TIMESTAMP( r.dt ) /300 ) ) AS 5datetime, ( SELECT r.c FROM astro_transit1 ra WHERE ra.dt = r.dt ORDER BY ra.dt DESC LIMIT 1 ) AS first_val FROM astro_transit1 r GROUP BY UNIX_TIMESTAMP( r.dt ) DIV 300 LIMIT 0 , 30