Query using group by with steps/range over large data - sql

I have a table that stores a sensor temperature readings every few seconds
Sample data looks like this
nId nOperationId strDeviceIp nIfIndex nValue nTimestamp
97 2 192.168.99.252 1 26502328 1593828551
158 2 192.168.99.252 1 26501704 1593828667
256 2 192.168.99.252 1 26501860 1593828788
354 2 192.168.99.250 1 26501704 1593828908
452 2 192.168.99.250 1 26501692 1593829029
I want to have the average temperature per device so I ran the following query
select strDeviceIp, AVG(CAST(nValue as bigint)) as val1
from myTable
where nOperationId = 2 and nTimestamp >= 1593828600 and nTimestamp <= 1593838600
group by strSwitchIp;
Where I can pass the time range I want.
My issue is that this gives me total average but I want steps or range
I want to achieve that instead of one line I'll get all the values in a range/step of say 5 minutes as a row.
P.S. I'm trying to show a graph.
Running the following query I get
strSwitchIp average
192.168.99.252 26501731
But I would like to get
strSwitchIp average timestamp
192.168.99.252 26201731 1593828600
192.168.99.252 26532731 1593828900
192.168.99.252 24501721 1593829200
192.168.99.252 26506531 1593829500
In this example I would like to get a row every 300 seconds, per device.

Since your nTimestamp is number of seconds, you can simply add it to the GROUP BY. Division by 300 gives you 300 second (5 minute) intervals. In SQL Server / is integer division, which discards the fractional part.
select
strSwitchIp
,AVG(CAST(nValue as bigint)) as val1
,(nTimestamp / 300) * 300 AS Timestamp
from myTable
where
nOperationId = 2 and nTimestamp >= 1593828600 and nTimestamp <= 1593838600
group by
strSwitchIp
,nTimestamp / 300
;
nTimestamp / 300 gives an integer, a number of 5-minute intervals since 1970. / discards here the fractional part.
When this number is multiplied back by 300, it becomes again the number of seconds since 1970, but "rounded" to the nearest 5-minute interval. Just as you showed in the question in the expected result.
For example:
1593828667 / 300 = 5312762.2233333333333333333333333
discard fractional part
1593828667 / 300 = 5312762
5312762 * 300 = 1593828600
So, all timestamps between 1593828600 and 1593828899 become 1593828600 and all values for these timestamps are grouped into one row and averaged.

you ca use partition like this:
select strDeviceIp, AVG(CAST(nValue as bigint)) as val1,
ROW_NUMBER() over(partition by nTimestamp order by nTimestamp desc) as ROW_NO from AmyTable) Q where q.ROW_NO%5=0
....

Related

SQL query compare value with average of similiar records

The table has 3 columns : Category, Value(int), Date
What I want the SQL query to do is check for each record belonging to a specific category, if the value lies within a specific tolerance range (say t) of the average of value over last 100 records which have the same weekday (monday, tuesday, etc) and same category as that of the concerned record.
I was able to implement this partially, as I know the Category before hand, but the weekday depends on the record which is queried. Also, currently I am just checking if the value is greater than the average, instead of which I need to check if it lies within a certain tolerance.
SELECT Value, Date,
CASE WHEN
value > (SELECT AVG(value) FROM Table WHERE Category = 'CategoryX' and Date BETWEEN current_date - 700 and current_date - 1) THEN 1
ELSE 0
END AS check_avg
FROM Table
WHERE Category = 'CategoryX'
Sample :
Category
Value
Date
CategoryX
5000
2022-06-29
CategoryX
4500
2022-06-27
CategoryX
1000
2022-06-22
CategoryY
4500
2022-06-15
CategoryX
2000
2022-06-15
CategoryX
3000
2022-06-08
Expected Result :
Value in Record with today's date : 5000.
Average of values in records with same weekday and same category : 1000 + 2000 + 3000 / 3 = 2000.
If tolerance is 50%, then allowed value should be between 1000 - 3000.
So result should be 0
Validate that in both queries you are evaluating the same category and same weekday. Then sort the values that will be used to compute the average by date, and getting only the inmediate previous 100 records. Finally, check the difference between current value and average is below the tolerance interval epsilon.
SELECT Value, Date,
CASE WHEN
ABS(value - (SELECT AVG(Value) FROM (SELECT TOP 100 Value FROM Table WHERE Category = t.Category and DATEPART(WEEKDAY, Date)=DATEPART(WEEKDAY, t.Date) AND Date <= t.Date ORDER BY Date DESC ))) < epsilon THEN 1
ELSE 0
END AS check_avg
FROM Table t
WHERE Category = 'CategoryX'

Optimize SQLite queries for number of records in time intervals

I have a table which stores a set of events whose schema can be simplified for this example to
CREATE TABLE events (
id INTEGER PRIMARY KEY,
time INTEGER NOT NULL,
data BLOB
);
CREATE INDEX by_time ON events(time);
Given a time interval min to max, I want to get the number of events in each 1-hour interval between min and max (concrete example below).
The most obvious way to achieve this that I can think of is to compute the required intervals in my code and then for each one run the query
SELECT count(*) FROM events WHERE ? <= time AND time < ?;
Is there a faster way to achieve this by making SQLite deal with splitting the interval into smaller chunks?
If this makes the solution simpler, we can assume min and max are exactly at the start/end of an hour interval.
Example
Suppose events contains events with times
100, 200, 1600, 3000,
3800, 4000,
7400,
15000, 15200, 17000,
20400,
22300, 23000
Then I would want a query with min = 3600, max = 21600 to return something like
start | end | count
-------------------
3600 | 7200 | 2
7200 | 10800 | 1
10800 | 14400 | 0
14400 | 18000 | 3
18000 | 21600 | 1
It doesn't matter exactly what the format of the output is as long as it contains the required counts and a way to identify which interval they refer to.
You can use a recursive CTE to get the time intervals and then LEFT join the table to aggregate:
WITH
cte(min, max) AS (SELECT 3600, 21600),
intervals AS (
SELECT min from_time, min + 3600 to_time, max
FROM cte
WHERE min + 3600 <= max
UNION ALL
SELECT to_time, to_time + 3600, max
FROM intervals
WHERE to_time + 3600 <= max
)
SELECT i.from_time, i.to_time,
COUNT(e.id) count
FROM intervals i LEFT JOIN events e
ON e.time >= i.from_time AND e.time < i.to_time
GROUP BY i.from_time, i.to_time;
See the demo.

Selecting records with total greater then specified value over specified time interval

This is for Check Cashing business.
I have a table of checks cashed:
CustomerID, CustomerName, DateTimeCashed, CheckAmount, CheckFee, CheckPaypot
00100 John Doe 01/01/2017 12:40:30 1000 20 980
00200 John Smith 01/02/2017 13:24:45 2000 40 1960
..................
There are thousands of records like this.
I need to build a query which would return all records where total CheckPaypot for each Customer cashed in any 24 hour period exceeds 10000.
I know how to do this if a 24-hour interval is defined as a day from 12:00 AM to 11:59 PM.
Select * from (
Select CustomerID, CustomerName, DateTimeCashed, CheckAmount, CheckFee, CheckPaypot,
(Select sum(ch.CheckPaypot) from Checks ch
where
ch.CustomerID = c.CustomerID and CONVERT(date, cn.DateTimeCashed) = CONVERT(date, c.DateTimeCashed)
) as Total from Checks c) x
where x.Total > 10000
But the requirement is that the time interval is floating meaning that beginning and ending can be anything
as long as the length of the time interval is 24 hours. So of the Customer cashed 3 checks: 1 check in the afternoon
and 2 checks before noon of the next day and total of these checks is over $10000, they all must be included in the result.
Thank you,
lenkost.
SELECT
CustomerID,
SUM(CheckPaypot)
FROM
tb_previsao
WHERE
DateTimeCashed > DateTimeCashed - INTERVAL '1' DAY
GROUP BY
CustomerID
HAVING
SUM(CheckPaypot) > 10000;
Unfortunately, you'll have to use a correlated subquery:
SELECT
FROM (
SELECT outer_ch.*,
(SELECT SUM(checkpayout)
FROM checks inner_ch
WHERE DATEDIFF(HOUR, inner_ch.datetimecashed, outer_ch.datetimecashed)
BETWEEN 0 AND 23
AND inner_ch.customerid = outer_ch.customerid) AS running_sum_checkpayout
FROM checks outer_ch
)
WHERE running_sum_checkpayout > 10000
I say "unfortunately" because correlated subqueries are necessarily inefficient as they execute a separate subquery for each row in the result set. If this doesn't perform well enough, try to avoid doing a full table scan for each of these subqueries, e.g. by putting an index on customerid.

Access query to partition data and sum each partition?

I have a query with the fields date hour and value.
It looks something like this
date hour value
xx/xx/xx 15 100
xx/xx/xx 30 122
xx/xx/xx 45 50
... 100 100
... 115 23
... ... ...
... ... ...
... 2400 400
... 15 23
Basically, date is the date, hour is the hour, and value is the value for that particular 15 minute interval. What I have been trying to figure out is a way to take each hour (so 15, 30, 45, and 100) or (1015, 1030, 1045, 1100) [As you can see hours are military-esque 1:00pm is 1300 and midnight 2400], and sum their values together. So i am looking to return something like this:
xx/xx/xx 100 372
xx/xx/xx 200 23 + (130 data) + (145 data) + (200 data)
And so on...
The table has on average around 100 days and they all start from 15 to 2400 incrementing by 15 with varying numbers for the value column.
I have thought about using a partition, group by, etc. with no real ideas how to tackle it. Essentially I have to take 4 rows (an hour), sum their values, spit out the date, hour, and summed value then repeat for every day. I am not asking for code, just some help with what i should be using since this seems like a simple problem minus the key to solving it.
Any help is greatly appreciated, Thank you!
Grouping by Hour/100 will almost get you there - subtract 1 from the hour will make 1 AM fall to 99, and get included in the grouping. This will give a query that looks like this:
SELECT Table1.Dte, Int(([tme]-1)/100) AS Hr, Sum(Table1.Val) AS TotVal
FROM Table1
GROUP BY Table1.Dte, Int(([tme]-1)/100);
I may have misremembered how you cast to int in Access, but this might work:
Select
[date],
100 * (1 + Cint(([Hour] - 1) / 100)),
Sum(Value)
From
Query
Group By
[date],
100 * (1 + Cint(([Hour] - 1) / 100))
Order By
1, 2
SELECT
DateCol,
Int(HourCol \ -100) * -100 AS Hr,
Sum(Value) AS Value
FROM
YourTable
GROUP BY
DateCol,
Int(HourCol \ -100) * -100
Or you can use ((Hr + 99) \ 100) * 100.

Group DateTime into 5,15,30 and 60 minute intervals

I am trying to group some records into 5-, 15-, 30- and 60-minute intervals:
SELECT AVG(value) as "AvgValue",
sample_date/(5*60) as "TimeFive"
FROM DATA
WHERE id = 123 AND sample_date >= 3/21/2012
i want to run several queries, each would group my average values into the desired time increments. So the 5-min query would return results like this:
AvgValue TimeFive
6.90 1995-01-01 00:05:00
7.15 1995-01-01 00:10:00
8.25 1995-01-01 00:15:00
The 30-min query would result in this:
AvgValue TimeThirty
6.95 1995-01-01 00:30:00
7.40 1995-01-01 01:00:00
The datetime column is in yyyy-mm-dd hh:mm:ss format
I am getting implicit conversion errors of my datetime column. Any help is much appreciated!
Using
datediff(minute, '1990-01-01T00:00:00', yourDatetime)
will give you the number of minutes since 1990-1-1 (you can use the desired base date).
Then you can divide by 5, 15, 30 or 60, and group by the result of this division.
I've cheked it will be evaluated as an integer division, so you'll get an integer number you can use to group by.
i.e.
group by datediff(minute, '1990-01-01T00:00:00', yourDatetime) /5
UPDATE As the original question was edited to require the data to be shown in date-time format after the grouping, I've added this simple query that will do what the OP wants:
-- This convert the period to date-time format
SELECT
-- note the 5, the "minute", and the starting point to convert the
-- period back to original time
DATEADD(minute, AP.FiveMinutesPeriod * 5, '2010-01-01T00:00:00') AS Period,
AP.AvgValue
FROM
-- this groups by the period and gets the average
(SELECT
P.FiveMinutesPeriod,
AVG(P.Value) AS AvgValue
FROM
-- This calculates the period (five minutes in this instance)
(SELECT
-- note the division by 5 and the "minute" to build the 5 minute periods
-- the '2010-01-01T00:00:00' is the starting point for the periods
datediff(minute, '2010-01-01T00:00:00', T.Time)/5 AS FiveMinutesPeriod,
T.Value
FROM Test T) AS P
GROUP BY P.FiveMinutesPeriod) AP
NOTE: I've divided this in 3 subqueries for clarity. You should read it from inside out. It could, of course, be written as a single, compact query
NOTE: if you change the period and the starting date-time you can get any interval you need, like weeks starting from a given day, or whatever you can need
If you want to generate test data for this query use this:
CREATE TABLE Test
( Id INT IDENTITY PRIMARY KEY,
Time DATETIME,
Value FLOAT)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:00:22', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:03:22', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:04:45', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:07:21', 20)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:10:25', 30)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:11:22', 30)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:14:47', 30)
The result of executing the query is this:
Period AvgValue
2012-03-22 00:00:00.000 10
2012-03-22 00:05:00.000 20
2012-03-22 00:10:00.000 30
Building on #JotaBe's answer (to which I cannot comment on--otherwise I would), you could also try something like this which would not require a subquery.
SELECT
AVG(value) AS 'AvgValue',
-- Add the rounded seconds back onto epoch to get rounded time
DATEADD(
MINUTE,
(DATEDIFF(MINUTE, '1990-01-01T00:00:00', your_date) / 30) * 30,
'1990-01-01T00:00:00'
) AS 'TimeThirty'
FROM YourTable
-- WHERE your_date > some max lookback period
GROUP BY
(DATEDIFF(MINUTE, '1990-01-01T00:00:00', your_date) / 30)
This change removes temp tables and subqueries. It uses the same core logic for grouping by 30 minute intervals but, when presenting the data back as part of the result I'm just reversing the interval calculation to get the rounded date & time.
So, in case you googled this, but you need to do it in mysql, which was my case:
In MySQL you can do
GROUP BY
CONCAT(
DATE_FORMAT(`timestamp`,'%m-%d-%Y %H:'),
FLOOR(DATE_FORMAT(`timestamp`,'%i')/5)*5
)
In the new SQL Server 2022, you can use DATE_BUCKET, this rounds it down to the nearest interval specified.
SELECT
DATE_BUCKET(minute, 5, d.sample_date) AS TimeFive,
AVG(d.value) AS AvgValue
FROM DATA d
WHERE d.id = 123
AND d.sample_date >= '20121203'
GROUP BY
DATE_BUCKET(minute, 5, d.sample_date);
You can use the following statement, this removed the second component and calculates the number of minutes away from the five minute mark and uses this to round down to the time block. This is ideal if you want to change your window, you can simply change the mod value.
select dateadd(minute, - datepart(minute, [YOURDATE]) % 5, dateadd(minute, datediff(minute, 0, [YOURDATE]), 0)) as [TimeBlock]
This will help exactly what you want
replace dt - your datetime c - call field astro_transit1 - your table 300 refer 5 min so add 300 each time for time gap increase
SELECT FROM_UNIXTIME( 300 * ROUND( UNIX_TIMESTAMP( r.dt ) /300 ) ) AS 5datetime, ( SELECT r.c FROM astro_transit1 ra WHERE ra.dt = r.dt ORDER BY ra.dt DESC LIMIT 1 ) AS first_val FROM astro_transit1 r GROUP BY UNIX_TIMESTAMP( r.dt ) DIV 300 LIMIT 0 , 30