Group DateTime into 5,15,30 and 60 minute intervals - sql

I am trying to group some records into 5-, 15-, 30- and 60-minute intervals:
SELECT AVG(value) as "AvgValue",
sample_date/(5*60) as "TimeFive"
FROM DATA
WHERE id = 123 AND sample_date >= 3/21/2012
i want to run several queries, each would group my average values into the desired time increments. So the 5-min query would return results like this:
AvgValue TimeFive
6.90 1995-01-01 00:05:00
7.15 1995-01-01 00:10:00
8.25 1995-01-01 00:15:00
The 30-min query would result in this:
AvgValue TimeThirty
6.95 1995-01-01 00:30:00
7.40 1995-01-01 01:00:00
The datetime column is in yyyy-mm-dd hh:mm:ss format
I am getting implicit conversion errors of my datetime column. Any help is much appreciated!

Using
datediff(minute, '1990-01-01T00:00:00', yourDatetime)
will give you the number of minutes since 1990-1-1 (you can use the desired base date).
Then you can divide by 5, 15, 30 or 60, and group by the result of this division.
I've cheked it will be evaluated as an integer division, so you'll get an integer number you can use to group by.
i.e.
group by datediff(minute, '1990-01-01T00:00:00', yourDatetime) /5
UPDATE As the original question was edited to require the data to be shown in date-time format after the grouping, I've added this simple query that will do what the OP wants:
-- This convert the period to date-time format
SELECT
-- note the 5, the "minute", and the starting point to convert the
-- period back to original time
DATEADD(minute, AP.FiveMinutesPeriod * 5, '2010-01-01T00:00:00') AS Period,
AP.AvgValue
FROM
-- this groups by the period and gets the average
(SELECT
P.FiveMinutesPeriod,
AVG(P.Value) AS AvgValue
FROM
-- This calculates the period (five minutes in this instance)
(SELECT
-- note the division by 5 and the "minute" to build the 5 minute periods
-- the '2010-01-01T00:00:00' is the starting point for the periods
datediff(minute, '2010-01-01T00:00:00', T.Time)/5 AS FiveMinutesPeriod,
T.Value
FROM Test T) AS P
GROUP BY P.FiveMinutesPeriod) AP
NOTE: I've divided this in 3 subqueries for clarity. You should read it from inside out. It could, of course, be written as a single, compact query
NOTE: if you change the period and the starting date-time you can get any interval you need, like weeks starting from a given day, or whatever you can need
If you want to generate test data for this query use this:
CREATE TABLE Test
( Id INT IDENTITY PRIMARY KEY,
Time DATETIME,
Value FLOAT)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:00:22', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:03:22', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:04:45', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:07:21', 20)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:10:25', 30)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:11:22', 30)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:14:47', 30)
The result of executing the query is this:
Period AvgValue
2012-03-22 00:00:00.000 10
2012-03-22 00:05:00.000 20
2012-03-22 00:10:00.000 30

Building on #JotaBe's answer (to which I cannot comment on--otherwise I would), you could also try something like this which would not require a subquery.
SELECT
AVG(value) AS 'AvgValue',
-- Add the rounded seconds back onto epoch to get rounded time
DATEADD(
MINUTE,
(DATEDIFF(MINUTE, '1990-01-01T00:00:00', your_date) / 30) * 30,
'1990-01-01T00:00:00'
) AS 'TimeThirty'
FROM YourTable
-- WHERE your_date > some max lookback period
GROUP BY
(DATEDIFF(MINUTE, '1990-01-01T00:00:00', your_date) / 30)
This change removes temp tables and subqueries. It uses the same core logic for grouping by 30 minute intervals but, when presenting the data back as part of the result I'm just reversing the interval calculation to get the rounded date & time.

So, in case you googled this, but you need to do it in mysql, which was my case:
In MySQL you can do
GROUP BY
CONCAT(
DATE_FORMAT(`timestamp`,'%m-%d-%Y %H:'),
FLOOR(DATE_FORMAT(`timestamp`,'%i')/5)*5
)

In the new SQL Server 2022, you can use DATE_BUCKET, this rounds it down to the nearest interval specified.
SELECT
DATE_BUCKET(minute, 5, d.sample_date) AS TimeFive,
AVG(d.value) AS AvgValue
FROM DATA d
WHERE d.id = 123
AND d.sample_date >= '20121203'
GROUP BY
DATE_BUCKET(minute, 5, d.sample_date);

You can use the following statement, this removed the second component and calculates the number of minutes away from the five minute mark and uses this to round down to the time block. This is ideal if you want to change your window, you can simply change the mod value.
select dateadd(minute, - datepart(minute, [YOURDATE]) % 5, dateadd(minute, datediff(minute, 0, [YOURDATE]), 0)) as [TimeBlock]

This will help exactly what you want
replace dt - your datetime c - call field astro_transit1 - your table 300 refer 5 min so add 300 each time for time gap increase
SELECT FROM_UNIXTIME( 300 * ROUND( UNIX_TIMESTAMP( r.dt ) /300 ) ) AS 5datetime, ( SELECT r.c FROM astro_transit1 ra WHERE ra.dt = r.dt ORDER BY ra.dt DESC LIMIT 1 ) AS first_val FROM astro_transit1 r GROUP BY UNIX_TIMESTAMP( r.dt ) DIV 300 LIMIT 0 , 30

Related

Retrieve data 60 days prior to their retest date

I have a requirement where I need to retrieve Row(s) 60 days prior to their "Retest Date" which is a column present in the table. I have also attached the screenshot and the field "Retest Date" is highlighted.
reagentlotid
reagentlotdesc
u_retest
RL-0000004
NULL
2021-09-30 17:00:00.00
RL-0000005
NULL
2021-09-29 04:21:00.00
RL-0000006
NULL
2021-09-29 04:22:00.00
RL-0000007
Y-T4
2021-08-28 05:56:00.00
RL-0000008
NULL
2021-09-30 05:56:00.00
RL-0000009
NULL
2021-09-28 04:23:00.00
This is what I was trying to do in SQL Server:
select r.reagentlotid, r.reagentlotdesc, r.u_retestdt
from reagentlot r
where u_retestdt = DATEADD(DD,60,GETDATE());
But, it didn't work. The above query returning 0 rows.
Could please someone help me with this query?
Use a range, if you want all data from the day 60 days hence:
select r.reagentlotid, r.reagentlotdesc, r.u_retestdt
from reagentlot r
where
u_retestdt >= CAST(DATEADD(DD,60,GETDATE())
AS DATE) AND
u_retestdt < CAST(DATEADD(DD,61,GETDATE()) AS DATE)
Dates are like numbers; the time is like a decimal part. 12:00:00 is half way through a day so it's like x.5 - SQLServer even lets you manipulate datetime types by adding fractions of days etc (adding 0.5 is adding 12h)
If you had a column of numbers like 1.1, 1.5. 2.4 and you want all the one-point-somethings you can't get any of them by saying score = 1; you say score >= 1 and score < 2
Generally, you should try to avoid manipulating table data in a query's WHERE clause because it usually makes indexes unusable: if you want "all numbers between 1 and 2", use a range; don't chop the decimal off the table data in order to compare it to 1. Same with dates; don't chop the time off - use a range:
--yes
WHERE score >= 1 and score < 2
--no
WHERE CAST(score as INTEGER) = 1
--yes
WHERE birthdatetime >= '1970-01-01' and birthdatetime < '1970-01-02'
--no
WHERE CAST(birthdatetime as DATE) = '1970-01-01'
Note that I am using a CAST to cut the time off in my recommendation to you, but that's to establish a pair of constants of "midnight on the day 60 days in the future" and "midnight on 61 days in the future" that will be used in the range check.
Follow the rule of thumb of "avoid calling functions on columns in a where clause" and generally, you'll be fine :)
Try something like this. -60 days may be the current or previous year. HTH
;with doy1 as (
select DATENAME(dayofyear, dateadd(day,-60,GetDate())) as doy
)
, doy2 as (
select case when doy > 0 then doy
when doy < 0 then 365 - doy end as doy
, case when doy > 0 then year(getdate())
when doy < 0 then year(getdate())-1 end as yr
from doy1
)
select r.reagentlotid
, r.reagentlotdesc
, cast(r.u_retestdt as date) as u_retestdt
from reagentlot r
inner join doy2 d on DATENAME(dayofyear, r.u_retestdt) = d.doy
where DATENAME(dayofyear, r.u_retestdt) = doy
and year(r.u_retestdt) = d.yr

Data aggregation by sliding time periods

[Query and question edited and fixed thanks to comments from #Gordon Linoff and #shawnt00]
I recently inherited a SQL query that calculates the number of some events in time windows of 30 days from a log database. It uses a CTE (Common Table Expression) to generate the 30 days ranges since '2019-01-01' to now. And then it counts the cases in those 30/60/90 days intervals. I am not sure this is the best method. All I know is that it takes a long time to run and I do not understand 100% how exactly it works. So I am trying to rebuild it in an efficient way (maybe as it is now is the most efficient way, I do not know).
I have several questions:
One of the things I notice is that instead of using DATEDIFF the query simply substracts a number of days from the date.Is that a good practice at all?
Is there a better way of doing the time comparisons?
Is there a better way to do the whole thing? The bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.
Note: LogDate original format is like 2019-04-01 18:30:12.000.
DECLARE #dt1 Datetime='2019-01-01'
DECLARE #dt2 Datetime=getDate();
WITH ctedaterange
AS (SELECT [Dates]=#dt1
UNION ALL
SELECT [dates] + 30
FROM ctedaterange
WHERE [dates] + 30<= #dt2)
SELECT
[dates],
lt.Activity, COUNT(*) as Total,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 90 THEN 1 ELSE 0 END) AS Activity90days,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 60 THEN 1 ELSE 0 END) AS Activity60days,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 30 THEN 1 ELSE 0 END) AS Activity30days
FROM ctedaterange AS cte
JOIN (SELECT Activity, CONVERT(DATE, LogDate) as LogDate FROM LogTable) AS lt
ON cte.[dates] = lt.LogDate
group by [dates], lt.Activity
OPTION (maxrecursion 0)
Sample dataset (LogTable):
LogDate, Activity
2020-02-25 01:10:10.000, Activity01
2020-04-14 01:12:10.000, Activity02
2020-08-18 02:03:53.000, Activity02
2019-10-29 12:25:55.000, Activity01
2019-12-24 18:11:11.000, Activity03
2019-04-02 03:33:09.000, Activity01
Expected Output (the output does not reflect the data shown above for I would need too many lines in the sample set to be shown in this post)
As I said above, the bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.
Activity, Activity90days, Activity60days, Activity30days
Activity01, 3, 0, 1
Activity02, 1, 10, 2
Activity03, 5, 1, 3
Thank you for any suggestion.
SQL Server doesn't yet have the option to range over values of the window frame of an analytic function. Since you've generated all possible dates though and you've already got the counts by date, it's very easy to look back a specific number of (aggregated) rows to get the right totals. Here is my suggested expression for 90 days:
sum(count(LogDate)) over (
partition by Activity order by [dates]
with rows between 89 preceding and current row
)

How to get count of average calls every 5 minutes using datetime sql

I am running the following query
select DateTime
from Calls
where DateTime > '17 Oct 2018 00:00:00.000' and
DialedNumberID = '1234'
What would this give me is a list of all the times that this number was dialled on the specific date.
Essentially what I am looking for is a query that would give me the average calls that take place every X minutes and would like to run the query for the whole year.
Thanks
I guess you have a table named Calls with the columns DateTime and DialedNumberID.
You can summarize the information in that table year-by-year using the kind of pattern.
SELECT YEAR(`DateTime`),
DialedNumberID,
COUNT(*) call_count
FROM Calls
GROUP BY YEAR(`DateTime`), DialedNumberID
The trick in this pattern is to GROUP BY f(date) . The function f() reduces any date to the year in which it occures.
Summary by five minute intervals, you need f(date) that reduces datestamps to five minute intervals. That function is a good deal more complex than YEAR().
DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') + INTERVAL (MINUTE(datestamp) - MINUTE(datestamp) MOD 5)
Given, for example, 2001-09-11 08:43:00, this gives back 2001-09-11 08:40:00.
So, here's your summary by five minute intervals.
SELECT DATE_FORMAT(`DateTime`,'%Y-%m-%d %H:00')+INTERVAL(MINUTE(`DateTime`)-MINUTE(datestamp) MOD 5) interval_beginning,
DialedNumberID,
COUNT(*) call_count
FROM Calls
GROUP BY DATE_FORMAT(`DateTime`,'%Y-%m-%d %H:00')+INTERVAL(MINUTE(`DateTime`)-MINUTE(datestamp) MOD 5),
DialedNumberID
You can make this query clearer and less repetitive by defining a stored function for that ugly DATE_FORMAT() expression. But that's a topic for another day.
Finally, append
WHERE YEAR(`DateTime`) = YEAR(NOW())
AND DialedNumberID = '1234'
to the query to filter by the current year and a particular id.
This query will need work to make it as efficient as possible. That too is a topic for another day.
Pro tip: DATETIME is a reserved word in MySQL. Column names are generally case-insensitive. Avoid naming your columns, in this case DateTime, the same as a reserved word.
The average amount of calls per interval is the number of calls (COUNT(*)) divided by the minutes between the start and end of of the monitored period (TIMESTAMPDIFF(minute, period_start, period_end)) multiplied with the number of minutes in the desired interval (five in your example).
For MySQL:
select count(*) / timestampdiff(minute, date '2018-01-01', now()) * 5 as avg_calls
from calls
where `datetime` >= date '2018-01-01'
and dialednumberid = 1234;
For SQL Server:
select count(*) * 1.0 / datediff(minute, '20180101', getdate()) * 5 as avg_calls
from calls
where [datetime] >= '20180101'
and dialednumberid = 1234;
This forces the call time into 5 minute intervals. Use 'count' and 'group by' on these intervals. Using DateTime as a column name is confusing
SELECT DATEADD(MINUTE, CAST(DATEPART(MINUTE, [DateTime] AS INTEGER)%5 * - 1,CAST(FORMAT([DateTime], 'MM/dd/yyyy hh:mm') AS DATETIME)) AS CallInterval, COUNT(*)
FROM Calls
GROUP BY DATEADD(MINUTE, CAST(DATEPART(MINUTE, [DateTime]) AS INTEGER)%5 * - 1,CAST(FORMAT([DateTime], 'MM/dd/yyyy hh:mm') AS DATETIME))

calculating average with grouping based on time intervals

In a postgres table I have store the speed of an object with a 10 seconds interval. The values are not available for every 10 seconds during the day; so it could be that there is no line for today 16:39:40
How would the query look like to get an relation containing the average of the speed for 1 minute (or 30sec or n-sec) intervals for a given day, assuming the non-existing rows mean a speed of 0.
speed_table
id (int, pk)
ts (timestamp)
speed (numeric)
I've built this query but am getting stuck on some important parts:
SELECT
date_trunc('minute', ts) AS truncated,
avg(speed)
FROM speed_table AS t
WHERE ts >= '2014-06-21 00:00:00'
AND ts <= '2014-06-21 23:59:59'
AND condition2 = 'something'
GROUP BY date_trunc('minute', ts)
ORDER BY truncated
How can I alter the interval in something other then the result of the date_trunc function eg 5 minutes of 30 seconds?
How can I add the not available rows for the remaining of the day?
Simple and fast solution for this particular example:
SELECT date_trunc('minute', ts) AS minute
, sum(speed)/6 AS avg_speed
FROM speed_table AS t
WHERE ts >= '2014-06-21 0:0'
AND ts < '2014-06-20 0:0' -- exclude dangling corner case
AND condition2 = 'something'
GROUP BY 1
ORDER BY 1;
You need to factor in missing rows as "0 speed". Since a minute has 6 samples, just sum and divide by 6. Missing rows evaluate to 0 implicitly.
This returns no row for minutes with no rows at all.avg_speed for missing result rows is 0.
General query for arbitrary intervals
Works for all any interval listed in the manual for date_trunc():
SELECT date_trunc('minute', g.ts) AS ts_start
, avg(COALESCE(speed, 0)) AS avg_speed
FROM (SELECT generate_series('2014-06-21 0:0'::timestamp
, '2014-06-22 0:0'::timestamp
, '10 sec'::interval) AS ts) g
LEFT JOIN speed_table t USING (ts)
WHERE (t.condition2 = 'something' OR
t.condition2 IS NULL) -- depends on actual condition!
AND g.ts <> '2014-06-22 0:0'::timestamp -- exclude dangling corner case
GROUP BY 1
ORDER BY 1;
The problematic part is the additional unknown condition. You would need to define that. And decide whether missing rows supplied by generate_series should pass the test or not (which can be tricky!).
I let them pass in my example (and all other rows with a NULL values).
Compare:
PostgreSQL: running count of rows for a query 'by minute'
Arbitrary intervals:
Truncate timestamp to arbitrary intervals
For completely arbitrary intervals consider #Clodoaldo's math based on epoch values or use the often overlooked function width_bucket(). Example:
Aggregating (x,y) coordinate point clouds in PostgreSQL
Aggregating (x,y) coordinate point clouds in PostgreSQL
If you had issued some data it would be possible to test so this can contain errors. Point them including the error message so I can fix.
select
to_timestamp(
(extract(epoch from ts)::integer / (60 * 2)) * (60 * 2)
) as truncated,
avg(coalesce(speed, 0)) as avg_speed
from
generate_series (
'2014-06-21 00:00:00'::timestamp,
'2014-06-22'::timestamp - interval '1 second',
'10 seconds'
) ts (ts)
left join
speed_table t on ts.ts = t.ts and condition2 = 'something'
group by 1
order by 1
The example is grouped by 30 seconds. It is number of seconds since 1970-01-01 00:00:00 (epoch) divided by 120. When you want to group by 5 minutes divide it by 12 (60 / 5).
The generate_series in the example is generating timestamps at 1 second interval. It is left outer joined to the speed table so it fills the gaps. When the speed is null then coalesce returns 0.

SQL Query Count and Math Operations

I have a unique query request.
I run this query:
select * from documentationissues
where dateAdded is not null
and dateAdded >= '2013-10-09 10:37:15.483'
This will return me however many rows have been inserted since the dateAdded clause. What I am trying to do is do all of my math in the query as well.
I need to figure out how many minutes have passed since the dateAdded clause.
I need to get a count of how many rows that are returned.
I then need to figure out on average how many rows are being done on average per minute and then per hour.
And then say if there were 6,000,000 files to be done. How many days it would take to process all of the files at the average day rate.
If I ran the query right now it returned 2100 results as of today at 10:56:15 am.
So that would be 19 minutes have passed which is about 110 rows per minute and about 6600 per hour.
I'm not sure how to do all of the math in the select statement with grouping etc.
Here is another option that also includes all of the fields you were asking for:
SELECT M.RowsReturned, M.MinutesPassed,
M.RowsReturned / M.MinutesPassed AS AvgPerMinute,
M.RowsReturned / M.MinutesPassed * 60 AS AvgPerHour,
6000000 / M.RowsReturned / M.MinutesPassed / 1440 AS DaysToProcess
FROM (
SELECT COUNT(*) AS RowsReturned,
DATEDIFF(minute, '2013-10-09 10:37:15.483', CURRENT_TIMESTAMP) AS MinutesPassed
FROM documentationissues
WHERE dateAdded is NOT NULL
AND dateAdded >= '2013-10-09 10:37:15.483'
) AS M
Try this:
SELECT COUNT(*)/DATEDIFF(minute, '2013-10-09 10:37:15.483', GETDATE()) AS AvgPerMin,
COUNT(*)/DATEDIFF(minute, '2013-10-09 10:37:15.483', GETDATE()) * 60 AS AvgPerHr
from documentationissues
where dateAdded is not null
and dateAdded >= '2013-10-09 10:37:15.483'