Calculating the AVG value per GROUP in the GROUP BY Clause - sql

I'm working on a query in SQL Server 2005 that looks at a table of recorded phone calls, groups them by the hour of the day, and computes the average wait time for each hour in the day.
I have a query that I think works, but I'm having trouble convincing myself it's right.
SELECT
DATEPART(HOUR, CallTime) AS Hour,
(AVG(calls.WaitDuration) / 60) AS WaitingTimesInMinutes
FROM (
SELECT
CallTime,
WaitDuration
FROM Calls
WHERE DATEADD(day, DATEDIFF(Day, 0, CallTime), 0) = DATEADD(day, DATEDIFF(Day, 0, GETDATE()), 0)
AND DATEPART(HOUR, CallTime) BETWEEN 6 AND 18
) AS calls
GROUP BY DATEPART(HOUR, CallTime)
ORDER BY DATEPART(HOUR, CallTime);
To clarify what I think is happening, this query looks at all calls made on the same day as today, and where the hour of the call is between 6 and 18 -- the times are recorded and SELECTed in 24-hour time, so this between hours is to get calls between 6am and 6pm.
Then, the outer query computes the average of the WaitDuration column (and converts seconds to minutes) and then groups each average by the hour.
What I'm uncertain of is this: Are the reported BY HOUR averages only for the calls made in that hour's timeframe? Or does it compute each reported average using all the calls made on the day and between the hours? I know the AVG function has a optional OVER/PARTITION clause, and it's been a while since I used the AVG group function. What I would like is that each result grouped by an hour shows ONLY the average wait time for that specific hour of the day.
Thanks for your time in this.

The grouping happens on the values that get spit out of datepart(hour, ...). You're already filtering on that value so you know they're going to range between 6 and 18. That's all that the grouping is going to see.
Now of course the datepart() function does what you're looking for in that it looks at the clock and gives the hour component of the time. If you want your group to coincide with HH:00:00 to HH:59:59.997 then you're in luck.
I've already noted in comments that you probably meant to filter your range from 6 to 17 and that your query will probably perform better if you change that and compare your raw CallTime value against a static range instead. Your reasoning looks correct to me. And because your reasoning is correct, you don't need the inner query (derived table) at all.
Also if WaitDuration is an integer then you're going to be doing decimal division in your output. You'd need to cast to decimal in that case or change the divisor a decimal value like 60.00.

Yes if you use the AVG function with a GROUP BY only the items in that group are averaged. Just like if you use the COUNT function with a GROUP BY only the items in that group are counted.
You can use windowing functions (OVER/PARTITION) to conceptually perform GROUP BYs on different criteria for a single function.
eg
AVG(zed) OVER (PARTITION BY DATEPART(YEAR, CallTime)) as YEAR_AVG

Are the reported BY HOUR averages only for the calls made in that hour's timeframe?
Yes. The WHERE clause is applied before the grouping and aggregation, so the aggregation will apply to all records that fit the WHERE clause and within each group.

Related

How can I output values for time intervals with no data in QuestDB

I am using QuestDB to get the amount of events we are receiving every 500 milliseconds. Everything works as expected and I can use SAMPLE BY 500T to aggregate in half a second intervals.
However, for the intervals where we don't have any data, we are not getting any rows. I guess this is expected, but it would be good to have some way of getting a row for those intervals just with null or empty values.
Luckily in QuestDB you have the FILL keyword to do exactly that. Take this query running at the public QuestDB demo:
SELECT
timestamp, count()
FROM trades
WHERE timestamp > dateadd('d', -1, now())
SAMPLE BY 500T ALIGN TO CALENDAR;
In this case I am aggregating every 500 milliseconds and getting results only for the intervals where I have data. I am limiting to only the past day. You can run this on the demo site as it is a live dataset and you should see gaps for some intervals.
Now, by using FILL I can add the rows for the periods with no values
SELECT
timestamp, count()
FROM trades
WHERE timestamp > dateadd('d', -1, now())
SAMPLE BY 500T FILL(NULL) ALIGN TO CALENDAR;
Note that you could also fill with LINEAR (linear interpolation of previous and next rows), PREV for the value of the row before, or with a constant value.

Why is the result of datediff year in Firebird too high?

I have question about function datediff in firebird. When I try to diff two dates like 15.12.1999 and 30.06.2000 in sql like this
SELECT
SUM(datediff (YEAR, W.FROM, W.TO)),
SUM(datediff (MONTH, W.FROM, W.TO)),
SUM(datediff (DAY, W.FROM, W.TO))
FROM WORKERS W
WHEN W.ID=1
I get in result 1 year, 6 month and 198 days but it is not true with value years (of course result should be 0) How I have to write my query to get correct result in parameter year? In that link https://firebirdsql.org/refdocs/langrefupd21-intfunc-datediff.html in documentation there is information about this case but there is not how to solve this problem.
The documentation is not very clear, but I'm pretty sure that datediff() is counting the number of boundaries between two dates. (This is how the very similar function in SQL Server works.) Hence, for year, it is counting the number of "Dec 31st/Jan 1st" boundaries. This is explicitly explained in the documentation.
If you want a more accurate count, you can use a smaller increment. The following is pretty close:
(datediff(day, w.from, t.to) / 365.25) as years_diff

SQL Average of total days in DATA per month

I have a SQL question.
I am trying to find the average injection volume per month. Currently my code takes the sum of all days of injection, and divides them by the TOTAL DAYS in the month.
Sum(W1."INJECTION_VOLUME" /
EXTRACT(DAY FROM LAST_DAY(W1."INJECTION_DATE"))) AS "AVGINJ"
This is not what I wanted.
I need to take the injection_volume and divide by the total days in the DATA .
ie. right now the data only 8 days of injection volume, lets say it is 3000.
So right now the sql is 3000/31.
I need to have it be 3000/8 (the total days in the data for the current month.)
Also, this should only be for the current month. All other completed months should be divided by the total days in the month.
Use
SELECT
SUM(W1.INJECTION_VOLUME) / COUNT(DISTINCT MyDateField)
FROM MyTable
WHERE X=Value
This gives you what you're after
SUM(W1.INJECTION_VOLUME) is the total volume for the dataset
Gives you the number of days, no matter how many records you have
COUNT(DISTINCT MyDateField)
So if you have 100 records but only 5 actual unique days in this time, this expression gives you 5
Note that this kind of calc is normally worked out with
SUM(A) / SUM(B)
not
SUM(A/B)
They give you completely different answers.
In order to get the average of the data for the current month you will need to divide by the count in the month:
SUM(`W1`.`INJECTION_VOLUME` / COUNT(EXTRACT(YEAR_MONTH FROM `W1`.`INJECTION_DATE`)))
To get all other data as the full month you'll need to combine your code:
SUM(`W1`.`INJECTION_VOLUME` / EXTRACT(DAY FROM LAST_DAY(`W1`.`INJECTION_DATE`)))
With an IF. So something like this:
SUM(
IF(
EXTRACT(YEAR_MONTH FROM `W1`.`INJECTION_DATE`) = EXTRACT(YEAR_MONTH FROM NOW()),
`W1`.`INJECTION_VOLUME` / COUNT(EXTRACT(YEAR_MONTH FROM `W1`.`INJECTION_DATE`)),
`W1`.`INJECTION_VOLUME` / EXTRACT(DAY FROM LAST_DAY(`W1`.`INJECTION_DATE`)
)
)
Note: this is untested and I'm not sure about the RDBMS you are using so you may need to change the code slightly to make it work.

SQL Select statement Where time is *:00

I'm attempting to make a filtered table based off an existing table. The current table has rows for every minute of every hour of 24 days based off of locations (tmcs).
I want to filter this table into another table that has rows for just 1 an hour for each of the 24 days based off the locations (tmcs)
Here is the sql statement that i thought would have done it...
SELECT
Time_Format(t.time, '%H:00') as time, ROUND(AVG(t.avg), 0) as avg,
tmc, Date, Date_Time FROM traffic t
GROUP BY time, tmc, Date
The problem is i still get 247,000 rows effected...and according to simple math I should only have:
Locations (TMCS): 14
Hours in a day: 24
Days tracked: 24
Total = 14 * 24 * 24 = 12,096
My original table has 477,277 rows
When I make a new table off this query i get right around 247,000 which makes no sense, so my query must be wrong.
The reason I did this method instead of a where clause is because I wanted to find the average speed(avg)per hour. This is not mandatory so I'd be fine with using a Where clause for time, but I just don't know how to do this based off *:00
Any help would be much appreciated
Fix the GROUP BY so it's standard, rather then the random MySQL extension
SELECT
Time_Format(t.time, '%H:00') as time,
ROUND(AVG(t.avg), 0) as avg,
tmc, Date, Date_Time
FROM traffic t
GROUP BY
Time_Format(t.time, '%H:00'), tmc, Date, Date_Time
Run this with SET SESSION sql_mode = 'ONLY_FULL_GROUP_BY'; to see the errors that other RDBMS will give you and make MySQL work properly

Query to find a weekly average

I have an SQLite database with the following fields for example:
date (yyyymmdd fomrat)
total (0.00 format)
There is typically 2 months of records in the database. Does anyone know a SQL query to find a weekly average?
I could easily just execute:
SELECT COUNT(1) as total_records, SUM(total) as total FROM stats_adsense
Then just divide total by 7 but unless there is exactly x days that are divisible by 7 in the db I don't think it will be very accurate, especially if there is less than 7 days of records.
To get a daily summary it's obviously just total / total_records.
Can anyone help me out with this?
You could try something like this:
SELECT strftime('%W', thedate) theweek, avg(total) theaverage
FROM table GROUP BY strftime('%W', thedate)
I'm not sure how the syntax would work in SQLite, but one way would be to parse out the date parts of each [date] field, and then specifying which WEEK and DAY boundaries in your WHERE clause and then GROUP by the week. This will give you a true average regardless of whether there are rows or not.
Something like this (using T-SQL):
SELECT DATEPART(w, theDate), Avg(theAmount) as Average
FROM Table
GROUP BY DATEPART(w, theDate)
This will return a row for every week. You could filter it in your WHERE clause to restrict it to a given date range.
Hope this helps.
Your weekly average is
daily * 7
Obviously this doesn't take in to account specific weeks, but you can get that by narrowing the result set in a date range.
You'll have to omit those records in the addition which don't belong to a full week. So, prior to summing up, you'll have to find the min and max of the dates, manipulate them such that they form "whole" weeks, and then run your original query with a WHERE that limits the date values according to the new range. Maybe you can even put all this into one query. I'll leave that up to you. ;-)
Those values which are "truncated" are not used then, obviously. If there's not enough values for a week at all, there's no result at all. But there's no solution to that, apparently.