I have a SQL question.
I am trying to find the average injection volume per month. Currently my code takes the sum of all days of injection, and divides them by the TOTAL DAYS in the month.
Sum(W1."INJECTION_VOLUME" /
EXTRACT(DAY FROM LAST_DAY(W1."INJECTION_DATE"))) AS "AVGINJ"
This is not what I wanted.
I need to take the injection_volume and divide by the total days in the DATA .
ie. right now the data only 8 days of injection volume, lets say it is 3000.
So right now the sql is 3000/31.
I need to have it be 3000/8 (the total days in the data for the current month.)
Also, this should only be for the current month. All other completed months should be divided by the total days in the month.
Use
SELECT
SUM(W1.INJECTION_VOLUME) / COUNT(DISTINCT MyDateField)
FROM MyTable
WHERE X=Value
This gives you what you're after
SUM(W1.INJECTION_VOLUME) is the total volume for the dataset
Gives you the number of days, no matter how many records you have
COUNT(DISTINCT MyDateField)
So if you have 100 records but only 5 actual unique days in this time, this expression gives you 5
Note that this kind of calc is normally worked out with
SUM(A) / SUM(B)
not
SUM(A/B)
They give you completely different answers.
In order to get the average of the data for the current month you will need to divide by the count in the month:
SUM(`W1`.`INJECTION_VOLUME` / COUNT(EXTRACT(YEAR_MONTH FROM `W1`.`INJECTION_DATE`)))
To get all other data as the full month you'll need to combine your code:
SUM(`W1`.`INJECTION_VOLUME` / EXTRACT(DAY FROM LAST_DAY(`W1`.`INJECTION_DATE`)))
With an IF. So something like this:
SUM(
IF(
EXTRACT(YEAR_MONTH FROM `W1`.`INJECTION_DATE`) = EXTRACT(YEAR_MONTH FROM NOW()),
`W1`.`INJECTION_VOLUME` / COUNT(EXTRACT(YEAR_MONTH FROM `W1`.`INJECTION_DATE`)),
`W1`.`INJECTION_VOLUME` / EXTRACT(DAY FROM LAST_DAY(`W1`.`INJECTION_DATE`)
)
)
Note: this is untested and I'm not sure about the RDBMS you are using so you may need to change the code slightly to make it work.
Related
I want to calculate the difference in percent for the number of visitors in the last 4 weeks (week on week) for a restaurant.
My code lets me group days into weeks and sum the number of visitors into each week, then I used lag and over to try and get the percent difference but it's giving me rubbish for that column.
Here's my code
SELECT
to_char(visit_date, 'IW') AS weeks, SUM(reserve_visitors) AS total_visitors,
((SUM(reserve_visitors)/lag(SUM(reserve_visitors), 1) OVER (ORDER BY to_char(visit_date, 'IW'))) -1) * 100 AS percentage_change
FROM res_visitors
WHERE visit_date BETWEEN '02/01/2017' AND '28/05/2017'
GROUP BY weeks
ORDER BY weeks DESC
LIMIT 4
This is what I get
Does anyone know where the error might be?
Ideally I'd like to have a percentage which shows how much the number of visitors grew/shrank from one week to the next one
Thanks in advance and sorry if it might seem trivial to most of you here, I've gone around trying to figure it out but I just can't seem to find it
Turns out the data type was wrong, I was dividing integers so any value under 0.5 is 0 and over 0.5 is 1.
I multiplied SUM(reserve_visitors) by 1.0 to make it a float and problem solved
I've a question about a rolling distinct count. I'm trying to calculate the latest 30 weeks (210 days) from a specific date (eg. Specific date = 18-02-2019 distinct count from 23-07-2018).
I've found a website/blog where this is explained, https://radacad.com/datesinperiod-vs-datesbetween-dax-time-intelligence-for-power-bi. But in some weird way, my calculation is not working.
My DAX expression:
Aantal mutaties afgelopen 30 weken:=
CALCULATE(
DISTINCTCOUNT(FCT_KlantReis_Mutatie[Mutatie]);
DATESINPERIOD(
FCT_KlantReis_Mutatie[Mutatiedatum];
LASTDATE(FCT_KlantReis_Mutatie[Mutatiedatum]) ;-210;DAY)
)
But in Excel (and PowerBI) I get the following result:
The table is linked to a date dimension. My guess is that it must be posible, but how...
Thanks in advance for the help.
I'm working on a query in SQL Server 2005 that looks at a table of recorded phone calls, groups them by the hour of the day, and computes the average wait time for each hour in the day.
I have a query that I think works, but I'm having trouble convincing myself it's right.
SELECT
DATEPART(HOUR, CallTime) AS Hour,
(AVG(calls.WaitDuration) / 60) AS WaitingTimesInMinutes
FROM (
SELECT
CallTime,
WaitDuration
FROM Calls
WHERE DATEADD(day, DATEDIFF(Day, 0, CallTime), 0) = DATEADD(day, DATEDIFF(Day, 0, GETDATE()), 0)
AND DATEPART(HOUR, CallTime) BETWEEN 6 AND 18
) AS calls
GROUP BY DATEPART(HOUR, CallTime)
ORDER BY DATEPART(HOUR, CallTime);
To clarify what I think is happening, this query looks at all calls made on the same day as today, and where the hour of the call is between 6 and 18 -- the times are recorded and SELECTed in 24-hour time, so this between hours is to get calls between 6am and 6pm.
Then, the outer query computes the average of the WaitDuration column (and converts seconds to minutes) and then groups each average by the hour.
What I'm uncertain of is this: Are the reported BY HOUR averages only for the calls made in that hour's timeframe? Or does it compute each reported average using all the calls made on the day and between the hours? I know the AVG function has a optional OVER/PARTITION clause, and it's been a while since I used the AVG group function. What I would like is that each result grouped by an hour shows ONLY the average wait time for that specific hour of the day.
Thanks for your time in this.
The grouping happens on the values that get spit out of datepart(hour, ...). You're already filtering on that value so you know they're going to range between 6 and 18. That's all that the grouping is going to see.
Now of course the datepart() function does what you're looking for in that it looks at the clock and gives the hour component of the time. If you want your group to coincide with HH:00:00 to HH:59:59.997 then you're in luck.
I've already noted in comments that you probably meant to filter your range from 6 to 17 and that your query will probably perform better if you change that and compare your raw CallTime value against a static range instead. Your reasoning looks correct to me. And because your reasoning is correct, you don't need the inner query (derived table) at all.
Also if WaitDuration is an integer then you're going to be doing decimal division in your output. You'd need to cast to decimal in that case or change the divisor a decimal value like 60.00.
Yes if you use the AVG function with a GROUP BY only the items in that group are averaged. Just like if you use the COUNT function with a GROUP BY only the items in that group are counted.
You can use windowing functions (OVER/PARTITION) to conceptually perform GROUP BYs on different criteria for a single function.
eg
AVG(zed) OVER (PARTITION BY DATEPART(YEAR, CallTime)) as YEAR_AVG
Are the reported BY HOUR averages only for the calls made in that hour's timeframe?
Yes. The WHERE clause is applied before the grouping and aggregation, so the aggregation will apply to all records that fit the WHERE clause and within each group.
I need to write a SQL query in SQL Server, and I hope I can explain what I am after. I have several years of data. Here is a sample of some of the database.
What I am wanting to do is get the total number of Bovine for values New_Zealand per week.
So for week 2013-01-12 the total value would be 36080 (Sum of New Zealand where Animal is Bovine for that week). Repeat this for all weeks in the database.
I currently have this SQL select statement.
SELECT SUM(New_Zealand)
FROM Slaughter_Data
WHERE Animal = 'Bovine'
AND WeekEndingDate BETWEEN '2010-01-04' AND '2015-01-03'
AND New_Zealand IS NOT NULL
GROUP BY DATEPART(wk, WeekEndingDate)
This is wrong because it gives me 52 weeks, with all New_Zealand values summed where Animal = Bovine, and summed for that week across all years. I want it for each year. Can anyone tell me what I need to change?
I hope I have made myself clear and let me know if I haven't so I can clarify.
How about grouping by WeekEndingDate, since it will get your the number for that week in that year.
SELECT SUM(New_Zealand)
FROM Slaughter_Data
WHERE Animal = 'Bovine'
AND WeekEndingDate BETWEEN '2010-01-04' AND '2015-01-03'
AND New_Zealand IS NOT NULL
GROUP BY WeekEndingDate
I have an SQLite database with the following fields for example:
date (yyyymmdd fomrat)
total (0.00 format)
There is typically 2 months of records in the database. Does anyone know a SQL query to find a weekly average?
I could easily just execute:
SELECT COUNT(1) as total_records, SUM(total) as total FROM stats_adsense
Then just divide total by 7 but unless there is exactly x days that are divisible by 7 in the db I don't think it will be very accurate, especially if there is less than 7 days of records.
To get a daily summary it's obviously just total / total_records.
Can anyone help me out with this?
You could try something like this:
SELECT strftime('%W', thedate) theweek, avg(total) theaverage
FROM table GROUP BY strftime('%W', thedate)
I'm not sure how the syntax would work in SQLite, but one way would be to parse out the date parts of each [date] field, and then specifying which WEEK and DAY boundaries in your WHERE clause and then GROUP by the week. This will give you a true average regardless of whether there are rows or not.
Something like this (using T-SQL):
SELECT DATEPART(w, theDate), Avg(theAmount) as Average
FROM Table
GROUP BY DATEPART(w, theDate)
This will return a row for every week. You could filter it in your WHERE clause to restrict it to a given date range.
Hope this helps.
Your weekly average is
daily * 7
Obviously this doesn't take in to account specific weeks, but you can get that by narrowing the result set in a date range.
You'll have to omit those records in the addition which don't belong to a full week. So, prior to summing up, you'll have to find the min and max of the dates, manipulate them such that they form "whole" weeks, and then run your original query with a WHERE that limits the date values according to the new range. Maybe you can even put all this into one query. I'll leave that up to you. ;-)
Those values which are "truncated" are not used then, obviously. If there's not enough values for a week at all, there's no result at all. But there's no solution to that, apparently.