How to take average of two columns row by row in SQL? - sql

I have a table match which looks like this (please see attached image). I wanted to retrieve a dataset that had a column of average values for home_goal and away_goal using this code
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
AVG(m.home_goal + m.away_goal) AS avg_goal
FROM match AS m;
However, I got this error
column "m.country_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 3: m.country_id,
My question is: why was GROUP BY clause required? Why couldn't SQL know how to take average of two columns row by row?
Thank you.

try this:
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal)/2 AS avg_goal
FROM match AS m;
You have been asked for the group_by as avg() much like sum() work on multiple values of one column where you classify all columns that are not a columns wise operation in the group by
You are looking to average two distinct columns - it is a row-wise operations instead of column-wise

how to take average of two columns row by row?
You don't use AVG() for this; it is an aggregate function, that operates over a set of rows. Here, it seems like you just want a simple math computation:
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal) / 2.0 AS avg_goal
FROM match AS m;
Note the decimal denominator (2.0): this avoids integer division in databases that implement it.

Avg in the context of the function mentioned above is calculating the average of the values of the columns and not the average of the two values in the same row. It is an aggregate function and that’s why the group by clause is required.
In order to take the average of two columns in the same row you need to divide by 2.

Let's consider the following table:
CREATE TABLE Numbers([x] int, [y] int, [category] nvarchar(10));
INSERT INTO Numbers ([x], [y], [category])
VALUES
(1, 11, 'odd'),
(2, 22, 'even'),
(3, 33, 'odd'),
(4, 44, 'even');
Here is an example of using two aggregate functions - AVG and SUM - with GROUP BY:
SELECT
Category,
AVG(x) as avg_x,
AVG(x+y) as avg_xy,
SUM(x) as sum_x,
SUM(x+y) as sum_xy
FROM Numbers
GROUP BY Category
The result has two rows:
Category avg_x avg_xy sum_x sum_xy
even 3 36 6 72
odd 2 24 4 48
Please note that Category is available in the SELECT part because the results are GROUP BY'ed by it. If a GROUP BY is not specified then the result would be 1 row and Category is not available (which value should be displayed if we have sums and averages for multiple rows with different caetories?).
What you want is to compute a new column and for this you don't use aggregate functions:
SELECT
(x+y)/2 as avg_xy,
(x+y) as sum_xy
FROM Numbers
This returns all rows:
avg_xy sum_xy
6 12
12 24
18 36
24 48
If your columns are integers don't forget to handle rounding, if needed. For example (CAST(x AS DECIMAL)+y)/2 as avg_xy,

The simple arithmetic calculation:
(m.home_goal + m.away_goal) / 2.0
is not exactly equivalent to AVG(), because NULL values mess it up. Databases that support lateral joins provide a pretty easy (and efficient) way to actually use AVG() within a row.
The safe version looks like:
(coalesce(m.home_goal, 0) + coalesce(m.away_goal, 0)) /
nullif( (case when m.home_goal is not null then 1 else 0 end +
case when m.away_goal is not null then 1 else 0 end
), 0
)
Some databases have syntax extensions that allow the expression to be simplified.

Related

Counting number of null values in a row to divide by (unknown) number of columns

I'm using SQL Server 14 and I need to count the number of null values in a row to create a new column where a "% of completeness" for each row will be stored. For example, if 9 out of 10 columns contain values for a given row, the % for that row would be 90%.
I know this can be done via a number of Case expressions, but the thing is, this data will be used for a live dashboard and won't be under my supervision after completion.
I would like for this % to be calculated every time a function (or procedure? not sure what is used in this case) is run and need to know the number of columns that exist in my table in order to count the null values in a row and then divide by the number of columns to find the "% of completeness".
Any help is greatly appreciated!
Thank you
One method uses cross apply to unpivot the columns to rows and count the ratio of non-null values.
Assuming that your table has columns col1 to col4, you would write this as:
select t.*, x.*
from mytable t
cross apply (
select avg(case when col is not null then 1.0 else 0 end) completeness_ratio
from (values (col1), (col2), (col3), (col4)) x(col)
) x

Two different condition for two different colums using case statement in SQL

Given a table of random numbers as follows:
** Person table schema **
Name
Marks1
Marks2
I want to return a table with similar structure and headings, where if the sum of a column is odd, the column shows the maximum value for that column, and when the sum is even, it shows the minimum value by using a case statement.
** output table schema **
Marks1
Marks2
I've tried the following code.
select Marks1,Marks2 ,
(case
when mod(sum(Marks1),2)=0 then
min(Marks1)
else max(Marks1)
end) as Marks1 ,
(case
when mod(sum(Marks2),2)=0 then
min(Marks2)
else max(Marks2)
end) as Marks2
from numbers
group by Marks1;
Sample output -
TABLE
Ash 56 45
David 45 35
Output -
56 35
As 56+45 = 101 odd number so output 56(max number). Whereas in marks2 column, 45+35 =80, even number so output 35(min number).
Can anyone tell me what's wrong with it? Thanks in advance.
Use a CTE to get your min(), max(), and sum() values. Then use case to determine what values to display.
Since your problem statement and sample results do not match, I followed your sample results to return max() on an odd sum(). You can switch this by changing the two case statements from 1 to 0.
Working fiddle
with totals as (
select sum(marks1) as marks1sum,
min(marks1) as marks1min,
max(marks1) as marks1max,
sum(marks2) as marks2sum,
min(marks2) as marks2min,
max(marks2) as marks2max
from numbers
)
select case mod(marks1sum, 2)
when 1 then marks1max
else marks1min
end as marks1,
case mod(marks2sum, 2)
when 1 then marks2max
else marks2min
end as marks2
from totals;
You are reusing marks1 and marks2 when aliasing your third and fourth column which is colliding. Try using different name.

Average Distinct Values in a single column in Power Pivot

I have a column in PowerPivot that basically goes:
1
1
2
3
4
3
5
4
If I =AVERAGE([Column]), it's going to average all 8 values in the sample column. I just need the average of the distinct values (i.e., in the example above I want the average of (1,2,3,4,5).
Any thoughts on how to go about doing this? I tried a combination of =(DISTINCT(AVERAGE)) but it gives a formula error.
Thanks!!
Kevin
There must be a cleaner way of doing this but here is one method which uses a measure to get the sum of the values divided by the number of times it appears (to basically give the original value) then uses an iterative function to do it for each unique value.
Apologies for the uninspired measure names:
[m1] = SUM(table1[theValue]) / COUNTROWS(Table1)
[m2] = AVERAGEX(VALUES(Tables1[theValue]), [m1])
Assuming your table is caled table1 and the column is called theValue

How to get three count values from same column using SQL in Access?

I have a table that has an integer column from which I am trying to get a few counts from. Basically I need four separate counts from the same column. The first value I need returned is the count of how many records have an integer value stored in this column between two values such as 213 and 9999, including the min and max values. The other three count values I need returned are just the count of records between different values of this column. I've tried doing queries like...
SELECT (SELECT Count(ID) FROM view1 WHERE ((MyIntColumn BETWEEN 213 AND 9999));)
AS Value1, (SELECT Count(ID) FROM FROM view1 WHERE ((MyIntColumn BETWEEN 500 AND 600));) AS Value2 FROM view1;
So there are for example, ten records with this column value between 213 and 9999. The result returned from this query gives me 10, but it gives me the same value of 10, 618 times which is the number of total records in the table. How would it be possible for me to only have it return one record of 10 instead?
Use the Iif() function instead of CASE WHEN
select Condition1: iif( ), condition2: iif( ), etc
P.S. : What I used to do when working with Access was have the iif() resolve to 1 or 0 and then do a SUM() to get the counts. Roundabout but it worked better with aggregation since it avoided nulls.
SELECT
COUNT(CASE
WHEN MyIntColumn >= 213 AND MyIntColumn <= 9999
THEN MyIntColumn
ELSE NULL
END) AS FirstValue
, ??? AS SecondValue
, ??? AS ThirdValue
, ??? AS FourthValue
FROM Table
This doesn't need nesting or CTE or anything. Just define via CASE your condition within COUNTs argument.
I dont really understand what You want in the second, third an fourth column. Sounds to me, its very similar to the first one.
Reformatted, your query looks like:
SELECT (
SELECT Count(ID)
FROM view1
WHERE MyIntColumn BETWEEN 213 AND 9999
) AS Value1
FROM view1;
So you are selecting a subquery expression that is not related to the outer query. For each row in view1, you calculate the number of rows in view1.
Instead, try to do the calculation once. You just have to remove your outer query:
SELECT Count(ID)
FROM view1
WHERE MyIntColumn BETWEEN 213 AND 9999;
OLEDB Connection in MS Access does not support key words CASE and WHEN .
You can only use iif() function to count two, three.. values in same columns
SELECT Attendance.StudentName, Count(IIf([Attendance]![Yes_No]='Yes',1,Null)) AS Yes, Count(IIf([Attendance]![Yes_No]='No',1,Null)) AS [No], Count(IIf([Attendance]![Yes_No]='Not',1,Null)) AS [Not], Count(IIf([Attendance]![Yes_No],1,Null)) AS Total
FROM Attendance
GROUP BY Attendance.StudentName;

SQL Server SQL Select: How do I select rows where sum of a column is within a specified multiple?

I have a process that needs to select rows from a Table (queued items) each row has a quantity column and I need to select rows where the quantities add to a specific multiple. The mulitple is the order of between around 4, 8, 10 (but could in theory be any multiple. (odd or even)
Any suggestions on how to select rows where the sum of a field is of a specified multiple?
My first thought would be to use some kind of MOD function which I believe in SQL server is the % sign. So the criteria would be something like this
WHERE MyField % 4 = 0 OR MyField % 8 = 0
It might not be that fast so another way might be to make a temp table containing say 100 values of the X times table (where X is the multiple you are looking for) and join on that