What's wrong with the following SQL query? - sql

I have the following SQL query which finds the name and age of the oldest snowboarders.
SELECT c.cname, MAX(c.age)
FROM Customers c
WHERE c.type = 'snowboard';
But the guide I am reading says that query is wrong because the non-aggregate columns in the SELECT clause must come from the attributes in the GROUP BY clause.
I think this query serves its purpose because the aggregate MAX(c.age) corresponds to a single value. It does find the age of the oldest snowboarder.

You need to group by c.cname column. Whenever you do any aggregation on some column like SUM, COUNT, etc. you need to provide another column(s) by which you want to aggregate. You generally provide these columns in your SELECT clause(here c.cname). These same columns should be mentioned in the GROUP BY clause else you will get a syntax error.
The form should be
SELECT A, B, C, SUM(D)
FROM TABLE_NAME
GROUP BY A, B, C;
Your query should be like below
SELECT c.cname, MAX(c.age)
FROM Customers c
WHERE c.type=‘snowboard’
GROUP BY c.cname;

If you want to display the name and age of the oldest snowboarder(s), you have to do this:
SELECT c.cname, c.age
FROM Customers c
WHERE c.type = 'snowboard'
AND c.age = (SELECT MAX(age)
FROM Customers
WHERE type = 'snowboard')

Whenever some other table attributes are selected along with the aggregate function then it has to be accompanied by group by clause, otherwise a situation may occur where database may not know which row has to be selected. Let us say in your example there are two customers who have same maximum age, now database will try to pull out the age from both the rows but it will get confused, which name to pick. Here your group by clause comes into picture, which will instruct to display two different rows with different customer names but same maximum age.
Thus, your query should look like this:
SELECT c.cname, MAX(c.age)
FROM Customers c
WHERE c.type=‘snowboard’
GROUP BY c.cname;

Related

Confused with the Group By function in SQL

Q1: After using the Group By function, why does it only output one row of each group at most? Does this mean that having is supposed to filter the group rather than filter the records in each group?
Q2: I want to find the records in each group whose ages are greater than the average age of that group. I tried the following, but it returns nothing. How should I fix this?
SELECT *, avg(age) FROM Mytable Group By country Having age > avg(age)
Thanks!!!!
You can calculate the average age for each country in a subquery and join that to your table for filtering:
SELECT mt.*, MtAvg.AvgAge
FROM Mytable mt
inner join
(
select mtavgs.country
, avg(mtavgs.age) as AvgAge
from Mytable mtavgs
group by mtavgs.country
) MTAvg
on mtavg.country=mt.country
and mt.Age > mtavg.AvgAge
GROUP BY returns always 1 row per unique combination of values in the GROUP BY columns listed (provided that they are not removed by a HAVING clause). The subquery in our example (alias: MTAvg) will calculate a single row per country. We will use its results for filtering the main table rows by applying the condition in the INNER JOIN clause; we will also report that average by including the calculated average age.
GROUP BY is a keyword that is called an aggregate function. Check this out here for further reading SQL Group By tutorial
What it does is it lumps all the results together into one row. In your example it would lump all the results with the same country together.
Not quite sure what exactly your query needs to be to solve your exact problem. I would however look into what are called window functions in SQL. I believe what you first need to do is write a window function to find the average age in each group. Then you can write a query to return the results you need
Depending on your dbms type and version, you may be able to use a "window function" that will calculate the average per country and with this approach it makes the calculation available on every row. Once that data is present as a "derived table" you can simply use a where clause to filter for the ages that are greater then the calculated average per country.
SELECT mt.*
FROM (
SELECT *
, avg(age) OVER(PARTITION BY country) AS AvgAge
FROM Mytable
) mt
WHERE mt.Age > mt.AvgAge

Why does MAX statement require a Group By?

I understand why the first query needs a GROUP BY, as it doesn't know which date to apply the sum to, but I don't understand why this is the case with the second query. The value that ultimately is the max amount is already contained in the table - it is not calculated like SUM is. thank you
-- First Query
select
sum(OrderSales),OrderDates
From Orders
-- Second Query
select
max(FilmOscarWins),FilmName
From tblFilm
It is not the SUM and MAX that require the GROUP BY, it is the unaggregated column.
If you just write this, you will get a single row, for the maximum value of the FilmOscarWins column across the whole table:
select
max(FilmOscarWins)
From
tblFilm
If the most Oscars any film won was 12, that one row will say 12. But there could be multiple films, all of which won 12 Oscars, so if we ask for the FilmName alongside that 12, there is no single answer.
By adding the Group By, we fundamentally change the query: instead of returning one number for the whole table, it will return one row for each group - which in this case, means one row for each film.
If you do want to get a list of all those films which had the maximum 12 Oscars, you have to do something more complicated, such as using a sub-query to first find that single number (12) and then find all the rows matching it:
select
FilmOscarWins,
FilmName
From
tblFilm
Where FilmOscarWins = (
select
max(FilmOscarWins)
From
tblFilm
)
If you want the film with the most Oscar wins, then use select top:
select top (1) f.*
From tblFilm f
order by FilmOscarWins desc;
In an aggregation query, the select columns need to be consistent with the group by columns -- the unaggregated columns in the select must match the group by.

Is there a way to use group by together with a total sum of values in group by?

Very new to SQL and hopefully someone can help me with following concept:
SELECT name, sum(amount) as balance
FROM table
GROUP BY name
If the 'Name' variable contain A, B and C I will of course by the above statement get the balance per A, B and C.
However if want the total balance of all 'Names' I can remove the Group by clause and 'Name' from SELECT but my question is if this can be solved in one script? Basically, can I in some way add a row under 'Name' called Total which gives tha balance of A, B and C. So result should contain A, B, C and Total under 'Name' with respective balance?
Thanks,
SELECT name, sum(amount) as balance
FROM table
GROUP BY name
UNION ALL
SELECT 'Total', sum(amount) as balance
FROM table
This might be able to help you out
Most databases support this functionality as a GROUP BY modifier. The standard functionality uses GROUPING SETS:
SELECT name, sum(amount) as balance
FROM table
GROUP BY GROUPING SETS ( (name), () );
In this example (with one key), you can also use ROLLUP. The exact syntax varies by database.

SQL QUERY, what can't appear in the having clause?

I have the following relation:
R(A, B, C, D, E)
and the following query
SELECT ...
FROM R
WHERE ...
GROUP BY B, E
HAVING ???
What can't appear in the having clause?: MAX(C), COUNT(A), D,B
I believe all of them work, but I am little bit hesitant about B. MAX(C) works because we can bound the max value of the column c in a group. Same thing for COUNT(A). D also works. It's just an attribute, but it looks weird to bound a member of the GROUP BY clause.
According to standard SQL, when aggregating records (which is what you do with GROUP BY), you can select
the fields you group by. (Of course, if I group by employee and department, i.e. want a result row per employee and department, I can show the two in the result.)
aggregations, such as sums, counts, etc. (If I group by department, I can count the employees or sum up their salaries and show these in the result.)
fields that are functionally dependent on the grouped by columns. (If I group by unique employee number, I can also show the employes name for instance.)
Many DBMS don't support the latter case, however (and force you to redundantly put the employee name in the GROUP BY clause or to clumsily pseudo-aggregate, e.g. MIN(name) though there only is one name per employee number of course).
What can appear in the HAVING clause? All columns and expressions that you can select.
MAX(C). Yes you find the maximum C per B and E.
COUNT(A). Yes. You count all records per B and E where A is not null.
D. Only if D is dependent on B and E. The DBMS will see this from unique constraints on the table, i.e. if B or E or B+E is unique for the table, you can select D. (Provided the DBMS is standard-compliant here.)
B. Yes. That is the B that you group by. Of course you can show it or use it in HAVING.

How does GROUP BY use COUNT(*)

I have this query which finds the number of properties handled by each staff member along with their branch number:
SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount
FROM Staff s, PropertyForRent p
WHERE s.staffNo=p.staffNo
GROUP BY s.branchNo, s.staffNo
The two relations are:
Staff{staffNo, fName, lName, position, sex, DOB, salary, branchNO}
PropertyToRent{propertyNo, street, city, postcode, type, rooms, rent, ownerNo, staffNo, branchNo}
How does SQL know what COUNT(*) is referring to? Why does it count the number of properties and not (say for example), the number of staff per branch?
This is a bit long for a comment.
COUNT(*) is counting the number of rows in each group. It is not specifically counting any particular column. Instead, what is happening is that the join is producing multiple properties, because the properties are what cause multiple rows for given values of s.branchNo and s.staffNo.
It gets even a little more "confusing" if you include a column name. The following would all typically return the same value:
COUNT(*)
COUNT(s.branchNo)
COUNT(s.staffNo)
COUNT(p.propertyNo)
With a column name, COUNT() determines the number of rows that do not have a NULL value in the column.
And finally, you should learn to use proper, explicit join syntax in your queries. Put join conditions in the on clause, not the where clause:
SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount
FROM Staff s JOIN
PropertyForRent p
ON s.staffNo = p.staffNO
GROUP BY s.branchNo, s.staffNo;
GROUP BY clauses partition your result set. These partitions are all the sql engine needs to know - it simply counts their sizes.
Try your query with only count(*) in the select part.
In particular, COUNT(*) does not produce the number of distinct rows/columns in your result set!
Some people might think that count(*) really count all the columns, however the sql optimizer is smarter than that.
COUNT(*) returns the number of rows in a specified table without getting rid of duplicates. Which mean that you can't use Distinct with count(*)
Count(*) will return the cardinality (elements in table) of the specified mapping.
What you have to remember is that when using count over a specific column, null won't be allowed while count(*) will allow null in the rows as it could be any field.
How does SQL know what COUNT(*) is referring to?
I'm pretty sure, however not 100% sure as I can't find in doc, that the sql optimizer simply do a count on the primary key (not null) instead of trying to handle null in rows.