SQL QUERY, what can't appear in the having clause? - sql

I have the following relation:
R(A, B, C, D, E)
and the following query
SELECT ...
FROM R
WHERE ...
GROUP BY B, E
HAVING ???
What can't appear in the having clause?: MAX(C), COUNT(A), D,B
I believe all of them work, but I am little bit hesitant about B. MAX(C) works because we can bound the max value of the column c in a group. Same thing for COUNT(A). D also works. It's just an attribute, but it looks weird to bound a member of the GROUP BY clause.

According to standard SQL, when aggregating records (which is what you do with GROUP BY), you can select
the fields you group by. (Of course, if I group by employee and department, i.e. want a result row per employee and department, I can show the two in the result.)
aggregations, such as sums, counts, etc. (If I group by department, I can count the employees or sum up their salaries and show these in the result.)
fields that are functionally dependent on the grouped by columns. (If I group by unique employee number, I can also show the employes name for instance.)
Many DBMS don't support the latter case, however (and force you to redundantly put the employee name in the GROUP BY clause or to clumsily pseudo-aggregate, e.g. MIN(name) though there only is one name per employee number of course).
What can appear in the HAVING clause? All columns and expressions that you can select.
MAX(C). Yes you find the maximum C per B and E.
COUNT(A). Yes. You count all records per B and E where A is not null.
D. Only if D is dependent on B and E. The DBMS will see this from unique constraints on the table, i.e. if B or E or B+E is unique for the table, you can select D. (Provided the DBMS is standard-compliant here.)
B. Yes. That is the B that you group by. Of course you can show it or use it in HAVING.

Related

What's wrong with the following SQL query?

I have the following SQL query which finds the name and age of the oldest snowboarders.
SELECT c.cname, MAX(c.age)
FROM Customers c
WHERE c.type = 'snowboard';
But the guide I am reading says that query is wrong because the non-aggregate columns in the SELECT clause must come from the attributes in the GROUP BY clause.
I think this query serves its purpose because the aggregate MAX(c.age) corresponds to a single value. It does find the age of the oldest snowboarder.
You need to group by c.cname column. Whenever you do any aggregation on some column like SUM, COUNT, etc. you need to provide another column(s) by which you want to aggregate. You generally provide these columns in your SELECT clause(here c.cname). These same columns should be mentioned in the GROUP BY clause else you will get a syntax error.
The form should be
SELECT A, B, C, SUM(D)
FROM TABLE_NAME
GROUP BY A, B, C;
Your query should be like below
SELECT c.cname, MAX(c.age)
FROM Customers c
WHERE c.type=‘snowboard’
GROUP BY c.cname;
If you want to display the name and age of the oldest snowboarder(s), you have to do this:
SELECT c.cname, c.age
FROM Customers c
WHERE c.type = 'snowboard'
AND c.age = (SELECT MAX(age)
FROM Customers
WHERE type = 'snowboard')
Whenever some other table attributes are selected along with the aggregate function then it has to be accompanied by group by clause, otherwise a situation may occur where database may not know which row has to be selected. Let us say in your example there are two customers who have same maximum age, now database will try to pull out the age from both the rows but it will get confused, which name to pick. Here your group by clause comes into picture, which will instruct to display two different rows with different customer names but same maximum age.
Thus, your query should look like this:
SELECT c.cname, MAX(c.age)
FROM Customers c
WHERE c.type=‘snowboard’
GROUP BY c.cname;

How does GROUP BY use COUNT(*)

I have this query which finds the number of properties handled by each staff member along with their branch number:
SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount
FROM Staff s, PropertyForRent p
WHERE s.staffNo=p.staffNo
GROUP BY s.branchNo, s.staffNo
The two relations are:
Staff{staffNo, fName, lName, position, sex, DOB, salary, branchNO}
PropertyToRent{propertyNo, street, city, postcode, type, rooms, rent, ownerNo, staffNo, branchNo}
How does SQL know what COUNT(*) is referring to? Why does it count the number of properties and not (say for example), the number of staff per branch?
This is a bit long for a comment.
COUNT(*) is counting the number of rows in each group. It is not specifically counting any particular column. Instead, what is happening is that the join is producing multiple properties, because the properties are what cause multiple rows for given values of s.branchNo and s.staffNo.
It gets even a little more "confusing" if you include a column name. The following would all typically return the same value:
COUNT(*)
COUNT(s.branchNo)
COUNT(s.staffNo)
COUNT(p.propertyNo)
With a column name, COUNT() determines the number of rows that do not have a NULL value in the column.
And finally, you should learn to use proper, explicit join syntax in your queries. Put join conditions in the on clause, not the where clause:
SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount
FROM Staff s JOIN
PropertyForRent p
ON s.staffNo = p.staffNO
GROUP BY s.branchNo, s.staffNo;
GROUP BY clauses partition your result set. These partitions are all the sql engine needs to know - it simply counts their sizes.
Try your query with only count(*) in the select part.
In particular, COUNT(*) does not produce the number of distinct rows/columns in your result set!
Some people might think that count(*) really count all the columns, however the sql optimizer is smarter than that.
COUNT(*) returns the number of rows in a specified table without getting rid of duplicates. Which mean that you can't use Distinct with count(*)
Count(*) will return the cardinality (elements in table) of the specified mapping.
What you have to remember is that when using count over a specific column, null won't be allowed while count(*) will allow null in the rows as it could be any field.
How does SQL know what COUNT(*) is referring to?
I'm pretty sure, however not 100% sure as I can't find in doc, that the sql optimizer simply do a count on the primary key (not null) instead of trying to handle null in rows.

Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause [duplicate]

This question already has answers here:
GROUP BY / aggregate function confusion in SQL
(5 answers)
Closed 3 years ago.
I got an error -
Column 'Employee.EmpID' is invalid in the select list because it is
not contained in either an aggregate function or the GROUP BY clause.
select loc.LocationID, emp.EmpID
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
This situation fits into the answer given by Bill Karwin.
correction for above, fits into answer by ExactaBox -
select loc.LocationID, count(emp.EmpID) -- not count(*), don't want to count nulls
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
ORIGINAL QUESTION -
For the SQL query -
select *
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by (loc.LocationID)
I don't understand why I get this error. All I want to do is join the tables and then group all the employees in a particular location together.
I think I have a partial explanation for my own question. Tell me if its ok -
To group all employees that work in the same location we have to first mention the LocationID.
Then, we cannot/do not mention each employee ID next to it. Rather, we mention the total number of employees in that location, ie we should SUM() the employees working in that location. Why do we do it the latter way, i am not sure.
So, this explains the "it is not contained in either an aggregate function" part of the error.
What is the explanation for the GROUP BY clause part of the error ?
Suppose I have the following table T:
a b
--------
1 abc
1 def
1 ghi
2 jkl
2 mno
2 pqr
And I do the following query:
SELECT a, b
FROM T
GROUP BY a
The output should have two rows, one row where a=1 and a second row where a=2.
But what should the value of b show on each of these two rows? There are three possibilities in each case, and nothing in the query makes it clear which value to choose for b in each group. It's ambiguous.
This demonstrates the single-value rule, which prohibits the undefined results you get when you run a GROUP BY query, and you include any columns in the select-list that are neither part of the grouping criteria, nor appear in aggregate functions (SUM, MIN, MAX, etc.).
Fixing it might look like this:
SELECT a, MAX(b) AS x
FROM T
GROUP BY a
Now it's clear that you want the following result:
a x
--------
1 ghi
2 pqr
Your query will work in MYSQL if you set to disable ONLY_FULL_GROUP_BY server mode (and by default It is). But in this case, you are using different RDBMS. So to make your query work, add all non-aggregated columns to your GROUP BY clause, eg
SELECT col1, col2, SUM(col3) totalSUM
FROM tableName
GROUP BY col1, col2
Non-Aggregated columns means the column is not pass into aggregated functions like SUM, MAX, COUNT, etc..
Basically, what this error is saying is that if you are going to use the GROUP BY clause, then your result is going to be a relation/table with a row for each group, so in your SELECT statement you can only "select" the column that you are grouping by and use aggregate functions on that column because the other columns will not appear in the resulting table.
"All I want to do is join the tables and then group all the employees
in a particular location together."
It sounds like what you want is for the output of the SQL statement to list every employee in the company, but first all the people in the Anaheim office, then the people in the Buffalo office, then the people in the Cleveland office (A, B, C, get it, obviously I don't know what locations you have).
In that case, lose the GROUP BY statement. All you need is ORDER BY loc.LocationID

How does GROUP BY work?

Suppose I have a table Tab1 with attributes - a1, a2, ... etc. None of the attributes are unique.
What will be the nature of the following query? Will it return a single row always?
SELECT a1, a2, sum(a3) FROM Tab1 GROUP BY a1, a2
GROUP BY returns a single row for each unique combination of the GROUP BY fields. So in your example, every distinct combination of (a1, a2) occurring in rows of Tab1 results in a row in the query representing the group of rows with the given combination of group by field values . Aggregate functions like SUM() are computed over the members of each group.
GROUP BY returns one row for each unique combination of fields in the GROUP BY clause. To ensure only one row, you would have to use an aggregate function - COUNT, SUM, MAX - without a GROUP BY clause.
GROUP BY groups all the identical records.
SELECT COUNT(ItemID), City
FROM Orders
GROUP BY City;
----------------------------------------
13 Sacrmento
23 Dallas
87 Los Angeles
5 Phoenix
If you don't group by City it will just display the total count of ItemID.
Analogously, not technically, to keep in mind its logic, it can be thought each grouped field having some rows is put per different table, then the aggregate function carries on the tables individually.
Ben Forta conspicuously states the following saying.
The GROUP BY clause instructs the DBMS to group the data and then
perform the aggregate (function) on each group rather than on the entire result
set.
Aside from the aggregate calculation statements, every column in your SELECT statement must be present in the GROUP BY clause.
The GROUP BY clause must come after any WHERE clause and before any ORDER BY clause.
My understanding reminiscent of his saying is the following.
As is DISTINCT keyword, each field specified through GROUP BY is
thought as grouped and made unique at the end of the day. The
aggregate function is carried out over each group, as happened in
SuL's answer.

COUNT(*) in SQL

I understand how count(*) in SQL when addressing one table but how does it work on inner joins?
e.g.
SELECT branch, staffNo, Count(*)
FROM Staff s, Properties p
WHERE s.staffNo = p.staffNo
GROUP BY s.staffNo, p.staffNo
staff contains staffNo staffName
properties contains property management details (i.e. which staff manages which property)
This returns the number of properties managed by staff, but how does the count work? As in how does it know what to count?
It's an aggregate function - as such it's managed by your group by clause - each row will correspond to a unique grouping (i.e. staffNo) and Count(*) will return the number of records in the join that match that grouping.
So for example:
SELECT branch, grade, Count(*)
FROM Staff s, Properties p
WHERE s.staffNo = p.staffNo
GROUP BY branch, grade
would return the number of staff members of a given grade at each branch.
SELECT branch, Count(*)
FROM Staff s, Properties p
WHERE s.staffNo = p.staffNo
GROUP BY branch
would return the total number of staff members at each branch
SELECT grade, Count(*)
FROM Staff s, Properties p
WHERE s.staffNo = p.staffNo
GROUP BY grade
would return the total number of staff at each grade
The aggregate function (whether it's count(), sum(), avg(), etc.) is computed on the rows in each group: that group is then collapsed/summarized/aggregated to a single row according to the select-list defined in the query.
The conceptual model for the execution of a select query is this:
Compute the cartesian product of all tables references in the FROM clause (as if a full join were being performed.
Apply the join criteria.
Filter according to the criteria defined in the where clause.
Partitition into groups, based on the criteria defined in the group by clause.
Reduce each group to a single row, computing the values of each aggregate function on the rows in that group.
Filter according to the criteria defined in the having clause
Sort according to the criteria defined in the order by clause
This conceptual model omits dealing with any compute or compute...by clauses.
Not this this is not actually how anything but a very naive SQL engine would actually execute a query, but the results should be identical to what you'd [eventually] get if you did it this way.
Your query is invalid.
You have an ambiguous column name staffno.
You are selecting branch but not grouping by it - prepare for a Syntax error (everything but MySQL) or random branches to be selected for you (MySQL).
I think what you want to know, though, is that it will return a count for each "set" of your grouped-by fields, so for each combination of s.staffno, p.staffno how many rows belong in that set.
count (*) simply counts the number of rows in the query or the group by.
In your query, it will print the number of rows by staffNo. (It is redundant to have s.staffNo, p.staffNo; either will suffice).
It counts the number of rows for each distinct StaffNo in the cartesian product.
Also, you should group by Branch, StaffNo.