Why Is This Column Name Invalid? [duplicate] - invalid-object-name

This question already has answers here:
GROUP BY / aggregate function confusion in SQL
(5 answers)
Closed 3 years ago.
I got an error -
Column 'Employee.EmpID' is invalid in the select list because it is
not contained in either an aggregate function or the GROUP BY clause.
select loc.LocationID, emp.EmpID
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
This situation fits into the answer given by Bill Karwin.
correction for above, fits into answer by ExactaBox -
select loc.LocationID, count(emp.EmpID) -- not count(*), don't want to count nulls
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
ORIGINAL QUESTION -
For the SQL query -
select *
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by (loc.LocationID)
I don't understand why I get this error. All I want to do is join the tables and then group all the employees in a particular location together.
I think I have a partial explanation for my own question. Tell me if its ok -
To group all employees that work in the same location we have to first mention the LocationID.
Then, we cannot/do not mention each employee ID next to it. Rather, we mention the total number of employees in that location, ie we should SUM() the employees working in that location. Why do we do it the latter way, i am not sure.
So, this explains the "it is not contained in either an aggregate function" part of the error.
What is the explanation for the GROUP BY clause part of the error ?

Suppose I have the following table T:
a b
--------
1 abc
1 def
1 ghi
2 jkl
2 mno
2 pqr
And I do the following query:
SELECT a, b
FROM T
GROUP BY a
The output should have two rows, one row where a=1 and a second row where a=2.
But what should the value of b show on each of these two rows? There are three possibilities in each case, and nothing in the query makes it clear which value to choose for b in each group. It's ambiguous.
This demonstrates the single-value rule, which prohibits the undefined results you get when you run a GROUP BY query, and you include any columns in the select-list that are neither part of the grouping criteria, nor appear in aggregate functions (SUM, MIN, MAX, etc.).
Fixing it might look like this:
SELECT a, MAX(b) AS x
FROM T
GROUP BY a
Now it's clear that you want the following result:
a x
--------
1 ghi
2 pqr

Your query will work in MYSQL if you set to disable ONLY_FULL_GROUP_BY server mode (and by default It is). But in this case, you are using different RDBMS. So to make your query work, add all non-aggregated columns to your GROUP BY clause, eg
SELECT col1, col2, SUM(col3) totalSUM
FROM tableName
GROUP BY col1, col2
Non-Aggregated columns means the column is not pass into aggregated functions like SUM, MAX, COUNT, etc..

Basically, what this error is saying is that if you are going to use the GROUP BY clause, then your result is going to be a relation/table with a row for each group, so in your SELECT statement you can only "select" the column that you are grouping by and use aggregate functions on that column because the other columns will not appear in the resulting table.

"All I want to do is join the tables and then group all the employees
in a particular location together."
It sounds like what you want is for the output of the SQL statement to list every employee in the company, but first all the people in the Anaheim office, then the people in the Buffalo office, then the people in the Cleveland office (A, B, C, get it, obviously I don't know what locations you have).
In that case, lose the GROUP BY statement. All you need is ORDER BY loc.LocationID

Related

Confused with the Group By function in SQL

Q1: After using the Group By function, why does it only output one row of each group at most? Does this mean that having is supposed to filter the group rather than filter the records in each group?
Q2: I want to find the records in each group whose ages are greater than the average age of that group. I tried the following, but it returns nothing. How should I fix this?
SELECT *, avg(age) FROM Mytable Group By country Having age > avg(age)
Thanks!!!!
You can calculate the average age for each country in a subquery and join that to your table for filtering:
SELECT mt.*, MtAvg.AvgAge
FROM Mytable mt
inner join
(
select mtavgs.country
, avg(mtavgs.age) as AvgAge
from Mytable mtavgs
group by mtavgs.country
) MTAvg
on mtavg.country=mt.country
and mt.Age > mtavg.AvgAge
GROUP BY returns always 1 row per unique combination of values in the GROUP BY columns listed (provided that they are not removed by a HAVING clause). The subquery in our example (alias: MTAvg) will calculate a single row per country. We will use its results for filtering the main table rows by applying the condition in the INNER JOIN clause; we will also report that average by including the calculated average age.
GROUP BY is a keyword that is called an aggregate function. Check this out here for further reading SQL Group By tutorial
What it does is it lumps all the results together into one row. In your example it would lump all the results with the same country together.
Not quite sure what exactly your query needs to be to solve your exact problem. I would however look into what are called window functions in SQL. I believe what you first need to do is write a window function to find the average age in each group. Then you can write a query to return the results you need
Depending on your dbms type and version, you may be able to use a "window function" that will calculate the average per country and with this approach it makes the calculation available on every row. Once that data is present as a "derived table" you can simply use a where clause to filter for the ages that are greater then the calculated average per country.
SELECT mt.*
FROM (
SELECT *
, avg(age) OVER(PARTITION BY country) AS AvgAge
FROM Mytable
) mt
WHERE mt.Age > mt.AvgAge

Why PostgreSQL is not accepting while group by on one table and selecting towards another tables [duplicate]

This question already has an answer here:
PGError: ERROR: column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
(1 answer)
Closed 8 years ago.
I am using postgreSQL version
PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-22ubuntu5) 4.7.2, 64-bit,my question is am joining two tables,Let name it as temp1 and temp2 ,here i need to join this two table
Table structure is
marks_map
marks int
stud_id int
student
stud_id int
class_id int
here my query
select class_id,stud_id,count(marks)
from student as s
inner join marks_map as m on (s.stud_id=m.stud_id) group by stud_id
Here i get error as
ERROR: column "s.class_id" must appear in the GROUP BY clause or be used in an aggregate function
Why does this error happen? If I use class_id in group by it's running successfully.
You have to add the class_id attribute to your group by clause because in your select part of the statement there is no aggregation function over this attribue.
In GROUP BY statments you have to add all the attributes over which you haven't aggregated after the GROUP BY clause.
For example:
SELECT
non-aggregating-attr-1, non-aggregating-attr2, non-aggregating-attr3, sum(attr4)
FROM
table
GROUP BY
non-aggregating-attr-1, non-aggregating-attr2, non-aggregating-attr3
That's the way group by work.
You can check your data like
select
array_agg(class_id) as arr_class_id,
stud_id, count(marks)
from student as s
inner join marks_map as m on (s.stud_id=m.stud_id)
group by stud_id
and see how much class_id you have for each group. Sometimes your class_id is dependant from stud_id (you have only one elemnet in array for each group), so you can use dummy aggregate like:
select
max(class_id) as class_id,
stud_id, count(marks)
from student as s
inner join marks_map as m on (s.stud_id=m.stud_id)
group by stud_id
You should be able to understand the problem on a simplified case that doesn't even involve a JOIN.
The query SELECT x,[other columns] GROUP BY x expresses the fact that for every distinct value of x, the [other columns] must be output with only one row for every x.
Now looking at a simplified example where the student table has two entries:
stud_id=1, class_id=1
stud_id=1, class_id=2
And we ask for SELECT stud_id,class_id FROM student GROUP BY class_id.
There is only one distinct value of stud_id, which is 1.
So we're telling the SQL engine, give me one row with stud_id=1 and the value of class_id that comes with it. And the problem is that there is not one, but two such values, 1 and 2. So which one to choose? Instead of choosing randomly, the SQL engine yields an error saying the question is conceptually bogus in the first place, because there's no rule that says each distinct value of stud_id has its own corresponding class_id.
On the other hand, if the non-GROUP'ed output columns are aggregate functions that transform a series of values into just one, like min, max, or count, then they provide the missing rules that say how to get only one value from several. That's why the SQL engine is OK with, for instance: SELECT stud_id,count(class_id) FROM student GROUP BY stud_id;.
Also, when faced with the error column "somecolumn" must appear in the GROUP BY clause, you don't want to just add columns to the GROUP BY until the error goes away, as if it was purely a syntax problem. It's a semantic problem, and each column added to the GROUP BY changes the sense of the question submitted to the SQL engine.
That is, GROUP BY x,y means for each distinct value of the (x,y) couple. It does not mean GROUP BY x, and hey, since it leads to an error, let's throw in the y as well!

Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause [duplicate]

This question already has answers here:
GROUP BY / aggregate function confusion in SQL
(5 answers)
Closed 3 years ago.
I got an error -
Column 'Employee.EmpID' is invalid in the select list because it is
not contained in either an aggregate function or the GROUP BY clause.
select loc.LocationID, emp.EmpID
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
This situation fits into the answer given by Bill Karwin.
correction for above, fits into answer by ExactaBox -
select loc.LocationID, count(emp.EmpID) -- not count(*), don't want to count nulls
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
ORIGINAL QUESTION -
For the SQL query -
select *
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by (loc.LocationID)
I don't understand why I get this error. All I want to do is join the tables and then group all the employees in a particular location together.
I think I have a partial explanation for my own question. Tell me if its ok -
To group all employees that work in the same location we have to first mention the LocationID.
Then, we cannot/do not mention each employee ID next to it. Rather, we mention the total number of employees in that location, ie we should SUM() the employees working in that location. Why do we do it the latter way, i am not sure.
So, this explains the "it is not contained in either an aggregate function" part of the error.
What is the explanation for the GROUP BY clause part of the error ?
Suppose I have the following table T:
a b
--------
1 abc
1 def
1 ghi
2 jkl
2 mno
2 pqr
And I do the following query:
SELECT a, b
FROM T
GROUP BY a
The output should have two rows, one row where a=1 and a second row where a=2.
But what should the value of b show on each of these two rows? There are three possibilities in each case, and nothing in the query makes it clear which value to choose for b in each group. It's ambiguous.
This demonstrates the single-value rule, which prohibits the undefined results you get when you run a GROUP BY query, and you include any columns in the select-list that are neither part of the grouping criteria, nor appear in aggregate functions (SUM, MIN, MAX, etc.).
Fixing it might look like this:
SELECT a, MAX(b) AS x
FROM T
GROUP BY a
Now it's clear that you want the following result:
a x
--------
1 ghi
2 pqr
Your query will work in MYSQL if you set to disable ONLY_FULL_GROUP_BY server mode (and by default It is). But in this case, you are using different RDBMS. So to make your query work, add all non-aggregated columns to your GROUP BY clause, eg
SELECT col1, col2, SUM(col3) totalSUM
FROM tableName
GROUP BY col1, col2
Non-Aggregated columns means the column is not pass into aggregated functions like SUM, MAX, COUNT, etc..
Basically, what this error is saying is that if you are going to use the GROUP BY clause, then your result is going to be a relation/table with a row for each group, so in your SELECT statement you can only "select" the column that you are grouping by and use aggregate functions on that column because the other columns will not appear in the resulting table.
"All I want to do is join the tables and then group all the employees
in a particular location together."
It sounds like what you want is for the output of the SQL statement to list every employee in the company, but first all the people in the Anaheim office, then the people in the Buffalo office, then the people in the Cleveland office (A, B, C, get it, obviously I don't know what locations you have).
In that case, lose the GROUP BY statement. All you need is ORDER BY loc.LocationID

Select and Group by together

I have my query like this:
Select
a.abc,
a.cde,
a.efg,
a.agh,
c.dummy
p.test
max(b.this)
sum(b.sugar)
sum(b.bucket)
sum(b.something)
followed by some outer join and inner join. Now the problem is when in group by
group by
a.abc,
a.cde,
a.efg,
a.agh,
c.dummy,
p.test
The query works fine. But if I remove any one of them from group by it gives:
SQLSTATE: 42803
Can anyone explain the cause of this error?
Generally, any column that isn't in the group by section can only be included in the select section if it has an aggregating function applied to it. Or, another way, any non-aggregated data in the select section must be grouped on.
Otherewise, how do you know what you want done with it. For example, if you group on a.abc, there can only be one thing that a.abc can be for that grouped row (since all other values of a.abc will come out in a different row). Here's a short example, with a table containing:
LastName FirstName Salary
-------- --------- ------
Smith John 123456
Smith George 111111
Diablo Pax 999999
With the query select LastName, Salary from Employees group by LastName, you would expect to see:
LastName Salary
-------- ------
Smith ??????
Diablo 999999
The salary for the Smiths is incalculable since you don't know what function to apply to it, which is what's causing that error. In other words, the DBMS doesn't know what to do with 123456 and 111111 to get a single value for the grouped row.
If you instead used select LastName, sum(Salary) from Employees group by LastName (or max() or min() or ave() or any other aggregating function), the DBMS would know what to do. For sum(), it will simply add them and give you 234567.
In your query, the equivalent of trying to use Salary without an aggregating function is to change sum(b.this) to just b.this but not include it in the group by section. Or alternatively, remove one of the group by columns without changing it to an aggregation in the select section.
In both cases, you'll have one row that has multiple possible values for the column.
The DB2 docs at publib for sqlstate 42803 describe your problem:
A column reference in the SELECT or HAVING clause is invalid, because it is not a grouping column; or a column reference in the GROUP BY clause is invalid.
SQL will insist that any column in the SELECT section is either included in the GROUP BY section or has an aggregate function applied to it in the SELECT section.
This article gives a nice explanation of why this is the case. The article is sql server specific but the principle should be roughly similar for all RDBMS

JOIN on another table after GROUP BY and COUNT

I'm trying to make sense of the right way to use JOIN, COUNT(*), and GROUP BY to do a pretty simple query. I've actually gotten it to work (see below) but from what I've read, I'm using an extra GROUP BY that I shouldn't be.
(Note: The problem below isn't my actual problem (which deals with more complicated tables), but I've tried to come up with an analogous problem)
I have two tables:
Table: Person
-------------
key name cityKey
1 Alice 1
2 Bob 2
3 Charles 2
4 David 1
Table: City
-------------
key name
1 Albany
2 Berkeley
3 Chico
I'd like to do a query on the People (with some WHERE clause) that returns
the number of matching people in each city
the key for the city
the name of the city.
If I do
SELECT COUNT(Person.key) AS count, City.key AS cityKey, City.name AS cityName
FROM Person
LEFT JOIN City ON Person.cityKey = City.key
GROUP BY Person.cityKey, City.name
I get the result that I want
count cityKey cityName
2 1 Albany
2 2 Berkeley
However, I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
So what's the right way to do this? I've been trying to google for an answer, but I feel like there's something fundamental that I'm just not getting.
I don't think that it's "wrong" in this case, because you've got a one-to-one relationship between city name and city key. You could rewrite it such that you join to a sub-select to get the count of persons to cities by key, to the city table again for the name, but it's debatable that that'd be better. It's a matter of style and opinion I guess.
select PC.ct, City.key, City.name
from City
join (select count(Person.key) ct, cityKey key from Person group by cityKey) PC
on City.key = PC.key
if my SQL isn't too rusty :-)
...I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
You misunderstand, you got it backwards.
Standard SQL requires you to specify in the GROUP BY all the columns mentioned in the SELECT that are not wrapped in aggregate functions. If you don't want certain columns in the GROUP BY, wrap them in aggregate functions. Depending on the database, you could use the analytic/windowing function OVER...
However, MySQL and SQLite provide the "feature" where you can omit these columns from the group by - which leads to no end of "why doesn't this port from MySQL to fill_in_the_blank database?!" Stackoverflow and numerous other sites & forums.
However, I've read that throwing in
that last part of the GROUP BY clause
(City.name) just to make it work is
wrong.
It's not wrong. You have to understand how the Query Optimizer sees your query. The order in which it is parsed is what requires you to "throw the last part in." The optimizer sees your query in something akin to this order:
the required tables are joined
the composite dataset is filtered through the WHERE clause
the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
they are then filtered again, through the HAVING clause
finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.
The point here is that it's not that the GROUP BY has to name all the columns in the SELECT, but in fact it is the opposite - the SELECT cannot include any columns not already in the GROUP BY.
Your query would only work on MySQL, because you group on Person.cityKey but select city.key. All other databases would require you to use an aggregate like min(city.key), or to add City.key to the group by clause.
Because the combination of city name and city key is unique, the following are equivalent:
select count(person.key), min(city.key), min(city.name)
...
group by person.citykey
Or:
select count(person.key), city.key, city.name
...
group by person.citykey, city.key, city.name
Or:
select count(person.key), city.key, max(city.name)
...
group by city.key
All rows in the group will have the same city name and key, so it doesn't matter if you use the max or min aggregate.
P.S. If you'd like to count only different persons, even if they have multiple rows, try:
count(DISTINCT person.key)
instead of
count(person.key)