Why is a group by clause required when rows are limited in where clause? - sql

fiId is a primary key of Table1. Why does this query return as many rows as there are fiId in table1. The fiId is being limited to 1 row in the where clause. The query performs properly when a group by Table1.fiId is added, surely this should not be needed? Thanks.
SELECT
Table1.fiId,
SUM(CASE Table2.type IN (4,7) THEN Table2.valueToSum ELSE 0 END),
FROM
Table1 INNER JOIN Table3 ON Table1.fiId = Table3.parentId
INNER JOIN Table2 ON Table2.leId = Table3.fiId
WHERE
Table1.fiId = 76813 AND
Table2.insId = 431144

When using aggregate functions in your SELECT such as SUM and COUNT when selecting other columns as well, a GROUP BY including those additional columns is required. While I don't know the exact reason behind this, it definitely helps to put the results in context.
Consider the following query:
SELECT Name, Count(Product) as NumOrders
FROM CustomerOrders
GROUP BY Name
Here, we assume that we will get results like this:
Name NumOrders
------------------
Joe 15
Sally 5
Jim 23
Now, if SQL did not require the GROUP BY, then what would you expect the output to be? My best guess would be something like this:
Name NumOrders
------------------
Joe 43
Sally 43
Jim 43
In that case, while there may in fact be 43 order records in the table, including Name doesn't really provide any useful data. Instead, we just have a bunch of names out of context.
For more on this, see a similar question here: Why do I need to explicitly specify all columns in a SQL "GROUP BY" clause - why not "GROUP BY *"?

Related

Why Is This Column Name Invalid? [duplicate]

This question already has answers here:
GROUP BY / aggregate function confusion in SQL
(5 answers)
Closed 3 years ago.
I got an error -
Column 'Employee.EmpID' is invalid in the select list because it is
not contained in either an aggregate function or the GROUP BY clause.
select loc.LocationID, emp.EmpID
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
This situation fits into the answer given by Bill Karwin.
correction for above, fits into answer by ExactaBox -
select loc.LocationID, count(emp.EmpID) -- not count(*), don't want to count nulls
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
ORIGINAL QUESTION -
For the SQL query -
select *
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by (loc.LocationID)
I don't understand why I get this error. All I want to do is join the tables and then group all the employees in a particular location together.
I think I have a partial explanation for my own question. Tell me if its ok -
To group all employees that work in the same location we have to first mention the LocationID.
Then, we cannot/do not mention each employee ID next to it. Rather, we mention the total number of employees in that location, ie we should SUM() the employees working in that location. Why do we do it the latter way, i am not sure.
So, this explains the "it is not contained in either an aggregate function" part of the error.
What is the explanation for the GROUP BY clause part of the error ?
Suppose I have the following table T:
a b
--------
1 abc
1 def
1 ghi
2 jkl
2 mno
2 pqr
And I do the following query:
SELECT a, b
FROM T
GROUP BY a
The output should have two rows, one row where a=1 and a second row where a=2.
But what should the value of b show on each of these two rows? There are three possibilities in each case, and nothing in the query makes it clear which value to choose for b in each group. It's ambiguous.
This demonstrates the single-value rule, which prohibits the undefined results you get when you run a GROUP BY query, and you include any columns in the select-list that are neither part of the grouping criteria, nor appear in aggregate functions (SUM, MIN, MAX, etc.).
Fixing it might look like this:
SELECT a, MAX(b) AS x
FROM T
GROUP BY a
Now it's clear that you want the following result:
a x
--------
1 ghi
2 pqr
Your query will work in MYSQL if you set to disable ONLY_FULL_GROUP_BY server mode (and by default It is). But in this case, you are using different RDBMS. So to make your query work, add all non-aggregated columns to your GROUP BY clause, eg
SELECT col1, col2, SUM(col3) totalSUM
FROM tableName
GROUP BY col1, col2
Non-Aggregated columns means the column is not pass into aggregated functions like SUM, MAX, COUNT, etc..
Basically, what this error is saying is that if you are going to use the GROUP BY clause, then your result is going to be a relation/table with a row for each group, so in your SELECT statement you can only "select" the column that you are grouping by and use aggregate functions on that column because the other columns will not appear in the resulting table.
"All I want to do is join the tables and then group all the employees
in a particular location together."
It sounds like what you want is for the output of the SQL statement to list every employee in the company, but first all the people in the Anaheim office, then the people in the Buffalo office, then the people in the Cleveland office (A, B, C, get it, obviously I don't know what locations you have).
In that case, lose the GROUP BY statement. All you need is ORDER BY loc.LocationID

Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause [duplicate]

This question already has answers here:
GROUP BY / aggregate function confusion in SQL
(5 answers)
Closed 3 years ago.
I got an error -
Column 'Employee.EmpID' is invalid in the select list because it is
not contained in either an aggregate function or the GROUP BY clause.
select loc.LocationID, emp.EmpID
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
This situation fits into the answer given by Bill Karwin.
correction for above, fits into answer by ExactaBox -
select loc.LocationID, count(emp.EmpID) -- not count(*), don't want to count nulls
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
ORIGINAL QUESTION -
For the SQL query -
select *
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by (loc.LocationID)
I don't understand why I get this error. All I want to do is join the tables and then group all the employees in a particular location together.
I think I have a partial explanation for my own question. Tell me if its ok -
To group all employees that work in the same location we have to first mention the LocationID.
Then, we cannot/do not mention each employee ID next to it. Rather, we mention the total number of employees in that location, ie we should SUM() the employees working in that location. Why do we do it the latter way, i am not sure.
So, this explains the "it is not contained in either an aggregate function" part of the error.
What is the explanation for the GROUP BY clause part of the error ?
Suppose I have the following table T:
a b
--------
1 abc
1 def
1 ghi
2 jkl
2 mno
2 pqr
And I do the following query:
SELECT a, b
FROM T
GROUP BY a
The output should have two rows, one row where a=1 and a second row where a=2.
But what should the value of b show on each of these two rows? There are three possibilities in each case, and nothing in the query makes it clear which value to choose for b in each group. It's ambiguous.
This demonstrates the single-value rule, which prohibits the undefined results you get when you run a GROUP BY query, and you include any columns in the select-list that are neither part of the grouping criteria, nor appear in aggregate functions (SUM, MIN, MAX, etc.).
Fixing it might look like this:
SELECT a, MAX(b) AS x
FROM T
GROUP BY a
Now it's clear that you want the following result:
a x
--------
1 ghi
2 pqr
Your query will work in MYSQL if you set to disable ONLY_FULL_GROUP_BY server mode (and by default It is). But in this case, you are using different RDBMS. So to make your query work, add all non-aggregated columns to your GROUP BY clause, eg
SELECT col1, col2, SUM(col3) totalSUM
FROM tableName
GROUP BY col1, col2
Non-Aggregated columns means the column is not pass into aggregated functions like SUM, MAX, COUNT, etc..
Basically, what this error is saying is that if you are going to use the GROUP BY clause, then your result is going to be a relation/table with a row for each group, so in your SELECT statement you can only "select" the column that you are grouping by and use aggregate functions on that column because the other columns will not appear in the resulting table.
"All I want to do is join the tables and then group all the employees
in a particular location together."
It sounds like what you want is for the output of the SQL statement to list every employee in the company, but first all the people in the Anaheim office, then the people in the Buffalo office, then the people in the Cleveland office (A, B, C, get it, obviously I don't know what locations you have).
In that case, lose the GROUP BY statement. All you need is ORDER BY loc.LocationID

Retrieving alternate attribute values in GROUP BY query?

Let me explain what I mean with that question:
Lets say I have to tables like these:
customers
id customer location
1 Adam UK
2 Pete US
values
id value
1 10
1 7
2 3
2 41
Let's ignore here for a moment that that (and the following query) wouldn't make a lot of sense. It's meant as a simplified example.
Now, if I run this query
SELECT id, customer, value FROM customers INNER JOIN values GROUP BY id
I should get this result (distinct by id)
id customer value
1 Adam 10
2 Pete 3
What I would like to be able to do is get that to use it in a search result list, but for actual displaying of the results I'd like to do something like this:
Customer: Adam
Values: 10, 7
So, basically, while I need to have a result set that's distinct for the ID, I'd still like to somehow save the rows dropped by the GROUP BY to show the values list like above. What is the best way to do this?
Look at http://mysql.com/group_concat - which only will work in MySql.
Better link: http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_group-concat
Technically, the following is not valid SQL even though MySQL allows it:
Select customers.id, customers.customer, values.value
From customers
Inner Join values
On values.id = customers.id
Group By customers.id
The SQL spec requires that every column in the Select clause be referenced in the Group By or in an aggregate function. However, given what you said later in your post, what I think you want is GROUP_CONCAT as first mentioned by Erik (+1) which is a function specific to MySQL:
Select customers.customer, Group_Concat(values.value)
From customers
Inner Join values
On values.id = customers.id
Group By customers.customer

JOIN on another table after GROUP BY and COUNT

I'm trying to make sense of the right way to use JOIN, COUNT(*), and GROUP BY to do a pretty simple query. I've actually gotten it to work (see below) but from what I've read, I'm using an extra GROUP BY that I shouldn't be.
(Note: The problem below isn't my actual problem (which deals with more complicated tables), but I've tried to come up with an analogous problem)
I have two tables:
Table: Person
-------------
key name cityKey
1 Alice 1
2 Bob 2
3 Charles 2
4 David 1
Table: City
-------------
key name
1 Albany
2 Berkeley
3 Chico
I'd like to do a query on the People (with some WHERE clause) that returns
the number of matching people in each city
the key for the city
the name of the city.
If I do
SELECT COUNT(Person.key) AS count, City.key AS cityKey, City.name AS cityName
FROM Person
LEFT JOIN City ON Person.cityKey = City.key
GROUP BY Person.cityKey, City.name
I get the result that I want
count cityKey cityName
2 1 Albany
2 2 Berkeley
However, I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
So what's the right way to do this? I've been trying to google for an answer, but I feel like there's something fundamental that I'm just not getting.
I don't think that it's "wrong" in this case, because you've got a one-to-one relationship between city name and city key. You could rewrite it such that you join to a sub-select to get the count of persons to cities by key, to the city table again for the name, but it's debatable that that'd be better. It's a matter of style and opinion I guess.
select PC.ct, City.key, City.name
from City
join (select count(Person.key) ct, cityKey key from Person group by cityKey) PC
on City.key = PC.key
if my SQL isn't too rusty :-)
...I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
You misunderstand, you got it backwards.
Standard SQL requires you to specify in the GROUP BY all the columns mentioned in the SELECT that are not wrapped in aggregate functions. If you don't want certain columns in the GROUP BY, wrap them in aggregate functions. Depending on the database, you could use the analytic/windowing function OVER...
However, MySQL and SQLite provide the "feature" where you can omit these columns from the group by - which leads to no end of "why doesn't this port from MySQL to fill_in_the_blank database?!" Stackoverflow and numerous other sites & forums.
However, I've read that throwing in
that last part of the GROUP BY clause
(City.name) just to make it work is
wrong.
It's not wrong. You have to understand how the Query Optimizer sees your query. The order in which it is parsed is what requires you to "throw the last part in." The optimizer sees your query in something akin to this order:
the required tables are joined
the composite dataset is filtered through the WHERE clause
the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
they are then filtered again, through the HAVING clause
finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.
The point here is that it's not that the GROUP BY has to name all the columns in the SELECT, but in fact it is the opposite - the SELECT cannot include any columns not already in the GROUP BY.
Your query would only work on MySQL, because you group on Person.cityKey but select city.key. All other databases would require you to use an aggregate like min(city.key), or to add City.key to the group by clause.
Because the combination of city name and city key is unique, the following are equivalent:
select count(person.key), min(city.key), min(city.name)
...
group by person.citykey
Or:
select count(person.key), city.key, city.name
...
group by person.citykey, city.key, city.name
Or:
select count(person.key), city.key, max(city.name)
...
group by city.key
All rows in the group will have the same city name and key, so it doesn't matter if you use the max or min aggregate.
P.S. If you'd like to count only different persons, even if they have multiple rows, try:
count(DISTINCT person.key)
instead of
count(person.key)

Query a list of names from one table that appear in a field in a different table

I want to query a list of names from one table that appear in a field in a different table.
Example:
table1.title>Tiger Woods Cheats, Tiger Woods Crashes, Brad Pitt is Great, Madonna Adopts, Brad Pitt Makes a Movie
table2.names>Tiger Woods, Brad Pitt, Madonna
So those are the two tables and values. I would like to write a query that counts which names from table2.names appear most often in table1.title
Someone suggested using inner join, but I could not get it to work... I appreciate the help!! Thanks.
Use:
SELECT a.names,
COUNT(b.titles) AS num
FROM TABLE_2 a
JOIN TABLE_1 b ON INSTR(b.title, a.names) > 0
GROUP BY a.names
ORDER BY num DESC
See the documentation about INSTR() - checking for a value greater than 0 means the name occurred in the title, otherwise it would be zero.
The AS num is a column alias, which you can reference in the ORDER BY to sort either in ASCending or DESCending order.
You do want to use a join, however your join condition will be a pattern match, not an equality. This assumes you want to look for exact name matches (i.e. the exact value from table2.names being included in the table1.title column somewhere/anywhere within the string and not partial matches):
select t2.names, count(*)
from table2 t2
join table1 t1
on t1.title like('%' + t2.names + '%')
group by t2.names
order by count(*) desc;