Not a GROUP BY expression error [duplicate] - sql

This question already has answers here:
ORA-00979 not a group by expression
(10 answers)
Closed 8 years ago.
I'm relatively new to databases. I am using Oracle and I'm trying to implement this query to find the number of personal training sessions the member has had.
The tables are;
MEMBERS
MEMBERS_ID(NUMBER),
MEMBERSHIP_TYPE_CODE(VARCHAR),
ADDRESS_ID(NUMBER), CLUB_ID(NUMBER)
MEMBER_NAME(VARCHAR),
MEMBER_PHONE(VARCHAR),
MEMBER_EMAIL(VARCHAR)
PERSONAL_TRAINING_SESSIONS
SESSION_ID(VARHCAR),
MEMBER_ID (NUMBER),
STAFF_ID(VARCHAR),
SESSION_DATETIME(DATE)
My query is returing this error:
ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
*Cause:
*Action: Error at Line: 1 Column: 8
SELECT MEMBERS.MEMBER_ID,MEMBERS.MEMBER_NAME, COUNT(personal_training_sessions.session_id)
FROM MEMBERS JOIN personal_training_sessions
ON personal_training_sessions.member_id=members.member_id
GROUP BY personal_training_sessions.session_id;
Can anyone point me in the right direction? I have looked around do I need to separate the count query?

The error says it all, you're not grouping by MEMBERS.MEMBER_ID and MEMBERS.MEMBER_NAME.
SELECT MEMBERS.MEMBER_ID, MEMBERS.MEMBER_NAME
, COUNT(personal_training_sessions.session_id)
FROM MEMBERS
JOIN personal_training_sessions
ON personal_training_sessions.member_id = members.member_id
GROUP BY MEMBERS.MEMBER_ID, MEMBERS.MEMBER_NAME
You want the count of personal sessions per member, so you need to group by the member information.
The basic (of course it can get a lot more complex) GROUP BY, SELECT query is:
SELECT <column 1>, <column n>
, <aggregate function 1>, <aggregate function n>
FROM <table_name>
GROUP BY <column 1>, <column n>
An aggregate function being, as Ken White says, something like MIN(), MAX(), COUNT() etc. You GROUP BY all the columns that are not aggregated.
This will only work as intended if your MEMBERS table is unique on MEMBER_ID, but based on your query I suspect it is. To clarify what I mean, if your table is not unique on MEMBER_ID then you're not counting the number of sessions per MEMBER_ID but the number of sessions per MEMBER_ID and per MEMBER_NAME. If they're in a 1:1 relationship then it's effectively the same thing but if you can have multiple MEMBER_NAMEs per MEMBER_ID then it's not.

SELECT MEMBERS.MEMBER_ID,
MEMBERS.MEMBER_NAME,
COUNT(personal_training_sessions.session_id)
FROM MEMBERS JOIN personal_training_sessions
ON personal_training_sessions.member_id=members.member_id
GROUP BY personal_training_sessions.session_id;
You are using a COUNT function, thus the other columns, MEMBER_ID & MEMBER_NAME, must be included in the group by clause.

Related

BigQuery - Given a table with 1623 rows (and all are distinct), how to query to get results with a new average column? (i.e. avoid aggregation)

New to BigQuery and SQL (and Stack Overflow), spent hours and couldn't find a solution on the web and couldn't figure it out myself. Would really appreciate if someone could shed some light:
Data Source from BigQuery: bigquery-public-data.new_york.citibike_stations
Screenshot of the table named "citibike_stations" showing 1623 rows.
The next screenshot shows that the table has a column named "num_bikes_available", which I used in my query.
Screenshot showing that the "citibike_stations" table has a column named "num_bikes_available"
I queried the following:
SELECT
station_id,
num_bikes_available,
AVG(num_bikes_available) AS avg_num_bikes_available
FROM
bigquery-public-data.new_york.citibike_stations;
Error Message: SELECT list expression references column station_id which is neither grouped nor aggregated at [2:3]
So I added a "GROUP BY" clause at the end:
SELECT
station_id,
num_bikes_available,
AVG(num_bikes_available) AS avg_num_bikes_available
FROM
bigquery-public-data.new_york.citibike_stations
GROUP BY
station_id, num_bikes_available;
The result I got is not what I wanted, which is shown in the following screenshot:
Screenshot of the query result, but not the desired result.
Someone else did the following query and was able to get it right (using a subquery):
SELECT
station_id,
num_bikes_available,
(SELECT AVG(num_bikes_available)
FROM bigquery-public-data.new_york.citibike_stations) AS avg_num_bikes_available
FROM
bigquery-public-data.new_york.citibike_stations;
Screenshot of the correct result ↓:
Screenshot of the correct result
Questions:
Why wouldn't it work when the "AVG(num_bikes_available) AS avg_num_bikes_available" is in the "SELECT" statement, as shown in the first set of query?
Why did it work when the "(SELECT AVG(num_bikes_available) FROM bigquery-public-data.new_york.citibike_stations) AS avg_num_bikes_available" is nested in the SELECT statement, as shown in the last set of query? Why does it not ask for aggregation when the "(SELECT AVG(num_bikes_available) FROM bigquery-public-data.new_york.citibike_stations) AS avg_num_bikes_available" is nested within the SELECT statement?
SELECT
station_id,
num_bikes_available,
AVG(num_bikes_available) OVER() AS avg_num_bikes_available
FROM
bigquery-public-data.new_york.citibike_stations;
You are confusing how aggregation works. I think it would be easier if you tried to express the query in a sentence.
What you want is 'A list of all the available stations along with information about the number of bikes in each station and the average number of bikes available in all stations'.
Note that the average is just one value, the total number of available bikes in all stations divided by the number of stations. This is the reason why the independant query actually works for what you want.
The 'group by' indicates which rows of the initial table will create each group/row of the result. In your case you want the average of the whole table, so when the query is expressed properly there are no groups to be defined.
If your table had multiple entries per city for example, you could do the average grouped by city to find the average number of available bikes in the stations of each city. Note that the select would be able to return the city and the average, but not any other attribute, e.g. the station id as they wouldn't make sense within each group.

Very basic SQL script with aggregate [duplicate]

This question already has answers here:
"You tried to execute a query that does not include the specified aggregate function"
(3 answers)
Closed 6 years ago.
select Customers.cust_id, count(Orders.cust_id)
from Customers left outer join Orders
on Customers.cust_id=Orders.cust_id
group by Customers.cust_id
This correctly displays everything.
select Customers.cust_id, ***Customers.cust_name***, count(Orders.cust_id)
from Customers left outer join Orders
on Customers.cust_id=Orders.cust_id
group by Customers.cust_id
,,Your query does not include the specified expression 'cust_name' as port of an aggregate function."
Why is that? Each cust_id in Customers has a name in cust_name. Why do I get this error message?
When you use an aggregate function count() all other fields (that aren't used with an aggregate function) must appear in the Group By clause.
Here is my explanation as to why:
Aggregate functions operate across groups.
(That is, unless no groups or other fields are specified, in which case they operate across the whole recordset by default. For example, SELECT Sum(Salary) FROM Staff works.)
If you group by cust_id then it knows what to output, a count for each cust_id. But what would it do with the cust_name's? Which cust_name would it, or should it, display for each cust_id output? What if there are several cust_name's for a cust_id? It will only display one row for each cust_id, so what name should it display alongside it? It won't make the assumption that there is exactly one cust_name to correspond to one cust_id.
If there is one cust_name per cust_id then grouping by both will produce the same number of rows (as for cust_id alone) and provide consistent, and reliable, behaviour.
select Customers.cust_id, Customers.cust_name, count(Orders.cust_id)
from Customers left outer join Orders
on Customers.cust_id=Orders.cust_id
group by Customers.cust_id, Customers.cust_name

Why PostgreSQL is not accepting while group by on one table and selecting towards another tables [duplicate]

This question already has an answer here:
PGError: ERROR: column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
(1 answer)
Closed 8 years ago.
I am using postgreSQL version
PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-22ubuntu5) 4.7.2, 64-bit,my question is am joining two tables,Let name it as temp1 and temp2 ,here i need to join this two table
Table structure is
marks_map
marks int
stud_id int
student
stud_id int
class_id int
here my query
select class_id,stud_id,count(marks)
from student as s
inner join marks_map as m on (s.stud_id=m.stud_id) group by stud_id
Here i get error as
ERROR: column "s.class_id" must appear in the GROUP BY clause or be used in an aggregate function
Why does this error happen? If I use class_id in group by it's running successfully.
You have to add the class_id attribute to your group by clause because in your select part of the statement there is no aggregation function over this attribue.
In GROUP BY statments you have to add all the attributes over which you haven't aggregated after the GROUP BY clause.
For example:
SELECT
non-aggregating-attr-1, non-aggregating-attr2, non-aggregating-attr3, sum(attr4)
FROM
table
GROUP BY
non-aggregating-attr-1, non-aggregating-attr2, non-aggregating-attr3
That's the way group by work.
You can check your data like
select
array_agg(class_id) as arr_class_id,
stud_id, count(marks)
from student as s
inner join marks_map as m on (s.stud_id=m.stud_id)
group by stud_id
and see how much class_id you have for each group. Sometimes your class_id is dependant from stud_id (you have only one elemnet in array for each group), so you can use dummy aggregate like:
select
max(class_id) as class_id,
stud_id, count(marks)
from student as s
inner join marks_map as m on (s.stud_id=m.stud_id)
group by stud_id
You should be able to understand the problem on a simplified case that doesn't even involve a JOIN.
The query SELECT x,[other columns] GROUP BY x expresses the fact that for every distinct value of x, the [other columns] must be output with only one row for every x.
Now looking at a simplified example where the student table has two entries:
stud_id=1, class_id=1
stud_id=1, class_id=2
And we ask for SELECT stud_id,class_id FROM student GROUP BY class_id.
There is only one distinct value of stud_id, which is 1.
So we're telling the SQL engine, give me one row with stud_id=1 and the value of class_id that comes with it. And the problem is that there is not one, but two such values, 1 and 2. So which one to choose? Instead of choosing randomly, the SQL engine yields an error saying the question is conceptually bogus in the first place, because there's no rule that says each distinct value of stud_id has its own corresponding class_id.
On the other hand, if the non-GROUP'ed output columns are aggregate functions that transform a series of values into just one, like min, max, or count, then they provide the missing rules that say how to get only one value from several. That's why the SQL engine is OK with, for instance: SELECT stud_id,count(class_id) FROM student GROUP BY stud_id;.
Also, when faced with the error column "somecolumn" must appear in the GROUP BY clause, you don't want to just add columns to the GROUP BY until the error goes away, as if it was purely a syntax problem. It's a semantic problem, and each column added to the GROUP BY changes the sense of the question submitted to the SQL engine.
That is, GROUP BY x,y means for each distinct value of the (x,y) couple. It does not mean GROUP BY x, and hey, since it leads to an error, let's throw in the y as well!

ORA-00979: not a GROUP BY expression [duplicate]

This question already has answers here:
ORA-00979 not a group by expression
(10 answers)
Closed 7 years ago.
I am trying show all the different companies for which students work. However only companies where more than four students are employed should be displayed.
This is what I have so far:
SELECT EMPLOYER, COUNT (STUDENT_ID)
FROM STUDENT
GROUP BY STUDENT_ID
HAVING COUNT (STUDENT_ID) >4;
I keep getting this message:
ERROR at line 1:
ORA-00979: not a GROUP BY expression
I don't get it. I also tried this earlier:
SELECT STUDENT.EMPLOYER, COUNT (STUDENT.STUDENT_ID)
FROM STUDENT
GROUP BY STUDENT.STUDENT_ID
HAVING COUNT (STUDENT.STUDENT_ID) >4;
but nothing seems to work. Any help is appreciated. I am on SQL*Plus if that helps.
Try:
SELECT EMPLOYER, COUNT (STUDENT_ID)
FROM STUDENT
GROUP BY EMPLOYER
HAVING COUNT (STUDENT_ID) >4;
- this will return a list of all employers with more than 4 students.
When grouping or including aggregated fields, your select statement should only include fields that are either aggregated or included in the group by clause - in your existing select, you are including EMPLOYER in your select clause, but not grouping by it or aggregating it.

Unable to comprehend why a WHERE clause is being accepted [duplicate]

This question already has answers here:
SQL - HAVING vs. WHERE
(9 answers)
Closed 9 years ago.
I am trying to understand the difference between HAVING and WHERE. I understand that HAVING is used with GROUP BY statements. However, I cannot understand why the following statement is accepted:
select SUM(child_id) from children WHERE child_ID = 5 GROUP BY Child_ID
Shouldn't the correct statement be select SUM(child_id) from children GROUP BY Child_ID HAVING child_ID = 5 ?
WHERE clauses are executed before the grouping process has occurred, and only have access to fields in the input table. HAVING is performed after the grouping pocess occurs, and can filter results based on the value of aggregate values computed in the grouping process.
The WHERE clause can be used even if a HAVING is being used. They mean very different things. The way to think about it is as follows:
The WHERE clause acts as a filter at the record level
Anything that gets through is then put into groups specified by your GROUP BY
Then, the HAVING clause filters out groups, based on aggregate (SUM, COUNT,
MIN, etc.) condition
So, if I have a table : ( STORE_ID, STATE_CODE, SALES)
Select STATE, SUM(SALES)
from MyTable
Where SALES > 100
Group By STATE
Having Sum(Sales) > 1000
This will first filter to read only the Store records with Sales over 100. For each Group (by State) it will sum the Sales of only those stores with Sales of 100 or more. Then, it will drop any State unless the State-level summation is more than 1000. [Note: The state summation excludes any store of sales 100 or less.]