SQL group by, what am I doing wrong? - sql

The situation is as follows:
Find the top 5 Community Areas by average College Enrollment.
The DB is stored as SCHOOLS.
%sql SELECT COLLEGE_ENROLLMENT, COMMUNITY_AREA_NAME FROM SCHOOLS GROUP BY COLLEGE_ENROLLMENT;
I understand that this would give me the college enrollment by community, but I get the error message of this:
(ibm_db_dbi.ProgrammingError) ibm_db_dbi::ProgrammingError: Exception('SQLNumResultCols failed: [IBM][CLI Driver][DB2/LINUXX8664] SQL0119N An expression starting with "COMMUNITY_AREA_NAME" specified in a SELECT clause, HAVING clause, or ORDER BY clause is not specified in the GROUP BY clause or it is in a SELECT clause, HAVING clause, or ORDER BY clause with a column function and no GROUP BY clause is specified. SQLSTATE=42803\r SQLCODE=-119')
Can anyone give me a lead on what I'm doing wrong here?
Thank you!

When using GROUP BY anything you put after the SELECT clause has to be used in the GROUP BY clause or an aggregate function, like SUM(). In your case you would need to place COMMUNITY_AREA_NAME in the GROUP BY clause or remove it from the SELECT clause to get the error to go away. That said, I don't think this query is quite what you want - I would do something like this:
SELECT COMMUNITY_AREA_NAME, SUM(COLLEGE_ENROLLMENT) AS TOTAL_ENROLLED FROM SCHOOLS GROUP BY COMMUNITY_AREA_NAME, ORDER BY TOTAL_ENROLLED DESC;
Explanation:
SUM(COLLEGE_ENROLLMENT): Total up the enrollment of all schools
that are in a single COMMUNITY_AREA_NAME.
AS TOTAL_ENROLLED: Give the result from SUM() a name so we can easily refer to it later in the ORDER BY clause.
ORDER BY TOTAL_ENROLLED DESC: Sort the output by TOTAL_ENROLLED and put the biggest numbers
at the top.

Try the following it should work.
Find the top 5 Community Areas by average College Enrollment.
SELECT
COMMUNITY_AREA_NAME,
AVG(COLLEGE_ENROLLMENT) AS AVG_ENROLL
FROM SCHOOLS
GROUP BY
COMMUNITY_AREA_NAME
ORDER BY
AVG(COLLEGE_ENROLLMENT) DESC
LIMIT 5
;

Related

In DB2 SELECT MOST Frequent occurrence and link with other table

I have two table Census and Crime
From the crime table, i need to find the most frequent occurrence of community_area_number
and linked the crime's community_area_number to table census's community_area_number to get the community_area_name
I am able to do the first step, but i fail at linking to another table. Please advise where have I done wrong. Thanks
%%sql
SELECT COUNT(CR.COMMUNITY_AREA_NUMBER) AS MOST_FREQ, CR.COMMUNITY_AREA_NUMBER, CE.COMMUNITY_AREA_NAME from CRIME AS CR, CENSUS AS CE
WHERE CR.COMMUNITY_AREA_NUMBER = CE.COMMUNITY_AREA_NUMBER
GROUP BY CR.COMMUNITY_AREA_NUMBER
ORDER BY COUNT(CR.COMMUNITY_AREA_NUMBER) DESC LIMIT 1
Expected output
MOST_FREQ ,community_area_number,, COMMUNITY_AREA_NAME
43 25 Uptown
Sample CENSUS
SAMPLE CRIME
You should be writing the query like this:
SELECT COUNT(*) AS MOST_FREQ,
CR.COMMUNITY_AREA_NUMBER, CE.COMMUNITY_AREA_NAME
FROM CRIME CR JOIN
CENSUS CE
ON CR.COMMUNITY_AREA_NUMBER = CE.COMMUNITY_AREA_NUMBER
GROUP BY CR.COMMUNITY_AREA_NUMBER, CE.COMMUNITY_AREA_NAME
ORDER BY COUNT(*) DESC
LIMIT 1;
Note the use of proper, explicit, standard, readable JOIN syntax. Never use commas in the FROM clause.
The relevant change, though, is to include CE.COMMUNITY_AREA_NAME in the GROUP BY. All non-aggregated columns should be in the GROUP BY as a general rule.
Also, COUNT(*) is simpler for counting matches, so this query uses that instead of counting the non-NULL values of a column.
You are using a aggregate function COUNT(CR.COMMUNITY_AREA_NUMBER) AS MOST_FREQ
and all other (non aggregate) return values need to be in the GROUP BY clause.
For your query it means try adding E.COMMUNITY_AREA_NAME to the GROUP BY.

SQL GROUP BY usages

I am doing SQL transformation lesson from Codecademy here. I am not sure why they are using those numbers after GROUP BY clause and what those numbers are doing. Can anyone passed the course be so kind to let me know?
SELECT dep_month,
dep_day_of_week,
dep_date,
COUNT(*) AS flight_count
FROM flights
GROUP BY 1,2,3
The numbers in the GROUP BY clause simply refer to the columns in the SELECT list, from left to right. Hence, your query is identical to the following:
SELECT
dep_month,
dep_day_of_week,
dep_date,
COUNT(*) AS flight_count
FROM flights
GROUP BY
dep_month,
dep_day_of_week,
dep_date
The above query which I wrote is what I would use in practice. The reason for this is that GROUP BY 1,2,3 refers to positions rather than columns. If someone refactors the SELECT later, he runs the risk of breaking your query.
Obviously these are position numbers. So this is a GROUP BY on the first three columns:
GROUP BY 1,2,3
means
GROUP BY dep_month, dep_day_of_week, dep_date
here.
This is not compliant with the SQL standard, because the GROUP BY clause is supposed to be executed before the SELECT clause, so the positions cannot be known. They are only known in the ORDER BY clause, because that occurs after the SELECT clause. Only few DBMS make an exception and allow this positional declaration in GROUP BY. It's bad hence to show this in a tutorial.
It's basically group by column 1, column 2 and column 3 from your select query.

I am getting: "You tried to execute a query that does not include the specified expression 'OrdID' as part of an aggregate function. How do I bypass?

My code is as follows:
SELECT Last, OrderLine.OrdID, OrdDate, SUM(Price*Qty) AS total_price
FROM ((Cus INNER JOIN Orders ON Cus.CID=Orders.CID)
INNER JOIN OrderLine
ON Orders.OrdID=OrderLine.OrdID)
INNER JOIN ProdFabric
ON OrderLine.PrID=ProdFabric.PrID
AND OrderLine.Fabric=ProdFabric.Fabric
GROUP BY Last
ORDER BY Last DESC, OrderLine.OrdID DESC;
This code has been answered before, but vaguely. I was wondering where I am going wrong.
You tried to execute a query that does not include the specified expression 'OrdID' as part of an aggregate function.
Is the error message I keep getting, no matter what I change, it gives me this error. Yes I know, it is written as SQL-92, but how do I make this a legal function?
For almost every DBMS (MySQL is the only exception I'm aware of, but there could be others), every column in a SELECT that is not aggregated needs to be in the GROUP BY clause. In the case of your query, that would be everything but the columns in the SUM():
SELECT Last, OrderLine.OrdID, OrdDate, SUM(Price*Qty) AS total_price
...
GROUP BY Last, OrderLine.OrdID, OrdDate
ORDER BY Last DESC, OrderLine.OrdID DESC;
If you have to keep your GROUP BY intact (and not to add non-agreggated fields to the list) then you need to decide which values you will want for OrderLine.OrdID and OrdDate. For example, you may chose to have MAX or MIN of these values.
So it's either as bernie suggested GROUP BY Last, OrderLine.OrdID, OrdDate or something like this (if it makes sense for your business logic):
SELECT Last, MAX(OrderLine.OrdID), MAX(OrdDate), SUM(Price*Qty) AS total_price

pgSQL query error

i tried using this query:
"SELECT * FROM guests WHERE event_id=".$id." GROUP BY member_id;"
and I'm getting this error:
ERROR: column "guests.id" must appear in the GROUP BY clause or be used in an aggregate function
can anyone explain how i can work around this?
You can't Group By without letting the Select know what to take, and how to group.
Try
SELECT guests.member_id FROM guests WHERE event_id=".$id." GROUP BY member_id;
IF you need to get more info from this table about the guests, you'll need to add it to the Group By.
Plus, it seems like your select should actually be
SELECT guests.id FROM guests WHERE event_id=".$id." GROUP BY id;
Each of the columns used in a group by query needs to be specifically called out (ie, don't do SELECT * FROM ...), as you need to use them in some sort of aggregate function (min/max/sum/avg/count/etc) or be part of the group by clause.
For example:
SELECT instrument, detector, min(date_obs), max(date_obs)
FROM observations
WHERE observatory='SOHO'
GROUP BY instrument, detector;

It's possible to have a WHERE clause after a HAVING clause?

Is it possible to use a WHERE clause after a HAVING clause?
The first thing that comes to my mind is sub queries, but I'm not sure.
P.S. If the answer is affirmative, could you give some examples?
No, not in the same query.
The where clause goes before the having and the group by. If you want to filter out records before the grouping the condition goes in the where clause, and if you want to filter out grouped records the condition goes in the having clause:
select ...
from ...
where ...
group by ...
having ...
If neither of those are possible to use for some odd reason, you have to make the query a subquery so that you can put the where clause in the outer query:
select ...
from (
select ...
from ...
where ...
group by ...
having ...
) x
where ...
A HAVING clause is just a WHERE clause after a GROUP BY. Why not put your WHERE conditions in the HAVING clause?
If it's a trick question, it's possible if the WHERE and the HAVING are not at the same level, as you mentionned, with subquery.
I guess something like that would work
HAVING value=(SELECT max(value) FROM
foo WHERE crit=123)
p.s.: why were you asking?
Do you have a specific problem?
p.s.s: OK silly me, I missed the "interview*" tag...
From SELECT help
Processing Order of WHERE, GROUP BY,
and HAVING Clauses The following steps
show the processing order for a SELECT
statement with a WHERE clause, a GROUP
BY clause, and a HAVING clause:
The FROM clause returns an initial
result set.
The WHERE clause excludes rows not
meeting its search condition.
The GROUP BY clause collects the
selected rows into one group for each
unique value in the GROUP BY clause.
Aggregate functions specified in the
select list calculate summary values
for each group.
The HAVING clause additionally
excludes rows not meeting its search
condition.
So, no you can not.
Within the same scope, answer is no. If subqueries is allowed then you can avoid using HAVING entirely.
I think HAVING is an anachronism. Hugh Darwen refers to HAVING as "The Folly of Structured Queries":
In old SQL, the WHERE clause could not
be used on results of aggregation, so
they had to invent HAVING (with same
meaning as WHERE):
SELECT D#, AVG(Salary) AS Avg_Sal
FROM Emp
GROUP
BY D#
HAVING AVG(Salary) > 999;
But would we ever have had HAVING if
in 1979 one could write:
SELECT *
FROM (
SELECT D#, AVG(Sal) AS Avg_Sal
FROM Emp
GROUP
BY D#
)
AS dummy
WHERE Avg_Sal > 999;
I strongly suspect the answer to Darwen's question is no.