I have the following code (see below) that finds the minimum date of birth of an employee.
I don't understand why does it throw the error:
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
The following code throws the previous error:
SELECT *
FROM [TerpsConsultant.Employee] e1
WHERE EXISTS (
SELECT *
FROM [TerpsConsultant.Employee] e2
WHERE e1.empDateOfBirth = MIN(e2.empDateOfBirth)
)
enter image description here
After searching for similar questions on here, I tried this code and it worked fine:
SELECT *
FROM [TerpsConsultant.Employee] e1
WHERE EXISTS (
SELECT *
FROM [TerpsConsultant.Employee] e2
WHERE e1.empDateOfBirth = (SELECT MIN(e2.empDateOfBirth) FROM [TerpsConsultant.Employee] e2)
)
Would you help me understand why the first version of the code is not working? What's the difference after all?
As others already said in comments, you cannot use aggregate function in where clause.
As specified in MSDN Aggregate Functions you can only use aggregate functions as expressions only in the following situations:
The select list of a SELECT statement (either a subquery or an outer query).
A HAVING clause.
So in your first statement, your code is invalid, because it doesn't meet the requirements for using aggregates. While in second statement, you have a subquery which returns the aggregate, and only then it checks on the where condition.
And as Sergey Kalinichenko answered on this old post.
WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows. Your query calls for a second kind of condition (i.e. a condition on an aggregation) hence HAVING works correctly.
Fields that not used in GROUP BY are not usable in SELECT but they're usable in WHERE. This makes sense since WHEN comes before GROUP BY but shouldn't HAVING has to be able to access "other" columns of the row.
Example
Below is valid.
select fid, count(*)
from class
inner join faculty using (fid)
group by fid
having every(class.room = 'R128')
But can't do this.
select fid, count(*)
from class
inner join faculty using (fid)
group by fid
having class.room = 'R128' // Changed Line
Error message of above snippet:
RROR: column "class.room" must appear in the GROUP BY clause or be used in an aggregate function
LINE 7: having class.room = 'R128'
^
SQL state: 42803
Character: 86
I didn't fall into XY Problem, I want to know why this is impossible (Question is correct with every() later is wrong in semantics too for the question)
having is used to filter the result of the grouping.
However the room column is neither part of an aggregate nor part of the GROUP BY.
every() is an aggregate function an thus it's allowed in the having clause.
You can only use aggregate functions in a HAVING clause and “every” is an aggregate function
SELECT
"employees"."FIRST_NAME",
"employees"."LAST_NAME",
"employees"."SALARY"
FROM
'employees'
WHERE
(("employees"."SALARY" > (SELECT "employees"."SALARY"
FROM 'employees'
WHERE (("employees"."FIRST_NAME" = "Alexander")))))`
The subquery returns 2 values. How can they be compared with "employees"."salary" ? That is, there are 2 employees with first name "Alexander"... Replacing the subquery with a 2 element tuple gives a query that is not accepted from the SQL client... That is this query should be equivalent to the one above but it does not execute correctly:
SELECT
"employees"."FIRST_NAME", "employees"."LAST_NAME",
"employees"."SALARY"
FROM
'employees'
WHERE
(("employees"."SALARY" > (3500, 9000)))
What is going on?
From SQL Language Expressions/11. Subquery Expressions:
A SELECT statement enclosed in parentheses is a subquery. All types of
SELECT statement, including aggregate and compound SELECT queries
(queries with keywords like UNION or EXCEPT) are allowed as scalar
subqueries. The value of a subquery expression is the first row of
the result from the enclosed SELECT statement. The value of a
subquery expression is NULL if the enclosed SELECT statement returns
no rows.
Your query would not run in any other database than SQLite.
But, SQLite as you can see from the documentation, instead of throwing an error like the subquery returns more than 1 rows, allows the subquery by keeping only the 1st of the rows that it returns.
This is one of SQLite's non-standard sql features and in your case it leads to wrong results.
What you would want, I believe, is to compare each employee's salary to the max salary of all employees named 'Alexander'.
You can do this by changing the subquery to:
SELECT MAX(SALARY) FROM employees WHERE FIRST_NAME = 'Alexander'
This is a not correlated scalar subquery, so there is no need for any aliases.
Note assuming that the SQLite tag is correct , i.e. there are many flavours of SQL and that the database manager being is used is therefore important and relevant.
the subquery returns 2 values. How can they be compared with "employees"."salary" ?
You compare multiple values using a function that can take multiple values such as max, which could be what you require.
e.g.
SELECT "employees"."FIRST_NAME", "employees"."LAST_NAME", "employees"."SALARY" FROM 'employees' WHERE (("employees"."SALARY" > max(3500, 9000)))
What is going on?
The first is using a WHERE clause that is a valid expression that is either true or false. The second is misusing values i.e. a list of values is provided where a single value is expected.
First, writing your queries, you should not have to "quote" every part, it gets cluttered and bloated. Also, you can use aliases to help readability. you'll see soon. If you use quotes, use the single quotes around specific values such as a date like > '2022-02-22'.
Now on to your query. Your query is looking for salaries greater than a given person (Alexander), but there are multiple people by that name. To get ONE answer, you might need the MAX() salary for the critiera. So this essentially becomes TWO queries... one relying upon the other.
So, to get you an answer, the outer query is what you will get as the results, the WHERE query is pre-qualifying that one salary you are interested in.
Select
e.first_name,
e.last_name,
e.salary
from
employees e
where
e.salary > ( select max( e2.salary )
from employees e2
where e2.first_name = 'Alexander' )
Notice the where clause is getting whatever the MAX() salary value is from the employee table for the employee 'Alexander'. So now, that ONE value comes back as the basis for the outer query.
Notice the indentation, you can better see how the outer query is reliant on that. Also makes for more readable queries.
I am looking for clarification on this. I am writing two queries below:
We have a table of employee name with columns ID , name , salary
1. Select name from employee
where sum(salary) > 1000 ;
2. Select name from employee
where substring_index(name,' ',1) = 'nishant' ;
Query 1 doesn't work but Query 2 does work. From my development experience, I feel the possible explanation to this is:
The sum() works on a set of values specified in the argument. Here
'salary' column is passed , so it must add up all the values of this
column. But inside where clause, the records are checked one by one ,
like first record 1 is checked for the test and so on. Thus
sum(salary) will not be computed as it needs access to all the column
values and then only it will return a value.
Query 2 works as substring_index() works on a single value and hence here it works on the value supplied to it.
Can you please validate my understanding.
The reason you can't use SUM() in the WHERE clause is the order of evaluation of clauses.
FROM tells you where to read rows from. Right as rows are read from disk to memory, they are checked for the WHERE conditions. (Actually in many cases rows that fail the WHERE clause will not even be read from disk. "Conditions" are formally known as predicates and some predicates are used - by the query execution engine - to decide which rows are read from the base tables. These are called access predicates.) As you can see, the WHERE clause is applied to each row as it is presented to the engine.
On the other hand, aggregation is done only after all rows (that verify all the predicates) have been read.
Think about this: SUM() applies ONLY to the rows that satisfy the WHERE conditions. If you put SUM() in the WHERE clause, you are asking for circular logic. Does a new row pass the WHERE clause? How would I know? If it will pass, then I must include it in the SUM, but if not, it should not be included in the SUM. So how do I even evaluate the SUM condition?
Why can't we use aggregate function in where clause
Aggregate functions work on sets of data. A WHERE clause doesn't have access to entire set, but only to the row that it is currently working on.
You can of course use HAVING clause:
select name from employee
group by name having sum(salary) > 1000;
If you must use WHERE, you can use a subquery:
select name from (
select name, sum(salary) total_salary from employee
group by name
) t where total_salary > 1000;
sum() is an aggregation function. In general, you would expect it to work with group by. Hence, your first query is missing a group by. In a group by query, having is used for filtering after the aggregation:
Select name
from employee
group by name
having sum(salary) > 1000 ;
Using having works since the query goes direct to the rows in that column while where fails since the query keep looping back and forth whenever conditions is not met.
What is the difference between HAVING and WHERE in an SQL SELECT statement?
EDIT: I have marked Steven's answer as the correct one as it contained the key bit of information on the link:
When GROUP BY is not used, HAVING behaves like a WHERE clause
The situation I had seen the WHERE in did not have GROUP BY and is where my confusion started. Of course, until you know this you can't specify it in the question.
HAVING: is used to check conditions after the aggregation takes place.
WHERE: is used to check conditions before the aggregation takes place.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Gives you a table of all cities in MA and the number of addresses in each city.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Having Count(1)>5
Gives you a table of cities in MA with more than 5 addresses and the number of addresses in each city.
HAVING specifies a search condition for a
group or an aggregate function used in SELECT statement.
Source
Number one difference for me: if HAVING was removed from the SQL language then life would go on more or less as before. Certainly, a minority queries would need to be rewritten using a derived table, CTE, etc but they would arguably be easier to understand and maintain as a result. Maybe vendors' optimizer code would need to be rewritten to account for this, again an opportunity for improvement within the industry.
Now consider for a moment removing WHERE from the language. This time the majority of queries in existence would need to be rewritten without an obvious alternative construct. Coders would have to get creative e.g. inner join to a table known to contain exactly one row (e.g. DUAL in Oracle) using the ON clause to simulate the prior WHERE clause. Such constructions would be contrived; it would be obvious there was something was missing from the language and the situation would be worse as a result.
TL;DR we could lose HAVING tomorrow and things would be no worse, possibly better, but the same cannot be said of WHERE.
From the answers here, it seems that many folk don't realize that a HAVING clause may be used without a GROUP BY clause. In this case, the HAVING clause is applied to the entire table expression and requires that only constants appear in the SELECT clause. Typically the HAVING clause will involve aggregates.
This is more useful than it sounds. For example, consider this query to test whether the name column is unique for all values in T:
SELECT 1 AS result
FROM T
HAVING COUNT( DISTINCT name ) = COUNT( name );
There are only two possible results: if the HAVING clause is true then the result with be a single row containing the value 1, otherwise the result will be the empty set.
The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.
Check out this w3schools link for more information
Syntax:
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value
A query such as this:
SELECT column_name, COUNT( column_name ) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
HAVING COUNT( column_name ) >= 3;
...may be rewritten using a derived table (and omitting the HAVING) like this:
SELECT column_name, column_name_tally
FROM (
SELECT column_name, COUNT(column_name) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
) pointless_range_variable_required_here
WHERE column_name_tally >= 3;
The difference between the two is in the relationship to the GROUP BY clause:
WHERE comes before GROUP BY; SQL evaluates the WHERE clause before it groups records.
HAVING comes after GROUP BY; SQL evaluates HAVING after it groups records.
References
SQLite SELECT Statement Syntax/Railroad Diagram
Informix SELECT Statement Syntax/Railroad Diagram
HAVING is used when you are using an aggregate such as GROUP BY.
SELECT edc_country, COUNT(*)
FROM Ed_Centers
GROUP BY edc_country
HAVING COUNT(*) > 1
ORDER BY edc_country;
WHERE is applied as a limitation on the set returned by SQL; it uses SQL's built-in set oeprations and indexes and therefore is the fastest way to filter result sets. Always use WHERE whenever possible.
HAVING is necessary for some aggregate filters. It filters the query AFTER sql has retrieved, assembled, and sorted the results. Therefore, it is much slower than WHERE and should be avoided except in those situations that require it.
SQL Server will let you get away with using HAVING even when WHERE would be much faster. Don't do it.
WHERE clause does not work for aggregate functions
means : you should not use like this
bonus : table name
SELECT name
FROM bonus
GROUP BY name
WHERE sum(salary) > 200
HERE Instead of using WHERE clause you have to use HAVING..
without using GROUP BY clause, HAVING clause just works as WHERE clause
SELECT name
FROM bonus
GROUP BY name
HAVING sum(salary) > 200
Difference b/w WHERE and HAVING clause:
The main difference between WHERE and HAVING clause is, WHERE is used for row operations and HAVING is used for column operations.
Why we need HAVING clause?
As we know, aggregate functions can only be performed on columns, so we can not use aggregate functions in WHERE clause. Therefore, we use aggregate functions in HAVING clause.
One way to think of it is that the having clause is an additional filter to the where clause.
A WHERE clause is used filters records from a result. The filter occurs before any groupings are made. A HAVING clause is used to filter values from a group
In an Aggregate query, (Any query Where an aggregate function is used) Predicates in a where clause are evaluated before the aggregated intermediate result set is generated,
Predicates in a Having clause are applied to the aggregate result set AFTER it has been generated. That's why predicate conditions on aggregate values must be placed in Having clause, not in the Where clause, and why you can use aliases defined in the Select clause in a Having Clause, but not in a Where Clause.
I had a problem and found out another difference between WHERE and HAVING. It does not act the same way on indexed columns.
WHERE my_indexed_row = 123 will show rows and automatically perform a "ORDER ASC" on other indexed rows.
HAVING my_indexed_row = 123 shows everything from the oldest "inserted" row to the newest one, no ordering.
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is used to filter values from a group (i.e., to
check conditions after aggregation into groups has been performed).
Resource from Here
From here.
the SQL standard requires that HAVING
must reference only columns in the
GROUP BY clause or columns used in
aggregate functions
as opposed to the WHERE clause which is applied to database rows
While working on a project, this was also my question. As stated above, the HAVING checks the condition on the query result already found. But WHERE is for checking condition while query runs.
Let me give an example to illustrate this. Suppose you have a database table like this.
usertable{ int userid, date datefield, int dailyincome }
Suppose, the following rows are in table:
1, 2011-05-20, 100
1, 2011-05-21, 50
1, 2011-05-30, 10
2, 2011-05-30, 10
2, 2011-05-20, 20
Now, we want to get the userids and sum(dailyincome) whose sum(dailyincome)>100
If we write:
SELECT userid, sum(dailyincome) FROM usertable WHERE
sum(dailyincome)>100 GROUP BY userid
This will be an error. The correct query would be:
SELECT userid, sum(dailyincome) FROM usertable GROUP BY userid HAVING
sum(dailyincome)>100
WHERE clause is used for comparing values in the base table, whereas the HAVING clause can be used for filtering the results of aggregate functions in the result set of the query
Click here!
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is
used to filter values from a group (i.e., to check conditions after
aggregation into groups has been performed).
I use HAVING for constraining a query based on the results of an aggregate function. E.G. select * in blahblahblah group by SOMETHING having count(SOMETHING)>0