It's possible to have a WHERE clause after a HAVING clause? - sql

Is it possible to use a WHERE clause after a HAVING clause?
The first thing that comes to my mind is sub queries, but I'm not sure.
P.S. If the answer is affirmative, could you give some examples?

No, not in the same query.
The where clause goes before the having and the group by. If you want to filter out records before the grouping the condition goes in the where clause, and if you want to filter out grouped records the condition goes in the having clause:
select ...
from ...
where ...
group by ...
having ...
If neither of those are possible to use for some odd reason, you have to make the query a subquery so that you can put the where clause in the outer query:
select ...
from (
select ...
from ...
where ...
group by ...
having ...
) x
where ...

A HAVING clause is just a WHERE clause after a GROUP BY. Why not put your WHERE conditions in the HAVING clause?

If it's a trick question, it's possible if the WHERE and the HAVING are not at the same level, as you mentionned, with subquery.
I guess something like that would work
HAVING value=(SELECT max(value) FROM
foo WHERE crit=123)
p.s.: why were you asking?
Do you have a specific problem?
p.s.s: OK silly me, I missed the "interview*" tag...

From SELECT help
Processing Order of WHERE, GROUP BY,
and HAVING Clauses The following steps
show the processing order for a SELECT
statement with a WHERE clause, a GROUP
BY clause, and a HAVING clause:
The FROM clause returns an initial
result set.
The WHERE clause excludes rows not
meeting its search condition.
The GROUP BY clause collects the
selected rows into one group for each
unique value in the GROUP BY clause.
Aggregate functions specified in the
select list calculate summary values
for each group.
The HAVING clause additionally
excludes rows not meeting its search
condition.
So, no you can not.

Within the same scope, answer is no. If subqueries is allowed then you can avoid using HAVING entirely.
I think HAVING is an anachronism. Hugh Darwen refers to HAVING as "The Folly of Structured Queries":
In old SQL, the WHERE clause could not
be used on results of aggregation, so
they had to invent HAVING (with same
meaning as WHERE):
SELECT D#, AVG(Salary) AS Avg_Sal
FROM Emp
GROUP
BY D#
HAVING AVG(Salary) > 999;
But would we ever have had HAVING if
in 1979 one could write:
SELECT *
FROM (
SELECT D#, AVG(Sal) AS Avg_Sal
FROM Emp
GROUP
BY D#
)
AS dummy
WHERE Avg_Sal > 999;
I strongly suspect the answer to Darwen's question is no.

Related

Still confusing the rules around selecting columns, group by, and joins

I am still confused by the syntax rules of using GROUP BY. I understand we use GROUP BY when there is some aggregate function. If I have even one aggregate function in a SQL statement, do I need to put all of my selected columns into my GROUP BY statement? I don't have a specific query to ask about but when I try to do joins, I get errors. In particular, when I use a count(*) in a statement and/or a join, I just seem to mess it up.
I use BigQuery at my job. I am regularly floored by strange gaps in knowledge.
Thank you!
This is a little complicated.
First, no aggregation functions are needed in an aggregation query. So this is allowed:
select a
from t
group by a;
This is equivalent, by the way, to:
select distinct a
from t;
If there are aggregation functions, then no group by is needed. So, this is allowed:
select max(a)
from t;
Such an aggregation query -- with no group by -- always returns one row. This is true even if the table is empty or a where clause filters out all the rows. In that case, most aggregation functions return NULL, with the notable exception of count() that returns 0.
Next, if you mix aggregation functions and non-aggregation expressions in the select, then in general you want the non-aggregation, non-constant expressions in the group by. I should note that you can do:
select a, concat(a, 'bcd'), count(*)
from t
group by a;
This should work, but sometimes BigQuery gets confused and will want the expression in the group by.
Finally, the SQL standard supports a query like this:
select t.*, count(*)
from t join
u
using (foo)
group by t.a;
When a is the primary key (or equivalent) in t. However, BigQuery does not have primary keys, so this is not relevant to that database.

SQL GROUP BY usages

I am doing SQL transformation lesson from Codecademy here. I am not sure why they are using those numbers after GROUP BY clause and what those numbers are doing. Can anyone passed the course be so kind to let me know?
SELECT dep_month,
dep_day_of_week,
dep_date,
COUNT(*) AS flight_count
FROM flights
GROUP BY 1,2,3
The numbers in the GROUP BY clause simply refer to the columns in the SELECT list, from left to right. Hence, your query is identical to the following:
SELECT
dep_month,
dep_day_of_week,
dep_date,
COUNT(*) AS flight_count
FROM flights
GROUP BY
dep_month,
dep_day_of_week,
dep_date
The above query which I wrote is what I would use in practice. The reason for this is that GROUP BY 1,2,3 refers to positions rather than columns. If someone refactors the SELECT later, he runs the risk of breaking your query.
Obviously these are position numbers. So this is a GROUP BY on the first three columns:
GROUP BY 1,2,3
means
GROUP BY dep_month, dep_day_of_week, dep_date
here.
This is not compliant with the SQL standard, because the GROUP BY clause is supposed to be executed before the SELECT clause, so the positions cannot be known. They are only known in the ORDER BY clause, because that occurs after the SELECT clause. Only few DBMS make an exception and allow this positional declaration in GROUP BY. It's bad hence to show this in a tutorial.
It's basically group by column 1, column 2 and column 3 from your select query.

Why are aggregate functions not allowed in where clause

I am looking for clarification on this. I am writing two queries below:
We have a table of employee name with columns ID , name , salary
1. Select name from employee
where sum(salary) > 1000 ;
2. Select name from employee
where substring_index(name,' ',1) = 'nishant' ;
Query 1 doesn't work but Query 2 does work. From my development experience, I feel the possible explanation to this is:
The sum() works on a set of values specified in the argument. Here
'salary' column is passed , so it must add up all the values of this
column. But inside where clause, the records are checked one by one ,
like first record 1 is checked for the test and so on. Thus
sum(salary) will not be computed as it needs access to all the column
values and then only it will return a value.
Query 2 works as substring_index() works on a single value and hence here it works on the value supplied to it.
Can you please validate my understanding.
The reason you can't use SUM() in the WHERE clause is the order of evaluation of clauses.
FROM tells you where to read rows from. Right as rows are read from disk to memory, they are checked for the WHERE conditions. (Actually in many cases rows that fail the WHERE clause will not even be read from disk. "Conditions" are formally known as predicates and some predicates are used - by the query execution engine - to decide which rows are read from the base tables. These are called access predicates.) As you can see, the WHERE clause is applied to each row as it is presented to the engine.
On the other hand, aggregation is done only after all rows (that verify all the predicates) have been read.
Think about this: SUM() applies ONLY to the rows that satisfy the WHERE conditions. If you put SUM() in the WHERE clause, you are asking for circular logic. Does a new row pass the WHERE clause? How would I know? If it will pass, then I must include it in the SUM, but if not, it should not be included in the SUM. So how do I even evaluate the SUM condition?
Why can't we use aggregate function in where clause
Aggregate functions work on sets of data. A WHERE clause doesn't have access to entire set, but only to the row that it is currently working on.
You can of course use HAVING clause:
select name from employee
group by name having sum(salary) > 1000;
If you must use WHERE, you can use a subquery:
select name from (
select name, sum(salary) total_salary from employee
group by name
) t where total_salary > 1000;
sum() is an aggregation function. In general, you would expect it to work with group by. Hence, your first query is missing a group by. In a group by query, having is used for filtering after the aggregation:
Select name
from employee
group by name
having sum(salary) > 1000 ;
Using having works since the query goes direct to the rows in that column while where fails since the query keep looping back and forth whenever conditions is not met.

Difference between "HAVING ... GROUP BY" and "GROUP BY ... HAVING"

I have got the table MYTABLE with 2 columns: A and B
I have got the following pieces of the code:
SELECT MYTABLE.A FROM MYTABLE
HAVING SUM(MYTABLE.B) > 100
GROUP BY MYTABLE.A
and
SELECT MYTABLE.A FROM MYTABLE
GROUP BY MYTABLE.A
HAVING SUM(MYTABLE.B) > 100
Is it the same? Is it possible that these 2 codes will return diffrent sets of results?
Thank you in advance
As documented, there is no difference. People are just used to seeing HAVING after GROUP BY.
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_10002.htm#SQLRF20040
Specify GROUP BY and HAVING after the where_clause and hierarchical_query_clause. If you specify both GROUP BY and HAVING, then they can appear in either order.
http://sqlfiddle.com/#!4/66e33/1
I originally wrote:
I am not sure your 1st query is valid. As far as I know, HAVING should always come after GROUP BY.
I was corrected by David Aldridge, the Oracle docs state that the order does not matter. Although I don't recommend using HAVING before GROUP for readability reasons (and to prevent confusion with a WHERE clause), it is technically correct. So that makes the answer to your question 'yes, it's the same'.
You can't have a HAVING before a GROUP BY, the HAVING is like the "WHERE" but for the GROUP BY condition.
The clauses are evaluated in order. You can have a HAVING clause following immediately the FROM clause. In this case, the HAVING clause will apply to the entire rows of the result set. The select list may only contain, in this case, one/more/all of the aggregation functions contained in the HAVING clause.
So, your first query is not valid because of the above. A valid query would be
SELECT SUM(MYTABLE.B) AS s FROM MYTABLE
HAVING SUM(MYTABLE.B) > 100
The above query will return one or no row, depending on whether the condition SUM(MYTABLE.B) > 100 is verified or not.
Still, there is one more reason for which your first query is not valid. The GROUP BY clause may refer only to columns in the data set to which it applies. So going on with my valid query above, you can write the following valid query (though it will be useless and nonsense, as it is applied to either one or no rows):
SELECT SUM(s)
FROM
(
SELECT SUM(MYTABLE.B) s
FROM MYTABLE
HAVING SUM(MYTABLE.B) > 100
) q
GROUP BY s
So, just to answer: no, they're not the same. One of them is not even valid.
both WHERE and HAVING allow for the imposition of conditions in the query. Difference:
We use WHERE for the records returned by select from the table,
We use HAVING for groups returned by group by select query

What is the difference between HAVING and WHERE in SQL?

What is the difference between HAVING and WHERE in an SQL SELECT statement?
EDIT: I have marked Steven's answer as the correct one as it contained the key bit of information on the link:
When GROUP BY is not used, HAVING behaves like a WHERE clause
The situation I had seen the WHERE in did not have GROUP BY and is where my confusion started. Of course, until you know this you can't specify it in the question.
HAVING: is used to check conditions after the aggregation takes place.
WHERE: is used to check conditions before the aggregation takes place.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Gives you a table of all cities in MA and the number of addresses in each city.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Having Count(1)>5
Gives you a table of cities in MA with more than 5 addresses and the number of addresses in each city.
HAVING specifies a search condition for a
group or an aggregate function used in SELECT statement.
Source
Number one difference for me: if HAVING was removed from the SQL language then life would go on more or less as before. Certainly, a minority queries would need to be rewritten using a derived table, CTE, etc but they would arguably be easier to understand and maintain as a result. Maybe vendors' optimizer code would need to be rewritten to account for this, again an opportunity for improvement within the industry.
Now consider for a moment removing WHERE from the language. This time the majority of queries in existence would need to be rewritten without an obvious alternative construct. Coders would have to get creative e.g. inner join to a table known to contain exactly one row (e.g. DUAL in Oracle) using the ON clause to simulate the prior WHERE clause. Such constructions would be contrived; it would be obvious there was something was missing from the language and the situation would be worse as a result.
TL;DR we could lose HAVING tomorrow and things would be no worse, possibly better, but the same cannot be said of WHERE.
From the answers here, it seems that many folk don't realize that a HAVING clause may be used without a GROUP BY clause. In this case, the HAVING clause is applied to the entire table expression and requires that only constants appear in the SELECT clause. Typically the HAVING clause will involve aggregates.
This is more useful than it sounds. For example, consider this query to test whether the name column is unique for all values in T:
SELECT 1 AS result
FROM T
HAVING COUNT( DISTINCT name ) = COUNT( name );
There are only two possible results: if the HAVING clause is true then the result with be a single row containing the value 1, otherwise the result will be the empty set.
The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.
Check out this w3schools link for more information
Syntax:
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value
A query such as this:
SELECT column_name, COUNT( column_name ) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
HAVING COUNT( column_name ) >= 3;
...may be rewritten using a derived table (and omitting the HAVING) like this:
SELECT column_name, column_name_tally
FROM (
SELECT column_name, COUNT(column_name) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
) pointless_range_variable_required_here
WHERE column_name_tally >= 3;
The difference between the two is in the relationship to the GROUP BY clause:
WHERE comes before GROUP BY; SQL evaluates the WHERE clause before it groups records.
HAVING comes after GROUP BY; SQL evaluates HAVING after it groups records.
References
SQLite SELECT Statement Syntax/Railroad Diagram
Informix SELECT Statement Syntax/Railroad Diagram
HAVING is used when you are using an aggregate such as GROUP BY.
SELECT edc_country, COUNT(*)
FROM Ed_Centers
GROUP BY edc_country
HAVING COUNT(*) > 1
ORDER BY edc_country;
WHERE is applied as a limitation on the set returned by SQL; it uses SQL's built-in set oeprations and indexes and therefore is the fastest way to filter result sets. Always use WHERE whenever possible.
HAVING is necessary for some aggregate filters. It filters the query AFTER sql has retrieved, assembled, and sorted the results. Therefore, it is much slower than WHERE and should be avoided except in those situations that require it.
SQL Server will let you get away with using HAVING even when WHERE would be much faster. Don't do it.
WHERE clause does not work for aggregate functions
means : you should not use like this
bonus : table name
SELECT name
FROM bonus
GROUP BY name
WHERE sum(salary) > 200
HERE Instead of using WHERE clause you have to use HAVING..
without using GROUP BY clause, HAVING clause just works as WHERE clause
SELECT name
FROM bonus
GROUP BY name
HAVING sum(salary) > 200
Difference b/w WHERE and HAVING clause:
The main difference between WHERE and HAVING clause is, WHERE is used for row operations and HAVING is used for column operations.
Why we need HAVING clause?
As we know, aggregate functions can only be performed on columns, so we can not use aggregate functions in WHERE clause. Therefore, we use aggregate functions in HAVING clause.
One way to think of it is that the having clause is an additional filter to the where clause.
A WHERE clause is used filters records from a result. The filter occurs before any groupings are made. A HAVING clause is used to filter values from a group
In an Aggregate query, (Any query Where an aggregate function is used) Predicates in a where clause are evaluated before the aggregated intermediate result set is generated,
Predicates in a Having clause are applied to the aggregate result set AFTER it has been generated. That's why predicate conditions on aggregate values must be placed in Having clause, not in the Where clause, and why you can use aliases defined in the Select clause in a Having Clause, but not in a Where Clause.
I had a problem and found out another difference between WHERE and HAVING. It does not act the same way on indexed columns.
WHERE my_indexed_row = 123 will show rows and automatically perform a "ORDER ASC" on other indexed rows.
HAVING my_indexed_row = 123 shows everything from the oldest "inserted" row to the newest one, no ordering.
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is used to filter values from a group (i.e., to
check conditions after aggregation into groups has been performed).
Resource from Here
From here.
the SQL standard requires that HAVING
must reference only columns in the
GROUP BY clause or columns used in
aggregate functions
as opposed to the WHERE clause which is applied to database rows
While working on a project, this was also my question. As stated above, the HAVING checks the condition on the query result already found. But WHERE is for checking condition while query runs.
Let me give an example to illustrate this. Suppose you have a database table like this.
usertable{ int userid, date datefield, int dailyincome }
Suppose, the following rows are in table:
1, 2011-05-20, 100
1, 2011-05-21, 50
1, 2011-05-30, 10
2, 2011-05-30, 10
2, 2011-05-20, 20
Now, we want to get the userids and sum(dailyincome) whose sum(dailyincome)>100
If we write:
SELECT userid, sum(dailyincome) FROM usertable WHERE
sum(dailyincome)>100 GROUP BY userid
This will be an error. The correct query would be:
SELECT userid, sum(dailyincome) FROM usertable GROUP BY userid HAVING
sum(dailyincome)>100
WHERE clause is used for comparing values in the base table, whereas the HAVING clause can be used for filtering the results of aggregate functions in the result set of the query
Click here!
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is
used to filter values from a group (i.e., to check conditions after
aggregation into groups has been performed).
I use HAVING for constraining a query based on the results of an aggregate function. E.G. select * in blahblahblah group by SOMETHING having count(SOMETHING)>0