Difference between "HAVING ... GROUP BY" and "GROUP BY ... HAVING" - sql

I have got the table MYTABLE with 2 columns: A and B
I have got the following pieces of the code:
SELECT MYTABLE.A FROM MYTABLE
HAVING SUM(MYTABLE.B) > 100
GROUP BY MYTABLE.A
and
SELECT MYTABLE.A FROM MYTABLE
GROUP BY MYTABLE.A
HAVING SUM(MYTABLE.B) > 100
Is it the same? Is it possible that these 2 codes will return diffrent sets of results?
Thank you in advance

As documented, there is no difference. People are just used to seeing HAVING after GROUP BY.
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_10002.htm#SQLRF20040
Specify GROUP BY and HAVING after the where_clause and hierarchical_query_clause. If you specify both GROUP BY and HAVING, then they can appear in either order.
http://sqlfiddle.com/#!4/66e33/1

I originally wrote:
I am not sure your 1st query is valid. As far as I know, HAVING should always come after GROUP BY.
I was corrected by David Aldridge, the Oracle docs state that the order does not matter. Although I don't recommend using HAVING before GROUP for readability reasons (and to prevent confusion with a WHERE clause), it is technically correct. So that makes the answer to your question 'yes, it's the same'.

You can't have a HAVING before a GROUP BY, the HAVING is like the "WHERE" but for the GROUP BY condition.

The clauses are evaluated in order. You can have a HAVING clause following immediately the FROM clause. In this case, the HAVING clause will apply to the entire rows of the result set. The select list may only contain, in this case, one/more/all of the aggregation functions contained in the HAVING clause.
So, your first query is not valid because of the above. A valid query would be
SELECT SUM(MYTABLE.B) AS s FROM MYTABLE
HAVING SUM(MYTABLE.B) > 100
The above query will return one or no row, depending on whether the condition SUM(MYTABLE.B) > 100 is verified or not.
Still, there is one more reason for which your first query is not valid. The GROUP BY clause may refer only to columns in the data set to which it applies. So going on with my valid query above, you can write the following valid query (though it will be useless and nonsense, as it is applied to either one or no rows):
SELECT SUM(s)
FROM
(
SELECT SUM(MYTABLE.B) s
FROM MYTABLE
HAVING SUM(MYTABLE.B) > 100
) q
GROUP BY s
So, just to answer: no, they're not the same. One of them is not even valid.

both WHERE and HAVING allow for the imposition of conditions in the query. Difference:
We use WHERE for the records returned by select from the table,
We use HAVING for groups returned by group by select query

Related

In SQL, does groupby on an ordered query behave the same as doing both in the same query?

Are the following queries identical, or might I get different results (in any major DB system, e.g. MSSQL, MySQL, Postgres, SQLite):
Doing both in the same query:
SELECT group, some_agg_func(some_value)
FROM my_table
GROUP BY group
ORDER BY some_other_value
vs. ordering in a subquery:
SELECT group, some_agg_func(some_value)
FROM (
SELECT group, some_value
FROM my_table
ORDER BY some_other_value
) as alias
GROUP BY group
Looking at the first sample:
SELECT group, some_agg_func(some_value)
FROM my_table
GROUP BY group
ORDER BY some_other_value
Let's think about what GROUP BY does by looking at this imaginary sample data:
A B
- -
1 1
1 2
Then think about this query:
SELECT A
FROM SampleData
GROUP BY A
ORDER BY B
The GROUP BY clause puts the two rows into a single group. Then we want to order by B... but the two rows in the group have different values for B. Which should it use?
Obviously in this situation it doesn't really matter: there's only one row in the results, so the order is not relevant. But generally, how does the database know what to do?
The database could guess which one you want, or just take the first value, or the last — whatever those mean in a setting where the data is unordered by definition. And in fact this is what MySql will try to do for you: it will try to guess are your meaning. But this response is really inappropriate. You specified an in-exact query; the only correct thing to do is throw an error, which is what most databases will do.
Now let's look at the second sample:
SELECT group, some_agg_func(some_value)
FROM (
SELECT group, some_value
FROM my_table
ORDER BY some_other_value
) as alias
GROUP BY group
Here it is important to remember databases have their roots in relational set theory, and what we think of as "tables" are more formally described as Unordered Relations. Again: the idea of being "unordered" is baked into the very nature of a table at the deepest level.
In this case the inner query can run and create results in the specified order, and then the outer query can use that with GROUP BY to create a new set... but just like tables, query results are unordered relations. Without an ORDER BY clause the final result is also unordered by definition.
Now you might tend to get results in the order you want, but the reality is all bets are off. In fact, the databases that run this query will tend to give you results in the order in which they first encountered each group, which will not tend to match the ORDER BY because the GROUP BY expression is looking at completely different columns. Other databases (Sql Server is in this group) will not even allow the query to run, though I might prefer a warning here.
So now we come to the final section, where we must re-think the question, like this:
How can I use GROUP BY on the one group column, while also ordering by some_other_column not in the group?
The answer is each group can contain multiple rows, and so you must tell the database which row to look at to get the correct (specific) some_other_column value. The typical way to do this is with another aggregate function, which might look like this:
SELECT group, some_agg_func(some_value)
FROM my_table
GROUP BY group
ORDER BY some_other_agg_func(some_other_column)
That code will run without error on pretty much any database.
Just be careful here. On one hand, when people want to do this it's often for the common case where they know every record for some_other_column in each group will have the same value. For example, you might GROUP BY UserID, but ORDER BY Email, where of course every record with the same UserID should have the same Email address. As humans, we have the ability to make that kind of inference. Computers, however, don't handle that kind of thinking as well, and so we help it out with an extra aggregate function like MIN() or MAX().
On the other hand, if you're not careful sometimes the two different aggregate functions don't match up, and you end up showing the value from one row in the group, while using a completely different row from the group for the ORDER BY expression in a way that is not good.
Tables are unordered sets of data. A query result is a table. So if you select from a subquery that contains an ORDER BY clause, that clause means nothing; the data set is unordered by definition. The DBMS is free to ignore the ORDER BY clause. Some DBMS may even issue a warning or error, but I suppose it's more common that the ORDER BY clause just has no effect - at least not guaranteed.
In this query
SELECT group, some_agg_func(some_value)
FROM my_table
GROUP BY group
ORDER BY some_other_value
you try to order your results by some_other_value. If this is meant to be a column, you can't, because that other column is no part of your results. You'll get a syntax error. If some_other_value is a fixed value, then there is nothing ordered, because you'd have the same sort key for every row. But it can be an expression based on your result data (group key and aggreation results) and you can order your result rows by that.
In this query
SELECT group, some_agg_func(some_value)
FROM (
SELECT group, some_value
FROM my_table
ORDER BY some_other_value
) as alias
GROUP BY group
the ORDER BY clause has no effect. You could just as well just select FROM my_table directly:
SELECT group, some_agg_func(some_value)
FROM my_table as alias
GROUP BY group
This gets the results unordered (or at least the order you see is not guaranteed to be thus every time you run that query), because your query doesn't have an ORDER BY clause.

"group by" needed in count(*) SQL statement?

The following statement works in my database:
select column_a, count(*) from my_schema.my_table group by 1;
but this one doesn't:
select column_a, count(*) from my_schema.my_table;
I get the error:
ERROR: column "my_table.column_a" must appear in the GROUP BY clause
or be used in an aggregate function
Helpful note: This thread: What does SQL clause "GROUP BY 1" mean? discusses the meaning of "group by 1".
Update:
The reason why I am confused is because I have often seen count(*) as follows:
select count(*) from my_schema.my_table
where there is no group by statement. Is COUNT always required to be followed by group by? Is the group by statement implicit in this case?
This error makes perfect sense. COUNT is an "aggregate" function. So you need to tell it which field to aggregate by, which is done with the GROUP BY clause.
The one which probably makes most sense in your case would be:
SELECT column_a, COUNT(*) FROM my_schema.my_table GROUP BY column_a;
If you only use the COUNT(*) clause, you are asking to return the complete number of rows, instead of aggregating by another condition. Your questing if GROUP BY is implicit in that case, could be answered with: "sort of": If you don't specify anything is a bit like asking: "group by nothing", which means you will get one huge aggregate, which is the whole table.
As an example, executing:
SELECT COUNT(*) FROM table;
will show you the number of rows in that table, whereas:
SELECT col_a, COUNT(*) FROM table GROUP BY col_a;
will show you the the number of rows per value of col_a. Something like:
col_a | COUNT(*)
---------+----------------
value1 | 100
value2 | 10
value3 | 123
You also should take into account that the * means to count everything. Including NULLs! If you want to count a specific condition, you should use COUNT(expression)! See the docs about aggragate functions for more details on this topic.
If you don't use the Group by clause at all then all that will be returned is a count of 1 for each row, which is already assumed anyway and therefore redundant data. By adding GROUP BY 1 you have categorized the information thereby making it non-redundant even though it returns the same result in theory as the statement that creates an error.
When you have a function like count, sum etc. you need to group the other columns. This would be equivalent to your query:
select column_a, count(*) from my_schema.my_table group by column_a;
When you use count(*) with no other column, you are counting all rows from SELECT * from the table. When you use count(*) alongside another column, you are counting the number of rows for each different value of that other column. So in this case you need to group the results, in order to show each value and its count only once.
group by 1 in this case refers to column_a which has the column position 1 in your query.
This why it works on your server. Indeed this is not a good practice in sql.
You should mention the column name because the column order may change in the table so it will be hard to maintain this code.
The best solution is:
select column_a, count(*) from my_schema.my_table group by column_a;

Clarification on "rownum"

I have a table Table1
Name Date
A 01-jun-2010
B 03-dec-2010
C 12-may-2010
When i query this table with the following query
select * from table1 where rownum=1
i got output as
Name Date
A 01-jun-2010
But in the same way when i use the following queries i do not get any output.
select * from table1 where rownum=2
select * from table1 where rownum=3
Someone please give me guidance why it works like that, and how to use the rownum.
Tom has an answer for many Oracle related questions
In short, rownum is available after the where clause has been applied and before the order by clause is applied.
In the case of RowNum=2, the predicate in the where clause will never evaluate to true as RowNum starts at 1 and only increases if records matching the predicate can be found.
Adding rownums is one of the last things done after the result set has been fetched from the database. This means that the first row will always have rownum 1. Rownum is better used when you want to limit the result set, for instance when doing paging.
See this for more: http://www.orafaq.com/wiki/ROWNUM
(Not an Oracle expert by any means)
From what I understand, rownum numbers the rows in a result set.
So, in your example:
select * from table1 where rownum=2
How many rows are there going to be in the result set? Therefore, what rownum would be assigned to such a row? Can you see now why no result is actually returned?
In general, you should avoid relying on rownum, or any features that imply an order to results. Try to think about working with the entire set of results.
With that being said, I believe the following would work:
select * from (select rownum as rn,table1.* from table1) as t where t.rn = 2
Because in that case, you're numbering the rows within the subquery.

It's possible to have a WHERE clause after a HAVING clause?

Is it possible to use a WHERE clause after a HAVING clause?
The first thing that comes to my mind is sub queries, but I'm not sure.
P.S. If the answer is affirmative, could you give some examples?
No, not in the same query.
The where clause goes before the having and the group by. If you want to filter out records before the grouping the condition goes in the where clause, and if you want to filter out grouped records the condition goes in the having clause:
select ...
from ...
where ...
group by ...
having ...
If neither of those are possible to use for some odd reason, you have to make the query a subquery so that you can put the where clause in the outer query:
select ...
from (
select ...
from ...
where ...
group by ...
having ...
) x
where ...
A HAVING clause is just a WHERE clause after a GROUP BY. Why not put your WHERE conditions in the HAVING clause?
If it's a trick question, it's possible if the WHERE and the HAVING are not at the same level, as you mentionned, with subquery.
I guess something like that would work
HAVING value=(SELECT max(value) FROM
foo WHERE crit=123)
p.s.: why were you asking?
Do you have a specific problem?
p.s.s: OK silly me, I missed the "interview*" tag...
From SELECT help
Processing Order of WHERE, GROUP BY,
and HAVING Clauses The following steps
show the processing order for a SELECT
statement with a WHERE clause, a GROUP
BY clause, and a HAVING clause:
The FROM clause returns an initial
result set.
The WHERE clause excludes rows not
meeting its search condition.
The GROUP BY clause collects the
selected rows into one group for each
unique value in the GROUP BY clause.
Aggregate functions specified in the
select list calculate summary values
for each group.
The HAVING clause additionally
excludes rows not meeting its search
condition.
So, no you can not.
Within the same scope, answer is no. If subqueries is allowed then you can avoid using HAVING entirely.
I think HAVING is an anachronism. Hugh Darwen refers to HAVING as "The Folly of Structured Queries":
In old SQL, the WHERE clause could not
be used on results of aggregation, so
they had to invent HAVING (with same
meaning as WHERE):
SELECT D#, AVG(Salary) AS Avg_Sal
FROM Emp
GROUP
BY D#
HAVING AVG(Salary) > 999;
But would we ever have had HAVING if
in 1979 one could write:
SELECT *
FROM (
SELECT D#, AVG(Sal) AS Avg_Sal
FROM Emp
GROUP
BY D#
)
AS dummy
WHERE Avg_Sal > 999;
I strongly suspect the answer to Darwen's question is no.

What is the difference between HAVING and WHERE in SQL?

What is the difference between HAVING and WHERE in an SQL SELECT statement?
EDIT: I have marked Steven's answer as the correct one as it contained the key bit of information on the link:
When GROUP BY is not used, HAVING behaves like a WHERE clause
The situation I had seen the WHERE in did not have GROUP BY and is where my confusion started. Of course, until you know this you can't specify it in the question.
HAVING: is used to check conditions after the aggregation takes place.
WHERE: is used to check conditions before the aggregation takes place.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Gives you a table of all cities in MA and the number of addresses in each city.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Having Count(1)>5
Gives you a table of cities in MA with more than 5 addresses and the number of addresses in each city.
HAVING specifies a search condition for a
group or an aggregate function used in SELECT statement.
Source
Number one difference for me: if HAVING was removed from the SQL language then life would go on more or less as before. Certainly, a minority queries would need to be rewritten using a derived table, CTE, etc but they would arguably be easier to understand and maintain as a result. Maybe vendors' optimizer code would need to be rewritten to account for this, again an opportunity for improvement within the industry.
Now consider for a moment removing WHERE from the language. This time the majority of queries in existence would need to be rewritten without an obvious alternative construct. Coders would have to get creative e.g. inner join to a table known to contain exactly one row (e.g. DUAL in Oracle) using the ON clause to simulate the prior WHERE clause. Such constructions would be contrived; it would be obvious there was something was missing from the language and the situation would be worse as a result.
TL;DR we could lose HAVING tomorrow and things would be no worse, possibly better, but the same cannot be said of WHERE.
From the answers here, it seems that many folk don't realize that a HAVING clause may be used without a GROUP BY clause. In this case, the HAVING clause is applied to the entire table expression and requires that only constants appear in the SELECT clause. Typically the HAVING clause will involve aggregates.
This is more useful than it sounds. For example, consider this query to test whether the name column is unique for all values in T:
SELECT 1 AS result
FROM T
HAVING COUNT( DISTINCT name ) = COUNT( name );
There are only two possible results: if the HAVING clause is true then the result with be a single row containing the value 1, otherwise the result will be the empty set.
The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.
Check out this w3schools link for more information
Syntax:
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value
A query such as this:
SELECT column_name, COUNT( column_name ) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
HAVING COUNT( column_name ) >= 3;
...may be rewritten using a derived table (and omitting the HAVING) like this:
SELECT column_name, column_name_tally
FROM (
SELECT column_name, COUNT(column_name) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
) pointless_range_variable_required_here
WHERE column_name_tally >= 3;
The difference between the two is in the relationship to the GROUP BY clause:
WHERE comes before GROUP BY; SQL evaluates the WHERE clause before it groups records.
HAVING comes after GROUP BY; SQL evaluates HAVING after it groups records.
References
SQLite SELECT Statement Syntax/Railroad Diagram
Informix SELECT Statement Syntax/Railroad Diagram
HAVING is used when you are using an aggregate such as GROUP BY.
SELECT edc_country, COUNT(*)
FROM Ed_Centers
GROUP BY edc_country
HAVING COUNT(*) > 1
ORDER BY edc_country;
WHERE is applied as a limitation on the set returned by SQL; it uses SQL's built-in set oeprations and indexes and therefore is the fastest way to filter result sets. Always use WHERE whenever possible.
HAVING is necessary for some aggregate filters. It filters the query AFTER sql has retrieved, assembled, and sorted the results. Therefore, it is much slower than WHERE and should be avoided except in those situations that require it.
SQL Server will let you get away with using HAVING even when WHERE would be much faster. Don't do it.
WHERE clause does not work for aggregate functions
means : you should not use like this
bonus : table name
SELECT name
FROM bonus
GROUP BY name
WHERE sum(salary) > 200
HERE Instead of using WHERE clause you have to use HAVING..
without using GROUP BY clause, HAVING clause just works as WHERE clause
SELECT name
FROM bonus
GROUP BY name
HAVING sum(salary) > 200
Difference b/w WHERE and HAVING clause:
The main difference between WHERE and HAVING clause is, WHERE is used for row operations and HAVING is used for column operations.
Why we need HAVING clause?
As we know, aggregate functions can only be performed on columns, so we can not use aggregate functions in WHERE clause. Therefore, we use aggregate functions in HAVING clause.
One way to think of it is that the having clause is an additional filter to the where clause.
A WHERE clause is used filters records from a result. The filter occurs before any groupings are made. A HAVING clause is used to filter values from a group
In an Aggregate query, (Any query Where an aggregate function is used) Predicates in a where clause are evaluated before the aggregated intermediate result set is generated,
Predicates in a Having clause are applied to the aggregate result set AFTER it has been generated. That's why predicate conditions on aggregate values must be placed in Having clause, not in the Where clause, and why you can use aliases defined in the Select clause in a Having Clause, but not in a Where Clause.
I had a problem and found out another difference between WHERE and HAVING. It does not act the same way on indexed columns.
WHERE my_indexed_row = 123 will show rows and automatically perform a "ORDER ASC" on other indexed rows.
HAVING my_indexed_row = 123 shows everything from the oldest "inserted" row to the newest one, no ordering.
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is used to filter values from a group (i.e., to
check conditions after aggregation into groups has been performed).
Resource from Here
From here.
the SQL standard requires that HAVING
must reference only columns in the
GROUP BY clause or columns used in
aggregate functions
as opposed to the WHERE clause which is applied to database rows
While working on a project, this was also my question. As stated above, the HAVING checks the condition on the query result already found. But WHERE is for checking condition while query runs.
Let me give an example to illustrate this. Suppose you have a database table like this.
usertable{ int userid, date datefield, int dailyincome }
Suppose, the following rows are in table:
1, 2011-05-20, 100
1, 2011-05-21, 50
1, 2011-05-30, 10
2, 2011-05-30, 10
2, 2011-05-20, 20
Now, we want to get the userids and sum(dailyincome) whose sum(dailyincome)>100
If we write:
SELECT userid, sum(dailyincome) FROM usertable WHERE
sum(dailyincome)>100 GROUP BY userid
This will be an error. The correct query would be:
SELECT userid, sum(dailyincome) FROM usertable GROUP BY userid HAVING
sum(dailyincome)>100
WHERE clause is used for comparing values in the base table, whereas the HAVING clause can be used for filtering the results of aggregate functions in the result set of the query
Click here!
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is
used to filter values from a group (i.e., to check conditions after
aggregation into groups has been performed).
I use HAVING for constraining a query based on the results of an aggregate function. E.G. select * in blahblahblah group by SOMETHING having count(SOMETHING)>0