This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
SQL: What's the difference between HAVING and WHERE?
What is the difference between using having clause and where clause. Could any one explain in detail.
HAVING filters grouped elements,
WHERE filters ungrouped elements.
Example 1:
SELECT col1, col2 FROM table
WHERE col1 = #id
Example 2:
SELECT SUM(col1), col2 FROM table
GROUP BY col2
HAVING SUM(col1) > 10
Because the HAVING condition can only be applied in the second example AFTER the grouping has occurred, you could not rewrite it as a WHERE clause.
Example 3:
SELECT SUM(col1), col2 FROM table
WHERE col1 = #id
GROUP BY col2
HAVING SUM(col1) > 10
demonstrates how you might use both WHERE and HAVING together:
The table data is first filtered by col1 = #id
then the filtered data is grouped
then the grouped data is filtered again by SUM(col1) > 10
WHERE filters rows before they are grouped in GROUP BY clause
while HAVING filters the aggregate values after GROUP BY takes place
HAVING specifies a search for something used in the SELECT statement.
In other words.
HAVING applies to groups.
WHERE applies to rows.
Without a GROUP BY, there is no difference (but HAVING looks strange then)
With a GROUP BY
HAVING is for testing condition on the aggregate (MAX, SUM, COUNT etc)
HAVING column = 1 is the same as WHERE column = 1 (no aggregate on column )
WHERE COUNT(*) = 1 is not allowed.
HAVING COUNT(*) = 1 is allowed
Having is for use with an aggregate such as Sum. Where is for all other cases.
They specify a search condition for a group or an aggregate. But the difference is that HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause. Having Clause is basically used only with the GROUP BY function in a query whereas WHERE Clause is applied to each row before they are part of the GROUP BY function in a query.
As other already said, having is used with group by. The reason is the order of execution - where is executed before group by, having is executed after it
Think of it as a matter of where the filtering happens.
When you specify a where clause you filter input rows to your aggregate function (ie: I only want to get the average age on persons living in a specific city.) When you specify a having constraint you specify that you only want a certain subset of the averages. (I only want to see cities with an average age of 70 years or above.)
Having is for aggregate functions, e.g.
SELECT *
FROM foo
GROUP BY baz
HAVING COUNT(*) > 8
Related
I want to get the amount of results for each day of the past week. Unfortunately, I got this error for the query:
An expression starting with "APP_ID" specified in a SELECT clause,
HAVING clause, or ORDER BY clause is not specified in the GROUP BY
clause or it is in a SELECT clause, HAVING clause, or ORDER BY clause
with a column function and no GROUP BY clause is specified..
SQLCODE=-119, SQLSTATE=42803, DRIVER=3.67.27
The query:
SELECT DAYNAME(created), app_id
FROM Annotation
WHERE app_id = 1 AND (created < CURRENT DATE - 7 DAYS)
GROUP BY DAYNAME(created) ORDER BY created
The problem has something to do with the GROUP BY statement. What is wrong with it?
I think the error is pretty clear -- appid is in the SELECT but not the GROUP BY. The solution is that you need an aggregation function. I would expect something like this:
SELECT DAYNAME(created), COUNT(*)
FROM Annotation a
WHERE app_id = 1 AND (created < CURRENT DATE - 7 DAYS)
GROUP BY DAYNAME(created)
ORDER BY MIN(created);
If you want to use group by, then every column must either by in your group by statement, or aggregated.
select col1, col2, 'same-for-every-row', sum(col3) as col3_sum, avg(col4) as col4_avg
from schema.table
group by col1, col2
This works because col1 and col2 have been grouped by, but every other column has some aggregation to know how to group up all the values.
Your current statement won't work, because although you've grouped by date, you haven't specified how to group all of the rows for app_id, you need to specify that they should be grouped by summing or averaging or finding the minimum or aggregating in some other way, all of the values in that group.
The exception being a column that's created using a string, 'same-for-every-row', this won't need to be aggregated as it's the same every time.
This question already has answers here:
Sql Server : How to use an aggregate function like MAX in a WHERE clause
(6 answers)
Closed 7 years ago.
can someone please tell me what is wrong with the following query
select 1
from table1 a,
table2 b
where a.pdate=max(b.pdate)
It is not compiled.
the other way to write this query is
set #pdate=pdate from table2
select 1
from table1 a,
table2 b
where a.pdate=max(b.pdate)
But I want to understand what is wrong with the first query.
Thanks
But I want to understand what is wrong with the first query.
The error message tells you something that could be of value to you.
An aggregate may not appear in the WHERE clause unless it is in a
subquery contained in a HAVING clause or a select list, and the column
being aggregated is an outer reference.
The max() function is an aggregate that returns the max value for a set of rows. The where clause is used to filter rows. So if you use an aggregate in the place where you are doing the filtering it is not clear what rows you actually want the max value for.
A rewrite could look like this:
select 1
from dbo.table1 as a
where a.pdate = (
select max(b.pdate)
from dbo.table2 as b
);
even second query is wrong.
Correct way,
Select #pdate=max(pdate) from table2
select 1
from table1 a where a.pdate=#pdate
or,
select 1
from table1 a where a.pdate=(Select max(pdate) from table2)
if you mention another column name apart from aggregate column then you hv to use group by
This question already has answers here:
Is there any difference between GROUP BY and DISTINCT
(26 answers)
Closed 8 years ago.
I believe GROUP BY in SQL would make DISTINCT unnecessary because if you group by a column then there will only be one of each value in the result, but I want to make sure I am understanding the keywords correctly. Is it the case that you would not need to do this:
SELECT DISTINCT a_uuid
FROM table
GROUP BY a_uuid
HAVING NOT bool_or(type = 'Purchase')
because you could just drop the DISTINCT completely?
You do not need the distinct in this query. In general, you don't need distinct with group by. There are actually some queries where distinct and group by go together, but they are very rare.
You need group by in this query, because you are using an aggregation function in the having clause. So, use:
SELECT a_uuid
FROM table
GROUP BY a_uuid
HAVING NOT bool_or(type = 'Purchase')
As long as aggregate functions aren't involved you can use DISTINCT instead of GROUP BY.
Use either DISTINCT or GROUP BY - not both!
Use DISTINCT if you just want to remove duplicates. Use GROUPY BY if you want to apply aggregate operators (MAX, SUM, GROUP_CONCAT, ..., or a HAVING clause).
If you use DISTINCT with multiple columns, the result set won't be grouped as it will with GROUP BY, and you can't use aggregate functions with DISTINCT.
Overall, these two are different in functionality matter, however, it happens that these two return same output for the particular set of data.
The following statement works in my database:
select column_a, count(*) from my_schema.my_table group by 1;
but this one doesn't:
select column_a, count(*) from my_schema.my_table;
I get the error:
ERROR: column "my_table.column_a" must appear in the GROUP BY clause
or be used in an aggregate function
Helpful note: This thread: What does SQL clause "GROUP BY 1" mean? discusses the meaning of "group by 1".
Update:
The reason why I am confused is because I have often seen count(*) as follows:
select count(*) from my_schema.my_table
where there is no group by statement. Is COUNT always required to be followed by group by? Is the group by statement implicit in this case?
This error makes perfect sense. COUNT is an "aggregate" function. So you need to tell it which field to aggregate by, which is done with the GROUP BY clause.
The one which probably makes most sense in your case would be:
SELECT column_a, COUNT(*) FROM my_schema.my_table GROUP BY column_a;
If you only use the COUNT(*) clause, you are asking to return the complete number of rows, instead of aggregating by another condition. Your questing if GROUP BY is implicit in that case, could be answered with: "sort of": If you don't specify anything is a bit like asking: "group by nothing", which means you will get one huge aggregate, which is the whole table.
As an example, executing:
SELECT COUNT(*) FROM table;
will show you the number of rows in that table, whereas:
SELECT col_a, COUNT(*) FROM table GROUP BY col_a;
will show you the the number of rows per value of col_a. Something like:
col_a | COUNT(*)
---------+----------------
value1 | 100
value2 | 10
value3 | 123
You also should take into account that the * means to count everything. Including NULLs! If you want to count a specific condition, you should use COUNT(expression)! See the docs about aggragate functions for more details on this topic.
If you don't use the Group by clause at all then all that will be returned is a count of 1 for each row, which is already assumed anyway and therefore redundant data. By adding GROUP BY 1 you have categorized the information thereby making it non-redundant even though it returns the same result in theory as the statement that creates an error.
When you have a function like count, sum etc. you need to group the other columns. This would be equivalent to your query:
select column_a, count(*) from my_schema.my_table group by column_a;
When you use count(*) with no other column, you are counting all rows from SELECT * from the table. When you use count(*) alongside another column, you are counting the number of rows for each different value of that other column. So in this case you need to group the results, in order to show each value and its count only once.
group by 1 in this case refers to column_a which has the column position 1 in your query.
This why it works on your server. Indeed this is not a good practice in sql.
You should mention the column name because the column order may change in the table so it will be hard to maintain this code.
The best solution is:
select column_a, count(*) from my_schema.my_table group by column_a;
I have to find distinct count of combination of 2 variables. I used the following 2 queries to find the count:
select count(*) from
( select V1, V2
from table1
group by 1,2
) a
select count(distinct catx('-', V1, V2))
from table1
Logically, both the above queries should give the same count but I am getting different counts. Note that
both V1 and V2 are integers
Both variables can have null values, though there are no null values in my table
There are no negative values
Any idea why I might be getting different outputs? And which is the best way to find the count of distinct combinations of 2 or more columns?
Thanks.
The SAS log gives the answer when you run the first sql code. Using 'group by' requires a summary function, otherwise it is ignored. The count will therefore return the overall number of rows instead of a distinct count of the 2 variables combined.
Just add count(*) to the subquery and you will get the same answer with both methods.
select count(*) from
( select V1, V2, count(*)
from table1
group by 1,2
) a
Use distinct in the subquery for the first query..
When you do a group by but don't include any aggregate function, it discards the group by.
so you will still have duplicate combinations of v1 and v2.
It seems that GROUP BY doesn't work that way in SAS. You can't use it to remove duplicates unless you have an aggregate function in your query. I found this in the log of my query output -
NOTE: A GROUP BY clause has been discarded because neither the SELECT
clause nor the optional HAVING clause of the associated
table-expression referenced a summary function.
This answers the question.
you can ignore the group by part also and just add a distinct in the sub-query. Also the second query you wrote is more efficient