SQL - using 'HAVING' with 'EXISTS' without using 'GROUP BY' - sql

Using 'HAVING' without 'GROUP BY' is not allowed:
SELECT *
FROM products
HAVING unitprice > avg(unitprice)
Column 'products.UnitPrice' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.
But when placing the same code under 'EXISTS' - no problems:
SELECT *
FROM products p
WHERE EXISTS (SELECT 1
FROM products
HAVING p.unitprice > avg(unitprice))
Can you please explain why?

well the error is clear in first query UnitPrice is not part of aggregation nor group by
whereas in your second query you are comparing p.unitprice from table "products p" which doesn't need to be part of aggregation or group by , your second query is equivalent to :
select * from products p
where p.unitprice > (select avg(unitprice) FROM products)
which maybe this is more clear , that sql caculate avg(unitprice) then compares it with unitprice column from product.

HAVING filters after aggregation according to the SQL standard and in most databases.
Without a GROUP BY, there is still aggregation.
But in your case, you simply want a subquery and WHERE:
SELECT p.*
FROM products p
WHERE p.unitprice > (SELECT AVG(p2.unitprice) FROM products p2);

The problem comes from the columns you select :
SELECT *
and
SELECT 1
Unlike ordinary functions that are evaluated at each row, aggregate functions are computed once the whole dataset is processed, which means that in theory (at least without a GROUP BY statement), you can't select both aggregate and regular functions in a same column set (even if some DBMS still tolerate this).
It's easier to see when considering SUM(). You're not supposed to have an access to the total of a column before all rows have been returned, which prevents you to write something like SELECT price,SUM(price), for instance.
GROUP BY, now, enables you to regroup your rows according to a given criteria (actually, a bunch of columns), which makes these aggregate functions to be computed at the end of each of these groups instead of the whole dataset. Therefore, since all the column specified in GROUP BY are supposed to be the same for a given group, you're allowed to include them in your global SELECT statement.
This leads us to the actual failure cause: on first query, you select all columns. On the second one, you select none: only the constant 1, which is not part of the table itself.

Related

Does it has to be that in SQL, aggregate function goes hand in hand with grouping?

Question in the title. Thanks for the time.
EXAMPLE v
SELECT
customer_id,
SUM(unit_price * quantity) AS total_price
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_id
GROUP BY customer_id
Yes, grouping does go hand in hand with aggregate functions, for any resulting column not contained within an aggregate function
Grouping and the operations of aggregation are typically linked to three keywords:
any aggregation function
the GROUP BY clause
the DISTINCT (ON) modifier after the SELECT keyword
You can use:
aggregation functions alone when they are used on every field found inside the SELECT statement
the GROUP BY clause alone - an allowed bad practice as you're intending to do an aggregation on your field but you're not specifying any aggregation function inside the SELECT statement (really you should go with DISTINCT ON)
the DISTINCT modifier alone to select distinct rows (not fields)
the DISTINCT ON modifier only when accompanied by an ORDER BY clause that defines the order for extracting the rows
aggregation functions + the GROUP BY clause, that is forced to contain all fields found in the SELECT statement that are not object of aggregation
the DISTINCT modifier + the GROUP BY clause - you can do it but the GROUP BY clause really is superfluous as it is already implied by the DISTINCT keyword itself.
You can't use:
aggregation functions + the GROUP BY clause when non-aggregated fields included in the SELECT statement are not found within the GROUP BY clause
aggregation functions alone when in the SELECT statement they are found together with non-aggregated fields.
When you can't aggregate because you need to select multiple fields, typically window functions + filtering operations (with a WHERE clause) can come in handy for computing values and row by row and removing unneeded records.

what is aggregate function in sql?

I have two queries both are working fine when they executed separately:
select distinct
style_ref
from
tbl_Size
where
order_ref='123'
select
sum(quantity)
from
tbl_size
where
order_ref='123'
But if I try to combine them it does not work
select distinct
style_ref, sum(quantity)
from
tbl_size
where
order_ref='123'
ERROR appears:
Column 'tbl_Size.style_ref' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
An aggregate function is one that combines several records into a single one. In your case, SUM. You're taking the sum, clearly, of more than one row at a time. Another example might be AVG, to get the average of several values.
You can't run aggregate functions, as your error says, alongside ungrouped columns, because that introduces multiple "layers" of data. In one row, you'd have something that described the entire dataset, and you'd have something else that described only a single record. This would be confusing, not to mention inefficient.
Rather than using DISTINCT in your example, you're probably looking to GROUP BY your column:
SELECT style_ref, sum(quantity)
FROM tbl_size
WHERE order_ref='123'
GROUP BY style_ref
This will group up every set of records, based on their style_ref value, then tell you the sum of the quantities. Thus, assuming your schema naming is accurate, it will tell you how many orders were present for each style_ref.
The above query is equivalent in meaning to the following:
SELECT DISTINCT style_ref, (SELECT SUM(quantity)
FROM tbl_size AS B
WHERE B.order_ref = '123'
AND B.style_ref = tbl_size.style_ref)
FROM tbl_size
WHERE order_ref = '123'
As you can see, the GROUP BY solution is much, much cleaner and better to use. But I included this just to describe what it returns in a arguably a bit more of a readable way. You can see here how the aggregate function (SUM) could be described as working on a separate plane from the style_ref column, so it'd be hard to combine those into a single one without GROUP BY.
An aggregate function is a function that returns one result for many rows - like sum in your example.
You can use them in conjunction with the group by clause in order to get one result per group:
select style_ref, sum(quantity)
from tbl_size
where order_ref='123'
group by style_ref

Select all columns on a group by throws error

I ran a query against Northwind database Products Table like below
select * from Northwind.dbo.Products GROUP BY CategoryID and i was hit with a error. I am sure you will also be hit by same error. So what is the correct statement that i need to execute to group all products with respect to their category id's.
edit: this like really helped understand a lot
http://weblogs.sqlteam.com/jeffs/archive/2007/07/20/but-why-must-that-column-be-contained-in-an-aggregate.aspx
You need to use an Aggregate function and then group by any non-aggregated columns.
I recommend reading up on GROUP BY.
If you're using GROUP BY in a query, all items in your SELECT statement must either be contained as part of an aggregate function, e.g. Sum() or Count(), else they will also need to be included in the GROUP BY clause.
Because you are using SELECT *, this is equivalent to listing ALL columns in your SELECT.
Therefore, either list them all in the GROUP BY too, use aggregating functions for the rest where possible, or only select the CategoryID.

GROUP BY / aggregate function confusion in SQL

I need a bit of help straightening out something, I know it's a very easy easy question but it's something that is slightly confusing me in SQL.
This SQL query throws a 'not a GROUP BY expression' error in Oracle. I understand why, as I know that once I group by an attribute of a tuple, I can no longer access any other attribute.
SELECT *
FROM order_details
GROUP BY order_no
However this one does work
SELECT SUM(order_price)
FROM order_details
GROUP BY order_no
Just to concrete my understanding on this.... Assuming that there are multiple tuples in order_details for each order that is made, once I group the tuples according to order_no, I can still access the order_price attribute for each individual tuple in the group, but only using an aggregate function?
In other words, aggregate functions when used in the SELECT clause are able to drill down into the group to see the 'hidden' attributes, where simply using 'SELECT order_no' will throw an error?
In standard SQL (but not MySQL), when you use GROUP BY, you must list all the result columns that are not aggregates in the GROUP BY clause. So, if order_details has 6 columns, then you must list all 6 columns (by name - you can't use * in the GROUP BY or ORDER BY clauses) in the GROUP BY clause.
You can also do:
SELECT order_no, SUM(order_price)
FROM order_details
GROUP BY order_no;
That will work because all the non-aggregate columns are listed in the GROUP BY clause.
You could do something like:
SELECT order_no, order_price, MAX(order_item)
FROM order_details
GROUP BY order_no, order_price;
This query isn't really meaningful (or most probably isn't meaningful), but it will 'work'. It will list each separate order number and order price combination, and will give the maximum order item (number) associated with that price. If all the items in an order have distinct prices, you'll end up with groups of one row each. OTOH, if there are several items in the order at the same price (say £0.99 each), then it will group those together and return the maximum order item number at that price. (I'm assuming the table has a primary key on (order_no, order_item) where the first item in the order has order_item = 1, the second item is 2, etc.)
The order in which SQL is written is not the same order it is executed.
Normally, you would write SQL like this:
SELECT
FROM
JOIN
WHERE
GROUP BY
HAVING
ORDER BY
Under the hood, SQL is executed like this:
FROM
JOIN
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Reason why you need to put all the non-aggregate columns in SELECT to the GROUP BY is the top-down behaviour in programming. You cannot call something you have not declared yet.
Read more: https://sqlbolt.com/lesson/select_queries_order_of_execution
SELECT *
FROM order_details
GROUP BY order_no
In the above query you are selecting all the columns because of that its throwing an error not group by something like..
to avoid that you have to mention all the columns whichever in select statement all columns must be in group by clause..
SELECT *
FROM order_details
GROUP BY order_no,order_details,etc
etc it means all the columns from order_details table.
To use group by clause you have to mention all the columns from select statement in to group by clause but not the column from aggregate function.
TO do this instead of group by you can use partition by clause you can use only one port to group as a partition by.
you can also make it as partition by 1
use Common table expression(CTE) to avoid this issue.
multiple CTes also come handy, pasting a case where I have used...maybe helpful
with ranked_cte1 as
( select r.mov_id,DENSE_RANK() over ( order by r.rev_stars desc )as rankked from ratings r ),
ranked_cte2 as ( select * from movie where mov_id=(select mov_id from ranked_cte1 where rankked=7 ) ) select * from ranked_cte2
select * from movie where mov_id=902

What is the difference between HAVING and WHERE in SQL?

What is the difference between HAVING and WHERE in an SQL SELECT statement?
EDIT: I have marked Steven's answer as the correct one as it contained the key bit of information on the link:
When GROUP BY is not used, HAVING behaves like a WHERE clause
The situation I had seen the WHERE in did not have GROUP BY and is where my confusion started. Of course, until you know this you can't specify it in the question.
HAVING: is used to check conditions after the aggregation takes place.
WHERE: is used to check conditions before the aggregation takes place.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Gives you a table of all cities in MA and the number of addresses in each city.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Having Count(1)>5
Gives you a table of cities in MA with more than 5 addresses and the number of addresses in each city.
HAVING specifies a search condition for a
group or an aggregate function used in SELECT statement.
Source
Number one difference for me: if HAVING was removed from the SQL language then life would go on more or less as before. Certainly, a minority queries would need to be rewritten using a derived table, CTE, etc but they would arguably be easier to understand and maintain as a result. Maybe vendors' optimizer code would need to be rewritten to account for this, again an opportunity for improvement within the industry.
Now consider for a moment removing WHERE from the language. This time the majority of queries in existence would need to be rewritten without an obvious alternative construct. Coders would have to get creative e.g. inner join to a table known to contain exactly one row (e.g. DUAL in Oracle) using the ON clause to simulate the prior WHERE clause. Such constructions would be contrived; it would be obvious there was something was missing from the language and the situation would be worse as a result.
TL;DR we could lose HAVING tomorrow and things would be no worse, possibly better, but the same cannot be said of WHERE.
From the answers here, it seems that many folk don't realize that a HAVING clause may be used without a GROUP BY clause. In this case, the HAVING clause is applied to the entire table expression and requires that only constants appear in the SELECT clause. Typically the HAVING clause will involve aggregates.
This is more useful than it sounds. For example, consider this query to test whether the name column is unique for all values in T:
SELECT 1 AS result
FROM T
HAVING COUNT( DISTINCT name ) = COUNT( name );
There are only two possible results: if the HAVING clause is true then the result with be a single row containing the value 1, otherwise the result will be the empty set.
The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.
Check out this w3schools link for more information
Syntax:
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value
A query such as this:
SELECT column_name, COUNT( column_name ) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
HAVING COUNT( column_name ) >= 3;
...may be rewritten using a derived table (and omitting the HAVING) like this:
SELECT column_name, column_name_tally
FROM (
SELECT column_name, COUNT(column_name) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
) pointless_range_variable_required_here
WHERE column_name_tally >= 3;
The difference between the two is in the relationship to the GROUP BY clause:
WHERE comes before GROUP BY; SQL evaluates the WHERE clause before it groups records.
HAVING comes after GROUP BY; SQL evaluates HAVING after it groups records.
References
SQLite SELECT Statement Syntax/Railroad Diagram
Informix SELECT Statement Syntax/Railroad Diagram
HAVING is used when you are using an aggregate such as GROUP BY.
SELECT edc_country, COUNT(*)
FROM Ed_Centers
GROUP BY edc_country
HAVING COUNT(*) > 1
ORDER BY edc_country;
WHERE is applied as a limitation on the set returned by SQL; it uses SQL's built-in set oeprations and indexes and therefore is the fastest way to filter result sets. Always use WHERE whenever possible.
HAVING is necessary for some aggregate filters. It filters the query AFTER sql has retrieved, assembled, and sorted the results. Therefore, it is much slower than WHERE and should be avoided except in those situations that require it.
SQL Server will let you get away with using HAVING even when WHERE would be much faster. Don't do it.
WHERE clause does not work for aggregate functions
means : you should not use like this
bonus : table name
SELECT name
FROM bonus
GROUP BY name
WHERE sum(salary) > 200
HERE Instead of using WHERE clause you have to use HAVING..
without using GROUP BY clause, HAVING clause just works as WHERE clause
SELECT name
FROM bonus
GROUP BY name
HAVING sum(salary) > 200
Difference b/w WHERE and HAVING clause:
The main difference between WHERE and HAVING clause is, WHERE is used for row operations and HAVING is used for column operations.
Why we need HAVING clause?
As we know, aggregate functions can only be performed on columns, so we can not use aggregate functions in WHERE clause. Therefore, we use aggregate functions in HAVING clause.
One way to think of it is that the having clause is an additional filter to the where clause.
A WHERE clause is used filters records from a result. The filter occurs before any groupings are made. A HAVING clause is used to filter values from a group
In an Aggregate query, (Any query Where an aggregate function is used) Predicates in a where clause are evaluated before the aggregated intermediate result set is generated,
Predicates in a Having clause are applied to the aggregate result set AFTER it has been generated. That's why predicate conditions on aggregate values must be placed in Having clause, not in the Where clause, and why you can use aliases defined in the Select clause in a Having Clause, but not in a Where Clause.
I had a problem and found out another difference between WHERE and HAVING. It does not act the same way on indexed columns.
WHERE my_indexed_row = 123 will show rows and automatically perform a "ORDER ASC" on other indexed rows.
HAVING my_indexed_row = 123 shows everything from the oldest "inserted" row to the newest one, no ordering.
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is used to filter values from a group (i.e., to
check conditions after aggregation into groups has been performed).
Resource from Here
From here.
the SQL standard requires that HAVING
must reference only columns in the
GROUP BY clause or columns used in
aggregate functions
as opposed to the WHERE clause which is applied to database rows
While working on a project, this was also my question. As stated above, the HAVING checks the condition on the query result already found. But WHERE is for checking condition while query runs.
Let me give an example to illustrate this. Suppose you have a database table like this.
usertable{ int userid, date datefield, int dailyincome }
Suppose, the following rows are in table:
1, 2011-05-20, 100
1, 2011-05-21, 50
1, 2011-05-30, 10
2, 2011-05-30, 10
2, 2011-05-20, 20
Now, we want to get the userids and sum(dailyincome) whose sum(dailyincome)>100
If we write:
SELECT userid, sum(dailyincome) FROM usertable WHERE
sum(dailyincome)>100 GROUP BY userid
This will be an error. The correct query would be:
SELECT userid, sum(dailyincome) FROM usertable GROUP BY userid HAVING
sum(dailyincome)>100
WHERE clause is used for comparing values in the base table, whereas the HAVING clause can be used for filtering the results of aggregate functions in the result set of the query
Click here!
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is
used to filter values from a group (i.e., to check conditions after
aggregation into groups has been performed).
I use HAVING for constraining a query based on the results of an aggregate function. E.G. select * in blahblahblah group by SOMETHING having count(SOMETHING)>0