Difference between HAVING and WHERE in SQL - sql

I've seen in other questions that the difference between HAVING and WHERE in SQL is that HAVING is used post-aggregation whereas WHERE is used pre-aggregation. However, I am still unsure about when to use pre-aggregation filtering or post-aggregation filtering.
As a concrete example, why don't these two queries yield the same result (the second sums quantity prematurely in a way that squashes the GROUP BY call)?
Using WHERE to obtain number of condo sales of each real estate agent.
SELECT agentId, SUM(quantity) total_sales
FROM sales s, houses h
WHERE s.houseId = h.houseId AND h.type = "condo"
GROUP BY agentId
ORDER BY total_sales;
Attempted use of HAVING to obtain the same quantity as above.
SELECT agentId, SUM(quantity) total_sales
FROM sales s, houses h
GROUP BY agentId
HAVING s.houseId = h.houseId AND h.type = "condo"
ORDER BY total_sales;
Note: these were written/tested/executed in sqlite3.

The simple way to think about it is to consider the order in which the steps are applied.
Step 1: Where clause filters data
Step 2: Group by is implemented (SUM / MAX / MIN / ETC)
Step 3: Having clause filters the results
So in your 2 examples:
SELECT agentId, SUM(quantity) total_sales
FROM sales s, houses h
WHERE s.houseId = h.houseId AND h.type = "condo"
GROUP BY agentId
ORDER BY total_sales;
Step 1: Filter by HouseId and Condo
Step 2: Add up the results
(number of houses that match the houseid and condo)
SELECT agentId, SUM(quantity) total_sales
FROM sales s, houses h
GROUP BY agentId
HAVING s.houseId = h.houseId AND h.type = "condo"
ORDER BY total_sales;
Step 1: No Filter
Step 2: Add up quantity of all houses
Step 3: Filter the results by houseid and condo.
Hopefully this clears up what is happening.
The easiest way to decide which you should use is:
- Use WHERE to filter the data
- Use HAVING to filter the results of an aggregation (SUM / MAX / MIN / ETC)

WHERE filters rows from the database. Then, if the query has aggregation, aggregation is ran based on the aggregate functions and GROUP BY clause in the query. After that point, HAVING is applied to filter the grouping results. The only filtering that HAVING allows is filtering on GROUP BY columns or calculated aggregates.
I must assume that you're using MySQL for your example query since, as other answers have noted, your HAVING clause doesn't make sense and MySQL has some default behaviors which are occasionally problematic and confusing.

First, learn to use proper, explicit, standard JOIN syntax.
Second, your query should look like:
SELECT s.agentId, SUM(s.quantity) as total_sales
FROM sales s JOIN
houses h
ON s.houseId = h.houseId
WHERE h.type = 'condo'
GROUP BY s.agentId
ORDER BY total_sales;
Your version of the query should generate an error in any reasonable database, because the HAVING clause has columns that are neither GROUP BY keys nor aggregation functions.
Additional notes:
The delimiter for a string is single quotes. If you use double quotes, things may not work as you expect.
You should qualify all column references, especially when your query references more than one table.
JOIN conditions belong in the ON clause, not in a WHERE clause.
Filtering on h.type after the aggregation makes no sense. If it did work, the sum() would include non-condos because the filtering is happening too late.

Related

Subtotal levels for ROLLUP on four columns in Oracle SQL

I'm trying to construct an Oracle SQL query using the ROLLUP operator. The query needs to summarize sales by year, quarter, store state, and store city over two years, with the result including subtotals for the two hierarchical dimensions of year/quarter and state/city. Here's my attempt thus far:
SELECT storestate, storecity, calyear, calquarter, SUM(sales) AS Sales
FROM store_dim, time_dim, sales_fact
WHERE sales_fact.storeid = store_dim.storeid
AND sales_fact.timenum = time_dim.timenum
AND (calyear BETWEEN 2011 AND 2012)
GROUP BY ROLLUP(calyear, calquarter, storestate, storecity);
I'm trying to figure out if, as it's currently written, the query is showing subtotals for the two hierarchies I'm looking for, rather than treating them as one big one. Attempting to map out the subtotal levels by hand didn't help, and I haven't been able to find any examples of a single ROLLUP with four columns from two dimensions, or an example of two ROLLUP operators in a single GROUP BY clause, like below:
GROUP BY ROLLUP(calyear, cal quarter), ROLLUP(storestate, storecity)
A breakdown of the subtotal levels produced by the two GROUP BY clauses would be hugely helpful.
Edit: I'm specifically to use ROLLUP here. GROUPING SETS would generally be the first choice for this kind of query otherwise.
Use grouping sets . . . and proper, explicit, standard join syntax:
select s.storestate, s.storecity, t.calyear, t.calquarter,
sum(sf.sales) AS Sales
from sales_fact sf join
store_dim s
on s.storeid = sf.storeid join
time_dim t
on sf.timenum = t.timenum
where calyear between 2011 and 2012
group by grouping sets ( (calyear, calquarter, storestate, storecity),
(calyear, calquarter), (storestate, storecity)
);

GROUP by ITEM number and the sum of Quantity to Stock

I am trying to get my head into space on how I can GROUP by the item number and add the qty to stock.
My current code is:
SELECT dbo.InventoryItems.ItemNo, dbo.InventoryItems.DescriptionMed AS Description
, SUM(dbo.InventoryItems.QtyToStock) AS QtytoStock, dbo.Locations.LocationCode
FROM dbo.InventoryItems INNER JOIN dbo.Locations
ON dbo.InventoryItems.LocationID = dbo.Locations.LocationID
GROUP BY dbo.InventoryItems.ItemNo, dbo.InventoryItems.QtyToStock
, dbo.InventoryItems.DescriptionMed, dbo.Locations.LocationCode
HAVING (dbo.InventoryItems.ItemNo LIKE 'CL10%')
and I am getting the below result:
But my expected output is:
CL1000 will just be in two rows with their sum.
Please help!
ItemNo Description QTYtoStock LocationCode
CL1000 Square Seat Legs 4 CREST
CL1000 Square Seat Legs 93 DZ
CL1002 Square Low Back Sofa 5 DZ
You clearly just need the right GROUP BY:
SELECT ii.ItemNo, ii.DescriptionMed AS Description,
SUM(ii.QtyToStock) AS QtytoStock, l.LocationCode
FROM dbo.InventoryItems ii INNER JOIN
dbo.Locations l
ON ii.LocationID = l.LocationID
WHERE ii.ItemNo LIKE 'CL10%'
GROUP BY ii.ItemNo, ii.DescriptionMed, l.LocationCode;
All the unaggregated columns (or expressions) should be in the GROUP BY. QtyToStock is being aggregated, so it is not appropriate.
Further advice:
Use table aliases. These should be abbreviations for the tables, so they are easy to follow.
Qualify column names with the shortened aliases. Much, much easier to write and read.
The HAVING clause is on a GROUP BY key. This is better handled (usually) using WHERE. The WHERE will reduce the number of rows that need to be aggregated, which is usually a performance win.

SQL Work out average from joined column

I have 3 columns I need to display and I need to join on another column that calculates the AVG from the CLUB_FEE column. My code does not work, it throws a "not a single-group group function" Can someone please help? Here is my SQL:
SELECT S.MEMBER_ID, S.CLUB_ID, C.CLUB_FEE, AVG(C.CLUB_FEE) AVGINCOME
FROM SUBSCRIPTION S, CLUB C
WHERE S.CLUB_ID = C.CLUB_ID;
i Suggest to use Inner join try it also When you include an aggregate function (like avg, sum) in your query, you must group by all columns :
SELECT S.MEMBER_ID, S.CLUB_ID, C.CLUB_FEE, AVG(C.CLUB_FEE) as AVGINCOME
FROM SUBSCRIPTION S INNER JOIN CLUB C
ON S.CLUB_ID = C.CLUB_ID
GROUP BY
S.MEMBER_ID, S.CLUB_ID, C.CLUB_FEE ;
Learn to use explicit JOIN syntax. Simple rule: Never use commas in the FROM clause. Always use explicit JOIN syntax.
In your case, you need to remove columns from the SELECT and the GROUP BY. If you want the average fee paid by any member, then you don't need the GROUP BY at all:
SELECT AVG(C.CLUB_FEE) as AVGINCOME
FROM SUBSCRIPTION S JOIN
CLUB C
ON S.CLUB_ID = C.CLUB_ID;
If you want to control the formatting, either use to_char():
SELECT TO_CHAR(AVG(C.CLUB_FEE), '999.99') as AVGINCOME
(check the documentation for other formats).
Or, cast to a decimal:
SELECT CAST(AVG(C.CLUB_FEE) AS DECIMAL(10, 2)) as AVGINCOME
If you need to display the three columns and the average, not just the average alone, you can do something like this:
SELECT S.MEMBER_ID, S.CLUB_ID, C.CLUB_FEE, A.AVGINCOME
FROM SUBSCRIPTION S INNER JOIN CLUB C
ON S.CLUB_ID = C.CLUB_ID
CROSS JOIN (SELECT AVG(CLUB_FEE) AS AVGINCOME FROM CLUB) A
;
If you need the average rounded to two decimal places, use ROUND(AVG(CLUB_FEE), 2) in the subquery.
A fancier solution, which doesn't require a join (so it doesn't scan the CLUB table twice), uses AVG as an analytic function - but doesn't partition by anything. You still need the PARTITION BY clause (with an empty column list) to indicate it's used as an analytic function, not as an aggregate.
SELECT S.MEMBER_ID, S.CLUB_ID, C.CLUB_FEE,
ROUND(AVG(C.CLUB_FEE) OVER (PARTITION BY NULL)) AS AVGINCOME
FROM SUBSCRIPTION S INNER JOIN CLUB C
ON S.CLUB_ID = C.CLUB_ID
;
Even fancier (although functionally identical) - the keyword OVER is needed to indicate analytic function, but you can also write it as OVER() (no need to even mention PARTITION BY NULL).

SQL: How to use sum in group by

SELECT idteam,
job,
price,
COUNT('X') as INFORMS,
SUM(COUNT('X') * price) as TOTAL
FROM REP
JOIN COSTS ON (job = categ AND to_number(to_char(REP,'YYYY')) = year)
GROUP BY idteam, job, price, TOTAL
ORDER BY IDTEAM;
I don't know why if I write TOTAL in GROUP BY and sql sends me error.. Identifier invalid.
I don't know how can I resolve that.
Thanks.
The column "TOTAL" is an alias for SUM(COUNT('X') * price).
It cannot be used as a column identifier in the GROUP BY clause. You must say GROUP BY SUM(COUNT('X') * price), because "TOTAL" is unknown/not a column, at the time of grouping.
After using GROUPING, you can refer to "TOTAL" in a HAVING clause.
In any case, the version/type of SQL your are using, doesn't allow it.
Additionally, why are you COUNTing 'X'? That X is a fixed value, and does not depend on any of your columns. If you would like to count each row, just use Count(1) or Count(*). Also, you don't need to SUM a COUNT. A COUNT is already summed.
You should post the structure of both REP and COSTS. Your linked image doesn't have enough info to support the query you wrote.
select
idteam,
-- job, /* not selected since it would need to be grouped*/
sum(price) as 'theSUM'
from REP
join COSTS
on REP.categ = COSTS.job
and COSTS.year = 2016
group by idteam
order by idteam

Use of the HAVING clause when using muliple sums

I was having a problem getting mulitple sums from multiple tables. Short story, my answer was solved in the "sql sum data from multiple tables" thread on this site. But where it came up short, is that now I'd like to only show sums that are greater than a certain amount. So while I have sub-selects in my select, I think I need to use a HAVING clause to filter the summed amounts that are too low.
Example, using the code specified in the link above (more specifically the answer that the owner has chosen as correct), I would only like to see a query result if SUM(AP2.Value) > 1500. Any thoughts?
If you need to filter on the results of ANY aggregate function, you MUST use a HAVING clause. WHERE is applied at the row level as the DB scans the tables for matching things. HAVING is applied basically immediately before the result set is sent out to the client. At the time WHERE operates, the aggregate function results are not (and cannot) be available, so you have to use a HAVING clause, which is applied after the main query is complete and all aggregate results are available.
So... long story short, yes, you'll need to do
SELECT ...
FROM ...
WHERE ...
HAVING (SUM_AP > 1500)
Note that you can use column aliases in the having clause. In technical terms, having on a query as above works basically exactly the same as wrapping the initial query in another query and applying another WHERE clause on the wrapper:
SELECT *
FROM (
SELECT ...
) AS child
WHERE (SUM_AP > 1500)
You could wrap that query as a subselect and then specify your criteria in the WHERE clause:
SELECT
PROJECT,
SUM_AP,
SUM_INV
FROM (
SELECT
AP1.[PROJECT],
(SELECT SUM(AP2.Value) FROM AP AS AP2 WHERE AP2.PROJECT = AP1.PROJECT) AS SUM_AP,
(SELECT SUM(INV2.Value) FROM INV AS INV2 WHERE INV2.PROJECT = AP1.PROJECT) AS SUM_INV
FROM AP AS AP1
INNER JOIN INV AS INV1 ON
AP1.[PROJECT] = INV1.[PROJECT]
WHERE
AP1.[PROJECT] = 'XXXXX'
GROUP BY
AP1.[PROJECT]
) SQ
WHERE
SQ.SUM_AP > 1500