This question already has answers here:
SQL - HAVING vs. WHERE
(9 answers)
Closed 9 years ago.
I am trying to understand the difference between HAVING and WHERE. I understand that HAVING is used with GROUP BY statements. However, I cannot understand why the following statement is accepted:
select SUM(child_id) from children WHERE child_ID = 5 GROUP BY Child_ID
Shouldn't the correct statement be select SUM(child_id) from children GROUP BY Child_ID HAVING child_ID = 5 ?
WHERE clauses are executed before the grouping process has occurred, and only have access to fields in the input table. HAVING is performed after the grouping pocess occurs, and can filter results based on the value of aggregate values computed in the grouping process.
The WHERE clause can be used even if a HAVING is being used. They mean very different things. The way to think about it is as follows:
The WHERE clause acts as a filter at the record level
Anything that gets through is then put into groups specified by your GROUP BY
Then, the HAVING clause filters out groups, based on aggregate (SUM, COUNT,
MIN, etc.) condition
So, if I have a table : ( STORE_ID, STATE_CODE, SALES)
Select STATE, SUM(SALES)
from MyTable
Where SALES > 100
Group By STATE
Having Sum(Sales) > 1000
This will first filter to read only the Store records with Sales over 100. For each Group (by State) it will sum the Sales of only those stores with Sales of 100 or more. Then, it will drop any State unless the State-level summation is more than 1000. [Note: The state summation excludes any store of sales 100 or less.]
Related
How can I retrieve rows where BID comes up multiple times in AID
You can see the sample below, AID and BID columns are under the PrimaryID, and BIDs are under AID. I want to come up with an output that only takes records where BIDs had 1 to many relationship with records on AIDs column. Example output below.
I provided a small sample of data, I am trying to retrieve 20+ columns and joining 4 tables. I have unqiue PrimaryIDs and under those I have multiple unique AIDs, however under these AIDs I can have multiple non-unqiue BIDs that can repeatedly come up under different AIDs.
Hive supports window functions. A window function can associate every row in a group with an attribute of the group. Count() being one of the supported functions. In your case you can use that a and select rows for which that count > 1
The partition by clause you specify which columns define the group, tge same way that you would in the more familiar group by clause.
Something like this:
select * from
(
Select *,
count(*) over (partition by primaryID,AID) counts
from mytable
) x
Where counts>1
This question already has answers here:
Why no windowed functions in where clauses?
(8 answers)
Closed 5 years ago.
I have a table that consists of an order, items on that order, and then the quantity of the item ordered.
What I would like to do is create an additional column for 'Order quantity' which is the sum of item quantities grouped by order (see graphic of table below... order B has 30 total quantity split across three lines)
I can easily accomplish this using sum and partition:
SUM(quantity) OVER (PARTITION BY order_id) order_qty
However, what I need to is then filter to only those orders having quantity > 20. When I try to add that criteria to the WHERE or HAVING clauses, I get this error:
ORA-30483: window functions are not allowed here
One solution appears to be to wrap the whole SQL block inside of another SELECT FROM statement, and then add a WHERE clause to filter by order_qty. Overall that seems sloppy and non-intuitive... Is there a better solution to filter based on an aggregated value that is partitioned at a higher level?
Replace with
SUM(quantity) OVER (PARTITION BY order_id order by 1) order_qty
Currently have a single table with large amount of data in access, due to the size I couldn't easily work with it in Excel any more.
I'm partially there on a query to pull data from this table.
7 Column table
One column GL_GL_NUM contains a transaction number. ~ 75% of these numbers are pairs. I'm trying to pull the records (all columns information) for each unique transaction number in this column.
I have put together some code from googling that hypothetically should work but I think I'm missing something on the syntax or simply asking access to do what it cannot.
See below:
SELECT SOURCE_FUND, GLType, Contract, Status, Debit, Credit, GL_GL_NUM
FROM Suspense
JOIN (
SELECT TC_TXN_NUM TXN_NUM, COUNT(GL_GL_NUM) GL_NUM
FROM Suspense
GROUP BY TC_TXN_NUM HAVING COUNT(GL_GL_NUM) > 1 ) SUB ON GL_GL_NUM = GL_NUM
Hey Beth is this the suggested code? It says there is a syntax error in the FROM clause. Thanks.
SELECT * from SuspenseGL
JOIN (
SELECT TC_TXN_NUM, COUNT(GL_GL_NUM) GL_NUM
FROM Suspense
GROUP BY TC_TXN_NUM
HAVING COUNT(GL_GL_NUM) > 1
Do you want detailed results (all rows and columns) or aggregate results, with one row per tx number?
If you want an aggregate result, like the count of distinct transaction numbers, then you need to apply one or more aggregate functions to any other columns you include.
If you run
SELECT TC_TXN_NUM, COUNT(GL_GL_NUM) GL_NUM
FROM Suspense
GROUP BY TC_TXN_NUM
HAVING COUNT(GL_GL_NUM) > 1
you'll get one row for each distinct txn, but if you then join those results back with your original table, you'll have the same number of rows as if you didn't join them with distinct txns at all.
Is there a column you don't want included in your results? If not, then the only query you need to work with is
select * from suspense
Considering your column names, what you may want is:
SELECT SOURCE_FUND, GLType, Contract, Status, sum(Debit) as sum_debit,
sum(Credit) as sum_credit, count(*) as txCount
FROM Suspense
group by
SOURCE_FUND, GLType, Contract, Status
based on your comments, if you can't work with aggregate results, you need to work with them all:
Select * from suspense
What's not working? It doesn't matter if 75% of the txns are duplicates, you need to send out every column in every row.
OK, let's say
Select * from suspense
returns 8 rows, and
select GL_GL_NUM from suspense group by GL_GL_NUM
returns 5 rows, because 3 of them have duplicate GL_GL_NUMs and 2 of them don't.
How many rows do you want in your result set? if you want less than 8 rows back, you need to perform some sort of aggregate function on each column you want returned.
You could do something like the following:
SELECT S.* FROM
SUSPENSE AS S
INNER JOIN (SELECT DISTINCT GL_GL_NUM, MIN(ID) AS ID FROM SUSPENSE
GROUP BY GL_GL_NUM) AS S2
ON S.ID = S2.ID
AND S.GL_GL_NUM = S2.GL_GL_NUM
Which would return a single row for a unique gl_gl_num. However if the other rows have different data it will not be shown. You would have to either aggregate that data up using SUM(Credit), SUM(Debit) and then GROUP BY the gl_gl_num.
I have attached a SQL Fiddle to demonstrate my results and make this clearer.
http://sqlfiddle.com/#!3/8284f/2
Given scenario:
table fd
(cust_id, fd_id) primary-key and amount
table loan
(cust_id, l_id) primary-key and amount
I want to list all customers who have a fixed deposit with an amount less than the sum of all their loans.
Query:
SELECT cust_id
FROM fd
WHERE amount
<
(SELECT sum(amount)
FROM loan
WHERE fd.cust_id = loan.cust_id);
OR should we use
SELECT cust_id
FROM fd
WHERE amount
<
(SELECT sum(amount)
FROM loan
WHERE fd.cust_id = loan.cust_id group by cust_id);
A customer can have multiple loans but one FD is considered at a time.
GROUP BY can be omitted in this case, because there is only (one) aggregate function(s) in the SELECT list and all rows are guaranteed to belong to the same group of cust_id ( by the WHERE clause).
The aggregation will be over all rows with matching cust_id in both cases. So both queries are correct.
This would be a cleaner another way to implement the same thing:
SELECT fd.cust_id
FROM fd
JOIN loan USING (cust_id)
GROUP BY fd.cust_id, fd.amount
HAVING fd.amount < sum(loan.amount)
There is one difference: rows with identical (cust_id, amount) in fd only appear once in the result of my query, while they would appear multiple times in the original.
Either way, if there is no matching row with a non-null amount in table loan, you get no rows at all. I assume you are aware of that.
There are no need for GROUP BY since you filtered data by cust_id. In any case inner query will return the same result.
No, it isn't, because you calculate sum(amount) for customer with id = fd.cust_id, so for a single customer.
However, if somehow your subquery calculate sum for more than one customer, the group by would cause the subquery to generate more than one row and this will cause the condition(<) to fail, and thus, the query to fail.
A query with an aggregate like sum but without a group by will output one group. The aggregates will be computed over all matching rows.
A subquery in a condition clause is only allowed to return one row. If the subquery returned multiple rows, what would the following expression mean?
where 1 > (... subquery ...)
So the group by must be omitted; you would even get an error for your second query.
N.B. When you specify all, any, or some a subquery can return multiple rows:
where 1 > ALL (... subquery ...)
But it's easy to see why that doesn't make sense in your case; you'd compare one customer's data to that of another.
I am running the following queries against a database:
CREATE TEMPORARY TABLE med_error_third_party_tmp
SELECT `med_error_category`.description AS category, `med_error_third_party_category`.error_count AS error_count
FROM
`med_error_category` INNER JOIN `med_error_third_party_category` ON med_error_category.`id` = `med_error_third_party_category`.`category`
WHERE
year = 2003
GROUP BY `med_error_category`.id;
The only problem is that when I create the temporary table and do a select * on it then it returns multiple rows, but the query above only returns one row. It seems to always return a single row unless I specify a GROUP BY, but then it returns a percentage of 1.0 like it should with a GROUP BY.
SELECT category,
error_count/SUM(error_count) AS percentage
FROM med_error_third_party_tmp;
Here are the server specs:
Server version: 5.0.77
Protocol version: 10
Server: Localhost via UNIX socket
Does anybody see a problem with this that is causing the problem?
Standard SQL requires you to specify a GROUP BY clause if any column is not wrapped in an aggregate function (IE: MIN, MAX, COUNT, SUM, AVG, etc), but MySQL supports "hidden columns in the GROUP BY" -- which is why:
SELECT category,
error_count/SUM(error_count) AS percentage
FROM med_error_third_party_tmp;
...runs without error. The problem with the functionality is that because there's no GROUP BY, the SUM is the SUM of the error_count column for the entire table. But the other column values are completely arbitrary - they can't be relied upon.
This:
SELECT category,
error_count/(SELECT SUM(error_count)
FROM med_error_third_party_tmp) AS percentage
FROM med_error_third_party_tmp;
...will give you a percentage on a per row basis -- category values will be duplicated because there's no grouping.
This:
SELECT category,
SUM(error_count)/x.total AS percentage
FROM med_error_third_party_tmp
JOIN (SELECT SUM(error_count) AS total
FROM med_error_third_party_tmp) x
GROUP BY category
...will gives you a percentage per category of the sum of the categories error_count values vs the sum of the error_count values for the entire table.
another way to do it - without the temp table as seperate item...
select category, error_count/sum(error_count) "Percentage"
from (SELECT mec.description category
, metpc.error_count
FROM med_error_category mec
, med_error_third_party_category metpc
WHERE mec.id = metpc.category
AND year = 2003
GROUP BY mec.id
);
i think you will notice that the percentage is unchanging over the categories. This is probably not what you want - you probably want to group the errors by category as well.