Oracle SQL - Comparing AVG functions in WHERE

Oracle SQL - Comparing AVG functions in WHERE - sql

I'm trying to write a few Oracle SQL scripts for an assignment. I've managed to get all of it to work, except for one part. To summarize, I have to display data from 2 tables if the average of 1 column in table A is greater than the average of another column in table B. I realize you cannot include AVG functions in a WHERE clause or HAVING clause since it seems unable to properly access the data (from what I've read). When I exclude this clause, the script executes properly, so I'm confident there are no other errors.
I've tried writing it as follows but the error I get is ORA-00936: missing expression and it is just before the > sign. I thought this may be due to improper bracket placing but none of my attempts resolved this. Here is my attempt:
SELECT l.l_category, SUM(r.r_sold), AVG(l.l_cost)
FROM promos l
INNER JOIN sales r
ON r.promo_id = l.promo_id
GROUP BY l.l_category
HAVING (SELECT AVG(l.l_cost) OVER (PARTITION BY l.l_cost)) >
(SELECT AVG(r.r_sold) OVER (PARTITION BY r.r_sold));
I tried doing this without the OVER (PARTITION BY ...) as well as putting it into a WHERE clause but it didn't resolve the error. I'm pretty sure I need to put it into a SELECT statement somehow but I'm at a loss.

You do not need to use the OVER clause when applying the aggregate functions in the HAVING clause. Just use the aggregate functions on their own.
SELECT l.l_category, SUM(r.r_sold), AVG(l.l_cost)
FROM promos l
INNER JOIN sales r
ON r.promo_id = l.promo_id
GROUP BY l.l_category
HAVING HAVING AVG(l.l_cost) > AVG(r.r_sold)

Related

SQLite alias (AS) not working in the same query

I'm stuck in an (apparently) extremely trivial task that I can't make work , and I really feel no chance than to ask for advice.
I used to deal with PHP/MySQL more than 10 years ago and I might be quite rusty now that I'm dealing with an SQLite DB using Qt5.
Basically I'm selecting some records while wanting to make some math operations on the fetched columns. I recall (and re-read some documentation and examples) that the keyword "AS" is going to conveniently rename (alias) a value.
So for example I have this query, where "X" is an integer number that I render into this big Qt string before executing it with a QSqlQuery. This query lets me select all the electronic components used in a Project and calculate how many of them to order (rounding to the nearest multiple of 5) and the total price per component.
SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category,
Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer,
Inventory.price, UsedItems.qty_used as used_qty,
UsedItems.qty_used*X AS To_Order,
ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5,
Inventory.price*Nearest5 AS TotPrice
FROM Inventory
LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid
WHERE UsedItems.pid='1'
ORDER BY RefDes, value ASC
So, for example, I aliased UsedItems.qty_used as used_qty. At first I tried to use it in the next field, multiplying it by X, writing "used_qty*X AS To_Order" ... Query failed. Well, no worries, I had just put the original tab.field name and it worked.
Going further, I have a complex calculation and I want to use its result on the next field, but the same issue popped out: if I alias "ROUND(...)" AS Nearest5, and then try to use this value by multiplying it in the next field, the query will fail.
Please note: the query WORKS, but ONLY if I don't use aliases in the following fields, namely if I don't use the alias Nearest5 in the TotPrice field. I just want to avoid re-writing the whole ROUND(...) thing for the TotPrice field.
What am I missing/doing wrong? Either SQLite does not support aliases on the same query or I am using a wrong syntax and I am just too stuck/confused to see the mistake (which I'm sure it has to be really stupid).

Column aliases defined in a SELECT cannot be used:
For other expressions in the same SELECT.
For filtering in the WHERE.
For conditions in the FROM clause.
Many databases also restrict their use in GROUP BY and HAVING.
All databases support them in ORDER BY.
This is how SQL works. The issue is two things:
The logic order of processing clauses in the query (i.e. how they are compiled). This affects the scoping of parameters.
The order of processing expressions in the SELECT. This is indeterminate. There is no requirement for the ordering of parameters.
For a simple example, what should x refer to in this example?
select x as a, y as x
from t
where x = 2;
By not allowing duplicates, SQL engines do not have to make a choice. The value is always t.x.

You can try with nested queries.
A SELECT query can be nested in another SELECT query within the FROM clause;
multiple queries can be nested, for example by following the following pattern:
SELECT *,[your last Expression] AS LastExp From (SELECT *,[your Middle Expression] AS MidExp FROM (SELECT *,[your first Expression] AS FirstExp FROM yourTables));
Obviously, respecting the order that the expressions of the innermost select query can be used by subsequent select queries:
the first expressions can be used by all other queries, but the other intermediate expressions can only be used by queries that are further upstream.
For your case, your query may be:
SELECT *, PRC*Nearest5 AS TotPrice FROM (SELECT *, ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5 FROM (SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category, Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer, Inventory.price AS PRC, UsedItems.qty_used*X AS To_Order FROM Inventory LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid WHERE UsedItems.pid='1' ORDER BY RefDes, value ASC))

SQL ORDER BY clause causing GROUP BY/aggregate error

I get this error:
PG::GroupingError: ERROR: column "relationships.created_at" must appear in the GROUP BY clause or be used in an aggregate function
from this query:
last_check = #user.last_check.to_i
#new_relationships = User.select('*')
.from("(#{#rels_unordered.to_sql}) AS rels_unordered")
.joins("
INNER JOIN relationships
ON rels_unordered.id = relationships.character_id
WHERE EXTRACT(EPOCH FROM relationships.created_at) > #{last_check}
ORDER BY relationships.created_at DESC
")
Without the ORDER BY line, it works fine. I don't understand what the GROUP BY clause is. How do I get it working and still order by relationships.created_at?
EDIT
I understand you can GROUP BY relationships.created_at. But isn't grouping unnecessary? Is the problem that relationship.created_at is not included in the SELECT? How do you include it? If you've already done an INNER JOIN with relationships, why the hell isn't relationships.created_at included in the result??
I've just realised this is all happening because the logs show the query begins with SELECT COUNT(*) FROM..... So the COUNT is the aggregate function. But I never requested a COUNT! Why does the query start with that?
EDIT 2
Ok, this seems to be happening because of lazy querying. The first thing that happens to #new_relationships is #new_relationships.any? This affects the query and turns it into a count. So I suppose the question is, how do I force the query to run as originally intended? And also to check if #new_relationships is empty without affecting the sql query?

You just need to add group by along with your order by clause
last_check = #user.last_check.to_i
#new_relationships =
User.select('"rels_unordered".*')
.from("(#{#rels_unordered.to_sql}) AS rels_unordered")
.joins("INNER JOIN relationships
ON rels_unordered.id = relationships.character_id
WHERE EXTRACT(EPOCH FROM relationships.created_at) > #{last_check}
GROUP BY relationships.created_at
ORDER BY relationships.created_at DESC ")

It's asking for a GROUP BY after your FROM clause in your EXTRACT (essentially a subselect). There's ways around it, but I've found it's often easier to make a GROUP BY work. Try: ...FROM relationships.created_at GROUP BY id or whatever indexing column you are using from that table. It seems like your ORDER BY is conflicting with itself. By grouping the subselect data it should lose its conflict.

SqlSyntaxErrorException in DB2 query

I am trying to execute the following db2 query, but I'm getting this error:
SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-119, SQLSTATE=42803, SQLERRMC=ENTITLEMENT
The query is:
SELECT *
FROM reclaimbalance rb
,user_benefit_accrued_period ubap
WHERE rb.user_id = ubap.user_id
AND rb.component_id = ubap.PAY_HEAD_ID
AND ubap.CUSTOMER_ID = 281
AND rb.year = '2016-2017'
AND ubap.STATUS = 1
GROUP BY ubap.user_id
,ubap.PAY_HEAD_ID
HAVING sum(ubap.STD_BALANCE_ADDED_IN_PERIOD) != rb.ENTITLEMENT

One problem is SELECT *. I would expect the error to be a bit different from just a generic syntax error, though.
Also, you should learn to use explicit JOIN syntax. And, your HAVING clause has an unaggregated column.
I assume you want something like this:
SELECT ubap.user_id, ubap.PAY_HEAD_ID, rb.ENTITLEMENT,
sum(ubap.STD_BALANCE_ADDED_IN_PERIOD)
FROM reclaimbalance rb JOIN
user_benefit_accrued_period ubap
ON rb.user_id = ubap.user_id AND rb.component_id = ubap.PAY_HEAD_ID
WHERE ubap.CUSTOMER_ID = 281 AND rb.year = '2016-2017' AND ubap.STATUS = 1
GROUP BY ubap.user_id, ubap.PAY_HEAD_ID, rb.ENTITLEMENT
HAVING sum(ubap.STD_BALANCE_ADDED_IN_PERIOD) <> rb.ENTITLEMENT;

The problem diagnosed by the sqlcode=-119 is that the column "ENTITLEMENT" specified in the HAVING clause is neither coded on that HAVING clause within an aggregate function nor is the column "ENTITLEMENT" specified in the GROUP BY clause.
Recovery is by either removing the non-aggregate reference to the column "ENTITLEMENT" from the HAVING clause, changing the reference to the column "ENTITLEMENT" to be inside of an aggregate function, or adding the column "ENTITLEMENT" to the GROUP BY clause.
Noting however, that despite being valid recovery, the effect may not be what is required. And even after that sqlcode=-119 problem is resolved according to any one of the possible recovery actions noted above, almost surely the next problem will be for a sqlcode-=-122 suggesting that some columns or non-aggregate\non-[effective-]constant expressions included in the SELECT-list are not also named on the GROUP BY clause. FWiW: Despite allusion(s) otherwise, the SELECT * can be compatible with GROUP BY, but just a special-case; as the rules state, the selected column [implicitly selected per the asterisk] must have been specified also, on the GROUP BY.

Use of the HAVING clause when using muliple sums

I was having a problem getting mulitple sums from multiple tables. Short story, my answer was solved in the "sql sum data from multiple tables" thread on this site. But where it came up short, is that now I'd like to only show sums that are greater than a certain amount. So while I have sub-selects in my select, I think I need to use a HAVING clause to filter the summed amounts that are too low.
Example, using the code specified in the link above (more specifically the answer that the owner has chosen as correct), I would only like to see a query result if SUM(AP2.Value) > 1500. Any thoughts?

If you need to filter on the results of ANY aggregate function, you MUST use a HAVING clause. WHERE is applied at the row level as the DB scans the tables for matching things. HAVING is applied basically immediately before the result set is sent out to the client. At the time WHERE operates, the aggregate function results are not (and cannot) be available, so you have to use a HAVING clause, which is applied after the main query is complete and all aggregate results are available.
So... long story short, yes, you'll need to do
SELECT ...
FROM ...
WHERE ...
HAVING (SUM_AP > 1500)
Note that you can use column aliases in the having clause. In technical terms, having on a query as above works basically exactly the same as wrapping the initial query in another query and applying another WHERE clause on the wrapper:
SELECT *
FROM (
SELECT ...
) AS child
WHERE (SUM_AP > 1500)

You could wrap that query as a subselect and then specify your criteria in the WHERE clause:
SELECT
PROJECT,
SUM_AP,
SUM_INV
FROM (
SELECT
AP1.[PROJECT],
(SELECT SUM(AP2.Value) FROM AP AS AP2 WHERE AP2.PROJECT = AP1.PROJECT) AS SUM_AP,
(SELECT SUM(INV2.Value) FROM INV AS INV2 WHERE INV2.PROJECT = AP1.PROJECT) AS SUM_INV
FROM AP AS AP1
INNER JOIN INV AS INV1 ON
AP1.[PROJECT] = INV1.[PROJECT]
WHERE
AP1.[PROJECT] = 'XXXXX'
GROUP BY
AP1.[PROJECT]
) SQ
WHERE
SQ.SUM_AP > 1500

Group by SQL statement

So I got this statement, which works fine:
SELECT MAX(patient_history_date_bio) AS med_date, medication_name
FROM biological
WHERE patient_id = 12)
GROUP BY medication_name
But, I would like to have the corresponding medication_dose also. So I type this up
SELECT MAX(patient_history_date_bio) AS med_date, medication_name, medication_dose
FROM biological
WHERE (patient_id = 12)
GROUP BY medication_name
But, it gives me an error saying:
"coumn 'biological.medication_dose' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.".
So I try adding medication_dose to the GROUP BY clause, but then it gives me extra rows that I don't want.
I would like to get the latest row for each medication in my table. (The latest row is determined by the max function, getting the latest date).
How do I fix this problem?

Use:
SELECT b.medication_name,
b.patient_history_date_bio AS med_date,
b.medication_dose
FROM BIOLOGICAL b
JOIN (SELECT y.medication_name,
MAX(y.patient_history_date_bio) AS max_date
FROM BIOLOGICAL y
GROUP BY y.medication_name) x ON x.medication_name = b.medication_name
AND x.max_date = b.patient_history_date_bio
WHERE b.patient_id = ?

If you really have to, as one quick workaround, you can apply an aggregate function to your medication_dose such as MAX(medication_dose).
However note that this is normally an indication that you are either building the query incorrectly, or that you need to refactor/normalize your database schema. In your case, it looks like you are tackling the query incorrectly. The correct approach should the one suggested by OMG Poinies in another answer.
You may be interested in checking out the following interesting article which describes the reasons behind this error:
But WHY Must That Column Be Contained in an Aggregate Function or the GROUP BY clause?

You need to put max(medication_dose) in your select. Group by returns a result set that contains distinct values for fields in your group by clause, so apparently you have multiple records that have the same medication_name, but different doses, so you are getting two results.
By putting in max(medication_dose) it will return the maximum dose value for each medication_name. You can use any aggregate function on dose (max, min, avg, sum, etc.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas