MS Access HW Query Missing Obvious Error Somewhere? - sql

I'm getting an error stating that I have not specified OrdID's table:
Here's what I have:
SELECT Last, OrderLine.OrdID, OrdDate, SUM(Price*Qty)
FROM ((Cus INNER JOIN Orders ON Cus.CID=Orders.CID)
INNER JOIN OrderLine
ON Orders.OrdID=OrderLine.OrdID)
INNER JOIN ProdFabric
ON OrderLine.PrID=ProdFabric.PrID
AND OrderLine.Fabric=ProdFabric.Fabric
GROUP BY Last
ORDER BY Last DESC, OrdID DESC;
When I hit run, it keeps saying that OrdID could refer to more than one table listed in the FROM clause.
Why does it keep saying that as I've specified which table to select for OrdID.
Tables:
Cus (**CID**, Last, First, Phone)
Orders (**OrdID**, OrdDate, ShipDate, CID)
Manu (**ManuID**, Name, Phone, City)
Prods (**PrID**, ManuID, Category)
ProdFabric (**PrID**, **Fabric**, Price)
Orderline (**OrdId**, **PrID**, Fabric, Qty)

Your ORDER BY clause is valid SQL-92 syntax.
Sadly, the Access database engine is not SQL-92 compliant. It doesn't let you use a column correlation name ('alias') from the SELECT clause. If you had used this:
SUM(Price * Qty) AS total_price
...
ORDER BY total_price
you would have received an error. (Aside: you should consider giving this expression a column correlation name anyhow.)
Instead of correlation names, the Access data engine is expecting either a column name or an expression (the latter being illegal in SQL-92); specified columns need not appear in the SELECT clause (again illegal in SQL-92). Because any column from any table in the FROM clause can be used, you need to disambiguate them with the table name; if you used a table correlation name in the FROM clause then you must use it in the ORDER BY clause (I don't make the rules!)
To satisfy the Access database engine's requirements, I think you need to change your ORDER BY clause to this:
ORDER BY Last DESC, OrderLine.OrdID DESC;
As an aside, I think your code would be more readable if you qualify you columns with table names in your SELECT clause even when they are unambiguous in context (I find using full table names a little wordy and prefer short table correlation names, specified in the data dictionary and used consistently in all queries). As it stands I can only guess that OrdDate is from Orders, and Price and Qty are from OrderLine. I've not idea what Last represents.

The ORDER BY clause is agnostic to what you've specified in the SELECT list. Its possible for example to order by a field that you don't actually include in the output select list.
Hence you need to be sure that the fields in the Order by list are not ambigious.

Related

SQL - Difference between .* and * in aggregate function query

SELECT reviews.*, COUNT(comments.review_id)
AS comment_count
FROM reviews
LEFT JOIN comments ON comments.review_id = reviews.review_id
GROUP BY reviews.review_id
ORDER BY reviews.review_id ASC;
When I run this code I get exactly what I want from my SQL query, however if I run the following
SELECT *, COUNT(comments.review_id)
AS comment_count
FROM reviews
LEFT JOIN comments ON comments.review_id = reviews.review_id
GROUP BY reviews.review_id
ORDER BY reviews.review_id ASC;
then I get an error "column must appear in GROUP BY clause or be used in an aggregate function
Just wondered what the difference was and why the behaviour is different.
Thanks
In the first example, the column are taken only from the reviews table. Although not databases allow the use of SELECT * with GROUP BY, it is allowed by Standard SQL, assuming that review_id is the primary key.
The issue is that that you are including columns in the SELECT that are not included in the GROUP BY. This is only allowed -- in certain databases -- under very special circumstances, where the columns in the GROUP BY are declared to uniquely identify each row (which a primary key does).
The second example has columns from comments that do not meet this condition. Hence it is not allowed.
In the select part of the query with group by, you can chose only those columns which you used in group by.
Since you did group by reviews.review_id, you can get the output for the first case. In the second query you are try to get all the records and that is not possible with group by.
You can use window function if you need to select columns which are not present in your group by clause. Hope it makes sense.
https://www.windowfunctions.com/

Refer to aggregate result in Amazon Redshift query?

In other postgresql DBMSes (e.g., Netezza) I can do something like this without errors:
select store_id
,sum(sales) as total_sales
,count(distinct(txn_id)) as d_txns
,total_sales/d_txns as avg_basket
from my_tlog
group by 1
I.e., I can use aggregate values within the same SQL query that defined them.
However, when I go to do the same sort of thing on Amazon Redshift, I get the error "Column total_sales does not exist..." Which it doesn't, that's correct; it's not really a column. But is there a way to preserve this idiom, rather than restructuring the query? I ask because there would be a lot of code to change.
Thanks.
You simply need to repeat the expressions (or use a subquery or CTE):
select store_id,
sum(sales) as total_sales,
count(distinct txn_id) as d_txns,
sum(sales)/count(distinct txn_id) as avg_basket
from my_tlog
group by store_id;
Most databases do not support the re-use of column aliases in the select. The reason is twofold (at least):
The designers of the database engine do not want to specify the order of processing of expressions in the select.
There is ambiguity when a column alias is also a valid column in a table in the from clause.
Personally I loove the construct in netezza. This is compact and the syntax is not ambiguous: any 'dublicate' column names will default to (new) alias in the current query, and if you need to reference the column of the underlying tables, simply put the tablename in front of the column. The above example would become:
select store_id
,sum(sales) as sales ---- dublicate name
,count(distinct(txn_id)) as d_txns
,my_tlog.sales/d_txns as avg_basket --- this illustrates but may not make sense
from my_tlog
group by 1
I recently moved away from sql server, and on that database I used a construct like this to avoid repeating the expressions:
Select *, total_sales/d_txns as avg_basket
From (
select store_id
,sum(sales) as total_sales
,count(distinct(txn_id)) as d_txns
from my_tlog
group by 1
)x
Most (if not all) databases will support this construct, and have done so for 10 years or more

GROUP BY SQL statement Error

I got this statement
SELECT ITEM.ITEMID, ITEMNAME, QUANTITY AS "QUANTITY SOLD"
FROM ORDERITEM,
NBUSER."ORDER",
PAYMENT,
ITEM
WHERE NBUSER."ORDER".PAYMENTID = PAYMENT.PAYMENTID
AND ITEM.ITEMID = ORDERITEM.ITEMID
AND PAYMENT.PAYMENTDATE BETWEEN '4/1/2017' AND '4/30/2017'
GROUP BY ITEM.ITEMID
ORDER BY ITEM.ITEMID DESC;
But it keeps giving me this error:
[Exception, Error code 30,000, SQLState 42Y36] Column reference
'ITEM.ITEMNAME' is invalid, or is part of an invalid expression. For
a SELECT list with a GROUP BY, the columns and expressions being
selected may only contain valid grouping expressions and valid
aggregate expressions. Line 1, column 1
I want to join the records with similar itemid together and adding the quantity up for all the same itemid.
First of all, you have 4 tables in the statement, but only 2 joins.
Do you do this intentionally? If not, you need to specify 1 more join.
When you use group by only for a part of columns in select, you must have an aggregate function for remaining columns in the select clause, in your case, for example:
select ITEM.ITEMID, max(ITEMNAME), sum(QUANTITY) AS "QUANTITY SOLD" ...
It is better to use table aliases for each column, otherwise, it is difficult to understand the sql statement in general.
You have defined alias here - ITEM.ITEMID (ITEM is an alias), but not there - ITEMNAME.
For the future reader, who may use another DBMS:
Depending on the standard version and the vendor, it may be allowed to additionally use columns in the SELECT statement, which are only functionally dependent on the grouping columns.
Source: An SQL 2008 standard draft (Link), part II, 7.12 -> General rule number 15:
" If T is a grouped table, then let G be the set of grouping columns of T. In each <value expression> contained
in <select list> , each column reference that references a column of T shall reference some column C that
is functionally dependent on G or shall be contained in an aggregated argument of a whose aggregation query is QS.
"

I am getting: "You tried to execute a query that does not include the specified expression 'OrdID' as part of an aggregate function. How do I bypass?

My code is as follows:
SELECT Last, OrderLine.OrdID, OrdDate, SUM(Price*Qty) AS total_price
FROM ((Cus INNER JOIN Orders ON Cus.CID=Orders.CID)
INNER JOIN OrderLine
ON Orders.OrdID=OrderLine.OrdID)
INNER JOIN ProdFabric
ON OrderLine.PrID=ProdFabric.PrID
AND OrderLine.Fabric=ProdFabric.Fabric
GROUP BY Last
ORDER BY Last DESC, OrderLine.OrdID DESC;
This code has been answered before, but vaguely. I was wondering where I am going wrong.
You tried to execute a query that does not include the specified expression 'OrdID' as part of an aggregate function.
Is the error message I keep getting, no matter what I change, it gives me this error. Yes I know, it is written as SQL-92, but how do I make this a legal function?
For almost every DBMS (MySQL is the only exception I'm aware of, but there could be others), every column in a SELECT that is not aggregated needs to be in the GROUP BY clause. In the case of your query, that would be everything but the columns in the SUM():
SELECT Last, OrderLine.OrdID, OrdDate, SUM(Price*Qty) AS total_price
...
GROUP BY Last, OrderLine.OrdID, OrdDate
ORDER BY Last DESC, OrderLine.OrdID DESC;
If you have to keep your GROUP BY intact (and not to add non-agreggated fields to the list) then you need to decide which values you will want for OrderLine.OrdID and OrdDate. For example, you may chose to have MAX or MIN of these values.
So it's either as bernie suggested GROUP BY Last, OrderLine.OrdID, OrdDate or something like this (if it makes sense for your business logic):
SELECT Last, MAX(OrderLine.OrdID), MAX(OrdDate), SUM(Price*Qty) AS total_price

GROUP BY / aggregate function confusion in SQL

I need a bit of help straightening out something, I know it's a very easy easy question but it's something that is slightly confusing me in SQL.
This SQL query throws a 'not a GROUP BY expression' error in Oracle. I understand why, as I know that once I group by an attribute of a tuple, I can no longer access any other attribute.
SELECT *
FROM order_details
GROUP BY order_no
However this one does work
SELECT SUM(order_price)
FROM order_details
GROUP BY order_no
Just to concrete my understanding on this.... Assuming that there are multiple tuples in order_details for each order that is made, once I group the tuples according to order_no, I can still access the order_price attribute for each individual tuple in the group, but only using an aggregate function?
In other words, aggregate functions when used in the SELECT clause are able to drill down into the group to see the 'hidden' attributes, where simply using 'SELECT order_no' will throw an error?
In standard SQL (but not MySQL), when you use GROUP BY, you must list all the result columns that are not aggregates in the GROUP BY clause. So, if order_details has 6 columns, then you must list all 6 columns (by name - you can't use * in the GROUP BY or ORDER BY clauses) in the GROUP BY clause.
You can also do:
SELECT order_no, SUM(order_price)
FROM order_details
GROUP BY order_no;
That will work because all the non-aggregate columns are listed in the GROUP BY clause.
You could do something like:
SELECT order_no, order_price, MAX(order_item)
FROM order_details
GROUP BY order_no, order_price;
This query isn't really meaningful (or most probably isn't meaningful), but it will 'work'. It will list each separate order number and order price combination, and will give the maximum order item (number) associated with that price. If all the items in an order have distinct prices, you'll end up with groups of one row each. OTOH, if there are several items in the order at the same price (say £0.99 each), then it will group those together and return the maximum order item number at that price. (I'm assuming the table has a primary key on (order_no, order_item) where the first item in the order has order_item = 1, the second item is 2, etc.)
The order in which SQL is written is not the same order it is executed.
Normally, you would write SQL like this:
SELECT
FROM
JOIN
WHERE
GROUP BY
HAVING
ORDER BY
Under the hood, SQL is executed like this:
FROM
JOIN
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Reason why you need to put all the non-aggregate columns in SELECT to the GROUP BY is the top-down behaviour in programming. You cannot call something you have not declared yet.
Read more: https://sqlbolt.com/lesson/select_queries_order_of_execution
SELECT *
FROM order_details
GROUP BY order_no
In the above query you are selecting all the columns because of that its throwing an error not group by something like..
to avoid that you have to mention all the columns whichever in select statement all columns must be in group by clause..
SELECT *
FROM order_details
GROUP BY order_no,order_details,etc
etc it means all the columns from order_details table.
To use group by clause you have to mention all the columns from select statement in to group by clause but not the column from aggregate function.
TO do this instead of group by you can use partition by clause you can use only one port to group as a partition by.
you can also make it as partition by 1
use Common table expression(CTE) to avoid this issue.
multiple CTes also come handy, pasting a case where I have used...maybe helpful
with ranked_cte1 as
( select r.mov_id,DENSE_RANK() over ( order by r.rev_stars desc )as rankked from ratings r ),
ranked_cte2 as ( select * from movie where mov_id=(select mov_id from ranked_cte1 where rankked=7 ) ) select * from ranked_cte2
select * from movie where mov_id=902