GROUP BY / aggregate function confusion in SQL - sql

I need a bit of help straightening out something, I know it's a very easy easy question but it's something that is slightly confusing me in SQL.
This SQL query throws a 'not a GROUP BY expression' error in Oracle. I understand why, as I know that once I group by an attribute of a tuple, I can no longer access any other attribute.
SELECT *
FROM order_details
GROUP BY order_no
However this one does work
SELECT SUM(order_price)
FROM order_details
GROUP BY order_no
Just to concrete my understanding on this.... Assuming that there are multiple tuples in order_details for each order that is made, once I group the tuples according to order_no, I can still access the order_price attribute for each individual tuple in the group, but only using an aggregate function?
In other words, aggregate functions when used in the SELECT clause are able to drill down into the group to see the 'hidden' attributes, where simply using 'SELECT order_no' will throw an error?

In standard SQL (but not MySQL), when you use GROUP BY, you must list all the result columns that are not aggregates in the GROUP BY clause. So, if order_details has 6 columns, then you must list all 6 columns (by name - you can't use * in the GROUP BY or ORDER BY clauses) in the GROUP BY clause.
You can also do:
SELECT order_no, SUM(order_price)
FROM order_details
GROUP BY order_no;
That will work because all the non-aggregate columns are listed in the GROUP BY clause.
You could do something like:
SELECT order_no, order_price, MAX(order_item)
FROM order_details
GROUP BY order_no, order_price;
This query isn't really meaningful (or most probably isn't meaningful), but it will 'work'. It will list each separate order number and order price combination, and will give the maximum order item (number) associated with that price. If all the items in an order have distinct prices, you'll end up with groups of one row each. OTOH, if there are several items in the order at the same price (say £0.99 each), then it will group those together and return the maximum order item number at that price. (I'm assuming the table has a primary key on (order_no, order_item) where the first item in the order has order_item = 1, the second item is 2, etc.)

The order in which SQL is written is not the same order it is executed.
Normally, you would write SQL like this:
SELECT
FROM
JOIN
WHERE
GROUP BY
HAVING
ORDER BY
Under the hood, SQL is executed like this:
FROM
JOIN
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Reason why you need to put all the non-aggregate columns in SELECT to the GROUP BY is the top-down behaviour in programming. You cannot call something you have not declared yet.
Read more: https://sqlbolt.com/lesson/select_queries_order_of_execution

SELECT *
FROM order_details
GROUP BY order_no
In the above query you are selecting all the columns because of that its throwing an error not group by something like..
to avoid that you have to mention all the columns whichever in select statement all columns must be in group by clause..
SELECT *
FROM order_details
GROUP BY order_no,order_details,etc
etc it means all the columns from order_details table.

To use group by clause you have to mention all the columns from select statement in to group by clause but not the column from aggregate function.
TO do this instead of group by you can use partition by clause you can use only one port to group as a partition by.
you can also make it as partition by 1

use Common table expression(CTE) to avoid this issue.
multiple CTes also come handy, pasting a case where I have used...maybe helpful
with ranked_cte1 as
( select r.mov_id,DENSE_RANK() over ( order by r.rev_stars desc )as rankked from ratings r ),
ranked_cte2 as ( select * from movie where mov_id=(select mov_id from ranked_cte1 where rankked=7 ) ) select * from ranked_cte2
select * from movie where mov_id=902

Related

SQL - Difference between .* and * in aggregate function query

SELECT reviews.*, COUNT(comments.review_id)
AS comment_count
FROM reviews
LEFT JOIN comments ON comments.review_id = reviews.review_id
GROUP BY reviews.review_id
ORDER BY reviews.review_id ASC;
When I run this code I get exactly what I want from my SQL query, however if I run the following
SELECT *, COUNT(comments.review_id)
AS comment_count
FROM reviews
LEFT JOIN comments ON comments.review_id = reviews.review_id
GROUP BY reviews.review_id
ORDER BY reviews.review_id ASC;
then I get an error "column must appear in GROUP BY clause or be used in an aggregate function
Just wondered what the difference was and why the behaviour is different.
Thanks
In the first example, the column are taken only from the reviews table. Although not databases allow the use of SELECT * with GROUP BY, it is allowed by Standard SQL, assuming that review_id is the primary key.
The issue is that that you are including columns in the SELECT that are not included in the GROUP BY. This is only allowed -- in certain databases -- under very special circumstances, where the columns in the GROUP BY are declared to uniquely identify each row (which a primary key does).
The second example has columns from comments that do not meet this condition. Hence it is not allowed.
In the select part of the query with group by, you can chose only those columns which you used in group by.
Since you did group by reviews.review_id, you can get the output for the first case. In the second query you are try to get all the records and that is not possible with group by.
You can use window function if you need to select columns which are not present in your group by clause. Hope it makes sense.
https://www.windowfunctions.com/

Group by or Distinct - But several fields

How can I use a Distinct or Group by statement on 1 field with a SELECT of All or at least several ones?
Example: Using SQL SERVER!
SELECT id_product,
description_fr,
DiffMAtrice,
id_mark,
id_type,
NbDiffMatrice,
nom_fr,
nouveaute
From C_Product_Tempo
And I want Distinct or Group By nom_fr
JUST GOT THE ANSWER:
select id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
from (
SELECT rn = row_number() over (partition by [nom_fr] order by id_mark)
, id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
From C_Product_Tempo
) d
where rn = 1
And this works prfectly!
If I'm understanding you correctly, you just want the first row per nom_fr. If so, you can simply use a subquery to get the lowest id_product per nom_fr, and just get the corresponding rows;
SELECT * FROM C_Product_Tempo WHERE id_product IN (
SELECT MIN(id_product) FROM C_Product_Tempo GROUP BY nom_fr
);
An SQLfiddle to test with.
You need to decide what to do with the other fields. For example, for numeric fields, do you want a sum? Average? Max? Min? For non-numeric fields to you want the values from a particular record if there are more than one with the same nom_fr?
Some SQL Systems allow you to get a "random" record when you do a GROUP BY, but SQL Server will not - you must define the proper aggregation for columns that are not in the GROUP BY.
GROUP BY is used to group in conjunction with an aggregate function (see http://www.w3schools.com/sql/sql_groupby.asp), so it's no use grouping without counting, summing up etc. DISTINCT eleminates duplicates but how that matches with the other columns you want to extract, I can't imagine, because some rows will be removed from the result.

Give priority to ORDER BY over a GROUP BY in MySQL without subquery

I have the following query which does what I want, but I suspect it is possible to do this without a subquery:
SELECT *
FROM (SELECT *
FROM 'versions'
ORDER BY 'ID' DESC) AS X
GROUP BY 'program'
What I need is to group by program, but returning the results for the objects in versions with the highest value of "ID".
In my past experience, a query like this should work in MySQL, but for some reason, it's not:
SELECT *
FROM 'versions'
GROUP BY 'program'
ORDER BY MAX('ID') DESC
What I want to do is have MySQL do the ORDER BY first and then the GROUP BY, but it insists on doing the GROUP BY first followed by the ORDER BY. i.e. it is sorting the results of the grouping instead of grouping the results of the ordering.
Of course it is not possible to write
SELECT * FROM 'versions' ORDER BY 'ID' DESC GROUP BY 'program'
Thanks.
By definition, ORDER BY is processed after grouping with GROUP BY. By definition, the conceptual way any SELECT statement is processed is:
Compute the cartesian product of all tables referenced in the FROM clause
Apply the join criteria from the FROM clause to filter the results
Apply the filter criteria in the WHERE clause to further filter the results
Group the results into subsets based on the GROUP BY clause, collapsing the results to a single row for each such subset and computing the values of any aggregate functions -- SUM(), MAX(), AVG(), etc. -- for each such subset. Note that if no GROUP BY clause is specified, the results are treated as if there is a single subset and any aggregate functions apply to the entire results set, collapsing it to a single row.
Filter the now-grouped results based on the HAVING clause.
Sort the results based on the ORDER BY clause.
The only columns allowed in the results set of a SELECT with a GROUP BY clause are, of course,
The columns referenced in the GROUP BY clause
Aggregate functions (such as MAX())
literal/constants
expresssions derived from any of the above.
Only broken SQL implementations allow things like select xxx,yyy,a,b,c FROM foo GROUP BY xxx,yyy — the references to colulmsn a, b and c are meaningless/undefined, given that the individual groups have been collapsed to a single row,
This should do it and work pretty well as long as there is a composite index on (program,id). The subquery should only inspect the very first id for each program branch, and quickly retrieve the required record from the outer query.
select v.*
from
(
select program, MAX(id) id
from versions
group by program
) m
inner join versions v on m.program=v.program and m.id=v.id
SELECT v.*
FROM (
SELECT DISTINCT program
FROM versions
) vd
JOIN versions v
ON v.id =
(
SELECT vi.id
FROM versions vi
WHERE vi.program = vd.program
ORDER BY
vi.program DESC, vi.id DESC
LIMIT 1
)
Create an index on (program, id) for this to work fast.
Regarding your original query:
SELECT * FROM 'versions' GROUP BY 'program' ORDER BY MAX('ID') DESC
This query would not parse in any SQL dialect except MySQL.
It abuses MySQL's ability to return ungrouped and unaggregated expressions from a GROUP BY statement.

Can I group by something that isn't in the SELECT line?

Given a command in SQL;
SELECT ...
FROM ...
GROUP BY ...
Can I group by something that isn't in the SELECT line?
Yes.
This is often used in the superaggregate queries like this:
SELECT AVG(cnt)
FROM (
SELECT COUNT(*) AS cnt
FROM sales
GROUP BY
product
HAVING COUNT(*) > 10
) q
, which aggregate the aggregates.
Yes of course e.g.
select
count(*)
from
some_table_with_updated_column
group by
trunc(updated, 'MM.YYYY')
Yes you can do it, but if you do that you won't be able to tell which result is for which group.
As a result, you almost always want to return the columns you've grouped by in the select clause. But you don't have to.
Yes, you can. Example:
select count(1)
from sales
group by salesman_id
What you can't do, of course, if having something on your select clause (other than aggregate functions) that are not part of the group by clause.
Hmm, I think the question should have been in the other way round like,
Can I SELECT something that is not there in the GROUP BY?
It's alright to write a code like:
SELECT customerId, count(orderId) FROM orders
GROUP BY customerId, orderedOn
If you want to find out the number of orders done by a customer datewise.
But you cannot do it the other way round:
SELECT customerId, orderedOn count(orderId) FROM orders
GROUP BY customerId
You can issue an aggregate function on the column that is not there in the group by. But you cannot give it in the select line without the aggregate function. As it will not make much sense. Like for the above query. You group by just customerId for order counts and you want the date also to be printed in the output??!! You don't involve the date factor in the group for counting then will it mean something to have a date in it?
I don't know about other DBMS' but DB2/z, for one, does this just fine. It's not required to have the column in the select portion but, of course, it does have to extract the data from the table in order to aggregate so you're probably not saving any time by leaving it off. You should only select the columns that you need, aggregation of the data is a separate task from that.
I'm pretty certain the SQL standard allows this (although that's only based on the knowledge that the mainframe DB2 product follows it pretty closely).

MS Access HW Query Missing Obvious Error Somewhere?

I'm getting an error stating that I have not specified OrdID's table:
Here's what I have:
SELECT Last, OrderLine.OrdID, OrdDate, SUM(Price*Qty)
FROM ((Cus INNER JOIN Orders ON Cus.CID=Orders.CID)
INNER JOIN OrderLine
ON Orders.OrdID=OrderLine.OrdID)
INNER JOIN ProdFabric
ON OrderLine.PrID=ProdFabric.PrID
AND OrderLine.Fabric=ProdFabric.Fabric
GROUP BY Last
ORDER BY Last DESC, OrdID DESC;
When I hit run, it keeps saying that OrdID could refer to more than one table listed in the FROM clause.
Why does it keep saying that as I've specified which table to select for OrdID.
Tables:
Cus (**CID**, Last, First, Phone)
Orders (**OrdID**, OrdDate, ShipDate, CID)
Manu (**ManuID**, Name, Phone, City)
Prods (**PrID**, ManuID, Category)
ProdFabric (**PrID**, **Fabric**, Price)
Orderline (**OrdId**, **PrID**, Fabric, Qty)
Your ORDER BY clause is valid SQL-92 syntax.
Sadly, the Access database engine is not SQL-92 compliant. It doesn't let you use a column correlation name ('alias') from the SELECT clause. If you had used this:
SUM(Price * Qty) AS total_price
...
ORDER BY total_price
you would have received an error. (Aside: you should consider giving this expression a column correlation name anyhow.)
Instead of correlation names, the Access data engine is expecting either a column name or an expression (the latter being illegal in SQL-92); specified columns need not appear in the SELECT clause (again illegal in SQL-92). Because any column from any table in the FROM clause can be used, you need to disambiguate them with the table name; if you used a table correlation name in the FROM clause then you must use it in the ORDER BY clause (I don't make the rules!)
To satisfy the Access database engine's requirements, I think you need to change your ORDER BY clause to this:
ORDER BY Last DESC, OrderLine.OrdID DESC;
As an aside, I think your code would be more readable if you qualify you columns with table names in your SELECT clause even when they are unambiguous in context (I find using full table names a little wordy and prefer short table correlation names, specified in the data dictionary and used consistently in all queries). As it stands I can only guess that OrdDate is from Orders, and Price and Qty are from OrderLine. I've not idea what Last represents.
The ORDER BY clause is agnostic to what you've specified in the SELECT list. Its possible for example to order by a field that you don't actually include in the output select list.
Hence you need to be sure that the fields in the Order by list are not ambigious.