Give priority to ORDER BY over a GROUP BY in MySQL without subquery - sql

I have the following query which does what I want, but I suspect it is possible to do this without a subquery:
SELECT *
FROM (SELECT *
FROM 'versions'
ORDER BY 'ID' DESC) AS X
GROUP BY 'program'
What I need is to group by program, but returning the results for the objects in versions with the highest value of "ID".
In my past experience, a query like this should work in MySQL, but for some reason, it's not:
SELECT *
FROM 'versions'
GROUP BY 'program'
ORDER BY MAX('ID') DESC
What I want to do is have MySQL do the ORDER BY first and then the GROUP BY, but it insists on doing the GROUP BY first followed by the ORDER BY. i.e. it is sorting the results of the grouping instead of grouping the results of the ordering.
Of course it is not possible to write
SELECT * FROM 'versions' ORDER BY 'ID' DESC GROUP BY 'program'
Thanks.

By definition, ORDER BY is processed after grouping with GROUP BY. By definition, the conceptual way any SELECT statement is processed is:
Compute the cartesian product of all tables referenced in the FROM clause
Apply the join criteria from the FROM clause to filter the results
Apply the filter criteria in the WHERE clause to further filter the results
Group the results into subsets based on the GROUP BY clause, collapsing the results to a single row for each such subset and computing the values of any aggregate functions -- SUM(), MAX(), AVG(), etc. -- for each such subset. Note that if no GROUP BY clause is specified, the results are treated as if there is a single subset and any aggregate functions apply to the entire results set, collapsing it to a single row.
Filter the now-grouped results based on the HAVING clause.
Sort the results based on the ORDER BY clause.
The only columns allowed in the results set of a SELECT with a GROUP BY clause are, of course,
The columns referenced in the GROUP BY clause
Aggregate functions (such as MAX())
literal/constants
expresssions derived from any of the above.
Only broken SQL implementations allow things like select xxx,yyy,a,b,c FROM foo GROUP BY xxx,yyy — the references to colulmsn a, b and c are meaningless/undefined, given that the individual groups have been collapsed to a single row,

This should do it and work pretty well as long as there is a composite index on (program,id). The subquery should only inspect the very first id for each program branch, and quickly retrieve the required record from the outer query.
select v.*
from
(
select program, MAX(id) id
from versions
group by program
) m
inner join versions v on m.program=v.program and m.id=v.id

SELECT v.*
FROM (
SELECT DISTINCT program
FROM versions
) vd
JOIN versions v
ON v.id =
(
SELECT vi.id
FROM versions vi
WHERE vi.program = vd.program
ORDER BY
vi.program DESC, vi.id DESC
LIMIT 1
)
Create an index on (program, id) for this to work fast.
Regarding your original query:
SELECT * FROM 'versions' GROUP BY 'program' ORDER BY MAX('ID') DESC
This query would not parse in any SQL dialect except MySQL.
It abuses MySQL's ability to return ungrouped and unaggregated expressions from a GROUP BY statement.

Related

Does it has to be that in SQL, aggregate function goes hand in hand with grouping?

Question in the title. Thanks for the time.
EXAMPLE v
SELECT
customer_id,
SUM(unit_price * quantity) AS total_price
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_id
GROUP BY customer_id
Yes, grouping does go hand in hand with aggregate functions, for any resulting column not contained within an aggregate function
Grouping and the operations of aggregation are typically linked to three keywords:
any aggregation function
the GROUP BY clause
the DISTINCT (ON) modifier after the SELECT keyword
You can use:
aggregation functions alone when they are used on every field found inside the SELECT statement
the GROUP BY clause alone - an allowed bad practice as you're intending to do an aggregation on your field but you're not specifying any aggregation function inside the SELECT statement (really you should go with DISTINCT ON)
the DISTINCT modifier alone to select distinct rows (not fields)
the DISTINCT ON modifier only when accompanied by an ORDER BY clause that defines the order for extracting the rows
aggregation functions + the GROUP BY clause, that is forced to contain all fields found in the SELECT statement that are not object of aggregation
the DISTINCT modifier + the GROUP BY clause - you can do it but the GROUP BY clause really is superfluous as it is already implied by the DISTINCT keyword itself.
You can't use:
aggregation functions + the GROUP BY clause when non-aggregated fields included in the SELECT statement are not found within the GROUP BY clause
aggregation functions alone when in the SELECT statement they are found together with non-aggregated fields.
When you can't aggregate because you need to select multiple fields, typically window functions + filtering operations (with a WHERE clause) can come in handy for computing values and row by row and removing unneeded records.

SQL - Difference between .* and * in aggregate function query

SELECT reviews.*, COUNT(comments.review_id)
AS comment_count
FROM reviews
LEFT JOIN comments ON comments.review_id = reviews.review_id
GROUP BY reviews.review_id
ORDER BY reviews.review_id ASC;
When I run this code I get exactly what I want from my SQL query, however if I run the following
SELECT *, COUNT(comments.review_id)
AS comment_count
FROM reviews
LEFT JOIN comments ON comments.review_id = reviews.review_id
GROUP BY reviews.review_id
ORDER BY reviews.review_id ASC;
then I get an error "column must appear in GROUP BY clause or be used in an aggregate function
Just wondered what the difference was and why the behaviour is different.
Thanks
In the first example, the column are taken only from the reviews table. Although not databases allow the use of SELECT * with GROUP BY, it is allowed by Standard SQL, assuming that review_id is the primary key.
The issue is that that you are including columns in the SELECT that are not included in the GROUP BY. This is only allowed -- in certain databases -- under very special circumstances, where the columns in the GROUP BY are declared to uniquely identify each row (which a primary key does).
The second example has columns from comments that do not meet this condition. Hence it is not allowed.
In the select part of the query with group by, you can chose only those columns which you used in group by.
Since you did group by reviews.review_id, you can get the output for the first case. In the second query you are try to get all the records and that is not possible with group by.
You can use window function if you need to select columns which are not present in your group by clause. Hope it makes sense.
https://www.windowfunctions.com/

Select all columns grouping by version - Postgres

I need to query all columns in a table of all customers, the main factor being the latest version for each customer.
My table:
My Query:
SELECT DISTINCT ON(code)
code,
namefile,
versioncol,
status
FROM table_A
ORDER BY versioncol desc
Error:
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON(code)
Postgres' error message is trying to tell you what to do:
DISTINCT ON expressions must match initial ORDER BY expressions
Actually that's quite clear: to make your code a valid DISTINCT ON query, you just need to add code (that's the DISTINCT ON expression) as a first sorting criteria to the query (ie as initial ORDER BY).
SELECT DISTINCT ON(code) a.*
FROM table_A a
ORDER BY code, versioncol DESC

Select Distinct On while Order By a different column

I'm coming from a MySQL background, where GROUP BY worked very differently than in Postgres. In Postgres - and apparently any standards-based SQL database - you have to group by all selected columns, while in MySQL you can handpick which ones to group by.
I read that you can get an equivalent effect with DISTINCT ON, and for the most part that's the case. The hitch is that you have to ORDER BY all the distinct columns, and this ordering has to be the left-most ordering. That's a problem when I want to order primarily by another column.
Right now my query looks like this:
SELECT
DISTINCT ON (eventable_id, eventable_type)
events.eventable_id, events.eventable_type, events.*
FROM events
WHERE <query>
ORDER BY eventable_id, eventable_type, events.created_at DESC
I would like to swap around the order by to look like this:
ORDER BY events.created_at, eventable_id, eventable_type DESC
Any advice for getting this to work?
Since you are selecting events.*, you shouldn't add eventable_id, and eventable_type to the output columns redundantly. Would result in duplicate column names. You know that you don't have to include the columns in the DISTINCT ON clause in target list, right?
Also, it's probably faster to use eventable_type DESC right away, since you have that in your final sort order. That's allowed, too.
SELECT DISTINCT ON (eventable_id, eventable_type)
*
FROM events
WHERE <condition>
ORDER BY eventable_id, eventable_type DESC, created_at DESC
#Denis already covers the rest: make that a subquery and order as you like in the outer query.
The alternative would be a subselect with GROUP BY and max(), but that yields multiple columns per group, when the latest created_at per group is not unique. (May or may not be desirable.) And it's probably still slower than DISTINCT ON with an additional ORDER BY step. Test with EXPLAIN ANALYZE.
SELECT e.*
FROM events e
JOIN (
SELECT eventable_id, eventable_type, max(created_at) AS created_at
FROM events
WHERE <condition>
GROUP BY 1, 2 DESC
) sub USING (eventable_id, eventable_type, created_at) -- maybe not unique
WHERE <repeat condition if dupes may be eliminated>
ORDER BY e.created_at, e.eventable_id, e.eventable_type DESC
If Postgres complains, use a subselect:
select * from ( ... ) q order by ...
(If it does, though, I'd take it as a hint that the query plan will suck.)

GROUP BY / aggregate function confusion in SQL

I need a bit of help straightening out something, I know it's a very easy easy question but it's something that is slightly confusing me in SQL.
This SQL query throws a 'not a GROUP BY expression' error in Oracle. I understand why, as I know that once I group by an attribute of a tuple, I can no longer access any other attribute.
SELECT *
FROM order_details
GROUP BY order_no
However this one does work
SELECT SUM(order_price)
FROM order_details
GROUP BY order_no
Just to concrete my understanding on this.... Assuming that there are multiple tuples in order_details for each order that is made, once I group the tuples according to order_no, I can still access the order_price attribute for each individual tuple in the group, but only using an aggregate function?
In other words, aggregate functions when used in the SELECT clause are able to drill down into the group to see the 'hidden' attributes, where simply using 'SELECT order_no' will throw an error?
In standard SQL (but not MySQL), when you use GROUP BY, you must list all the result columns that are not aggregates in the GROUP BY clause. So, if order_details has 6 columns, then you must list all 6 columns (by name - you can't use * in the GROUP BY or ORDER BY clauses) in the GROUP BY clause.
You can also do:
SELECT order_no, SUM(order_price)
FROM order_details
GROUP BY order_no;
That will work because all the non-aggregate columns are listed in the GROUP BY clause.
You could do something like:
SELECT order_no, order_price, MAX(order_item)
FROM order_details
GROUP BY order_no, order_price;
This query isn't really meaningful (or most probably isn't meaningful), but it will 'work'. It will list each separate order number and order price combination, and will give the maximum order item (number) associated with that price. If all the items in an order have distinct prices, you'll end up with groups of one row each. OTOH, if there are several items in the order at the same price (say £0.99 each), then it will group those together and return the maximum order item number at that price. (I'm assuming the table has a primary key on (order_no, order_item) where the first item in the order has order_item = 1, the second item is 2, etc.)
The order in which SQL is written is not the same order it is executed.
Normally, you would write SQL like this:
SELECT
FROM
JOIN
WHERE
GROUP BY
HAVING
ORDER BY
Under the hood, SQL is executed like this:
FROM
JOIN
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Reason why you need to put all the non-aggregate columns in SELECT to the GROUP BY is the top-down behaviour in programming. You cannot call something you have not declared yet.
Read more: https://sqlbolt.com/lesson/select_queries_order_of_execution
SELECT *
FROM order_details
GROUP BY order_no
In the above query you are selecting all the columns because of that its throwing an error not group by something like..
to avoid that you have to mention all the columns whichever in select statement all columns must be in group by clause..
SELECT *
FROM order_details
GROUP BY order_no,order_details,etc
etc it means all the columns from order_details table.
To use group by clause you have to mention all the columns from select statement in to group by clause but not the column from aggregate function.
TO do this instead of group by you can use partition by clause you can use only one port to group as a partition by.
you can also make it as partition by 1
use Common table expression(CTE) to avoid this issue.
multiple CTes also come handy, pasting a case where I have used...maybe helpful
with ranked_cte1 as
( select r.mov_id,DENSE_RANK() over ( order by r.rev_stars desc )as rankked from ratings r ),
ranked_cte2 as ( select * from movie where mov_id=(select mov_id from ranked_cte1 where rankked=7 ) ) select * from ranked_cte2
select * from movie where mov_id=902