Is MySQL LIMIT applied before or after ORDER BY? - sql

Which one comes first when MySQL processes the query?
An example:
SELECT pageRegions
FROM pageRegions WHERE(pageID=?) AND(published=true) AND (publishedOn<=?)
ORDER BY publishedON DESC
LIMIT 1';
Will that return the last published pageRegion even if the record does not match the revision datetime IF LIMIT is applied after ORDER BY?

Yes, it's after the ORDER BY. For your query, you'd get the record with the highest publishedOn, since you're ordering DESC, making the largest value first in the result set, of which you pick out the first one.

The limit is always applied at the end of result gathering, therefore after order by.
Given all your clauses, the order of processing will be
FROM
WHERE
SELECT
ORDER BY
LIMIT
So you will get the closest record <= publishedOn matching all the conditions in the WHERE clause.

Just wanted to point out the in case of MySQL ordering is applied before limiting the results. But this is not true for other DB.
For example Oracle first limits results and applies ordering on said results. It makes sense when you think about it from a performance point of view. In MySQL you are actually ordering the entire DB(> 1000 records) to get 2

Related

TOP 1 and ORDER BY not returning correct results

I have read the other topics on this but they don't seem to match my scenario. I have a query that is ordering the results by Entry Date ASC and then by Sort ASC.
The results shown are correctly ordered, however when I change my query to only pull TOP 1 it returns the second result instead. I have no idea why or how this happens.
If your query has the order by in the outermost select, then the results should be returned in that order. Period.
If the order by is anywhere else -- in a subquery or in a window frame specification -- then the results might look like they are ordered, but the ordering is not guaranteed.
My guess is that you don't have the explicit order by that the query needs to do what you intend.
Also, although not the case with your sample data, if the keys have the same value then they can appear in any order -- and in different positions when you run the query multiple times.

Are big-query results always ordered, that is: using OFFSET makes sense to skip rows?

In other words does a select query order results every time, so these 2 will always produce unique values:
select *
from bigquery-public-data.crypto_ethereum.balances
limit 10 OFFSET 100
select *
from bigquery-public-data.crypto_ethereum.balances
limit 10 OFFSET 2000
Assuming of course the table has unique values...I am just curious if without using "order" clause the table is always deterministic/consequetive or can the results duplicate if they're returned indeed at random? 10x!
I am just curious if without using "order" clause the table is always deterministic/consequetive or can the results duplicate if they're returned indeed at random.
No. SQL tables represent unordered set of rows. There is no inherent ordering of the rows. Unless an order by clause is specified, there is no guarantee that two consequent executive of the same query would yield an indentical result. The database is free to return the rows in whatever order it likes.
As a consequence, the results of a query with a row-limiting clause but no order by clause are not deterministic. Do add an order by clause the these queries, or you will sooner or later run into suprising and hard-to-debug behaviors.

how to resolve this - group by changes the Order of items in SQL Server

I'm using SQL server 2014,I'm fetching data from a view.The order of items is getting changed once i use Group by ,how can i get the order back after using this Group by,There is one date column,but its not saving any time,So i can't sort it based on date also..
How can I display the data in the same order as it displayed before using Group by?Anyone have any idea please help?
Thanks
Tables and views are essentially unordered sets. To get rows in a specific order, you should always add an ORDER BY clause on the columns you wish to order on.
I'm assuming you previously selected from the VIEW without an ORDER BY clause. The order in which rows are returned from a SELECT statement without an ORDER BY statement is undefined. The order you are getting them in, can change due to any number of reasons (eg some are listed here).
Your problem stems from the mistake you made on relying on the order from a SELECT from a VIEW without an ORDER BY. You should have had an ORDER BY clause in your SELECT statement to begin with.
How can I display the data in the same order as it displayed before using Group by?
The answer: You can't if your initial statement did not have an ORDER BY clause.
The resolution: Determine the order you want the resultset in and add an ORDER BY clause on those columns, both in your initial query and the version with the GROUP BY clause.
Maybe you can use the row_number() function without any OVER and ORDER BY keywords? This should be done in a sub-select and when you group the data in the outer SELECT, use the AVG() function on the numbered column and ORDER the result by this. The problem is, that when you group rows, the original rows disappear. That's kind if the purpose of GROUP BY. ;) Depending on what you GROUP BY, what you're asking might be logically impossible.
EDIT:
Found this solution Googling: http://blog.sqlauthority.com/2015/05/05/sql-server-generating-row-number-without-ordering-any-columns/
So you can number rows like this to maintain the order of rows from the table before you GROUP BY:
row_number() OVER (ORDER BY (SELECT 1))
The only way you can enforce a specific order is to explicitly use a ORDER BY clause. Otherwise the order of rows is not guaranteed (take a look at this article for more details) and the database engine will return the rows based on "as fast as it can" or "as fast as it can retrieve them from disk" rule. So, order can also vary between executions of the same query in the span of a few seconds.
When doing a DISTINCT, GROUP BY or ORDER BY, SQL Server automatically does a SORT of the data based on an index it uses for that query.
Looking at the execution plan of your query will show you what index (and implicitly columns in that index) is being used to sort the data.

Strange ordering bug (is it a bug?) when ordering two columns with identical values

I have the following query in postgres:
SELECT *
FROM "bookings"
WHERE ("bookings".client_id = 50)
ORDER BY session_time DESC
LIMIT 20 OFFSET 0
The record in the 20th place is has an identical session_time to the 21st record.
This query returns 20 results, however if you compare the results to the whole database the query returns the 1st-19th results and the 21st, skipping over the 20th.
This query can be fixed by adding, "id" to the order:
SELECT *
FROM "bookings"
WHERE ("bookings".client_id = 50)
ORDER BY session_time DESC, id
LIMIT 20 OFFSET 0
However I was wondering how this bug occurred? How does postgres order identical filed when using offsets and limits? Is it random? Is it a bug with postgres?
This is not a bug. The limit and offset happen after ordering and it is not deterministic which rows are selected in one case vs another. In general you want to have a tiebreaker so that your ordering is stable and deterministic (I prefer to use unique tiebreakers even when I don't have limit or offset issues in order to ensure the query is the same each time it is run).
If you are doing pagination, add the primary key or surrogate key to the sort as a tiebreaker. That is really the best way.

Using limit in sqlite SQL statement in combination with order by clause

Will the following two SQL statements always produce the same result set?
1. SELECT * FROM MyTable where Status='0' order by StartTime asc limit 10
2. SELECT * FROM (SELECT * FROM MyTable where Status='0' order by StartTime asc) limit 10
Yes, but ordering subqueries is probably a bad habit to get into. You could feasibly add a further ORDER BY outside the subquery in your second example, e.g.
SELECT *
FROM (SELECT *
FROM Test
ORDER BY ID ASC
) AS A
ORDER BY ID DESC
LIMIT 10;
SQLite still performs the ORDER BY on the inner query, before sorting them again in the outer query. A needless waste of resources.
I've done an SQL Fiddle to demonstrate so you can view the execution plans for each.
No. First because the StartTime column may not have UNIQUE constraint. So, even the first query may not always produce the same result - with itself!
Second, even if there are never two rows with same StartTime, the answer is still negative.
The first statement will always order on StartTime and produce the first 10 rows. The second query may produce the same result set but only with a primitive optimizer that doesn't understand that the ORDER BY in the subquery is redundant. And only if the execution plan includes this ordering phase.
The SQLite query optimizer may (at the moment) not be very bright and do just that (no idea really, we'll have to check the source code of SQLite*). So, it may appear that the two queries are producing identical results all the time. Still, it's not a good idea to count on it. You never know what changes will be made in a future version of SQLite.
I think it's not good practice to use LIMIT without ORDER BY, in any DBMS. It may work now, but you never know how long these queries will be used by the application. And you may not be around when SQLite is upgraded or the DBMS is changed.
(*) #Gareth's link provides the execution plan which suggests that current SQLite code is dumb enough to execute the redundant ordering.