Does the ORDER BY clause return a Virtual Table? - sql

My understanding is that relational tables aren't ordered.
I also understand that each step, or phase, of the query execution returns a "virtual table" which is passed as input to the next phase.
But if tables are never actually ordered, what's happening during/after the ORDER BY phase?
I'm just trying to understand what might happen with a query like this:
SELECT col1, col2
FROM mytable
ORDER BY col1
LIMIT 1;
Edit:
To clarify. I know what the query above outputs. I'm trying to better understand each phase/step of the underlying execution.
The (logical) order of execution (EDIT: different from the physical execution) for the above query would be:
FROM
SELECT
ORDER BY
LIMIT
I'm trying to understand what's going on during the ORDER BY phase. My understanding is that a virtual table is passed from the SELECT phase to the ORDER BY phase (in this case, a table with col1 and col2, but I don't know what's being returned by ORDER BY and subsequently passed to LIMIT.

Does the ORDER BY clause return a Virtual Table?
Sometimes.
The database engine tries as much as it can not to produce a materialized result set (that you call virtual table). Most of the time it's more efficient to work the rows one by one, so they can be successively processed by each execution step until they are returned to the client app.
However, this is not always possible. In such cases, the engine is forced to materialize an intermediate result that actually takes the form you are thinking about. But again, this is expensive, and is usually avoided.
The (logical) order of execution for the above query would be:
FROM
SELECT
ORDER BY
LIMIT
No. This is just how a SQL query is written and is unrelated to the actual execution steps. Take that sequence as a good pedagogical tool, useful [for learning purposes only] to understand how the result is produced. Behind the scenes, the engine cheats in all kinds of ways to do as less effort as possible to produce the result you asked. You wouldn't believe it if you saw it.

The underlying table is not sorted when you use ORDER BY, only the results returned by the SELECT statement are. That query will return the first result from mytable. Since the default order is ASCENDING, it will be the lowest value in col1.

the order is in tables unknown, and by definition unsorted
a result set as the end product of a SELECT without an ORDER BYis also unsorted.
but as the ORDER BY is the penultimate command before LIMIT and OFFSET , the result set is in that specific order

Related

Logical Query Processing: How the select is before the order by

I am using T-SQL and in the book T-SQL Fundamentals of Itzik Ben Gan, he said that Select clause is processed logically before the Order by clause.
I agree on this, but I want to know how the select is processed before the ORDER BY, when the TOP is in the select and it needs the result of the order by first?
Without rewriting a lot of what's already been written before:
https://learn.microsoft.com/en-us/sql/t-sql/queries/select-transact-sql
Short version: Imagine that SQL Server creates virtual tables during query execution. Those tables and their values are passed, step-by-step, through a logical process that determines your end result. The goal is to "fetch" the minimum number of rows from the beginning, and thereby filter out as few rows as possible. After all, why "fetch" 100,000 rows if you only want to see 2?
In the case of a TOP clause, you're only going to see those TOP x rows, but that doesn't mean that they are the only rows that SQL Server checked during the query execution.
On the contrary - if you're looking for the TOP x rows by some column value, then clearly SQL Server needs to make sure that it first analyzes the values for that column, orders them accordingly, and can only then present you with the TOP x rows. This is why having proper indexes can make such a difference when executing these sorts of queries.
This is very different from the WHERE clause, which can happen earlier on, because a value either = X, or it doesn't; so when scanning a table, SQL Server can know for every single row whether or not that row should be included in the final result set. With a TOP clause, this is not necessarily the case. Halfway through the query process, it doesn't necessarily know if there are more unread rows that should be included in that TOP or not - it can only know once the rows have been selected in accordance with all previous conditions, then ordered by your ORDER BY clause. Finally, it knows which rows should be in those TOP x rows you asked for.
Notice that in Microsoft's documentation, they explicitly state that the logical processing order can change from query to query, especially in extenuating circumstances (i.e., a VIEW that uses CONVERT(), or depending upon the indexes for a given table.)

Select without order by

It is my understanding that select is not guaranteed to always return the same result.
Following query is not guaranteed to return the same result every time:
select * from myTable offset 10000 limit 100
My question is if myTable is not changed between executions of select (no deletions or inserts) can i rely on it returning the same result set every time?
Or to put it in another way if my database is locked for changes can I rely on select returning the same result?
I am using postgresql.
Tables and result sets (without order by) are simply not ordered. It really is that simple.
In some databases, under some circumstances, the order will be consistent. However, you should never depend on this. Subsequent releases, for instance, might invalidate the query.
For me, I think the simplest way to understand this is by thinking of parallel processing. When you execute a query, different threads might go out and start to fetch data; which values are returned first depends on non-reproducible factors.
Another way to think of it is to consider a page cache that already has pages in memory -- probably from the end of the table. The SQL engine could read the pages in any order (although in practice this doesn't really happen).
Or, some other query might have a row or page lock, so that page gets skipped when reading the records.
So, just accept that unordered means what ordered means. Add an order by if you want data in a particular order. If you use a clustered index key, then there is basically no performance hit.

Select Distinct without sorting

I used a Select Distinct query, which resulted me a sorted data. Is there anyway that i dont get data sorted?
I'll try to elaborate a bit as to what's going on and why... though I agree with #vic's comment to the question...
Without explicitly stating an order (via an order by clause) there is absolutely no guarantee of any order in the result set.
Practically speaking, many queries will return a consistent order based on the query plan and how the data is actually stored and accessed... DO NOT RELY ON THIS!
Specifically, for a distinct query, the sql engine will sort the data so that it can be sure to remove any duplicates.
In short, if the order of the result set matters (even if the desired order is "random") you must ALWAYS explicitly state it. That said, from a purely set-based-math/sql standpoint, the order of the result shouldn't matter.
Put this at the end of your query. This will effectively randomize the results which then will appear to you non-sorted ;)
ORDER BY Rnd([ID]);
Replace the ID with primary key of the table. In Access SQL it is possible to call certain VB Functions directly. In this case the Rnd function can be called in a query and fed a seed value from the data being sorted.
I think sorting may have something to do with the way DISTINCT is determined.
The easiest way to return distinct values is to sort the selection set
returned by processing the SQL predicate and then
returning only the rows where the DISTINCT columns change value from the prior row.
In short,
DISTINCT requires a sort to be performed where duplicate rows are dropped.
That said, there is no guarantee that rows are returned to you in any particular
order unless you explicitly include an ORDER BY clause.

Oracle SQL returns rows in arbitrary fashion when no "order by" clause is used

Maybe someone can explain this to me, but when querying a data table from Oracle, where multiple records exist for a key (say a customer ID), the record that appears first for that customer can vary if there is no implicit "order by" statement enforcing the order by say an alternate field such as a transaction type. So running the same query on the same table could yield a different record ordering than from 10 minutes ago.
E.g., one run could yield:
Cust_ID, Transaction_Type
123 A
123 B
Unless an "order by Transaction_Type" clause is used, Oracle could arbitrarily return the following result the next time the query is run:
Cust_ID, Transaction_Type
123 B
123 A
I guess I was under the impression that there was a database default ordering of rows in Oracle which (perhaps) reflected the physical ordering on the disk medium. In other words, an arbitrary order that is immutable and would guarantee the same result when a query is rerun.
Does this have to do with the optimizer and how it decides where to most efficiently retrieve the data?
Of course the best practice from a programming perspective is to force whatever ordering is required, I was just a little unsettled by this behavior.
The order of rows returned to the application from a SELECT statement is COMPLETELY ARBITRARY unless otherwise specified. If you want, need, or expect rows to return in a certain order, it is the user's responsibility to specify such an order.
(Caveat: Some versions of Oracle would implicitly sort data in ascending order if certain operations were used, such as DISTINCT, UNION, MINUS, INTERSECT, or GROUP BY. However, as Oracle has implemented hash sorting, the nature of the sort of the data can vary, and lots of SQL relying on that feature broke.)
There is no default ordering, ever. If you don't specify ORDER BY, you can get the same result the first 10000 times, then it can change.
Note that this is also true even with ORDER BY for equal values. For example:
Col1 Col2
1 1
2 1
3 2
4 2
If you use ORDER BY Col2, you still don't know if row 1 or 2 will come first.
Just image the rows in a table like balls in a basket. Do the balls have an order?
I dont't think there is any DBMS that guarantees an order if ORDER BY is not specified.
Some might always return the rows in the order they were inserted, but that is an implementation side effect.
Some execution plans might cause the result set to be ordered even without an ORDER BY, but again this is an implementation side-effect that you should not rely on.
If an ORDER BY clause is not present the database (not just Oracle - any relational database) is free to return rows in whatever order it happens to find them. This will vary depending on the query plan chosen by the optimizer.
If the order in which the rows are returned matters you must use an ORDER BY clause. You may sometimes get lucky and the rows will come back in the order you want them to be even without an ORDER BY, but there is no guarantee that A) you will get lucky on other queries, and B) the order in which the rows are returned tomorrow will be the same as the order in which they're returned today.
In addition, updates to the database product may change the behavior of queries. We had to scramble a bit when doing a major version upgrade last year when we found that Oracle 10 returned GROUP BY results in a different order than did Oracle 9. Reason - no ORDER BY clause.
ORDER BY - when the order of the returned data really matters.
The simple answer is that the SQL standard says that there is no default order for queries that do not have an ORDER BY statement, so you should never assume one.
The real reason would probably relate to the hashes assigned to each row as it is pulled into the record set. There is no reason to assume consistent hashing.
if you don't use ORDER BY, the order is arbitrary; however, dependent on phisical storage and memory aspects.
so, if you repeat the same query hundreds of times in 10 minutes, you will get almost the same order everytime, because probably nothing changes.
Things that could change the "noorder order" are:
the executing plan - if is changed(you have pointed
that)
inserts and deletes on the tables involved in the query.
other things like presence in memory of the rows.(other querys on other tables could influence that)
When you get into parallel data retrieval I/O isn't it possible to get different sequences on different runs, even with no change to the stored data?
That is, in a multiprocessing environment the order of completion of parallel threads is undefined and can vary with what else is happening on the same shared processor.
As I'm new to Oracle database engine, I noticed this behavior in my SELECT statements that has no ORDER BY.
I've been using Microsoft SQL Server for years now. SQL Server Engine always will retrieve data ordered by the table's "Clustered Index" which is basically the Primary Key Index. SQL Server will always insert new data in a sequential order based on the clustered index.
So when you perform a select on a table without order by in SQL Server, it will always retrieve data ordered by primary key value.
ORDER BY can cause serious performance overhead, that's why you do not want to use it unless you are not happy with inconsistent results order.
I ended up with a conclusion that in ALL my Oracle queries I must use ORDER BY or I will end up with unpredicted order which will greatly effect my end-user reports.

Ensuring result order when joining, without using an order_by

I've got a mysql plugin that will return a result set in a specified order. Part of the result set includes a foreign key, and I'd like join on that key, while ensuring the order remains the same.
If I do something like:
select f.id,
f.title
from sphinx s
inner join foo f on s.id = f.id
where query='test;filter=type,2;sort=attr_asc:stitle';
It looks like I'm getting my results back in the order that sphinx returns them. Is this a quirk of mysql, or am I assured that a join won't change the order?
If you need a guaranteed order in the results of a query, use ORDER BY. Anything else is wishful thinking.
To give some insight on this, many databases divide execution steps in a way that can vary depending on the execution plan of the query, the amount of available CPU, and the kinds of optimizations the database can infer are safe. If a query is run in parallel on multiple threads the results can vary. If the explain plan changes, the results can vary. If multiple queries are running simultaneously, the results can vary. If some data is cached in memory, the results can vary.
If you need to guarantee an order, use ORDER BY.
I don't believe that sql guarantees which table drives the ultimate sort order. Having said that, unless I would be very surprised if MySQL rewrites your query in such a way that the order changes.
SQL makes no guarantees about the result set order of a SELECT, which includes joins.
You cannot do that, SQL does not guarantee the order after such operation.
There is no specific order guaranteed unless you specify an ORDER BY statement.
Since you mentioned that you were using a plugin that returns result sets in a specified order, I'm assuming that that plugin generates SQL that will add the ORDER BY statement.
If you do joins, one thing to look out for is the column names of the tables you're joining on. If they're named the same, your query might brake or order by a different column than intended.