what's order when select data from database? - sql

Suppose I have a table:
CREATE TABLE [tab] (
[name] varchar,
[order_by] int
)
There are 10 rows in the table, and all rows have same value for order_by (Let's say it's 0)
If I then issue following SQL:
select * from [tab] order by [order_by]
What's the order of the rows? What factor decides the row order in this case?

It's not defined. The database can spit them out in any order it chooses, and it can even change the order between queries if it feels like it (it probably won't do this, but you shouldn't rely on the order being consistent).

If your columns that you order by has no variation than there is no guaranteed order.
Any time you want a defined order, you need a good order by clause. I can't even imagine why anyone would use an orderby clause if there is no variation in the column being ordered or why you would even have a column that never has but one value.

There is no order in this case, since you did not specify an order.

My experience in real life is that when you don't specify any order (or specify one that doesn't actually result in sorting, as in this case) rows generally come out in the order they were added to the table. However, that is in no way guaranteed and I would never rely on it.

Generally speaking you can't depend on the order of records coming out of a table unless you specify an order by clause, and any records with the sames value(s) for the fields in an order by clause will not be sorted.
That being said, there are ways to make an educated guess as to the order of the records that will come out. Usually they will be emitted i the order of the table's clustered index. This is usually the primary key but not always. If there is no clustered index, then it will usually be insert order. Note that you can't depend on either of these things. SQL Server might be doing some optimizations that will change the order.

Typically your table has an identity column with a PKey. If that's the case then that would be the order in SQL Server 2008. Unfortunately, I've experienced older versions of SQL Server tending to give inconsistent results depending on whether you're connecting via OLEDB or ODBC.

If 'name' was a primary key, then the index would have a specified order (either ASC or DESC). And that's the order that I think you would see in this case. At least that's the behavior I've observed in SQL 2008.
If 'name' had no index then I don't believe the order would be predictable at all.
EDIT:
So even in the situation I described it looks like the order will not necessarily be reliable. There's a better explanation here: SQL best practice to deal with default sort order
I suppose the moral of the story is to specify an order if the order is important to you.

The 'natural' order of rows is the order in which the CLUSTERED index says they are in, and that is the order that rows are generally returned in if you don't specify an order. However, enterprise edition merry-go-round scans mean that you won't always get them in that order, and as a few people have said, you should never rely on that.
If you specify order, and the key you are ordering on is equal for a bunch of rows, then order is not guaranteed at all.

Related

Left Join is sorting output data strangely [duplicate]

What is the default order of a query when no ORDER BY is used?
There is no such order present. Taken from http://forums.mysql.com/read.php?21,239471,239688#msg-239688
Do not depend on order when ORDER BY is missing.
Always specify ORDER BY if you want a particular order -- in some situations the engine can eliminate the ORDER BY because of how it
does some other step.
GROUP BY forces ORDER BY. (This is a violation of the standard. It can be avoided by using ORDER BY NULL.)
SELECT * FROM tbl -- this will do a "table scan". If the table has
never had any DELETEs/REPLACEs/UPDATEs, the records will happen to be
in the insertion order, hence what you observed.
If you had done the same statement with an InnoDB table, they would
have been delivered in PRIMARY KEY order, not INSERT order. Again,
this is an artifact of the underlying implementation, not something to
depend on.
There's none. Depending on what you query and how your query was optimised, you can get any order. There's even no guarantee that two queries which look the same will return results in the same order: if you don't specify it, you cannot rely on it.
I've found SQL Server to be almost random in its default order (depending on age and complexity of the data), which is good as it forces you to specify all ordering.
(I vaguely remember Oracle being similar to SQL Server in this respect.)
MySQL by default seems to order by the record structure on disk, (which can include out-of-sequence entries due to deletions and optimisations) but it often initially fools developers into not bother using order-by clauses because the data appears to default to primary-key ordering, which is not the case!
I was surprised to discovere today, that MySQL 5.6 and 4.1 implicitly sub-order records which have been sorted on a column with a limited resolution in the opposite direction. Some of my results have identical sort-values and the overall order is unpredictable. e.g. in my case it was a sorted DESC by a datetime column and some of the entries were in the same second so they couldn't be explicitly ordered. On MySQL 5.6 they select in one order (the order of insertion), but in 4.1 they select backwards! This led to a very annoying deployment bug.
I have't found documentation on this change, but found notes on on implicit group order in MySQL:
By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well.
However:
Relying on implicit GROUP BY sorting in MySQL 5.5 is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause.
So in agreement with the other answers - never rely on default or implicit ordering in any database.
The default ordering will depend on indexes used in the query and in what order they are used. It can change as the data/statistics change and the optimizer chooses different plans.
If you want the data in a specific order, use ORDER BY

The order of a SQL Select statement without Order By clause in SQL Server [duplicate]

What is the default order of a query when no ORDER BY is used?
There is no such order present. Taken from http://forums.mysql.com/read.php?21,239471,239688#msg-239688
Do not depend on order when ORDER BY is missing.
Always specify ORDER BY if you want a particular order -- in some situations the engine can eliminate the ORDER BY because of how it
does some other step.
GROUP BY forces ORDER BY. (This is a violation of the standard. It can be avoided by using ORDER BY NULL.)
SELECT * FROM tbl -- this will do a "table scan". If the table has
never had any DELETEs/REPLACEs/UPDATEs, the records will happen to be
in the insertion order, hence what you observed.
If you had done the same statement with an InnoDB table, they would
have been delivered in PRIMARY KEY order, not INSERT order. Again,
this is an artifact of the underlying implementation, not something to
depend on.
There's none. Depending on what you query and how your query was optimised, you can get any order. There's even no guarantee that two queries which look the same will return results in the same order: if you don't specify it, you cannot rely on it.
I've found SQL Server to be almost random in its default order (depending on age and complexity of the data), which is good as it forces you to specify all ordering.
(I vaguely remember Oracle being similar to SQL Server in this respect.)
MySQL by default seems to order by the record structure on disk, (which can include out-of-sequence entries due to deletions and optimisations) but it often initially fools developers into not bother using order-by clauses because the data appears to default to primary-key ordering, which is not the case!
I was surprised to discovere today, that MySQL 5.6 and 4.1 implicitly sub-order records which have been sorted on a column with a limited resolution in the opposite direction. Some of my results have identical sort-values and the overall order is unpredictable. e.g. in my case it was a sorted DESC by a datetime column and some of the entries were in the same second so they couldn't be explicitly ordered. On MySQL 5.6 they select in one order (the order of insertion), but in 4.1 they select backwards! This led to a very annoying deployment bug.
I have't found documentation on this change, but found notes on on implicit group order in MySQL:
By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well.
However:
Relying on implicit GROUP BY sorting in MySQL 5.5 is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause.
So in agreement with the other answers - never rely on default or implicit ordering in any database.
The default ordering will depend on indexes used in the query and in what order they are used. It can change as the data/statistics change and the optimizer chooses different plans.
If you want the data in a specific order, use ORDER BY

Will row_number() always break ties in the same way?

Will the function row_number() always sort the same data in the same way?
No. Ordering in SQL is unstable, meaning that the original sort order is not preserved. There is no guarantee that an analytic function or order by will return the results in the same order for the same key values.
You can always add a unique id as the last key in the sort to make it reproducible.
EDIT:
Note: the non-reproduciblity of order by is part of the SQL standard. Oracle documentation does not specify otherwise. And, in general, I ordering is usually not stable in databases (for equivalent key values). I would expect row_number() to behave the same way.
If you need things in a particular order, you can add rowid to the order by clause (see here). In fact, rowid may solve your problem without row_number().

How do database servers decide which order to return rows without any "order by" statements?

Kind of a whimsical question, always something I've wondered about and I figure knowing why it does what it does might deepen my understanding a bit.
Let's say I do "SELECT TOP 10 * FROM TableName". In short timeframes, the same 10 rows come back, so it doesn't seem random. They weren't the first or last created. In my massive sample size of...one table, it isn't returning the min or max auto-incrementing primary key value.
I also figure the problem gets more complex when taking joins into account.
My database of choice is MSSQL, but I figure this might be an interesting question regardless of the platform.
If you do not supply an ORDER BY clause on a SELECT statement you will get rows back in arbitrary order.
The actual order is undefined, and depends on which blocks/records are already cached in memory, what order I/O is performed in, when threads in the database server are scheduled to run, and so on.
There's no rhyme or reason to the order and you should never base any expectations on what order rows will be in unless you supply an ORDER BY.
If they're not ordered by the calling query, I believe they're just returned in the order they were read off disk. This may vary because of the types of joins used or the indexes that looked up the values.
You can see this if the table has a clustered index on it (and you're just selecting - a JOIN can re-order things) - a SELECT will return the rows in clustered-index-order, even without an ORDER BY clause.
There is a very detailed explanation with examples here: http://sqlserverpedia.com/blog/sql-server-bloggers/its-the-natural-order-of-things-not/
"How do database servers decide which order to return rows without any “order by” statements?"
They simply do not take any "decision" with respect to ordering. They see the user doesn't care about ordering, and so they don't care either. And thus they simply go out to find the requested rows. The order in which they find them is normally the order in which you get them. That order depends on user-unpredictable things like the chosen physical access paths, ordering of physical records inside the database's physical files, etc. etc.
Don't let yourself be misled by the ordering as you get it, in the case where you didn't explicitly specify an ordering in your query. If you don't specify an ordering in your query, no ordering in the result set is guaranteed, even if in practice results seem to suggest that some ordering appears to be adhered to by the server.

SQL best practice to deal with default sort order

A lot of SQL code I've read, it seems like the developer assumes that the default sort order always holds. For example when building an HTML select list they would just SELECT id, name FROM table without issuing an ORDER BY clause.
From my own experience it seems like dbms alway orders data using FIFO if no ORDER BY clause is given and no index. However, the order is not guaranteed. But I have never seen a dbms reordering data if there no change to the table.
Have you ever experienced a dbms selecting data in a non deterministic order if there is no change to the table?
Is it best practice to always put an ORDER BY clause?
There is no default sort order. Even if the table has a clustered index, you are not guaranteed to get the results in that order. You must use an order by clause if you want a specific order.
As the other posters mention, if you don't specify a sort order, the SQL standard says the results can be in whatever order the query processor finds most expedient and efficient.
Let's say you do a simple unordered SELECT for all the rows of a CUSTOMER table, which has no indexes and no primary key. It's quite possible, and even likely, that the query processor will do a straight table scan and produce the rows in the order they were originally inserted (giving you the FIFO behavior you saw).
If you then add an index on the STATE and CITY fields (in that order), and then query for WHERE STATE = 'NY' the query processor may decide it's more efficient to scan the index entries for STATE = 'NY' rather than to do a full table scan. In this case it would probably materialize the rows in STATE, CITY order.
Even this is not certain. For example if the query processor has gathered statistics that show that nearly all the STATE values in your table are 'NY' (maybe because the database is for an Albany-based equipment rental business), it may decide that the table scan is actually cheaper than the index scan, and you'll see FIFO again.
It's a good idea to learn some basics about how your database plans its queries. You can use the EXPLAIN statement to see how your DBMS would execute any given query, and then use this to optimize your query, in some cases by orders of magnitude. This is a fascinating and useful area to learn.
If you want the data to come out consistently ordered, yes - you have to use ORDER BY.
Yes. There is no "default order" without an ORDER BY, and there's no guarantee that you'll get the data back in FIFO/LIFO or any other order.
As far as the developers using "SELECT id, name FROM table", they're either inept or they don't care what order anything appears in.
No serious RDBMS guarantees any order unless you specify an explicit ORDER BY.
Anything else is just pure luck or anectodal - if you want order, you have to specify ORDER BY - no way around that.
If you want the data ordered, the only way to guarantee anything (with every major RDBMS system that I'm aware of, definitely Sql Server and Oracle) is to include an ORDER BY clause. FIFO has absolutely nothing to do with the order data is returned without an ORDER BY clause, and there isn't a concept of any kind of DEFAULT sort order. The so called DEFAULT sort order is basically however the engine gets the data, which could be in literally any order based on indexes, cached data, simultaneous executing queries, load on the server, etc., etc.
This other stackoverflow thread is basically covering the same concept in relation to Sql Server, AlexK blogged a repo to demonstrate the behavior.
Even a simple query like SELECT ... FROM table can return data in various order. I know this to be true in theory, I know this to be true in practice, and I have seen plenty of cases when the order changes between subsequent executions, even when no data change occurs in the table.
A typical example of order changes between executions is when the query is executed using a parallel plan. Since parallel operators return data as the underlying threads produce it, the order of the rows in the result varies between each run. This situation makes even the simple SELECT in your example return wildly different results each time is run.
In my experience with SQL, most of the time I do not specify a ORDER BY in SQL, because the record sets are displayed in a "client-side" grid type control etc. where dynamic sorting is supported - in this case ordering by SQL is needless as it will be checked client side anyway.
This is also done client side because the same query might be used to display the data in different places in different orders.
Therefore it is only best practice to put in an ORDER BY, when
The order of the data IS important; and
The sorting is more efficient at the DB level.
i.e. if the front end developer is going to be "re-sorting" it anyway, then there is no point, as it unlikely to save overall processing time.
Perhaps the writers of those SQL queries you're reading don't care about the order of the data returned. The best practice is to use it where you need to ensure the order of the results returned!
I'm writing this in case if someone would like to use this as I did.
Well, I'm getting satisfactory default sort order, let's say for log tables, with sort on Index. For example I'm usually interested in last rows of log table (LIFO) so I make DateTime DESC as order. I also tried for fun to add Index on the other field (integer) beside Primary key and it worked.
CREATE TABLE [dbo].[tableA]([DateTime] [datetime] NOT NULL,
CONSTRAINT [PK_tableA]
PRIMARY KEY CLUSTERED ([DateTime] DESC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]
Or in SSMS ...