Will row_number() always break ties in the same way? - sql

Will the function row_number() always sort the same data in the same way?

No. Ordering in SQL is unstable, meaning that the original sort order is not preserved. There is no guarantee that an analytic function or order by will return the results in the same order for the same key values.
You can always add a unique id as the last key in the sort to make it reproducible.
EDIT:
Note: the non-reproduciblity of order by is part of the SQL standard. Oracle documentation does not specify otherwise. And, in general, I ordering is usually not stable in databases (for equivalent key values). I would expect row_number() to behave the same way.
If you need things in a particular order, you can add rowid to the order by clause (see here). In fact, rowid may solve your problem without row_number().

Related

Left Join is sorting output data strangely [duplicate]

What is the default order of a query when no ORDER BY is used?
There is no such order present. Taken from http://forums.mysql.com/read.php?21,239471,239688#msg-239688
Do not depend on order when ORDER BY is missing.
Always specify ORDER BY if you want a particular order -- in some situations the engine can eliminate the ORDER BY because of how it
does some other step.
GROUP BY forces ORDER BY. (This is a violation of the standard. It can be avoided by using ORDER BY NULL.)
SELECT * FROM tbl -- this will do a "table scan". If the table has
never had any DELETEs/REPLACEs/UPDATEs, the records will happen to be
in the insertion order, hence what you observed.
If you had done the same statement with an InnoDB table, they would
have been delivered in PRIMARY KEY order, not INSERT order. Again,
this is an artifact of the underlying implementation, not something to
depend on.
There's none. Depending on what you query and how your query was optimised, you can get any order. There's even no guarantee that two queries which look the same will return results in the same order: if you don't specify it, you cannot rely on it.
I've found SQL Server to be almost random in its default order (depending on age and complexity of the data), which is good as it forces you to specify all ordering.
(I vaguely remember Oracle being similar to SQL Server in this respect.)
MySQL by default seems to order by the record structure on disk, (which can include out-of-sequence entries due to deletions and optimisations) but it often initially fools developers into not bother using order-by clauses because the data appears to default to primary-key ordering, which is not the case!
I was surprised to discovere today, that MySQL 5.6 and 4.1 implicitly sub-order records which have been sorted on a column with a limited resolution in the opposite direction. Some of my results have identical sort-values and the overall order is unpredictable. e.g. in my case it was a sorted DESC by a datetime column and some of the entries were in the same second so they couldn't be explicitly ordered. On MySQL 5.6 they select in one order (the order of insertion), but in 4.1 they select backwards! This led to a very annoying deployment bug.
I have't found documentation on this change, but found notes on on implicit group order in MySQL:
By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well.
However:
Relying on implicit GROUP BY sorting in MySQL 5.5 is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause.
So in agreement with the other answers - never rely on default or implicit ordering in any database.
The default ordering will depend on indexes used in the query and in what order they are used. It can change as the data/statistics change and the optimizer chooses different plans.
If you want the data in a specific order, use ORDER BY

The order of a SQL Select statement without Order By clause in SQL Server [duplicate]

What is the default order of a query when no ORDER BY is used?
There is no such order present. Taken from http://forums.mysql.com/read.php?21,239471,239688#msg-239688
Do not depend on order when ORDER BY is missing.
Always specify ORDER BY if you want a particular order -- in some situations the engine can eliminate the ORDER BY because of how it
does some other step.
GROUP BY forces ORDER BY. (This is a violation of the standard. It can be avoided by using ORDER BY NULL.)
SELECT * FROM tbl -- this will do a "table scan". If the table has
never had any DELETEs/REPLACEs/UPDATEs, the records will happen to be
in the insertion order, hence what you observed.
If you had done the same statement with an InnoDB table, they would
have been delivered in PRIMARY KEY order, not INSERT order. Again,
this is an artifact of the underlying implementation, not something to
depend on.
There's none. Depending on what you query and how your query was optimised, you can get any order. There's even no guarantee that two queries which look the same will return results in the same order: if you don't specify it, you cannot rely on it.
I've found SQL Server to be almost random in its default order (depending on age and complexity of the data), which is good as it forces you to specify all ordering.
(I vaguely remember Oracle being similar to SQL Server in this respect.)
MySQL by default seems to order by the record structure on disk, (which can include out-of-sequence entries due to deletions and optimisations) but it often initially fools developers into not bother using order-by clauses because the data appears to default to primary-key ordering, which is not the case!
I was surprised to discovere today, that MySQL 5.6 and 4.1 implicitly sub-order records which have been sorted on a column with a limited resolution in the opposite direction. Some of my results have identical sort-values and the overall order is unpredictable. e.g. in my case it was a sorted DESC by a datetime column and some of the entries were in the same second so they couldn't be explicitly ordered. On MySQL 5.6 they select in one order (the order of insertion), but in 4.1 they select backwards! This led to a very annoying deployment bug.
I have't found documentation on this change, but found notes on on implicit group order in MySQL:
By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well.
However:
Relying on implicit GROUP BY sorting in MySQL 5.5 is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause.
So in agreement with the other answers - never rely on default or implicit ordering in any database.
The default ordering will depend on indexes used in the query and in what order they are used. It can change as the data/statistics change and the optimizer chooses different plans.
If you want the data in a specific order, use ORDER BY

Select Distinct without sorting

I used a Select Distinct query, which resulted me a sorted data. Is there anyway that i dont get data sorted?
I'll try to elaborate a bit as to what's going on and why... though I agree with #vic's comment to the question...
Without explicitly stating an order (via an order by clause) there is absolutely no guarantee of any order in the result set.
Practically speaking, many queries will return a consistent order based on the query plan and how the data is actually stored and accessed... DO NOT RELY ON THIS!
Specifically, for a distinct query, the sql engine will sort the data so that it can be sure to remove any duplicates.
In short, if the order of the result set matters (even if the desired order is "random") you must ALWAYS explicitly state it. That said, from a purely set-based-math/sql standpoint, the order of the result shouldn't matter.
Put this at the end of your query. This will effectively randomize the results which then will appear to you non-sorted ;)
ORDER BY Rnd([ID]);
Replace the ID with primary key of the table. In Access SQL it is possible to call certain VB Functions directly. In this case the Rnd function can be called in a query and fed a seed value from the data being sorted.
I think sorting may have something to do with the way DISTINCT is determined.
The easiest way to return distinct values is to sort the selection set
returned by processing the SQL predicate and then
returning only the rows where the DISTINCT columns change value from the prior row.
In short,
DISTINCT requires a sort to be performed where duplicate rows are dropped.
That said, there is no guarantee that rows are returned to you in any particular
order unless you explicitly include an ORDER BY clause.

what's order when select data from database?

Suppose I have a table:
CREATE TABLE [tab] (
[name] varchar,
[order_by] int
)
There are 10 rows in the table, and all rows have same value for order_by (Let's say it's 0)
If I then issue following SQL:
select * from [tab] order by [order_by]
What's the order of the rows? What factor decides the row order in this case?
It's not defined. The database can spit them out in any order it chooses, and it can even change the order between queries if it feels like it (it probably won't do this, but you shouldn't rely on the order being consistent).
If your columns that you order by has no variation than there is no guaranteed order.
Any time you want a defined order, you need a good order by clause. I can't even imagine why anyone would use an orderby clause if there is no variation in the column being ordered or why you would even have a column that never has but one value.
There is no order in this case, since you did not specify an order.
My experience in real life is that when you don't specify any order (or specify one that doesn't actually result in sorting, as in this case) rows generally come out in the order they were added to the table. However, that is in no way guaranteed and I would never rely on it.
Generally speaking you can't depend on the order of records coming out of a table unless you specify an order by clause, and any records with the sames value(s) for the fields in an order by clause will not be sorted.
That being said, there are ways to make an educated guess as to the order of the records that will come out. Usually they will be emitted i the order of the table's clustered index. This is usually the primary key but not always. If there is no clustered index, then it will usually be insert order. Note that you can't depend on either of these things. SQL Server might be doing some optimizations that will change the order.
Typically your table has an identity column with a PKey. If that's the case then that would be the order in SQL Server 2008. Unfortunately, I've experienced older versions of SQL Server tending to give inconsistent results depending on whether you're connecting via OLEDB or ODBC.
If 'name' was a primary key, then the index would have a specified order (either ASC or DESC). And that's the order that I think you would see in this case. At least that's the behavior I've observed in SQL 2008.
If 'name' had no index then I don't believe the order would be predictable at all.
EDIT:
So even in the situation I described it looks like the order will not necessarily be reliable. There's a better explanation here: SQL best practice to deal with default sort order
I suppose the moral of the story is to specify an order if the order is important to you.
The 'natural' order of rows is the order in which the CLUSTERED index says they are in, and that is the order that rows are generally returned in if you don't specify an order. However, enterprise edition merry-go-round scans mean that you won't always get them in that order, and as a few people have said, you should never rely on that.
If you specify order, and the key you are ordering on is equal for a bunch of rows, then order is not guaranteed at all.

Does SELECT DISTINCT imply a sort of the results

Does including DISTINCT in a SELECT query imply that the resulting set should be sorted?
I don't think it does, but I'm looking for a an authoritative answer (web link).
I've got a query like this:
Select Distinct foo
From Bar
In oracle, the results are distinct but are not in sorted order. In Jet/MS-Access there seems to be some extra work being done to ensure that the results are sort. I'm assuming that oracle is following the spec in this case and MS Access is going beyond.
Also, is there a way I can give the table a hint that it should be sorting on foo (unless otherwise specified)?
From the SQL92 specification:
If DISTINCT is specified, then let TXA be the result of eliminating redundant duplicate values from TX. Otherwise, let TXA be TX.
...
4) If an is not specified, then the ordering of the rows of Q is implementation-dependent.
Ultimately the real answer is that DISTINCT and ORDER BY are two separate parts of the SQL statement; If you don't have an ORDER BY clause, the results by definition will not be specifically ordered.
No. There are a number of circumstances in which a DISTINCT in Oracle does not imply a sort, the most important of which is the hashing algorithm used in 10g+ for both group by and distinct operations.
Always specify ORDER BY if you want an ordered result set, even in 9i and below.
There is no "authoritative" answer link, since this is something that no SQL server guarantees.
You will often see results in order when using distinct as a side effect of the best methods of finding those results. However, any number of other things can mix up the results, and some server may hand back results in such a way as to not give them sorted even if it had to sort to get the results.
Bottom line: if your server doesn't guarantee something you shouldn't count on it.
Not to my knowledge, no. The only reason I can think of is that SQL Server would internally sort the data in order to detect and filter out duplicates, and thus return it in a "pre-sorted" manner. But I wouldn't rely on that "side effect" :-)
No, it is not implying a sort. In my experience, it sorts by the known index, which may happen to be foo.
Why be subtle? Why not specific Select Distinct foo from Bar Order by foo?
On at least one server I've used (probably either Oracle or SQL Server, about six years ago), SELECT DISTINCT was rejected if you didn't have an ORDER BY clause. It was accepted on the "other" server (Oracle or SQL Server). Your mileage may vary.
No, the results are not sorted. If you want to give it a 'hint', you can certainly supply an ORDER BY:
select distinct foo
from bar
order by foo
But keep in mind that you might want to sort on more than just alphabetically. Instead you might want to sort on criteria on other fields. See:
http://weblogs.sqlteam.com/jeffs/archive/2007/12/13/select-distinct-order-by-error.aspx
As the answers mostly say, DISTINCT does not mandate a sort - only ORDER BY mandates that. However, one standard way of achieving DISTINCT results is to sort; the other is to hash the values (which tends to lead to semi-random sequencing). Relying on the sort effect of DISTINCT would be foolish.
In my case (SQL server), as an example I had a list of countries with a numerical value X assigned against each. When I did a select distinct * from Table order by X, it ordered it by X but at the same time result set countries were also ordered which was not directly implemented.
From my experience, I'll say that distinct does imply an implicit sort.
Yes. Oracle does use a sort do calculate a distinct. You can see that if you look at the explain plan. The fact that it did a sort for that calculation does not in any way imply
that the result set will be sorted. If you want the result set sorted, you are required to use the ORDER BY clause.