Does 'Select' always order by primary key? - sql

A basic simple question for all of you DBA.
When I do a select, is it always guaranteed that my result will be ordered by the primary key, or should I specify it with an 'order by'?
I'm using Oracle as my DB.

No, if you do not use "order by" you are not guaranteed any ordering whatsoever. In fact, you are not guaranteed that the ordering from one query to the next will be the same. Remember that SQL is dealing with data in a set based fashion. Now, one database implementation or another may happen to provide orderings in a certain way but you should never rely on that.

When I do a select, is it always guaranteed that my result will be ordered by the primary key, or should I specify it with an 'order by'?
No, it's by far not guaranteed.
SELECT *
FROM table
most probably will use TABLE SCAN which does not use primary key at all.
You can use a hint:
SELECT /*+ INDEX(pk_index_name) */
*
FROM table
, but even in this case the ordering is not guaranteed: if you use Enterprise Edition, the query may be parallelized.
This is a problem, since ORDER BY cannot be used in a SELECT clause subquery and you cannot write something like this:
SELECT (
SELECT column
FROM table
WHERE rownum = 1
ORDER BY
other_column
)
FROM other_table

No, ordering is never guaranteed unless you use an ORDER BY.
The order that rows are fetched is dependent on the access method (e.g. full table scan, index scan), the physical attributes of the table, the logical location of each row within the table, and other factors. These can all change even if you don't change your query, so in order to guarantee a consistent ordering in your result set, ORDER BY is necessary.

It depends on your DB and also it depends on indexed fields.
For example, in my table Users every user has unique varchar(20) field - login, and primary key - id.
And "Select * from users" returns rowset ordered by login.

If you desire specific ordering then declare it specifically using ORDER BY.
What if the table doesn't have primary key?

If you want your results in a specific order, always specify an order by

Related

Auto Increment selection SQLite

I have a column named id in my SQLite database which is auto-increment, Primary Key, Unique.
Is the result of the following query guaranteed to be the smallest value of id in the database and does this correspond to the "oldest" (as in a FIFO) row to be inserted?
SELECT id FROM table LIMIT 1
The SQLite documentation is quite explicit:
If a SELECT statement that returns more than one row does not have an
ORDER BY clause, the order in which the rows are returned is
undefined. Or, if a SELECT statement does have an ORDER BY clause,
then the list of expressions attached to the ORDER BY determine the
order in which rows are returned to the user.
The LIMIT is applied after an ORDER BY would be, so I don't think it affects the application of this statement.
Hence, if you want the first row, use ORDER BY:
SELECT id
FROM table
ORDER BY id
LIMIT 1;
Note that if id is a primary key, this will add basically no overhead.
I should emphasize that in practice you are probably going to get the smallest id without the ORDER BY. However, it is a really, really bad idea to depend on behavior that directly contradicts the documentation.
Is the result of the following query guaranteed to be the smallest value of id in the database
Yes. However if the table is empty or the id column is NULL, it could also return NULL
and does this correspond to the "oldest" (as in a FIFO) row to be inserted?
No, there's no guarantee of that.

Do clustered index on a column GUARANTEES returning sorted rows according to that column [duplicate]

This question already has an answer here:
Does a SELECT query always return rows in the same order? Table with clustered index
(1 answer)
Closed 8 years ago.
I am unable to get clear cut answers on this contentious question .
MSDN documentation mentions
Clustered
Clustered indexes sort and store the data rows in the table or view
based on their key values. These are the columns included in the
index definition. There can be only one clustered index per table,
because the data rows themselves can be sorted in only one order.
The only time the data rows in a table are stored in sorted order is
when the table contains a clustered index. When a table has a
clustered index, the table is called a clustered table. If a table
has no clustered index, its data rows are stored in an unordered
structure called a heap.
While I see most of the answers
Does a SELECT query always return rows in the same order? Table with clustered index
http://sqlwithmanoj.com/2013/06/02/clustered-index-do-not-guarantee-physically-ordering-or-sorting-of-rows/
answering negative.
What is it ?
Just to be clear. Presumably, you are talking about a simple query such as:
select *
from table t;
First, if all the data on the table fits on a single page and there are no other indexes on the table, it is hard for me to imagine a scenario where the result set is not ordered by the primary key. However, this is because I think the most reasonable query plan would require a full-table scan, not because of any requirement -- documented or otherwise -- in SQL or SQL Server. Without an explicit order by, the ordering in the result set is a consequence of the query plan.
That gets to the heart of the issue. When you are talking about the ordering of the result sets, you are really talking about the query plan. And, the assumption of ordering by the primary key really means that you are assuming that the query uses full-table scan. What is ironic is that people make the assumption, without actually understanding the "why". Furthermore, people have a tendency to generalize from small examples (okay, this is part of the basis of human intelligence). Unfortunately, they see consistently that results sets from simple queries on small tables are always in primary key order and generalize to larger tables. The induction step is incorrect in this example.
What can change this? Off-hand, I think that a full table scan would return the data in primary key order if the following conditions are met:
Single threaded server.
Single file filegroup
No competing indexes
No table partitions
I'm not saying this is always true. It just seems reasonable that under these circumstances such a query would use a full table scan starting at the beginning of the table.
Even on a small table, you can get surprises. Consider:
select NonPrimaryKeyColumn
from table
The query plan would probably decide to use an index on table(NonPrimaryKeyColumn) rather than doing a full table scan. The results would not be ordered by the primary key (unless by accident). I show this example because indexes can be used for a variety of purposes, not just order by or where filtering.
If you use a multi-threaded instance of the database and you have reasonably sized tables, you will quickly learn that results without an order by have no explicit ordering.
And finally, SQL Server has a pretty smart optimizer. I think there is some reluctance to use order by in a query because users think it will automatically do a sort. SQL Server works hard to find the best execution plan for the query. IF it recognizes that the order by is redundant because of the rest of the plan, then the order by will not result in a sort.
And, of course you want to guarantee the ordering of results, you need order by in the outermost query. Even a query like this:
select *
from (select top 100 t.* from t order by col1) t
Does not guarantee that the results are ordered in the final result set. You really need to do:
select *
from (select top 100 t.* from t order by col1) t
order by col1;
to guarantee the results in a particular order. This behavior is documented here.
Without ORDER BY, there is no default sort order even if you have clustered index
in this link there is a good example :
CREATE SCHEMA Data AUTHORIZATION dbo
GO
CREATE TABLE Data.Numbers(Number INT NOT NULL PRIMARY KEY)
GO
DECLARE #ID INT;
SET NOCOUNT ON;
SET #ID = 1;
WHILE #ID < 100000 BEGIN
INSERT INTO Data.Numbers(Number)
SELECT #ID;
SET #ID = #ID+1;
END
CREATE TABLE Data.WideTable(ID INT NOT NULL
CONSTRAINT PK_WideTable PRIMARY KEY,
RandomInt INT NOT NULL,
CHARFiller CHAR(1000))
GO
CREATE VIEW dbo.WrappedRand
AS
SELECT RAND() AS random_value
GO
CREATE ALTER FUNCTION dbo.RandomInt()
RETURNS INT
AS
BEGIN
DECLARE #ret INT;
SET #ret = (SELECT random_value*1000000 FROM dbo.WrappedRand);
RETURN #ret;
END
GO
INSERT INTO Data.WideTable(ID,RandomInt,CHARFiller)
SELECT Number, dbo.RandomInt(), 'asdf'
FROM Data.Numbers
GO
CREATE INDEX WideTable_RandomInt ON Data.WideTable(RandomInt)
GO
SELECT TOP 100 ID FROM Data.WideTable
OUTPUT:
1407
253
9175
6568
4506
1623
581
As you have seen, the optimizer has chosen to use a non-clustered
index to satisfy this SELECT TOP query.
Clearly you cannot assume that your results are ordered unless you
explicitly use ORDER BY clause.
One must specify ORDER BY in the outermost query in order to guarantee rows are returned in a particular order. The SQL Server optimizer will optimize the query and data access to improve performance which may result in rows being returned in a different order. Examples of this are allocation order scans and parallelism. A relational table should always be viewed as an unordered set of rows.
I wish the MSDN documentation were clearer about this "sorting". It is more correct to say that SQL Server b-tree indexes provide ordering by 1) storing adjacent keys in the same page and 2) linking index pages in key order.

Unique sort order for postgres pagination

While trying to implement pagination from server side in postgres, i came across a point that while using limit and offset keywords you have to provide an ORDER BY clause on a unique column probably the primary key.
In my case i am using the UUID generation for Pkeys so I can't rely on a sequential order of increasing keys. ORDER BY pkey DESC - might not result in newer rows on top always.
So i resorted to using Created Date column - timestamp column which should be unique.
But my question comes what if the UI client wants to sort by some other column? in the event that it might not always be a unique column i resort to ORDER BY user_column, created_dt DESC so as to maintain predictable results for postgres pagination.
is this the right approach? i am not sure if i am going the right way. please advise.
I talked about this exact problem on an old blog post (in the context of using an ORM):
One last note about using sorting and paging in conjunction. A query
that implements paging can have odd results if the ORDER BY clause
does not include a field that represents an empirical sequence in the
data; sort order is not guaranteed beyond what is explicitly specified
in the ORDER BY clause in most (maybe all) database engines. An
example: if you have 100 orders that all occurred on the exact same
date, and you ask for the first page of this data sorted by this date,
then ask for the second page of data sorted the same way, it is
entirely possible that you will get some of the data duplicated across
both pages. So depending on the query and the distribution of data
that is “sortable,” it can be a good practice to always include a
unique field (like a primary key) as the final field in a sort clause
if you are implementing paging.
http://psandler.wordpress.com/2009/11/20/dynamic-search-objects-part-5sorting/
The strategy of using a column that uniquely identifies a record as pkey or insertion_date may not be possible in some cases.
I have an application where the user sets up his own grid query then it can simply put any column from multiple tables and perhaps none is a unique identifier.
In a case that can be useful you use rownum. You simply select the rownum and use his sort in over function. It would be something like:
select col1, col2, col3, row_number() over(order by col3) from tableX order by col3
It's important that over(order by *) match with order by *. Thus your paging will have consistent results every time.

SQL-Server-2005: Why are results being returned in a different order with(nolock)

i have a primary key clustered index in col1
why when i run the following statements are the results returned in a different order
select * from table
vs
select * from table with(nolock)
the results are also different with tablock
schema:
col1 int not null
col2 varchar (8000)
Without any ORDER BY no order of results is guaranteed.
Your question is now heavily truncated but the original version mentioned that you saw different order of result when using nolock as well as tablock.
Both of these locking options allow SQL Server to use an allocation order scan rather than reading along the clustered index data pages in logical order (following pointers along the linked list).
That should not be taken as meaning that the order is guaranteed to be in clustered index order without that as the advanced scanning mechanism, or parallelism for example could both change this.
The order of rows is never guaranteed unless you use an ORDER BY.
If you have to have the rows in a specific order there is no other solution that will return the rows in a predictable order.
If you leave out the order by the DBMS is free to return the rows in any order it thinks is most efficient
Sql Server makes no guarantee about the ordering, it will change based on how Sql Server optimises the query.
To guarantee the order you must use an order by clause.
if you are not specifying an order, it's completely nondeterministic. Today they may be different, tomorrow maybe not.
Supplying a hint may inadvertently guide the query optimizer down a more efficient path.

Insert data into sqlserver 2000 database, the order change, why?

Insert 200 data througn for-loop into sqlserver 2000 database, the order change, why ?
When I use mysql, it doesn't have the matter.
i mean:
when you insert 2, then insert 3, then insert 1, in mysql you will see 2,3,1 like the order you insert. But in sqlsever2000 that may not.
The order in which rows are stored doesn't really matter. Actually SQL tables don't have any ordering. If order matters to you, you need to query using an ORDER BY clause.
There are no guarantees on the order of the results of a select statement, unless you add an ORDER BY clause. You might get the results back in the order of insertion, but it's not guaranteed.
If you have an index on the table, it might appear that your data is being ordered in a way that you're not expecting. Normally, you'll just have an identity column as your surrogate primary key, which means your inserted data will show up in "order" if you just do a select *, but if you have indexes on other columns the data might be ordered differently when you do select *.
You can control physical ordering by creating a clustered index on that columns. Only one clustered index is allowed per table. Cluster index make sure when you perform SELECT you will always get data in order which was defined when creating cluster index.