hsqldb: do I have to ORDER BY to ensure consistent selection order? - indexing

I have a table containing samples. The inserted samples are already naturally ordered by the timestamp.
My question is this - when I SELECT from the table do I have to use the ORDER BY clause to ensure the fetched samples are ordered by the timestamp?

Rows in a relation database are NOT sorted (Picture them as balls in a basket. Which one is the "first"?)
The only way (really, the only) to get a consistently sorted result is to use ORDER BY.
You cannot rely on side effects of joins, group by. UNION, index retrieval or similar operators. They will never guarantee an order. The DBMS is free to choose to return the rows in whatever order it thinks is the fastest unless you specify an ORDER BY.

If an HSQLDB table T has a column C as primary key, or has any index on that column,
SELECT FROM T ORDER BY C
will return ordered rows without extra ORDER BY processing.
If there is a condition on the select, which uses an index on a different column, you can still force the use of the index for ORDER BY:
SELECT FROM T WHERE <some condition> ORDER BY C USING INDEX
But in this case, you should only use USING INDEX if most of the rows of the table will be returned. Otherwise it is better to leave the engine use the other index to reduce the table scan time.
USING INDEX is ignored if there is no index to use for ORDER BY.

Related

Related rows ordering when using JOIN without ORDER BY

Let's say we have two tables:
user:
id,name
1,bob
2,alice
user_group:
id,user_id,group
1,1,g1
2,1,g2
3,2,g2
4,2,g3
We don't have guarantees that on each execution of SELECT * FROM user without ORDER BY result set will have the same order. But what about related rows in joins?
For example,
SELECT user.name, user_group.group FROM user INNER JOIN user_group ON (user.id = user_group.user_id);. Will the related(joined) rows be adjacent in the result set(take PostgreSQL for ex.)? By that I imply:
bob,g1
bob,g2
alice,g2
alice,g3
OR
alice,g3
alice,g2
bob,g2
bob,g1
and NOT this:
bob,g1
alice,g2
bob,g2
alice,g3
The order of users doesn't matter, the order of groups within each user too
It is a fundamental rule in SQL that you can never rely on the ordering of a result set unless you add an ORDER BY. If you have no ORDER BY, the ordering of the result set can, among others, depend on
the order in which PostgreSQL reads the individual tables – it could be in index order or in sequential order, and even with a sequential scan you don't always get the same order (unless you disable synchronize_seqscans)
the join strategy chosen (nested loop, hash join or merge join)
the number of rows returned by the query (if you use a cursor, PostgreSQL optimizes the query so that the first rows can be returned quickly)
That said, with your specific example and PostgreSQL as database, I think that all join strategies will not return the result set in the order you describe as undesirable. But I wouldn't rely on that: often, the optimizer finds a surprising way to process a query.
The desire to save yourself an ORDER BY often comes from a wish to optimize processing speed. But correctness is more important than speed, and PostgreSQL can often find a way to return the result in the desired order without having to sort explicitly.

Permanently sort SQL table not result

I want to sort my table permanently by using ID column in ascending order.
select * from CLAIMS order by ID asc; gives me the result in ascending order.
But I want to permanently change my table. I am using SQL Management Studio 2014.
While there's no way to guarantee ordered result without an ORDER BY clause on your query, but you can store the rows in a sorted order to enable the SQL Server to run
select * from CLAIMS order by ID asc;
without having to perform a sort every time. To to that simply create a clustered index with ID as the only or the leading column.
EG
alter table CLAIMS add constraint PK_CLAIMS primary key clustered (ID)
or
create unique clustered index AK_CLAIMS on CLAIMS(ID)
I want to sort my table permanently
You just can't. SQL tables represent unordered set of rows. There is no inherent ordering of rows in a table, as you seem to assume.
If you want the rows returned in a given order for a given query, then do add an order by clause to the query:
select * from claims order by id
If you don't provide an order by clause, the database is free to return the rows in whichever order it likes. The ordering you see today for a given query might change unexpectedly in the future.

PostgreSQL Query without WHERE only ORDER BY and LIMIT doesn't use index

I have a table that contains an 'id' column of type BIGSERIAL. I also have an index for this one column (sort order descending, BTREE, unique).
I often need to retrieve the last 10, 20, 30 entries from a table of millions of entries, like this:
SELECT * FROM table ORDER BY id DESC LIMIT 10
I would have thought it's a pretty clear case: there's an index for this particular field, sort order matches, and I need only 10 entries compared to millions in the whole table, this query definitely uses an index scan.
But it doesn't it does a sequential scan over the whole table.
I try to dig deeper, didn't find anything unusual. The Postgres doc at https://www.postgresql.org/docs/9.6/static/indexes-ordering.html says:
An important special case is ORDER BY in combination with LIMIT n: an
explicit sort will have to process all the data to identify the first
n rows, but if there is an index matching the ORDER BY, the first n
rows can be retrieved directly, without scanning the remainder at all.
But it still doesn't work. Does anybody have any pointers for me? Maybe I'm just not seeing the forrest for the trees anymore... :-(
Ok, saying it out loud and trying to gather more information to put into my question apparently made me see the forrest again, I found the actual problem. Further down in the doc I mentioned above is this sentence:
An index stored in ascending order with nulls first can satisfy either
ORDER BY x ASC NULLS FIRST or ORDER BY x DESC NULLS LAST depending on
which direction it is scanned in.
This was the problem. I specified the sort order in the index but I ignored the NULLS FIRST vs. LAST.
Postgres default is NULLS FIRST if you don't mention it explicitly in your query. So what Postgres found was the combination ORDER BY DESC NULLS FIRST which wasn't covered by my index. The combination of both SORT ORDER and NULLS is what matters.
The 2 possible solutions:
Either mention NULLS FIRST/LAST accordingly in the query so that it matches the index
...or change the index to NULLS FIRST (which is what I did)
Now Postgres is doing a proper index scan and only touches 10 elements during the query, not all of them.
If you need to get last 10 entries in table you can use this:
SELECT *
FROM table
WHERE id >= (SELECT MAX(id) FROM table) - 10
ORDER BY id DESC
And similarly for 20 and 30 entries.
This looks not so clear, but works fast as long as you have index for 'id' column.

Unique sort order for postgres pagination

While trying to implement pagination from server side in postgres, i came across a point that while using limit and offset keywords you have to provide an ORDER BY clause on a unique column probably the primary key.
In my case i am using the UUID generation for Pkeys so I can't rely on a sequential order of increasing keys. ORDER BY pkey DESC - might not result in newer rows on top always.
So i resorted to using Created Date column - timestamp column which should be unique.
But my question comes what if the UI client wants to sort by some other column? in the event that it might not always be a unique column i resort to ORDER BY user_column, created_dt DESC so as to maintain predictable results for postgres pagination.
is this the right approach? i am not sure if i am going the right way. please advise.
I talked about this exact problem on an old blog post (in the context of using an ORM):
One last note about using sorting and paging in conjunction. A query
that implements paging can have odd results if the ORDER BY clause
does not include a field that represents an empirical sequence in the
data; sort order is not guaranteed beyond what is explicitly specified
in the ORDER BY clause in most (maybe all) database engines. An
example: if you have 100 orders that all occurred on the exact same
date, and you ask for the first page of this data sorted by this date,
then ask for the second page of data sorted the same way, it is
entirely possible that you will get some of the data duplicated across
both pages. So depending on the query and the distribution of data
that is “sortable,” it can be a good practice to always include a
unique field (like a primary key) as the final field in a sort clause
if you are implementing paging.
http://psandler.wordpress.com/2009/11/20/dynamic-search-objects-part-5sorting/
The strategy of using a column that uniquely identifies a record as pkey or insertion_date may not be possible in some cases.
I have an application where the user sets up his own grid query then it can simply put any column from multiple tables and perhaps none is a unique identifier.
In a case that can be useful you use rownum. You simply select the rownum and use his sort in over function. It would be something like:
select col1, col2, col3, row_number() over(order by col3) from tableX order by col3
It's important that over(order by *) match with order by *. Thus your paging will have consistent results every time.

SQL-Server-2005: Why are results being returned in a different order with(nolock)

i have a primary key clustered index in col1
why when i run the following statements are the results returned in a different order
select * from table
vs
select * from table with(nolock)
the results are also different with tablock
schema:
col1 int not null
col2 varchar (8000)
Without any ORDER BY no order of results is guaranteed.
Your question is now heavily truncated but the original version mentioned that you saw different order of result when using nolock as well as tablock.
Both of these locking options allow SQL Server to use an allocation order scan rather than reading along the clustered index data pages in logical order (following pointers along the linked list).
That should not be taken as meaning that the order is guaranteed to be in clustered index order without that as the advanced scanning mechanism, or parallelism for example could both change this.
The order of rows is never guaranteed unless you use an ORDER BY.
If you have to have the rows in a specific order there is no other solution that will return the rows in a predictable order.
If you leave out the order by the DBMS is free to return the rows in any order it thinks is most efficient
Sql Server makes no guarantee about the ordering, it will change based on how Sql Server optimises the query.
To guarantee the order you must use an order by clause.
if you are not specifying an order, it's completely nondeterministic. Today they may be different, tomorrow maybe not.
Supplying a hint may inadvertently guide the query optimizer down a more efficient path.