Unique sort order for postgres pagination - sql

While trying to implement pagination from server side in postgres, i came across a point that while using limit and offset keywords you have to provide an ORDER BY clause on a unique column probably the primary key.
In my case i am using the UUID generation for Pkeys so I can't rely on a sequential order of increasing keys. ORDER BY pkey DESC - might not result in newer rows on top always.
So i resorted to using Created Date column - timestamp column which should be unique.
But my question comes what if the UI client wants to sort by some other column? in the event that it might not always be a unique column i resort to ORDER BY user_column, created_dt DESC so as to maintain predictable results for postgres pagination.
is this the right approach? i am not sure if i am going the right way. please advise.

I talked about this exact problem on an old blog post (in the context of using an ORM):
One last note about using sorting and paging in conjunction. A query
that implements paging can have odd results if the ORDER BY clause
does not include a field that represents an empirical sequence in the
data; sort order is not guaranteed beyond what is explicitly specified
in the ORDER BY clause in most (maybe all) database engines. An
example: if you have 100 orders that all occurred on the exact same
date, and you ask for the first page of this data sorted by this date,
then ask for the second page of data sorted the same way, it is
entirely possible that you will get some of the data duplicated across
both pages. So depending on the query and the distribution of data
that is “sortable,” it can be a good practice to always include a
unique field (like a primary key) as the final field in a sort clause
if you are implementing paging.
http://psandler.wordpress.com/2009/11/20/dynamic-search-objects-part-5sorting/

The strategy of using a column that uniquely identifies a record as pkey or insertion_date may not be possible in some cases.
I have an application where the user sets up his own grid query then it can simply put any column from multiple tables and perhaps none is a unique identifier.
In a case that can be useful you use rownum. You simply select the rownum and use his sort in over function. It would be something like:
select col1, col2, col3, row_number() over(order by col3) from tableX order by col3
It's important that over(order by *) match with order by *. Thus your paging will have consistent results every time.

Related

SQL pagination based on last record retrieved

I need to implement pagination which is semi-resilient to data changing between paginations. The standard pagination relies on SQL's LIMIT and OFFSET, however offset has potential to become inaccurate as new data points are created or their ranking shifts in the sort.
One idea is to hold onto the last data point requested from the API and get the following elements. I don't really know SQL (we're using postgres), but this is my (certainly flawed) attempt at doing something like that. I am trying to store the position of the last element as 'rownum' and then use it in the following query.
WITH rownum AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY rank ASC, id) AS rownum
WHERE id = #{after_id}
FROM items )
SELECT * FROM items
OFFSET rownum
ORDER BY rank ASC, id
LIMIT #{pagination_limit}
I can see some issues with this, like if the last item changes significantly in rank. If anyone can think of another way to do this, that would be great. But I would like to confine it to a single DB query if possible since this is the applications most frequently hit API.
Your whole syntax doesn't quite work. OFFSET comes after ORDER BY. FROM comes before WHERE etc.
This simpler query would do what I think your code is supposed to do:
SELECT *
FROM items
WHERE (rank, id) > (
SELECT (rank, id)
FROM items
WHERE id = #{after_id}
)
ORDER BY rank, id
LIMIT #{pagination_limit};
Comparing the composite type (rank, id) guarantees identical sort order.
Make sure you have two indexes:
A multicolumn index on (rank, id).
Another one on just (id) - you probably have a pk constraint on the column doing that already. (A multicolumn index with leading id would do the job as well.)
More about indexes:
Is a composite index also good for queries on the first field?
If rank is not volatile it would be more efficient to parameterize it additionally instead of retrieving it dynamically - but the volatility of rank seems to be the point of your deliberations ...
I now think the best way to solve this problem is by storing the datetime of the original query and filtering out results after that moment on subsequent queries, thus ensuring the offset is mostly correct. Maybe a persistent database could be used to ensure that the data is at the same state it was when the original query was made.

SQL-Server-2005: Why are results being returned in a different order with(nolock)

i have a primary key clustered index in col1
why when i run the following statements are the results returned in a different order
select * from table
vs
select * from table with(nolock)
the results are also different with tablock
schema:
col1 int not null
col2 varchar (8000)
Without any ORDER BY no order of results is guaranteed.
Your question is now heavily truncated but the original version mentioned that you saw different order of result when using nolock as well as tablock.
Both of these locking options allow SQL Server to use an allocation order scan rather than reading along the clustered index data pages in logical order (following pointers along the linked list).
That should not be taken as meaning that the order is guaranteed to be in clustered index order without that as the advanced scanning mechanism, or parallelism for example could both change this.
The order of rows is never guaranteed unless you use an ORDER BY.
If you have to have the rows in a specific order there is no other solution that will return the rows in a predictable order.
If you leave out the order by the DBMS is free to return the rows in any order it thinks is most efficient
Sql Server makes no guarantee about the ordering, it will change based on how Sql Server optimises the query.
To guarantee the order you must use an order by clause.
if you are not specifying an order, it's completely nondeterministic. Today they may be different, tomorrow maybe not.
Supplying a hint may inadvertently guide the query optimizer down a more efficient path.

SQL Server Implicit Order

i've got an issue due to database conception.
My data are grouped in a table which looks like :
IdGroup | IdValue
So for each group i've got the list of value.
Indeed, we should have had an order column or an id, but i can't.
Do you know anyway which can prove the order of the select value based on the insert order ?
I mean, if I inserted 1003,1001,1002 could i garantuee it to be retrieve in this order ?
IdGroup | IdValue
1 | 1003
1 | 1001
1 | 1002
Of course, using an order by doesn't seems to fit because i don't have any column usable.
Any idea ? Using a system proc or something like this.
Thanks a lot :)
Stop telling me to use an order by and altering the table, it doesn't fit and yes i know it's the good pratice to do... thanks :)
A couple of ideas:
DBCC PAGE (undocumented) can be used to look at the raw data pages of the table. It may be possible to determine insert order by looking at the low level information.
If you cannot alter the table, can you add a table to the database? If so, consider creating a table with an identity column and use a trigger on the original table to insert the records in the new table.
Also, you should include which version(s) of SQL Server are involved. Doing anything this unusual will very often be version specific.
You shouldn't rely on the data being returned in a particular order; use an ORDER BY clause to guarantee the order.
(Despite the fact that data appears to be returned in clustered index order, this might not always be the case).
Whilst some small scale tests will show that it returns it in what appears to be the right order, it just will not hold.
The golden rule remains - unless an order by clause is specified, there are no guarentees provided on the order of the returned data.
edit : If you place a non-clustered index on the idgroup column it is forced to add a hidden field, the uniqueifier since the values are the same - the problem it, you can't access it in an order by clause, but from a forensic perspective, you can determine the order it was inserted in.
As others have said, the only way to guarantee an ordering is with an ORDER BY clause. What isn't highlighted in their answers is that, the only place that this ORDER BY matters is in the SELECT statement. It doesn't* matter if you apply an ORDER BY clause during the INSERT statement; the system is free to return results from a select in whatever order it finds most efficient, unless an ORDER BY is specified at that time.
*There's a particular way to ensure what order IDENTITY values are assigned to a result set during an INSERT, using an ORDER BY, but I can't remember the exact details, and it still doesn't effect the order of SELECT.
Can you add the Created Date column? In this way you can get the records using Order by Clause Created Date. Moreover set it's default value Getdate()

how to reverse mysql table

I need to display what the table contains from the freshest data to the oldest. Something like this doesn't work:
SELECT * FROM table ORDER BY DESC;. I know its becouse after ORDER BY should be name of the column. But I want to just reverse normal order (by normal I mean from the oldest to the freshest data). How to do this?
In your query the DESC stands for descending, the reverse is ascending, or:
SELECT * FROM table ORDER BY column ASC;
btw, if you do not specify a column, what you call "normal order" really is random unless you specify an ordering.
The "normal order" is not always from oldest to freshest, since some records may be deleted and then these are replaced with the new ones. It means that the "natural order" may appear to be somewhat "random" with the freshest items being in the middle of the dataset.
You need to add a column for an insertion date or an incrementing key.
You can't rely on the physical storage pattern to give you a correct ordering. According to the MySQL documentation, there is no guarantee that rows returned will be in any particular order.
"...the result rows are displayed in no
particular order. It is often easier
to examine query output when the rows
are sorted in some meaningful way. To
sort a result, use an ORDER BY clause."
http://dev.mysql.com/doc/refman/5.0/en/sorting-rows.html
You could create a new field of type timestamp and set the default value to CURRENT_TIMESTAMP, then then ORDER BY on that field.
The physical ordering of the records in a table are not guaranteed to match the sequence in which they were created. To do this, you will need to find or create a field you can sort on. A 'create date' field, or perhaps an id value which increases as new records are added (like an order id or something).
if you have autoincremented ID's , so maybe
order by id desc ? :)

Does 'Select' always order by primary key?

A basic simple question for all of you DBA.
When I do a select, is it always guaranteed that my result will be ordered by the primary key, or should I specify it with an 'order by'?
I'm using Oracle as my DB.
No, if you do not use "order by" you are not guaranteed any ordering whatsoever. In fact, you are not guaranteed that the ordering from one query to the next will be the same. Remember that SQL is dealing with data in a set based fashion. Now, one database implementation or another may happen to provide orderings in a certain way but you should never rely on that.
When I do a select, is it always guaranteed that my result will be ordered by the primary key, or should I specify it with an 'order by'?
No, it's by far not guaranteed.
SELECT *
FROM table
most probably will use TABLE SCAN which does not use primary key at all.
You can use a hint:
SELECT /*+ INDEX(pk_index_name) */
*
FROM table
, but even in this case the ordering is not guaranteed: if you use Enterprise Edition, the query may be parallelized.
This is a problem, since ORDER BY cannot be used in a SELECT clause subquery and you cannot write something like this:
SELECT (
SELECT column
FROM table
WHERE rownum = 1
ORDER BY
other_column
)
FROM other_table
No, ordering is never guaranteed unless you use an ORDER BY.
The order that rows are fetched is dependent on the access method (e.g. full table scan, index scan), the physical attributes of the table, the logical location of each row within the table, and other factors. These can all change even if you don't change your query, so in order to guarantee a consistent ordering in your result set, ORDER BY is necessary.
It depends on your DB and also it depends on indexed fields.
For example, in my table Users every user has unique varchar(20) field - login, and primary key - id.
And "Select * from users" returns rowset ordered by login.
If you desire specific ordering then declare it specifically using ORDER BY.
What if the table doesn't have primary key?
If you want your results in a specific order, always specify an order by