Using the Seek Method in SQL and jumping to non-adjacent pages - sql

I'm implementing pagination using the SQL strategy called Seek method in a PostgreSql RDBMS.
All the examples that I see over the Internet are explaining how to get the next page (e.g. see this article. But I'm wondering how to implement the method to move from a page to another that is not adjacent (e.g. from page 1 to page 5) without using any offset.
Any example?

A standard seek method on a table with a serial identifier can be written as:
SELECT id FROM table_with_serial_id WHERE id > prev_page_last_id ORDER BY id ASC LIMIT page_size;
Starting with prev_page_last_id set to 0, we can progressively advance through the table by always using the last id from the previous page.
Therefore if you want to skip to another page you could simply add page_size to prev_page_last_id to skip to the next page.
Note that this only works if you do not have gaps in the id column as this would cause a simple offset from the previous page.
Unfortunately in the latter there is no way to predict what the next id limit will be without going through each page, unless you want to accept the compromise of the possibility of having pages with less than page_size
Hope this helps!

T-SQL:
DECLARE #row_per_page INT = 100
DECLARE #page_number INT = 2
SELECT * FROM
(SELECT ROW_NUMBER() OVER (ORDER BY [ID]) AS [RowNumber],*
FROM table_name) AS T
WHERE T.[RowNumber] > (#page_number-1)*#row_per_page AND T.[RowNumber] < #page_number*#row_per_page+1

Related

Acquiring offset of some row in SQL

TL;DR: Is there a possibility to get OFFSET position of a particular, known row in SQL, considering some ORDER BY is applied?
So consider a schema like this (simplified):
CREATE TABLE "public"."painting" (
"uuid" uuid NOT NULL DEFAULT uuid_generate_v4(),
"name" varchar NOT NULL,
"score" int4 NOT NULL,
"approvedAt" timestamp,
PRIMARY KEY ("uuid")
);
Like
abc1,test1,10,10:00
abc2,test2,9,11:00
abc3,test3,8,8:00
abc4,test4,8,12:00
abc5,test5,6,7:00
I want to make a request sorted by score and limited with 3 items, and I should emphasize that multiple entities might have the same score.
Because of a dynamic nature of that table, while traversing through those items, sorted by score, some new item might appear somewhere in the list.
If I use SQL OFFSET statement, that means this new entity will shift all entities below to one row, so that the new selection will have an item, that was last on previous 3 items selection.
abc1,test1,10,10:00
abc2,test2,9,11:00
abc6,test6,8,15:00 (new item)
CURRENT OFFSET = 3
abc3,test3,8,8:00 (was in previous select)
abc4,test4,8,12:00
abc5,test5,6,7:00
To avoid that, instead of using OFFSET, I can remember the UUID of the item I fetched last, so it'll be abc3. On next request, I can use it's score to add an extra WHERE SCORE < 8 statement, but this will skip abc4, because it's too having score of 8.
If I use WHERE SCORE <= 8 this will again return abc3 which is already traversed. I can't use another field in WHERE clause, because this will affect the results. Additional ORDER BY won't help either.
It seems to me that it is a very common problem in database selection, yet I can't find one comprehensive answer.
So, my question then, if it's possible to do some kind of request like following:
SELECT * FROM "painting" WHERE "score" <= :score ORDER BY "score" DESC OFFSET %position of `abc3`% LIMIT 3
Or alternatively
SELECT OFFSET OF (`abc3`) FROM "painting" WHERE SCORE <= :score ORDER BY "score" DESC LIMIT 3
That will return 2 (because it's the second row with such score), then do
SELECT * FROM "painting" WHERE "score" <= :score ORDER BY "score" DESC OFFSET :offset LIMIT 3
where :score is the score of last received item and :offset is the result of SELECT OFFSET - 1
My own assumption is that we have to SELECT WHERE "score" = :score, and get offset position outside the SQL (or make a very complex SQL query). Though, if we have a lot of items with similar ORDER BY attribute, this helper request might end up being heavier than the data fetch itself.
Yet, I feel like that there's a much more clever SQL way of doing what I'm trying to do.
Good question. Accurate Backend Pagination requires the underlying data to use an ordering criteria with a set of columns that represent a UNIQUE key.
In your case your ordering criteria can be made unique by adding the column uuid to it. With that in mind you can increase the page size by 1 behind the scenes to 4. That 4th row won't be displayed but only used to retrieve the next page.
For example, you can get:
select *
from painting
order by -score, approvedAt, uuid
limit 4
Now you would display the first three rows:
abc1,test1,10,10:00
abc2,test2,9,11:00
abc3,test3,8,8:00
The client app (most likely the UI) will remember -- not display -- the 4th row (the "key") to retrieve the next page:
abc4,test4,8,12:00
Then, to get the next page the query will add a WHERE clause with the "key" and take the form:
select *
from painting
where (-score, approvedAt, uuid) >= (-8, '12:00', 'abc4')
order by -score, approvedAt, uuid
limit 4
This query won't display the new row being inserted, but the original 4th row.
To get blazing fast data retrieval you could create the index:
create index ix1 on painting ((-score), approvedAt, uuid);
See example at DB Fiddle.

Dealing with value based pagination given a number of a page

When using value based pagination
select * from articles
where id > #start_value
limit #page_size
how can I calculate #start_value given only page number?
Namely: say, I had a website and html page with a list of articles that I needed to paginate. But even to render the very 1st page, I'd need to calculate #start_value somehow. The input from a user would be a number of a page which he clicked; for the very first page it'd be 1 - by default.
given that 1, how would I calcualate #start_value?
or given any random page, still how would I calcualate #start_value?
Note that the values of the column id of a table aren't necessarily sequential, even if id is autoincremented.
First off, pagination without any sorting is not ideal. You can't guarantee how SQL will sort your results without including and order by clause.
You will also need to know the page size to calculate your start value, but given #page_num, and #page_size: #start_value is calculated by #start_value = #page_num * #page_size;.
Here it is without the where clause and with limit/offset instead
select *
from articles
order by id
limit #page_size
offset (#page_size * #page_num)
You don't need the "where id > ..." part. The right way of achieving this is using limit #page_size offset #offset construct. This way you don't have to worry about the gaps. To calculate the offset based on page number, you just have to multiply page_size * page_number. Another important thing is that you should order your registers if you want to have the same result always. If you don't trust the IDs, you can order by date or another field. So:
select * from articles
order by date
limit #page_size
offset (#page_size * (#page_num-1))
Note: I used (#page_num-1) to start with a 0 offset for page 1.

CosmosDB: How does SELECT TOP work?

I have a 450GB database... with millions of records.
Here is an example query:
SELECT TOP 1 * FROM c WHERE c.type='water';
To speed up our queries, I thought about just taking the first one but we have noticed that the query still takes quite a while, despite the very first record in the database matching our constraints.
So, my question is, how does the SELECT TOP 1 really work? Does it:
A) Select ALL records and then return just the first (top) one where
type='water'
B) Return the first record which is encountered where type='water'
Try this line, noting the offset limit:
SELECT * FROM c WHERE c.type='water' OFFSET 0 LIMIT 1
For more information about the offset limit:
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-offset-limit
Assuming you aren't sorting your results (which you query isn't) then TOP 1 will return the first result as soon as it finds one. This should then end the query.
Cosmos db Explorer doesn't work with the TOP Command, It's an existing issue. It works fine in SDK Call.
Check some Top command usage below
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-subquery

How to update a counter for a resultset

i'm creating something similar to an advertising system.
I would like to show, for example, 5 ads (5 record) from a given database table.
So i execute something like
SELECT * FROM mytable
ORDER BY view_counter ASC
LIMIT 5
ok, it works.
But, how can contextualy update the "view_counter" (that is a counter with the number of show) maybe with a single SQL ?
And, if i don't ask too much, is it possible to save the "position" which my record are returned ?
For example, my sql return
- record F (pos. 1)
- record X (pos. 2)
- record Z (pos. 3)
And save in a field "Avarage_Position" the .. avarage of position ?
Thanks in advance.
Regards
how can contextualy update the "view_counter" (that is a counter with the number of show) maybe with a single SQL ?
That's usually something handled by analytic/rank/windowing functions, which MySQL doesn't currently support. But you can use the following query to get the output you want:
SELECT *,
#rownum := #rownum + 1 AS rank
FROM mytable
JOIN (SELECT #rownum := 1) r
ORDER BY view_counter ASC
LIMIT 5
You'd get output like:
description | rank
--------------------------
record F | 1
record X | 2
record Z | 3
if i don't ask too much, is it possible to save the "position" which my record are returned ?
I don't recommend doing this, because it means the data needs to be updated every time there's a change. On other databases I'd recommend using a view so the calculation is made only when the view is used, but MySQL doesn't support variable use in views.
There is an alternative means of getting the rank value using a subselect - this link is for SQL Server, but there's nothing in the solution that is SQL Server specific.
You could do something like this, but it is pretty ugly and I would not recommend it (see below for my actual suggestion about how to handle this issue).
Create a dummy_field tinyint field, sum_position int field and average_position decimal field and run the following few statements within the same connection (I am usually very much against MySQL stored procedures, but in this case it could be useful to store this in a SP).
SET #updated_ads := '';
SET #current_position := 0;
UPDATE mytable SET view_counter= view_counter+1,
dummy_field = (SELECT #updated_ads := CONCAT(#updated_ads,id,"\t",ad_text,"\r\n"))*0, /* I added *0 for saving it as tinyint in dummy_field */
sum_position = sum_position + (#current_position := #current_position +1),
average_position = sum_position / view_counter
ORDER BY view_counter DESC
LIMIT 5;
SELECT #updated_ads;
Then parse the result string in your code using the delimiters you used (I used \r\n as a row delimiter and \t as the field delimiter).
What I actually suggest you to do is:
Query for selected ads.
Write a log file with the selected ads and positions.
Write a job to process the log file and update view_counter, average_position and sum_position fields in batch.
thanks for your answer. I solved simply executing the same SELECT query (with exactly the clause WHERE, Order BY and LIMIT) but, instead SELECT, i used UPDATE.
Yes, there's an "overhead", but it's simple solution.

How to limit result set size for arbitrary query in Ingres?

In Oracle, the number of rows returned in an arbitrary query can be limited by filtering on the "virtual" rownum column. Consider the following example, which will return, at most, 10 rows.
SELECT * FROM all_tables WHERE rownum <= 10
Is there a simple, generic way to do something similar in Ingres?
Blatantly changing my answer. "Limit 10" works for MySql and others, Ingres uses
Select First 10 * from myTable
Ref
select * from myTable limit 10 does not work.
Have discovered one possible solution:
TIDs are "tuple identifiers" or row addresses. The TID contains the
page number and the index of the offset to the row relative to the
page boundary. TIDs are presently implemented as 4-byte integers.
The TID uniquely identifies each row in a table. Every row has a
TID. The high-order 23 bits of the TID are the page number of the page
in which the row occurs. The TID can be addressed in SQL by the name
`tid.'
So you can limit the number of rows coming back using something like:
select * from SomeTable where tid < 2048
The method is somewhat inexact in the number of rows it returns. It's fine for my requirement though because I just want to limit rows coming back from a very large result set to speed up testing.
Hey Craig. I'm sorry, I made a Ninja Edit.
No, Limit 10 does not work, I was mistaken in thinking it was standard SQL supported by everyone. Ingres uses (according to doc) "First" to solve the issue.
Hey Ninja editor from Stockholm! No worries, have confirmed that "first X" works well and a much nicer solution than I came up with. Thankyou!