Efficient SQL to find specific ID used with pagination

Efficient SQL to find specific ID used with pagination - sql

At work I need to implement a feature in the API that returns a specific page (20 entries) that contains the entry with a specified ID. That entry could be any of those 20 in the page.
Normally, a page is determined by taking the ID of the last element of the previous page, applying a filter to the elements after the previous ID and take the first 20 entries of the result.
But with the new feature, you‘re supposed to receive the page that CONTAINS the specified ID, rather than using it to determine the first element of a new page.
I‘m not working with databases that much so I‘m not sure, but since I‘m the only developer in this company, there‘s no one I can ask. If it helps, the database is MS SQL Server. If more info is needed, I can give it, as long as it‘s not against company policy.

Let's say we look for ID 123456. This can be anywhere from page 1 to page 6173 (or even missing completely). There is no way to tell other than to count the rows/pages until we get there. And we even must count on to get all following rows that are still on the same page. This is not difficult, but rather slow.
We cannot know which ID comes after another; after ID 5 the next may be ID 6 or 1234 or 1000000 or whatever. So the first step is to number all rows ordered by ID. Or rather assign them pages, for we know numbers 1 to 20 = page 1, numbers 21 to 40 = page 2, etc. Thus we get to know that our ID is on page X and we must select all rows marked with page X.
with rows_with_page as
(
select
t.*,
(row_number() over (order by id) - 1) / 20 + 1 as page
from mytable t
)
select *
from rows_with_page
where page = (select page from rows_with_page where id = #id)
order by id;
As a row number is an integer, the division with / is an integer division resulting in an integer in SQL Server (like in elementary school; 5 / 2 = 2).

If I understand your question, you have a fixed page size of 20 records per page. You will receive a ID value and that ID value can appear anywhere on a given page of 20 records. So, assuming the ID values start at 1, if you receive an ID value between 1 and 20 it should return all the records on page 1, if the ID value received is between 21 and 40 it should return all the records on page 2, etc.
We can figure out what page we are on by doing integer division - divide the ID by 20 and then add 1. So for ID 4, (4 / 20) is zero (remember integer division!) and then add 1 to get page 1. You can use the LIMIT...OFFSET feature of SQL Server (assuming you are using version that supports it).
I will create a simple script to show how this works. I don't know if you plan is to create a stored procedure or what, but you can adapt this technique. So let's make a simple table variable to mimic your table, add 100 records for some sample data, and then write a query to retrieve the records on the appropriate page.
-- create a table for the query
DECLARE #Records AS TABLE(
ID INT IDENTITY(1,1) NOT NULL,
[Value] INT NOT NULL
);
-- populate with 100 sample records
DECLARE #i INT;
SELECT #i = 1;
WHILE (#i <= 100)
BEGIN
INSERT INTO #Records([Value]) VALUES (#i);
SELECT #i = #i + 1;
END
-- now find the records on the correct page
DECLARE #id INT = 41; -- the record to find
DECLARE #pageSize INT = 20; -- the number of records per page
DECLARE #page INT; -- the page, counting from 1
SELECT #page = (#id / 20) + 1;
SELECT ID, [Value]
FROM #Records
ORDER BY ID
OFFSET (#page -1 ) * #pageSize ROWS
FETCH NEXT #pageSize ROWS ONLY;
That should work for you (if I understand what you were asking).

Related

Select records from a specific key onwards

I have a table that has more than three trillion records
The main key of this table is guid
As below
GUID Value mid id
0B821574-8E85-4FB7-8047-553393E385CB 4 51 15
716F74B0-80D8-4869-86B4-99FF9EB10561 0 510 153
7EBA2C31-FFC8-4071-B11A-9E2B7ED16B2B 2 5 3
85491F90-E4C6-4030-B1E5-B9CA36238AE2 1 58 7
F04FA30C-0C35-4B9F-A01C-708C0189815D 20 50 13
guid is primary key
I want to select 10 records from where the key is equal to, for example, 85491F90-E4C6-4030-B1E5-B9CA36238AE2

You can use order by and top. Assuming that guid defines the ordering of the rows:
select top (10) t.*
from mytable t
where guid >= '85491F90-E4C6-4030-B1E5-B9CA36238AE2'
order by guid
If the ordering is defined in an other column, say id (that should be unique as well), then you would use a correlated subquery for filterig:
select top (10) t.*
from mytable t
where id >= (select id from mytable t1 where guid = '85491F90-E4C6-4030-B1E5-B9CA36238AE2')
order by id

To read data onward You can use OFFSET .. FETCH in the ORDER BY since MS SQL Server 2012. According learn.microsoft.com something like this:
-- Declare and set the variables for the OFFSET and FETCH values.
DECLARE #StartingRowNumber INT = 1
, #RowCountPerPage INT = 10;
-- Create the condition to stop the transaction after all rows have been returned:
WHILE (SELECT COUNT(*) FROM mytable) >= #StartingRowNumber
BEGIN
-- Run the query until the stop condition is met:
SELECT *
FROM mytable WHERE guid = '85491F90-E4C6-4030-B1E5-B9CA36238AE2'
ORDER BY id
OFFSET #StartingRowNumber - 1 ROWS
FETCH NEXT #RowCountPerPage ROWS ONLY;
-- Increment #StartingRowNumber value:
SET #StartingRowNumber = #StartingRowNumber + #RowCountPerPage;
CONTINUE
END;
In the real world it will not be enough, because another processes could (try) read or write data in your table at the same time.
Please, read documentation, for example, search for "Running multiple queries in a single transaction" in the https://learn.microsoft.com/en-us/sql/t-sql/queries/select-order-by-clause-transact-sql
Proper indexes for fields id and guid must to be created/applied to provide performance

SQL how to show the page of results that includes a specific record

I have an SQL database, and I use pagination and sorting to show it.
e.g.
SELECT *
FROM People_Table
WHERE Country=#Country
ORDER BY Postcode OFFSET #Offset ROWS FETCH NEXT 12 ROWS ONLY
If a new person is added, I want to show the page of results which contains the new person. They might be halfway down that page, so I don't want to just find them and the next 11 records. Any ideas on how to do this elegantly?

If I understand the question correctly what I'm thinking is you'd need to include a calculated column to contain page number that is based on an NTILE grouping that is based on the order by. Something like this using a partition over clause with an NTILE. See link for details on NTILE.
DECLARE #RowsPerPage INT = 12;
DECLARE #PageCount INT;
SELECT #PageCount = COUNT(*) / #RowsPerPage
FROM People_Table
WHERE Country = #Country;
SELECT * ,
NTILE(#PageCount) OVER ( ORDER BY PostCode ) AS PageNum
FROM People_Table
WHERE Country = #Country
ORDER BY Postcode
OFFSET #Offset ROWS FETCH NEXT #RowsPerPage ROWS ONLY;
Code sample for reference only, I haven't tested it other than making sure it would parse.
Comments: You'd probably need to be more specific on your order by however as ordering by postal code doesn't guarantee the order of rows that share the same postal code. This would be a problem when one postal code starts spanning more than one page. So you want to order by postal code and by person name or something more specific.

Thanks NTILE I'll give that a go. (The postcode was me trying to give a more understandable table than the one I actually have).
Where I'd got to was:
SELECT RowNo
FROM (SELECT ROW_NUMBER() OVER (ORDER BY DriverID) AS RowNo,Id
FROM DriverID_Table
WHERE Fleet IN (SELECT Fleet_ID FROM Fleet_Table WHERE User_Id=#user)) AS t
WHERE Id=#ID
using the value returned from that to calculate a page number and then a second query to fetch that page,
page = (int)RowNo/12; Offset = page*12;
SELECT *
FROM DriverID_Table
WHERE Fleet IN (SELECT Fleet_ID FROM Fleet_Table WHERE User_Id=#user)
ORDER BY DriverId
OFFSET #Offset ROWS FETCH NEXT 12 ROWS ONLY

SQL Query Create isDuplicate Column with IDs

I have a SQL Server 2005 database I'm working with. For the query I am using, I want to add a custom column that can start at any number and increment based on the row entry number.
For example, I start at number 10. Each row in my results will have an incrementing number 10, 11, 12, etc..
This is an example of the SELECT statement I would be using.
int customVal = 10;
SELECT
ID, customVal++
FROM myTable
The format of the above is clearly wrong, but it is conceptually what I am looking for.
RESULTS:
ID CustomColumn
-------------------
1 10
2 11
3 12
4 13
How can I go about implementing this kind functionality?
I cannot find any reference to incrementing variables within results. Is this the case?
EDIT: The customVal number will be pulled from another table. I.e. probably do a Select statement into the customVal variable. You cannot assume the the ID column will be any usable values.
The CustomColumn will be auto-incrementing starting at the customVal.

Use the ROW_NUMBER ranking function - http://technet.microsoft.com/en-us/library/ms186734.aspx
DECLARE #Offset INT = 9
SELECT
ID
, ROW_NUMBER() OVER (ORDER BY ID) + #Offset
FROM
Table

Process SQL Table with no Unique Column

We have a table which keeps the log of internet usage inside our company. this table is filled by a software bought by us and we cannot make any changes to its table. This table does not have a unique key or index (to make the data writing faster as its developers say)
I need to read the data in this table to create real time reports of internet usage by our users.
currently I'm reading data from this table in chunks of 1000 records. My problem is keeping the last record I have read from the table, so I can read the next 1000 records.
what is the best possible solution to this problem?
by the way, earlier records may get deleted by the software as needed if the database file size gets big.

Depending on your version of SQL Server, you can use row_number(). Once the row_number() is assigned, then you can page through the records:
select *
from
(
select *,
row_number() over(order by id) rn
from yourtable
) src
where rn between 1 and 1000
Then when you want to get the next set of records, you could change the values in the WHERE clause to:
where rn between 1001 and 2000
Based on your comment that the data gets deleted, I would do the following.
First, insert the data into a temptable:
select *, row_number() over(order by id) rn
into #temp
from yourtable
Then you can select the data by row number in any block as needed.
select *
from #temp
where rn between 1 and 1000

This would also help;
declare #numRecords int = 1000 --Number of records needed per request
declare #requestCount int = 0 --Request number starting from 0 and increase 1 by 1
select top (#numRecords) *
from
(
select *, row_number() over(order by id) rn
from yourtable
) T
where rn > #requestCount*#numRecords
EDIT: As per comments
CREATE PROCEDURE [dbo].[select_myrecords]
--Number of records needed per request
declare #NumRecords int --(= 1000 )
--Datetime of the LAST RECORD of previous result-set or null for first request
declare #LastDateTime datetime = null
AS
BEGIN
select top (#NumRecords) *
from yourtable
where LOGTime < isnull(#LastDateTime,getdate())
order by LOGTime desc
END

Without any index you cannot efficiently select the "last" records. The solution will not scale. You cannot use "real-time" and "repeated table scans of a big logging table" in the same sentence.
Actually, without any unique identification attribute for each row you cannot even determine what's new (proof: say, you had a table full of thousands of booleans. How would you determine which ones are new? They cannot be told apart! You cannot find out.). There must be something you can use, like a combination of DateTime, IP or so. Or, you can add an IDENTITY column which is likely to be transparent to the software you use.
Probably, the software you use will tolerate you creating an index on some ID or DateTime column as this is transparent to the software. It might create more load, so be sure to test it (my guess: you'll be fine).

SQL Server custom record sort in table, allowing to delete records

SQL Server table with custom sort has columns: ID (PK, auto-increment), OrderNumber, Col1, Col2..
By default an insert trigger copies value from ID to OrderNumber as suggested here.
Using some visual interface, user can sort records by incrementing or decrementing OrderNumber values.
However, how to deal with records being deleted in the meantime?
Example:
Say you add records with PK ID: 1,2,3,4,5 - OrderNumber receives same values. Then you delete records with ID=4,ID=5. Next record will have ID=6 and OrderNumber will receive the same value. Having a span of 2 missing OrderNumbers would force user to decrement record with ID=6 like 3 times to change it's order (i.e. 3x button pressed).
Alternatively, one could insert select count(*) from table into OrderNumber, but it would allow to have several similar values in table, when some old rows are deleted.
If one doesn't delete records, but only "deactivate" them, they're still included in sort order, just invisible for user. At the moment, solution in Java is needed, but I think the issue is language-independent.
Is there a better approach at this?

I would simply modify the script that switches the OrderNumber values so it does it correctly without relying on their being without gaps.
I don't know what arguments your script accepts and how it uses them, but the one that I've eventually come up with accept the ID of the item to move and the number of positions to move by (a negative value would mean "toward the lower OrderNumber values", and a positive one would imply the opposite direction).
The idea is as follows:
Look up the specified item's OrderNumber.
Rank all the items starting from OrderNumber in the direction determined by the second argument. The specified item thus receives the ranking of 1.
Pick the items with rankings from 1 to the one that is the absolute value of the second argument plus one. (I.e. the last item is the one where the specified item is being moved to.)
Join the resulting set with itself so that every row is joined with the next one and the last row is joined with the first one and thus use one set of rows to update the other.
This is the query that implements the above, with comments explaining some tricky parts:
Edited: fixed an issue with incorrect reordering
/* these are the arguments of the query */
DECLARE #ID int, #JumpBy int;
SET #ID = ...
SET #JumpBy = ...
DECLARE #OrderNumber int;
/* Step #1: Get OrderNumber of the specified item */
SELECT #OrderNumber = OrderNumber FROM atable WHERE ID = #ID;
WITH ranked AS (
/* Step #2: rank rows including the specified item and those that are sorted
either before or after it (depending on the value of #JumpBy */
SELECT
*,
rnk = ROW_NUMBER() OVER (
ORDER BY OrderNumber * SIGN(#JumpBy)
/* this little "* SIGN(#JumpBy)" trick ensures that the
top-ranked item will always be the one specified by #ID:
* if we are selecting rows where OrderNumber >= #OrderNumber,
the order will be by OrderNumber and #OrderNumber will be
the smallest item (thus #1);
* if we are selecting rows where OrderNumber <= #OrderNumber,
the order becomes by -OrderNumber and #OrderNumber again
becomes the top ranked item, because its negative counterpart,
-#OrderNumber, will again be the smallest one
*/
)
FROM atable
WHERE OrderNumber >= #OrderNumber AND #JumpBy > 0
OR OrderNumber <= #OrderNumber AND #JumpBy < 0
),
affected AS (
/* Step #3: select only rows that need be affected */
SELECT *
FROM ranked
WHERE rnk BETWEEN 1 AND ABS(#JumpBy) + 1
)
/* Step #4: self-join and update */
UPDATE old
SET OrderNumber = new.OrderNumber
FROM affected old
INNER JOIN affected new ON old.rnk = new.rnk % (ABS(#JumpBy) + 1) + 1
/* if old.rnk = 1, the corresponding new.rnk is N,
because 1 = N MOD N + 1 (N is ABS(#JumpBy)+1),
for old.rnk = 2 the matching new.rnk is 1: 2 = 1 MOD N + 1,
for 3, it's 2 etc.
this condition could alternatively be written like this:
new.rnk = (old.rnk + ABS(#JumpBy) - 1) % (ABS(#JumpBy) + 1) + 1
*/
Note: this assumes SQL Server 2005 or later version.
One known issue with this solution is that it will not "move" rows correctly if the specified ID cannot be moved exactly by the specified number of positions (for instance, if you want to move the topmost row up by any number of positions, or the second row by two or more positions etc.).

Ok - if I'm not mistaken, you want to defragment your OrderNumber.
What if you use ROW_NUMBER() for this ?
Example:
;WITH calc_cte AS (
SELECT
ID
, OrderNumber
, RowNo = ROW_NUMBER() OVER (ORDER BY ID)
FROM
dbo.Order
)
UPDATE
c
SET
OrderNumber = c.RowNo
FROM
calc_cte c
WHERE EXISTS (SELECT * FROM inserted i WHERE c.ID = i.ID)

Didn't want to reply my own question, but I believe I have found a solution.
Insert query:
INSERT INTO table (OrderNumber, col1, col2)
VALUES ((select count(*)+1 from table),val1,val2)
Delete trigger:
CREATE TRIGGER Cleanup_After_Delete ON table
AFTER DELETE AS
BEGIN
WITH rowtable AS (SELECT [ID], OrderNumber, rownum = ROW_NUMBER()
OVER (ORDER BY OrderNumber ASC) FROM table)
UPDATE rt SET OrderNumber = rt.rownum FROM rowtable rt
WHERE OrderNumber >= (SELECT OrderNumber FROM deleted)
END
The trigger fires up after every delete and corrects all OrderNumbers above the deleted one (no gaps). This means that I can simply change the order of 2 records by switching their OrderNumbers.
This is a working solution for my problem, however this one is also very good one, perhaps more useful for others.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Efficient SQL to find specific ID used with pagination - sql

Related

Select records from a specific key onwards

SQL how to show the page of results that includes a specific record

SQL Query Create isDuplicate Column with IDs

Process SQL Table with no Unique Column

SQL Server custom record sort in table, allowing to delete records

Categories

Resources