Pagination in two tables in SQL Server - sql

I have two tables,
Orders - this is small, typically up to 50 thousands records
OrdersArchive - this one is normal, about 80 millions records
This situation might happens:
the order might have on of those values for status:
'created'
'processing'
'finished'
The finished orders from Orders are periodically moved to OrdersArchive.
In other words, Orders might contain orders with status created, processing or finished. OrdersArchive contains only orders with a status of finished.
The result has to be sorted in this order 'created', 'processing', 'finished'
I need a query in this two tables which supports pagination.
What is the best way to do it? (so fast as possible)
A pagination might be any type
I mean like:
the classical pagination with PageNumber and CountOfRowsPerPage.
'lazy' pagination with count of orders after the specific Order.

I would use the union SQL operator for this. See the w3schools page for details.
With union you can either do union or union all. The first will check for duplicates while the second just combines the results. It sounds like you shouldn't have duplicates in these two tables so for performance you don't need to do the distinct search.
You also need to make sure that both queries have the same number of columns with similar types.
e.g.
select orderno, status from Orders
union all
select orderno, status from OrdersArchive
order by status, orderno
Pagination
That query gives you the combined resultset for both tables. Now to add pagination I would use a CTE with row numbers like this:
with x as (
select orderno as num, status as stat from Orders
union all
select archiveorderno as num, archivestatus as stat from OrdersArchive
) select row_number() over(order by stat, num) as rownum, num, stat from x
where rownum between 1 and 20
Alternative
If you find using union is too slow then you could look at changing the way your search works. If you always sort the same way and it's always records from Orders followed by records from OrdersArchive then you could query the tables separately. Start by paging through Orders and then when you run out of records continue paging through OrdersArchive. This would be much faster than the union but you would have to keep the query simple and always sort on status. The union allows much more complex searches.

Using OFFSET and FETCH NEXT in SQL Server can provide a paging solution. Rough sample code is:
DECLARE #PageNumber INT = 2
DECLARE #PageSize INT = 100000;
SELECT [ID]
FROM [Table]
ORDER BY [ID]
OFFSET #PageSize * (#PageNumber - 1) ROWS
FETCH NEXT #PageSize ROWS ONLY
Obviously place in your own tables, filters, orders and probably put this in a stored procedure with the PageNumber and PageSize being input params

Related

Fetch No oF Rows that can be returned by select query

I'm trying to fetch data and showing in a table with pagination. so I use limit and offset for that but I also need to show no of rows that can be fetched from that query. Is there any way to get that.
I tried
resultset.last() and getRow()
select count(*) from(query) myNewTable;
These two cases i'm getting correct answer but is it correct way to do this. Performance is a concern
We can get the limited records using below code,
First, we need to set how many records we want like below,
var limit = 10;
After that sent this limit to the below statement
WITH
Temp AS(
SELECT
ROW_NUMBER() OVER( primayKey DESC ) AS RowNumber,
*
FROM
myNewTable
),
Temp2 AS(
SELECT COUNT(*) AS TotalCount FROM Temp
)
SELECT TOP limit * FROM Temp, Temp2 WHERE RowNumber > :offset order by RowNumber
This is run in both MSSQL and MySQL
There is no easy way of doing this.
1. As you found out, it usually boils down to executing 2 queries:
Executing SELECT with limit and offset in order to fetch the data that you need.
Executing a COUNT(*) in order to count the total number of pages.
This approach might work for tables that don't have a lot of rows, or when you filter the data (int the COUNT and SELECT queries) on a column that is indexed.
2. If your table is large, but the data that you need to show represents smaller percentage of the data from the table and the data shares a common trait (for example, the data in all of your pages is created on a single day) you can use partitioning. Executing COUNT and SELECT on a single partition will be way more faster than executing them on the whole table.
3. You can create another table which will store the value of the COUNT query.
For example, lets say that your big_table table looks like this:
id | user_id | timestamp_column | text_column | another_text_column
Now, your SELECT query looks like this:
SELECT * FROM big_table WHERE user_id = 4 ORDER BY timestamp_column LIMIT 20 OFFSET 20;
And your count query:
SELECT COUNT(*) FROM table WHERE user_id = 4;
You could create a count_table that will have the following format:
user_id | count
Once you fill this table with the current data in the system, you will create a trigger which will update this table on every insert or update of the big_table.
This way, the count query will be really fast, because it will be executed on the count_table, for example:
SELECT count FROM count_table WHERE user_id = 4
The drawback of this approach is that the insert in the big_table will be slower, since the trigger will fire and update the count_table on every insert.
This are the approaches that you can try but in the end it all depends on the size and type of your data.

Optional parameters in SQL query

I am new to SQL and I am kind of lost. I have a table that contains products, various fields like productname, category etc.
I want to have a query where I can say something like: select all products in some category that have a specific word in their productname. The complicating factor is that I only want to return a specific range of that subset. So I also want to say return me the 100 to 120 products that fall in that specification.
I googled and found this query:
WITH OrderedRecords AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY PRODUCTNUMMER) AS "RowNumber",
FROM (
SELECT *
FROM SHOP.dbo.PRODUCT
WHERE CATEGORY = 'ARDUINO'
and PRODUCTNAME LIKE '%yellow%'
)
)
SELECT * FROM OrderedRecords WHERE RowNumber BETWEEN 100 and 120
Go
The query works to an extent, however it assigns the row number before filtering so I won't get enough records and I don't know how I can handle it if there are no parameters. Ideally I want to be able to not give a category and search word and it will just list all products.
I have no idea how to achieve this though and any help is appreciated!
Building on what esiprogrammer showed in his answer on how to return only rows in a certain range using paging.
Your second question was:
Ideally I want to be able to not give a category and search word and it will just list all products.
You can either have two queries/stored procedures, one for the case where you do lookup with specific parameters, another for lookup without parameters.
Or, if you insist on keeping one query/stored procedure for all cases, there are two options:
Build a Dynamic SQL statement that only has the filters that are present; execute it using EXECUTE (#sql) or EXECUTE sp_executesql #sql
Build a Catch-All Query
Example for option 2:
-- if no category is given, it will be NULL
DECLARE #search_category VARCHAR(128);
-- if no name is given, it will be NULL
DECLARE #search_name VARCHAR(128);
SELECT *
FROM SHOP.dbo.PRODUCT
WHERE (#search_category IS NULL OR CATEGORY=#search_category) AND
(#search_name IS NULL OR PRODUCTNAAM LIKE '%'+#search_name+'%')
ORDER BY PRODUCTNUMMER
OFFSET 100 ROWS
FETCH NEXT 20 ROWS ONLY
OPTION(RECOMPILE); -- generate a new plan on each execution that is optimized for that execution’s set of parameters
If you just need to to paginate your query and return a specific range of results, you can simply use OFFSET FETCH Clause.
That way there is no need to filter result items by RowNumber. I think this solution is easier:
SELECT *
FROM SHOP.dbo.PRODUCT
WHERE CATEGORY = 'ARDUINO' AND PRODUCTNAAM LIKE '%yellow%'
ORDER BY PRODUCTNUMMER
OFFSET 100 ROWS -- start row
FETCH NEXT 20 ROWS ONLY -- page size
Find out more Pagination with OFFSET / FETCH
What do you mean it assigns the rownumber before filtering? Category and ProductName are part of the sub query... So if the product table has 10k records and only 1k meet your criteria the results from the CTE will be 1k, so RowNumber BETWEEN 100 and 120 works. Test it out, remove your where clauses from both select statement and you'll get rownumber for all of products table. Then add back in the category and productname filter and your RowNumber is for your filter ordered by ProductNumber, so when you then add back in Between 100 and 120, this is the right solution based on what you described.
WITH OrderedRecords AS
(
SELECT ROW_NUMBER() OVER (ORDER BY PRODUCTNUMMER) AS "RowNumber"
, *
FROM SHOP.dbo.PRODUCT
WHERE CATEGORY = 'ARDUINO'
and PRODUCTNAAM LIKE '%yellow%'
)
)
SELECT *
FROM OrderedRecords
WHERE RowNumber
BETWEEN 100 and 120
Go

Pagination in SQL - Performance issue

Am trying to use pagination and i got the perfect link in SO
https://stackoverflow.com/a/109290/1481690
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS RowNum, *
FROM Orders
WHERE OrderDate >= '1980-01-01'
) AS RowConstrainedResult
WHERE RowNum >= 1
AND RowNum < 20
ORDER BY RowNum
Exact same query am trying to use with additional join of few tables in my inner Query.
Am getting few performance issues in following scenarios
WHERE RowNum >= 1
AND RowNum < 20 ==>executes faster approx 2 sec
WHERE RowNum >= 1000
AND RowNum < 1010 ==> more time approx 10 sec
WHERE RowNum >= 30000
AND RowNum < 30010 ==> more time approx 17 sec
Everytime i select 10 rows but huge time difference. Any idea or suggestions ?
I chose this approach as am binding columns dynamically and forming Query. Is there any other better way i can organize the Pagination Query in SQl Server 2008.
Is there a way i can improve the performance of the query ?
Thanks
I always check how much data I am accessing in query and try to eliminate un necessary columns as well as rows.
Well these are just obvious points you might have already check yet just wanted to pointed out in case you haven’t already.
In your query the slow performance might be because you doing “Select *”. Selecting all columns from table does not allow to come with good Execution plan.
Check if you need only selected columns and make sure you have correct covering index on table Orders.
Because explicit SKIPP or OFFSET function is not available in SQL 2008 version we need to create one and that we can create by INNER JOIN.
In one query we will first generate ID with OrderDate and nothing else will be in that query.
We do the same in second query but here we also select some other interested columns from table ORDER or ALL if you need ALL column.
Then we JOIN this to query results by ID and OrderDate and ADD SKIPP rows filter for first query where data set is at its minimal size what is required.
Try this code.
SELECT q2.*
FROM
(
SELECT ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS RowNum, OrderDate
FROM Orders
WHERE OrderDate >= '1980-01-01'
)q1
INNER JOIN
(
SELECT ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS RowNum, *
FROM Orders
WHERE OrderDate >= '1980-01-01'
)q2
ON q1.RowNum=q2.RowNum AND q1.OrderDate=q2.OrderDate AND q1.rownum BETWEEN 30000 AND 30020
To give you the estimate, i tried this with following test data and no matter what window you query the results are back in less than 2
seconds, and note that the table is HEAP (no index) Table has total 2M
rows. test select is querying 10 rows from 50,000 to 50,010
The below Insert took around 8 minutes.
IF object_id('TestSelect','u') IS NOT NULL
DROP TABLE TestSelect
GO
CREATE TABLE TestSelect
(
OrderDate DATETIME2(2)
)
GO
DECLARE #i bigint=1, #dt DATETIME2(2)='01/01/1700'
WHILE #I<=2000000
BEGIN
IF #i%15 = 0
SELECT #DT = DATEADD(DAY,1,#dt)
INSERT INTO dbo.TestSelect( OrderDate )
SELECT #dt
SELECT #i=#i+1
END
Selecting the window 50,000 to 50,010 took less than 3 seconds.
Selecting the last single row 2,000,000 to 2,000,000 also took 3 seconds.
SELECT q2.*
FROM
(
SELECT ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS RowNum
,OrderDate
FROM TestSelect
WHERE OrderDate >= '1700-01-01'
)q1
INNER JOIN
(
SELECT ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS RowNum
,*
FROM TestSelect
WHERE OrderDate >= '1700-01-01'
)q2
ON q1.RowNum=q2.RowNum
AND q1.OrderDate=q2.OrderDate
AND q1.RowNum BETWEEN 50000 AND 50010
ROW_NUMBER is crappy way of doing pagination as the cost of the operation grows extensively.
Instead you should use double ORDER BY clause.
Say you want to get records with ROW_NUMBER between 1200 and 1210. Instead of using ROW_NUMBER() OVER (...) and later binding the result in WHERE you should rather:
SELECT TOP(11) *
FROM (
SELECT TOP(1210) *
FROM [...]
ORDER BY something ASC
) subQuery
ORDER BY something DESC.
Note that this query will give the result in reverse order. That shouldn't - generally speaking - be an issue as it's easy to reverse the set in the UI so i.e. C#, especially as the resulting set should be relatively small.
The latter is generally a lot faster. Note that the latter solution will be greatly improved by CLUSTERING (CREATE CLUSTERED INDEX ...) on the column you use to sort the query by.
Hope that helps.
Even though you always selecting the same number of rows, performance degrades when you want to select rows at the end of your data window. To get first 10 rows, the engine fetches just 10 rows; to get next 10 it has to fetch 20, discard first 10 , and return 10. To get 30000 -- 30010, it has to read all 30010, skip first 30k, and return 10.
Some tricks to improve performance (not a full list, building OLAP completely skipped).
You mentioned joins; if that's possible join not inside the inner query, but result of it. You can also try to add some logic to ORDER BY OrderDate - ASC or DESC depends on what bucket you are retrieving . Say if you want to grab the "last" 10, ORDER BY ... DESC will work much faster. Needles to say, it has to be an index orderDate.
Incredibly, no other answer has mentioned the fastest way to do paging in all SQL Server versions, specifically with respect to the OP's question where offsets can be terribly slow for large page numbers as is benchmarked here.
There is an entirely different, much faster way to perform paging in SQL. This is often called the "seek method" as described in this blog post here.
SELECT TOP 10 *
FROM Orders
WHERE OrderDate >= '1980-01-01'
AND ((OrderDate > #previousOrderDate)
OR (OrderDate = #previousOrderDate AND OrderId > #previousOrderId))
ORDER BY OrderDate ASC, OrderId ASC
The #previousOrderDate and #previousOrderId values are the respective values of the last record from the previous page. This allows you to fetch the "next" page. If the ORDER BY direction is DESC, simply use < instead.
With the above method, you cannot immediately jump to page 4 without having first fetched the previous 40 records. But often, you do not want to jump that far anyway. Instead, you get a much faster query that might be able to fetch data in constant time, depending on your indexing. Plus, your pages remain "stable", no matter if the underlying data changes (e.g. on page 1, while you're on page 4).
This is the best way to implement paging when lazy loading more data in web applications, for instance.
Note, the "seek method" is also called keyset paging.
declare #pageOffset int
declare #pageSize int
-- set variables at some point
declare #startRow int
set #startRow = #pageOffset * #pageSize
declare #endRow int
set #endRow + #pageSize - 1
SELECT
o.*
FROM
(
SELECT
ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS RowNum
, OrderId
FROM
Orders
WHERE
OrderDate >= '1980-01-01'
) q1
INNER JOIN Orders o
on q1.OrderId = o.OrderId
where
q1.RowNum between #startRow and #endRow
order by
o.OrderDate
#peru, regarding if there is a better way and to build on the explanation provided by #a1ex07, try the following -
If the table has a unique identifier such as a numeric (order-id) or (order-date, order-index) upon which a compare (greater-than, less-than) operation can be performed then use that as an offset instead of the row-number.
For example if the table orders has 'order_id' as primary-key then -
To get the first ten results -
1.
select RowNum, order_id from
( select
ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS RowNum,
o.order_id
from orders o where o.order_id > 0 ;
)
tmp_qry where RowNum between 1 and 10 order by RowNum; // first 10
Assuming that the last order-id returned was 17 then,
To select the next 10,
2.
select RowNum, order_id from
( select
ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS RowNum,
o.order_id
from orders o where o.order_id > 17 ;
)
tmp_qry where RowNum between 1 and 10 order by RowNum; // next 10
Note that the row-num values have not been changed. Its the order-id value being compared that has been changed.
If such a key is not present then consider adding one !
Main drawback of your query is that it sorts whole table and calculates Row_Number for every query. You can make life easier for SQL Server by using less columns at sorting stage (for example as suggested by Anup Shah). However you still make it to read, sort and calculate row numbers for every query.
An alternative to calculations on the fly is reading values that were calculateed before.
Depending on volatility of your dataset and number of columns for sorting and filtering you can consider:
Add a rownumber column (or 2-3 columns ) and include it as a first columns in clustered index or create non-clustered inde).
Create views for most frequent combinations and then index those views. It is called indexed (materialised) views.
This will allow to read rownumber and performance will almost not depend on volume. Although maintaining of theese will, but less than sorting whole table for each query.
Note, that is this is a one off query and is run infrequently compared to all other queries, it is better to stick with query optimisation only: efforts to create extra columns/views might not pay off.

How do I make this SQL code faster?

Everyone want me to be more specific. I am attempting to do pagination with asp classic and ms-access database. This is the query I am using to get the items for page 2. there are 25 items per page and when the query returns larger data sets like around 500+ this is taking about 20+ seconds to execute and yes I have made sku indexed for faster queries. any suggestions.
SELECT TOP 25 *
FROM catalog
WHERE sku LIKE '1W%'
AND sku NOT IN (SELECT TOP 25 sku
FROM catalog
WHERE sku LIKE '1W%' ORDER BY price DESC ) ORDER BY price DESC
TOP without ORDER BY looks useless or at least strange. I guess youo meant to use this subquery:
( SELECT TOP 25 sku
FROM catalog
WHERE sku LIKE '1W%'
ORDER BY sku
)
Add an index on sku, if you haven't one.
A possible rewriting of the query, for Access:
SELECT *
FROM catalog
WHERE sku LIKE '1W%'
AND sku >= ( SELECT MAX(sku)
FROM ( SELECT TOP 26 sku
FROM catalog
WHERE sku LIKE '1W%'
ORDER BY sku
)
)
If you are using SQL-Server, you can use window functions for this type of query.
Some pointers:
You can simulate a SELECT BOTTOM (n) by using TOP (n) and reversing the ORDER BY
You can use nested SELECTs (creating a temporary table)
So, the final result of the "paging" query is (replace 50 with 75, 100, 125, ... for subsequent pages):
SELECT TOP 25 *
FROM
(
SELECT TOP 50 *
FROM catalog
WHERE sku LIKE '1W%'
ORDER BY price desc
)
TEMP
ORDER BY price asc;
Although you mentioned you've indexed your data, but, just to be completely clear, for optimal performance, you should ensure all your table is adequately indexed for your query. In this case, I would recommend, AT LEAST the two columns involved in the query:
CREATE INDEX IX_CATALOG ON CATALOG (SKU, PRICE);
What you're trying to do is select all the rows from a table, that meet a certain criteria, other than the first twenty-five. Unfortunately, different database management systems have their own syntax for doing this kind of thing.
There is a good survey of the different syntaxes on the Wikipedia page for the SQL select statement.
To give an example, in MySQL you can use the LIMIT clause of the SELECT statement to specify how many rows to return and the offset:
SELECT *
FROM catalog
WHERE sku LIKE '1W%'
ORDER by id
LIMIT 25, 9999999999
which returns rows 26 to 9999999999 of the query results.
Create an index on the column sku (if valid make it unique). How many rows are there in the table?
SELECT sku
FROM catalog
WHER sku LIKE '1W%
ORDER BY __SOME COLUMN __
LIMIT 10000 OFFSET 25
This is returning all* the rows in the database that are after row 25 (OFFSET 25).
*LIMIT 10000 constraints the resulting query to 10000 tuples (rows).
To ensure you are not getting a random OFFSET, you would need to order by some column.

total number of rows of a query

I have a very large query that is supposed to return only the top 10 results:
select top 10 ProductId from .....
The problem is that I also want the total number of results that match the criteria without that 'top 10', but in the same time it's considered unaceptable to return all rows (we are talking of roughly 100 thousand results.
Is there a way to get the total number of rows affected by the previous query, either in it or afterwords without running it again?
PS: please no temp tables of 100 000 rows :))
dump the count in a variable and return that
declare #count int
select #count = count(*) from ..... --same where clause as your query
--now you add that to your query..of course it will be the same for every row..
select top 10 ProductId, #count as TotalCount from .....
Assuming that you're using an ORDER BY clause already (to properly define which the "TOP 10" results are), then you could add a call of ROW_NUMBER also, with the opposite sort order, and pick the highest value returned.
E.g., the following:
select top 10 *,ROW_NUMBER() OVER (order by id desc) from sysobjects order by ID
Has a final column with values 2001, 2000, 1999, etc, descending. And the following:
select COUNT(*) from sysobjects
Confirms that there are 2001 rows in sysobjects.
I suppose you could hack it with a union select
select top 10 ... from ... where ...
union
select count(*) from ... where ...
For you to get away with this type of hack you will need to add fake columns to the count query so it returns the same amount of columns as the main query. For example:
select top 10 id, first_name from people
union
select count(*), '' as first_name from people
I don't recommend using this solution. Using two separate queries is how it should be done
Generally speaking no - reasoning is as follows:
If(!) the query planner can make use of TOP 10 to return only 10 rows then RDBMS will not even know the exact number of rows that satisfy the full criteria, it just gets the TOP 10.
Therefore, when you want to find out count of all rows satisfying the criteria you are not running it the second time, but the first time.
Having said that proper indexes might make both queries execute pretty fast.
Edit
MySQL has SQL_CALC_FOUND_ROWS which returns the number of rows that query would return if there was no LIMIT applied - googling for an equivalent in MS SQL points to analytical SQL and CTE variant, see this forum (even though not sure that either would qualify as running it only once, but feel free to check - and let us know).