Performance when using distinct and row_number pagination - sql

I have a SQL something like this:
SELECT A,B,C,FUN(A) AS A FROM someTable
The problem is FUN() is a function which quite slow, so if there are a lot of records in someTable, there will be a big performance issue.
If we using a pagination, we can resolve this issue, we do the pagination like this:
SELECT * FROM(
SELECT A,B,C,FUN(A), Row_number()OVER( ORDER BY B ASC) AS rownum FROM someTable
)T WHERE T.rownum >=1 AND T.rownum<20
In this script, the FUN() will only execute 20 times so the performance is OK.
But we need use alias to order by, so we can't write rownum inline, have to move to sub query or CTE, we chose CTE and it looks like this:
;WITH CTE AS (
SELECT A,B AS alias,C,FUN(A) FROM someTable
)
SELECT * FROM(
SELECT *,Row_number()OVER( ORDER BY alias ASC) AS rownum FROM CTE
)T WHERE T.rownum >=1 AND T.rownum<20
So far we are going fine, we get pagination to solve performance issue, we solve the alias order problem, but somehow we need to add DISTINCT to the query:
;WITH CTE AS (
SELECT DISTINCT A,B AS alias,C,FUN(A) FROM someTable
)
SELECT * FROM(
SELECT *,Row_number()OVER( ORDER BY alias ASC) AS rownum FROM CTE
)T WHERE T.rownum >=1 AND T.rownum<20
After this, the optimize of this SQL seems gone, the FUN() will execute many times as much as the records count in someTable, we get the performance issue again.
Basically we are blocked at here, is there any suggestions?

The problem is that in order to get distinct values, the database engine must run the fun(a) function on all the records being selected.
If you do the fun(a) only in the final select, the distinct should not effect it, so it should run only on the final 20 records.
I've changed the derived table you've used to another cte (but that's a personal preference - seems to me more tidy not to mix derived tables and ctes):
;WITH CTE1 AS (
SELECT DISTINCT A,B AS alias,C
FROM someTable
),
CTE2 AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY alias) As RowNum
FROM CTE1
)
SELECT *, FUN(A)
FROM CTE2
WHERE RowNum >= 1
AND RowNum < 20
Please note that since the fun function is not deterministic you might get results that are different from your original query - so before adapting this solution compare the results first.

Related

Printing same rows twice in SQL Server

I'm using this query
SELECT
[SCORECARD_NAME], [SCORE_NAME],
[TOTAL_ROWS], [VALID_PERCENTAGE], [INVALID_ROWS]
FROM
{table_name}
and I'm getting the result twice. I'm not getting why its happening like that
what is the solution for this?
You can use SELECT DISTINCT, or here is an option using a CTE with a window function:
;WITH CTE AS (
SELECT [SCORECARD_NAME], [SCORE_NAME], [TOTAL_ROWS], [VALID_PERCENTAGE],[INVALID_ROWS],
ROW_NUMBER() OVER (PARTITION BY [SCORECARD_NAME]
ORDER BY [SCORECARD_NAME]) AS DuplicateCount
FROM Table_Name
)
SELECT * FROM CTE WHERE DuplicateCount = 1;
In case you want rows with same entries to be displayed once then use distinct:
SELECT DISTINCT
[SCORECARD_NAME], [SCORE_NAME], [TOTAL_ROWS],
[VALID_PERCENTAGE], [INVALID_ROWS]
FROM
{table_name};
Let me know in case of any queries.

SQL Server view inside CTE causing poor performance

When I use a view inside of a CTE, each subquery that references the CTE seems to re-query the view. There are large chunks of the execution plan that are repeated for each subquery. This isn't the case when selecting from a table. Is this expected? Is there any way to get around it?
WITH cte AS (
SELECT v.id
FROM test_view AS v
)
SELECT TOP 25 *,
(SELECT COUNT(*) FROM cte) AS subquery
FROM cte
I'm working with SQL Server 2005
EDIT:
I'm trying to get data from a view in pages with the query below. I need the total number of rows in the view, the number of rows that match a search, and a subset of the matching rows. This works well when selecting from tables, but using a view causes repeated execution of the CTE. I attempted to force intermediate materialization a variety of different ways from the link in Martin's answer, but didn't have any luck.
WITH tableRecords AS (
SELECT *
FROM example_view
),
filteredTableRecords AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY id ASC) AS tableRecordNumber
FROM tableRecords
WHERE 1 = 1
)
SELECT *,
(SELECT COUNT(*) FROM tableRecords) AS totalRecords,
(SELECT COUNT(*) FROM filteredTableRecords) AS totalDisplayRecords
FROM filteredTableRecords
WHERE tableRecordNumber BETWEEN 1 AND 25
ORDER BY tableRecordNumber ASC
Yes it is largely expected.
See Provide a hint to force intermediate materialization of CTEs or derived tables
For the query in your question you can do this though
WITH CTE AS
(
SELECT v.id,
count(*) OVER () AS Cnt
FROM test_view AS v
)
SELECT TOP 25 *
FROM CTE
ORDER BY v.id
I suggest you re-write your query as below
There are a few improvements I did over your query
removal of where 1=1, it is unnecessary as it is always true.
the sub-query in select clause will be called everytime you execute the sql script, so instead of that you can actually use the cross apply to increase the performance.
;WITH tableRecords AS(
SELECT *
FROM example_view
),
filteredTableRecords AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY id ASC) AS tableRecordNumber
FROM tableRecords
),TotalNumber
(
SELECT (SELECT COUNT(1) FROM tableRecords) AS totalRecords,
(SELECT COUNT(1) FROM filteredTableRecords) AS totalDisplayRecords
)
SELECT *
FROM filteredTableRecords F
CROSS APPLY TotalNumber AS T
WHERE tableRecordNumber BETWEEN 1 AND 25
ORDER BY tableRecordNumber ASC

Return rows between a specific range, with one select statement

I'm looking to some expresion like this (using SQL Server 2008)
SELECT TOP 10 columName FROM tableName
But instead of that I need the values between 10 and 20. And I wonder if there is a way of doing it using only one SELECT statement.
For example this is useless:
SELECT columName FROM
(SELECT ROW_NUMBER() OVER(ORDER BY someId) AS RowNum, * FROM tableName) AS alias
WHERE RowNum BETWEEN 10 AND 20
Because the select inside brackets is already returning all the results, and I'm looking to avoid that, due to performance.
Use SQL Server 2012 to fetch/skip!
SELECT SalesOrderID, SalesOrderDetailID, ProductID, OrderQty, UnitPrice, LineTotal
FROM AdventureWorks2012.Sales.SalesOrderDetail
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY;
There's nothing better than you're describing for older versions of sql server. Maybe use CTE, but unlikely to make a difference.
WITH NumberedMyTable AS
(
SELECT
Id,
Value,
ROW_NUMBER() OVER (ORDER BY Id) AS RowNumber
FROM
MyTable
)
SELECT
Id,
Value
FROM
NumberedMyTable
WHERE
RowNumber BETWEEN #From AND #To
or, you can remove top 10 rows and then get next 10 rows, but I double anyone would want to do that.
There is a trick with row_number that does not involve sorting all the rows.
Try this:
SELECT columName
FROM (SELECT ROW_NUMBER() OVER(ORDER BY (select NULL as noorder)) AS RowNum, *
FROM tableName
) as alias
WHERE RowNum BETWEEN 10 AND 20
You cannot use a constant in the order by. However, you can use an expression that evaluates to a constant. SQL Server recognizes this and just returns the rows as encountered, properly enumerated.
Why do you think SQL Server would evaluate the entire inner query? Assuming your sort column is indexed, it'll just read the first 20 values. If you're really nervous you could do this:
Select
Id
From (
Select Top 20 -- note top 20
Row_Number() Over(Order By Id) As RowNum,
Id
From
dbo.Test
Order By
Id
) As alias
Where
RowNum Between 10 And 20
Order By
Id
but I'm pretty sure the query plan is the same either way.
(Really) Fixed as per Aaron's comment.
http://sqlfiddle.com/#!3/db162/6
One more option
SELECT TOP(11) columName
FROM dbo.tableName
ORDER BY
CASE WHEN ROW_NUMBER() OVER (ORDER BY someId) BETWEEN 10 AND 20
THEN ROW_NUMBER() OVER (ORDER BY someId) ELSE NULL END DESC
You could create a temp table that is ordered the way you want like:
SELECT ROW_NUMBER() OVER(ORDER BY someId) AS RowNum, * FROM tableName
into ##tempTable
...
That way you have an ordered list of rows.
and can just query by row number the subsequent times instead of doing the inner query multiple times.

How do write the following sql fragment for SQL Server 2000

Can Someone please help to how to write the below code in SQL Server2000? Thanks
WITH cte
AS (SELECT *,
( Row_number() OVER(ORDER BY productid, transactionid,
statusdate
DESC)
)AS
rownum
FROM #table),
cte2
AS (SELECT cte.*,
( CASE
WHEN cte.status = Isnull((SELECT t.status
FROM cte t
WHERE t.rownum = ( cte.rownum + 1
)),
'')
THEN 1
ELSE 0
END )AS rownum2
FROM cte)
SELECT cte2.productid,
cte2.transactionid,
cte2.details,
cte2.status,
cte2.statusdate,
cte2.requestdate
FROM cte2
WHERE rownum2 = 0
There is no OVER() function in 2000, so you will need to re-address/move the initial cte select. You already have a table variable #table, so I would modify this to include the row numbers, by adding a identity column.
You could then use a subquery to extract what you need from cte2.
DigbySwift's answer mentions that there is a problem with ROW_NUMBER and you need to solve that first. You can use the idea in his answer, and you could also look at answers to similar questions such as these:
ROW_NUMBER Alternative for SQL Server 2000
Row_Number simulation in Sql server 2000
However there's also another problem: SQL Server 2000 doesn't support CTEs (WITH ... AS ...).
To solve this use subselects instead of CTEs by removing the WITH clause and moving the definition of the CTE to the place(s) where it is used.
For example this query using CTEs:
WITH T1 AS (SELECT a,b,c FROM ...)
SELECT * FROM T1
Becomes:
SELECT *
FROM (
SELECT a,b,c FROM ...
) AS T1
Hopefully that is enough to get you started.

MSSQL 2008 SP pagination and count number of total records

In my SP I have the following:
with Paging(RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging WHERE RowNo BETWEEN 1 and 50
SELECT COUNT(*) FROM Paging
The result is that I get the error: invalid object name 'Paging'.
Can I query again the Paging table? I don't want to include the count for all results as a new column ... I would prefer to return as another data set. Is that possible?
Thanks, Radu
After more research I fond another way of doing this:
with Paging(RowNo, ID, Name, TotalOccurrences) AS
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
select RowNo, ID, Name, TotalOccurrences, (select COUNT(*) from Paging) as TotalResults from Paging where RowNo between (#PageNumber - 1 )* #PageSize + 1 and #PageNumber * #PageSize;
I think that this has better performance than calling two times the query.
You can't do that because the CTE you are defining will only be available to the FIRST query that appears after it's been defined. So when you run the COUNT(*) query, the CTE is no longer available to reference. That's just a limitation of CTEs.
So to do the COUNT as a separate step, you'd need to not use the CTE and instead use the full query to COUNT on.
Or, you could wrap the CTE up in an inline table valued function and use that instead, to save repeating the main query, something like this:
CREATE FUNCTION dbo.ufnExample()
RETURNS TABLE
AS
RETURN
(
with Paging(RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging
)
SELECT * FROM dbo.ufnExample() x WHERE RowNo BETWEEN 1 AND 50
SELECT COUNT(*) FROM dbo.ufnExample() x
Please be aware that Radu D's solution's query plan shows double hits to those tables. It is doing two executions under the covers. However, this still may be the best way as I haven't found a truly scalable 1-query design.
A less scalable 1-query design is to dump a completed ordered list into a #tablevariable , SELECT ##ROWCOUNT to get the full count, and select from #tablevariable where row number between X and Y. This works well for <10000 rows, but with results in the millions of rows, populating that #tablevariable gets expensive.
A hybrid approach is to populate this temp/variable up to 10000 rows. If not all 10000 rows are filled up, you're set. If 10000 rows are filled up, you'll need to rerun the search to get the full count. This works well if most of your queries return well under 10000 rows. The 10000 limit is a rough approximation, you can play around with this threshold for your case.
Write "AS" after the CTE table name Paging as below:
with Paging AS (RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging WHERE RowNo BETWEEN 1 and 50
SELECT COUNT(*) FROM Paging