MSSQL 2008 SP pagination and count number of total records - sql

In my SP I have the following:
with Paging(RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging WHERE RowNo BETWEEN 1 and 50
SELECT COUNT(*) FROM Paging
The result is that I get the error: invalid object name 'Paging'.
Can I query again the Paging table? I don't want to include the count for all results as a new column ... I would prefer to return as another data set. Is that possible?
Thanks, Radu

After more research I fond another way of doing this:
with Paging(RowNo, ID, Name, TotalOccurrences) AS
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
select RowNo, ID, Name, TotalOccurrences, (select COUNT(*) from Paging) as TotalResults from Paging where RowNo between (#PageNumber - 1 )* #PageSize + 1 and #PageNumber * #PageSize;
I think that this has better performance than calling two times the query.

You can't do that because the CTE you are defining will only be available to the FIRST query that appears after it's been defined. So when you run the COUNT(*) query, the CTE is no longer available to reference. That's just a limitation of CTEs.
So to do the COUNT as a separate step, you'd need to not use the CTE and instead use the full query to COUNT on.
Or, you could wrap the CTE up in an inline table valued function and use that instead, to save repeating the main query, something like this:
CREATE FUNCTION dbo.ufnExample()
RETURNS TABLE
AS
RETURN
(
with Paging(RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging
)
SELECT * FROM dbo.ufnExample() x WHERE RowNo BETWEEN 1 AND 50
SELECT COUNT(*) FROM dbo.ufnExample() x

Please be aware that Radu D's solution's query plan shows double hits to those tables. It is doing two executions under the covers. However, this still may be the best way as I haven't found a truly scalable 1-query design.
A less scalable 1-query design is to dump a completed ordered list into a #tablevariable , SELECT ##ROWCOUNT to get the full count, and select from #tablevariable where row number between X and Y. This works well for <10000 rows, but with results in the millions of rows, populating that #tablevariable gets expensive.
A hybrid approach is to populate this temp/variable up to 10000 rows. If not all 10000 rows are filled up, you're set. If 10000 rows are filled up, you'll need to rerun the search to get the full count. This works well if most of your queries return well under 10000 rows. The 10000 limit is a rough approximation, you can play around with this threshold for your case.

Write "AS" after the CTE table name Paging as below:
with Paging AS (RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging WHERE RowNo BETWEEN 1 and 50
SELECT COUNT(*) FROM Paging

Related

Performance when using distinct and row_number pagination

I have a SQL something like this:
SELECT A,B,C,FUN(A) AS A FROM someTable
The problem is FUN() is a function which quite slow, so if there are a lot of records in someTable, there will be a big performance issue.
If we using a pagination, we can resolve this issue, we do the pagination like this:
SELECT * FROM(
SELECT A,B,C,FUN(A), Row_number()OVER( ORDER BY B ASC) AS rownum FROM someTable
)T WHERE T.rownum >=1 AND T.rownum<20
In this script, the FUN() will only execute 20 times so the performance is OK.
But we need use alias to order by, so we can't write rownum inline, have to move to sub query or CTE, we chose CTE and it looks like this:
;WITH CTE AS (
SELECT A,B AS alias,C,FUN(A) FROM someTable
)
SELECT * FROM(
SELECT *,Row_number()OVER( ORDER BY alias ASC) AS rownum FROM CTE
)T WHERE T.rownum >=1 AND T.rownum<20
So far we are going fine, we get pagination to solve performance issue, we solve the alias order problem, but somehow we need to add DISTINCT to the query:
;WITH CTE AS (
SELECT DISTINCT A,B AS alias,C,FUN(A) FROM someTable
)
SELECT * FROM(
SELECT *,Row_number()OVER( ORDER BY alias ASC) AS rownum FROM CTE
)T WHERE T.rownum >=1 AND T.rownum<20
After this, the optimize of this SQL seems gone, the FUN() will execute many times as much as the records count in someTable, we get the performance issue again.
Basically we are blocked at here, is there any suggestions?
The problem is that in order to get distinct values, the database engine must run the fun(a) function on all the records being selected.
If you do the fun(a) only in the final select, the distinct should not effect it, so it should run only on the final 20 records.
I've changed the derived table you've used to another cte (but that's a personal preference - seems to me more tidy not to mix derived tables and ctes):
;WITH CTE1 AS (
SELECT DISTINCT A,B AS alias,C
FROM someTable
),
CTE2 AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY alias) As RowNum
FROM CTE1
)
SELECT *, FUN(A)
FROM CTE2
WHERE RowNum >= 1
AND RowNum < 20
Please note that since the fun function is not deterministic you might get results that are different from your original query - so before adapting this solution compare the results first.

SQL Server view inside CTE causing poor performance

When I use a view inside of a CTE, each subquery that references the CTE seems to re-query the view. There are large chunks of the execution plan that are repeated for each subquery. This isn't the case when selecting from a table. Is this expected? Is there any way to get around it?
WITH cte AS (
SELECT v.id
FROM test_view AS v
)
SELECT TOP 25 *,
(SELECT COUNT(*) FROM cte) AS subquery
FROM cte
I'm working with SQL Server 2005
EDIT:
I'm trying to get data from a view in pages with the query below. I need the total number of rows in the view, the number of rows that match a search, and a subset of the matching rows. This works well when selecting from tables, but using a view causes repeated execution of the CTE. I attempted to force intermediate materialization a variety of different ways from the link in Martin's answer, but didn't have any luck.
WITH tableRecords AS (
SELECT *
FROM example_view
),
filteredTableRecords AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY id ASC) AS tableRecordNumber
FROM tableRecords
WHERE 1 = 1
)
SELECT *,
(SELECT COUNT(*) FROM tableRecords) AS totalRecords,
(SELECT COUNT(*) FROM filteredTableRecords) AS totalDisplayRecords
FROM filteredTableRecords
WHERE tableRecordNumber BETWEEN 1 AND 25
ORDER BY tableRecordNumber ASC
Yes it is largely expected.
See Provide a hint to force intermediate materialization of CTEs or derived tables
For the query in your question you can do this though
WITH CTE AS
(
SELECT v.id,
count(*) OVER () AS Cnt
FROM test_view AS v
)
SELECT TOP 25 *
FROM CTE
ORDER BY v.id
I suggest you re-write your query as below
There are a few improvements I did over your query
removal of where 1=1, it is unnecessary as it is always true.
the sub-query in select clause will be called everytime you execute the sql script, so instead of that you can actually use the cross apply to increase the performance.
;WITH tableRecords AS(
SELECT *
FROM example_view
),
filteredTableRecords AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY id ASC) AS tableRecordNumber
FROM tableRecords
),TotalNumber
(
SELECT (SELECT COUNT(1) FROM tableRecords) AS totalRecords,
(SELECT COUNT(1) FROM filteredTableRecords) AS totalDisplayRecords
)
SELECT *
FROM filteredTableRecords F
CROSS APPLY TotalNumber AS T
WHERE tableRecordNumber BETWEEN 1 AND 25
ORDER BY tableRecordNumber ASC

Return rows between a specific range, with one select statement

I'm looking to some expresion like this (using SQL Server 2008)
SELECT TOP 10 columName FROM tableName
But instead of that I need the values between 10 and 20. And I wonder if there is a way of doing it using only one SELECT statement.
For example this is useless:
SELECT columName FROM
(SELECT ROW_NUMBER() OVER(ORDER BY someId) AS RowNum, * FROM tableName) AS alias
WHERE RowNum BETWEEN 10 AND 20
Because the select inside brackets is already returning all the results, and I'm looking to avoid that, due to performance.
Use SQL Server 2012 to fetch/skip!
SELECT SalesOrderID, SalesOrderDetailID, ProductID, OrderQty, UnitPrice, LineTotal
FROM AdventureWorks2012.Sales.SalesOrderDetail
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY;
There's nothing better than you're describing for older versions of sql server. Maybe use CTE, but unlikely to make a difference.
WITH NumberedMyTable AS
(
SELECT
Id,
Value,
ROW_NUMBER() OVER (ORDER BY Id) AS RowNumber
FROM
MyTable
)
SELECT
Id,
Value
FROM
NumberedMyTable
WHERE
RowNumber BETWEEN #From AND #To
or, you can remove top 10 rows and then get next 10 rows, but I double anyone would want to do that.
There is a trick with row_number that does not involve sorting all the rows.
Try this:
SELECT columName
FROM (SELECT ROW_NUMBER() OVER(ORDER BY (select NULL as noorder)) AS RowNum, *
FROM tableName
) as alias
WHERE RowNum BETWEEN 10 AND 20
You cannot use a constant in the order by. However, you can use an expression that evaluates to a constant. SQL Server recognizes this and just returns the rows as encountered, properly enumerated.
Why do you think SQL Server would evaluate the entire inner query? Assuming your sort column is indexed, it'll just read the first 20 values. If you're really nervous you could do this:
Select
Id
From (
Select Top 20 -- note top 20
Row_Number() Over(Order By Id) As RowNum,
Id
From
dbo.Test
Order By
Id
) As alias
Where
RowNum Between 10 And 20
Order By
Id
but I'm pretty sure the query plan is the same either way.
(Really) Fixed as per Aaron's comment.
http://sqlfiddle.com/#!3/db162/6
One more option
SELECT TOP(11) columName
FROM dbo.tableName
ORDER BY
CASE WHEN ROW_NUMBER() OVER (ORDER BY someId) BETWEEN 10 AND 20
THEN ROW_NUMBER() OVER (ORDER BY someId) ELSE NULL END DESC
You could create a temp table that is ordered the way you want like:
SELECT ROW_NUMBER() OVER(ORDER BY someId) AS RowNum, * FROM tableName
into ##tempTable
...
That way you have an ordered list of rows.
and can just query by row number the subsequent times instead of doing the inner query multiple times.

Compare SQL groups against eachother

How can one filter a grouped resultset for only those groups that meet some criterion compared against the other groups? For example, only those groups that have the maximum number of constituent records?
I had thought that a subquery as follows should do the trick:
SELECT * FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t HAVING Records = MAX(Records);
However the addition of the final HAVING clause results in an empty recordset... what's going on?
In MySQL (Which I assume you are using since you have posted SELECT *, COUNT(*) FROM T GROUP BY X Which would fail in all RDBMS that I know of). You can use:
SELECT T.*
FROM T
INNER JOIN
( SELECT X, COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
) T2
ON T2.X = T.X
This has been tested in MySQL and removes the implicit grouping/aggregation.
If you can use windowed functions and one of TOP/LIMIT with Ties or Common Table expressions it becomes even shorter:
Windowed function + CTE: (MS SQL-Server & PostgreSQL Tested)
WITH CTE AS
( SELECT *, COUNT(*) OVER(PARTITION BY X) AS Records
FROM T
)
SELECT *
FROM CTE
WHERE Records = (SELECT MAX(Records) FROM CTE)
Windowed Function with TOP (MS SQL-Server Tested)
SELECT TOP 1 WITH TIES *
FROM ( SELECT *, COUNT(*) OVER(PARTITION BY X) [Records]
FROM T
)
ORDER BY Records DESC
Lastly, I have never used oracle so apolgies for not adding a solution that works on oracle...
EDIT
My Solution for MySQL did not take into account ties, and my suggestion for a solution to this kind of steps on the toes of what you have said you want to avoid (duplicate subqueries) so I am not sure I can help after all, however just in case it is preferable here is a version that will work as required on your fiddle:
SELECT T.*
FROM T
INNER JOIN
( SELECT X
FROM T
GROUP BY X
HAVING COUNT(*) =
( SELECT COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
)
) T2
ON T2.X = T.X
For the exact question you give, one way to look at it is that you want the group of records where there is no other group that has more records. So if you say
SELECT taxid, COUNT(*) as howMany
GROUP by taxid
You get all counties and their counts
Then you can treat that expressions as a table by making it a subquery, and give it an alias. Below I assign two "copies" of the query the names X and Y and ask for taxids that don't have any more in one table. If there are two with the same number I'd get two or more. Different databases have proprietary syntax, notably TOP and LIMIT, that make this kind of query simpler, easier to understand.
SELECT taxid FROM
(select taxid, count(*) as HowMany from flats
GROUP by taxid) as X
WHERE NOT EXISTS
(
SELECT * from
(
SELECT taxid, count(*) as HowMany FROM
flats
GROUP by taxid
) AS Y
WHERE Y.howmany > X.howmany
)
Try this:
SELECT * FROM (
SELECT *, MAX(Records) as max_records FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t
) WHERE Records = max_records
I'm sorry that I can't test the validity of this query right now.

Problem using ROW_NUMBER() to get records randomly (SQL Server 2005)

I want to get 1000 records from a table randomly, so I use:
SELECT top 1000
mycol1
, mycol2
, ROW_NUMBER() OVER (ORDER BY NEWID()) rn
FROM mytable
However, I don't want to see rn in my resultset, so I do:
SELECT mycol1
, mycol2
FROM (
SELECT top 1000
mycol1
, mycol2
, ROW_NUMBER() OVER (ORDER BY NEWID()) rn
FROM mytable
) a
When I do this, the results do not come randomly anymore. They come as if I just said top 10000 without randomization using row_number().
When I change the query to
SELECT mycol1
, mycol2
, rn
FROM (
SELECT top 1000
mycol1
, mycol2
, ROW_NUMBER() OVER (ORDER BY NEWID()) rn
FROM mytable
) a
they are random again.
I guess sql server does some kind of optimization, saying "hey, this guy doesn't need the column rn anyway, so just ignore it". But this results to an unexpected behavior in this case. Is there any way to avoid this?
PS: I use the ROW_NUMBER() trick because mytable has 10 mio. rows and
SELECT top 10000 *
FROM mytable
ORDER BY NEWID()
runs forever, whereas with ROW_NUMBER() it takes only up to 30 secs.
You could also try using the rn field in some petty where clause like
WHERE rn > 0 in your outer query which would maybe force the compiler to bring the RN field through.
Also I think your overall query is going to be an issue if you want to randomly sample your entire millions of records. This will only grab the "first off disk" block of records which while not guaranteed to be the same will more often than not be the same 10000.
I would suggest creating a set of 10,000 random numbers between MIN(PrimaryKey) and the MAX(PrimaryKey) and then doing a WHERE PrimaryKey IN (...) or similar
Add something like Where rn Is Not Null to the outer query so rn it is included in query plan and not optimised out
I was struggling with this same problem. I solved it with CROSS APPLY and TOP. Keeping in mind that CROSS APPLY pulls my outer table into scope for the derived table, I knew there had to be a way to do this.
The following code results in 3(*) random related products being added based on the manufacturer.
INSERT INTO ProductGroup (
ParentId,
ChildId
)
SELECT DISTINCT
P.ProductId,
CandidateInner.ChildId
FROM ProductRelated PR
JOIN Product P
ON PR.ChildId = P.ProductId
CROSS APPLY
(
SELECT DISTINCT TOP 3
NewId() AS RandId,
Product.ManufacturerId,
ProductRelated.ChildId
FROM ProductRelated
JOIN Product
ON Product.ProductId = ProductRelated.ChildId
WHERE ManufacturerId IS NOT NULL
AND Product.ManufacturerId = P.ManufacturerId
ORDER BY NewId()
) CandidateInner
LEFT JOIN (
SELECT DISTINCT TOP 100 PERCENT
ParentId,
COUNT(DISTINCT ChildId) AS Ct
FROM ProductGroup
GROUP BY ParentId
HAVING COUNT(DISTINCT ChildId) >= 3
) AlreadyGrouped
ON P.ProductId = AlreadyGrouped.ParentId
WHERE P.ProductId <> CandidateInner.ChildId
AND AlreadyGrouped.ParentId IS NULL
ORDER BY P.ProductId
*Note that this will insert fewer than 3 in the following 2 cases:
1) Where there are < 3 products related by manufacturer
2) (Problematic) Where the random top 3 returns the same product to itself.
(1) above is unavoidable.
The way I handled (2) above was to run this twice then delete duplicates. This is still not 100%, but statistically, it's more than sufficient for my requirement. This is in a nightly-run script, but I still like the speediness of having the <> outside of the CROSS APPLY - anything pulling that check in scope results in scans of the derived tables resulting from the manufacturer join, even though pulling it inside will mean that (2) is no longer an issue, but it's painfully slow vs. instant with proper indexes.