How to select a portion of rows from a table? - sql

I need to select X rows from a table starting at position Y; a specific column is used for ordering the table.
This query almost works:
DECLARE #Index int
DECLARE #Count int
SELECT * FROM
(
SELECT TOP (#Count) * FROM
(
SELECT TOP (#Index + #Count) * FROM Table
ORDER BY Table.OrderColumn ASC
) AS T1
ORDER BY T1.OrderColumn DESC
) AS T2
ORDER BY T2.OrderColumn ASC
However, if there aren't enough rows in the table (say, the table has 120 rows and I want 50 rows starting from position 100), this query just ignores the starting position and returns the last X rows.
Also, using three levels of SELECTs and ordering strikes me as quite bad performance-wise.
What is the right way to do this?

Here's a variation that may work
DECLARE #Index int = 5403
DECLARE #Count int = 1000
SELECT * FROM
(
SELECT TOP (#Index + #Count) *,
ROW_NUMBER() over (order by OrderColumn) as Sequence
FROM MyTable
ORDER BY MyTable.OrderColumn ASC
) as T
WHERE Sequence BETWEEN #Index and #Index + #Count - 1
ORDER BY OrderColumn
The derived table (nested query) shouldn't hurt performance. SQL Server will optimize for it. Although it will depend on the what the real query looks like.

If SQL 2012 is used OFFSET clause may be handy, a sample based on AdventureWorks is provided below :-
DECLARE #Index int = 100 DECLARE #Count int = 50
SELECT SalesOrderID, OrderDate, CustomerID, SalesPersonID
FROM Sales.SalesOrderHeader
ORDER BY OrderDate, SalesOrderID
OFFSET #Index ROWS FETCH NEXT #Count ROWS ONLY;
if it is SQL2008, windows functions with CTE would be helpful, a sample based on AdventureWorks:-
DECLARE #Index int = 100
DECLARE #Count int = 50
;WITH C AS
(
SELECT ROW_NUMBER() OVER( ORDER BY OrderDate,SalesOrderID ) AS rownum,
SalesOrderID, OrderDate, CustomerID, SalesPersonID
FROM Sales.SalesOrderHeader
)
SELECT SalesOrderID, OrderDate, CustomerID, SalesPersonID
FROM C
WHERE rownum BETWEEN #Index + 1 AND #Index + #Count
ORDER BY rownum;

Related

Total Count in paging query taking more time in SQL Server

In my below paging query, taking 25 records from 100k records in just 2 seconds. But when I add TOTROWS column to my query for returning the total count of records (100k) it is taking more than 1 minute. Is there any method to find total no of records in optimized manner?
Below one is running fast without including TOTROWS column in the outer select query.
DECLARE #PRODUCTNAME NVARCHAR(200),
#PAGE VARCHAR(100)
SET NOCOUNT ON;
DECLARE #ROWNUM INT = 25
DECLARE #ROWCOUNT
DECLARE #TOTROWS INT
DECLARE #XY INT
SET #PAGE = 1
SELECT TOP 25
ID, NAME
FROM
(SELECT
*,
TOTROWS = COUNT(ID) OVER()
FROM
(SELECT DISTINCT
TP.ID AS ID, TP.NAME AS [NAME],
ROW_NUMBER() OVER (ORDER BY TP.ID ASC) AS Row
FROM
PDF TP
<WHERE CONDITIONS>
UNION ALL
SELECT DISTINCT
TP.ID AS ID, TP.NAME AS [NAME],
ROW_NUMBER() OVER (ORDER BY TP.ID ASC) AS Row
FROM
HTML TP
WHERE <conditions>) a
WHERE
ROW > (#PAGE - 1) * 25) XY
Below one is running slow after adding TOTROWS column in the outer select query.
DECLARE #PRODUCTNAME NVARCHAR(200),
#PAGE VARCHAR(100)
SET NOCOUNT ON;
DECLARE #ROWNUM INT = 25
DECLARE #ROWCOUNT
DECLARE #TOTROWS INT
DECLARE #XY INT
SET #PAGE = 1
SELECT TOP 25
ID, NAME, TOTROWS
FROM
(SELECT
*,
TOTROWS = COUNT(ID) OVER()
FROM
(SELECT DISTINCT
TP.ID AS ID, TP.NAME AS [NAME],
ROW_NUMBER() OVER (ORDER BY TP.ID ASC) AS Row
FROM
PDF TP
<WHERE CONDITIONS>
UNION ALL
SELECT DISTINCT
TP.ID AS ID, TP.NAME AS [NAME],
ROW_NUMBER() OVER (ORDER BY TP.ID ASC) AS Row
FROM
HTML TP
WHERE <conditions>) a
WHERE
ROW > (#PAGE - 1) * 25) XY

Selecting data from table where sum of values in a column equal to the value in another column

Sample data:
create table #temp (id int, qty int, checkvalue int)
insert into #temp values (1,1,3)
insert into #temp values (2,2,3)
insert into #temp values (3,1,3)
insert into #temp values (4,1,3)
According to data above, I would like to show exact number of lines from top to bottom where sum(qty) = checkvalue. Note that checkvalue is same for all the records all the time. Regarding the sample data above, the desired output is:
Id Qty checkValue
1 1 3
2 2 3
Because 1+2=3 and no more data is needed to show. If checkvalue was 4, we would show the third record: Id:3 Qty:1 checkValue:4 as well.
This is the code I am handling this problem. The code is working very well.
declare #checkValue int = (select top 1 checkvalue from #temp);
declare #counter int = 0, #sumValue int = 0;
while #sumValue < #checkValue
begin
set #counter = #counter + 1;
set #sumValue = #sumValue + (
select t.qty from
(
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY id ASC) AS rownumber,
id,qty,checkvalue
FROM #temp
) AS foo
WHERE rownumber = #counter
) t
)
end
declare #sql nvarchar(255) = 'select top '+cast(#counter as varchar(5))+' * from #temp'
EXECUTE sp_executesql #sql, N'#counter int', #counter = #counter;
However, I am not sure if this is the best way to deal with it and wonder if there is a better approach. There are many professionals here and I'd like to hear from them about what they think about my approach and how we can improve it. Any advice would be appreciated!
Try this:
select id, qty, checkvalue from (
select t1.*,
sum(t1.qty) over (partition by t2.id) [sum]
from #temp [t1] join #temp [t2] on t1.id <= t2.id
) a where checkvalue = [sum]
Smart self-join is all you need :)
For SQL Server 2012, and onwards, you can easily achieve this using ROWS BETWEEN in your OVER clause and the use of a CTE:
WITH Running AS(
SELECT *,
SUM(qty) OVER (ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningQty
FROM #temp t)
SELECT id, qty, checkvalue
FROM Running
WHERE RunningQty <= checkvalue;
One basic improvement is to try & reduce the no. of iterations. You're incrementing by 1, but if you repurpose the logic behind binary searching, you'd get something close to this:
DECLARE #RoughAverage int = 1 -- Some arbitrary value. The closer it is to the real average, the faster things should be.
DECLARE #CheckValue int = (SELECT TOP 1 checkvalue FROM #temp)
DECLARE #Sum int = 0
WHILE 1 = 1 -- Refer to BREAK below.
BEGIN
SELECT TOP (#RoughAverage) #Sum = SUM(qty) OVER(ORDER BY id)
FROM #temp
ORDER BY id
IF #Sum = #CheckValue
BREAK -- Indicating you reached your objective.
ELSE
SET #RoughAverage = #CheckValue - #Sum -- Most likely incomplete like this.
END
For SQL 2008 you can use recursive cte. Top 1 with ties limits result with first combination. Remove it to see all combinations
with cte as (
select
*, rn = row_number() over (order by id)
from
#temp
)
, rcte as (
select
i = id, id, qty, sumV = qty, checkvalue, rn
from
cte
union all
select
a.id, b.id, b.qty, a.sumV + b.qty, a.checkvalue, b.rn
from
rcte a
join cte b on a.rn + 1 = b.rn
where
a.sumV < b.checkvalue
)
select
top 1 with ties id, qty, checkvalue
from (
select
*, needed = max(case when sumV = checkvalue then 1 else 0 end) over (partition by i)
from
rcte
) t
where
needed = 1
order by dense_rank() over (order by i)

SQL Server - loop through table and update based on count

I have a SQL Server database. I need to loop through a table to get the count of each value in the column 'RevID'. Each value should only be in the table a certain number of times - for example 125 times. If the count of the value is greater than 125 or less than 125, I need to update the column to ensure all values in the RevID (are over 25 different values) is within the same range of 125 (ok to be a few numbers off)
For example, the count of RevID = "A2" is = 45 and the count of RevID = 'B2' is = 165 then I need to update RevID so the 45 count increases and the 165 decreases until they are within the 125 range.
This is what I have so far:
DECLARE #i INT = 1,
#RevCnt INT = SELECT RevId, COUNT(RevId) FROM MyTable group by RevId
WHILE(#RevCnt >= 50)
BEGIN
UPDATE MyTable
SET RevID= (SELECT COUNT(RevID) FROM MyTable)
WHERE RevID < 50)
#i = #i + 1
END
I have also played around with a cursor and instead of trigger. Any idea on how to achieve this? Thanks for any input.
Okay I cam back to this because I found it interesting even though clearly there are some business rules/discussion that you and I and others are not seeing. anyway, if you want to evenly and distribute arbitrarily there are a few ways you could do it by building recursive Common Table Expressions [CTE] or by building temp tables and more. Anyway here is a way that I decided to give it a try, I did utilize 1 temp table because sql was throwing in a little inconsistency with the main logic table as a cte about every 10th time but the temp table seems to have cleared that up. Anyway, this will evenly spread RevId arbitrarily and randomly assigning any remainder (# of Records / # of RevIds) to one of the RevIds. This script also doesn't rely on having a UniqueID or anything it works dynamically over row numbers it creates..... here you go just subtract out test data etc and you have what you more than likely want. Though rebuilding the table/values would probably be easier.
--Build Some Test Data
DECLARE #Table AS TABLE (RevId VARCHAR(10))
DECLARE #C AS INT = 1
WHILE #C <= 400
BEGIN
IF #C <= 200
BEGIN
INSERT INTO #Table (RevId) VALUES ('A1')
END
IF #c <= 170
BEGIN
INSERT INTO #Table (RevId) VALUES ('B2')
END
IF #c <= 100
BEGIN
INSERT INTO #Table (RevId) VALUES ('C3')
END
IF #c <= 400
BEGIN
INSERT INTO #Table (RevId) VALUES ('D4')
END
IF #c <= 1
BEGIN
INSERT INTO #Table (RevId) VALUES ('E5')
END
SET #C = #C+ 1
END
--save starting counts of test data to temp table to compare with later
IF OBJECT_ID('tempdb..#StartingCounts') IS NOT NULL
BEGIN
DROP TABLE #StartingCounts
END
SELECT
RevId
,COUNT(*) as Occurences
INTO #StartingCounts
FROM
#Table
GROUP BY
RevId
ORDER BY
RevId
/************************ This is the main method **********************************/
--clear temp table that is the main processing logic
IF OBJECT_ID('tempdb..#RowNumsToChange') IS NOT NULL
BEGIN
DROP TABLE #RowNumsToChange
END
--figure out how many records there are and how many there should be for each RevId
;WITH cteTargetNumbers AS (
SELECT
RevId
--,COUNT(*) as RevIdCount
--,SUM(COUNT(*)) OVER (PARTITION BY 1) / COUNT(*) OVER (PARTITION BY 1) +
--CASE
--WHEN ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY NEWID()) <=
--SUM(COUNT(*)) OVER (PARTITION BY 1) % COUNT(*) OVER (PARTITION BY 1)
--THEN 1
--ELSE 0
--END as TargetNumOfRecords
,SUM(COUNT(*)) OVER (PARTITION BY 1) / COUNT(*) OVER (PARTITION BY 1) +
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY NEWID()) <=
SUM(COUNT(*)) OVER (PARTITION BY 1) % COUNT(*) OVER (PARTITION BY 1)
THEN 1
ELSE 0
END - COUNT(*) AS NumRecordsToUpdate
FROM
#Table
GROUP BY
RevId
)
, cteEndRowNumsToChange AS (
SELECT *
,SUM(CASE WHEN NumRecordsToUpdate > 1 THEN NumRecordsToUpdate ELSE 0 END)
OVER (PARTITION BY 1 ORDER BY RevId) AS ChangeEndRowNum
FROM
cteTargetNumbers
)
SELECT
*
,LAG(ChangeEndRowNum,1,0) OVER (PARTITION BY 1 ORDER BY RevId) as ChangeStartRowNum
INTO #RowNumsToChange
FROM
cteEndRowNumsToChange
;WITH cteOriginalTableRowNum AS (
SELECT
RevId
,ROW_NUMBER() OVER (PARTITION BY RevId ORDER BY (SELECT 0)) as RowNumByRevId
FROM
#Table t
)
, cteRecordsAllowedToChange AS (
SELECT
o.RevId
,o.RowNumByRevId
,ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT 0)) as ChangeRowNum
FROM
cteOriginalTableRowNum o
INNER JOIN #RowNumsToChange t
ON o.RevId = t.RevId
AND t.NumRecordsToUpdate < 0
AND o.RowNumByRevId <= ABS(t.NumRecordsToUpdate)
)
UPDATE o
SET RevId = u.RevId
FROM
cteOriginalTableRowNum o
INNER JOIN cteRecordsAllowedToChange c
ON o.RevId = c.RevId
AND o.RowNumByRevId = c.RowNumByRevId
INNER JOIN #RowNumsToChange u
ON c.ChangeRowNum > u.ChangeStartRowNum
AND c.ChangeRowNum <= u.ChangeEndRowNum
AND u.NumRecordsToUpdate > 0
IF OBJECT_ID('tempdb..#RowNumsToChange') IS NOT NULL
BEGIN
DROP TABLE #RowNumsToChange
END
/***************************** End of Main Method *******************************/
-- Compare the results and clean up
;WITH ctePostUpdateResults AS (
SELECT
RevId
,COUNT(*) as AfterChangeOccurences
FROM
#Table
GROUP BY
RevId
)
SELECT *
FROM
#StartingCounts s
INNER JOIN ctePostUpdateResults r
ON s.RevId = r.RevId
ORDER BY
s.RevId
IF OBJECT_ID('tempdb..#StartingCounts') IS NOT NULL
BEGIN
DROP TABLE #StartingCounts
END
Since you've given no rules for how you'd like the balance to operate we're left to speculate. Here's an approach that would find the most overrepresented value and then find an underrepresented value that can take on the entire overage.
I have no idea how optimal this is and it will probably run in an infinite loop without more logic.
declare #balance int = 125;
declare #cnt_over int;
declare #cnt_under int;
declare #revID_overrepresented varchar(32);
declare #revID_underrepresented varchar(32);
declare #rowcount int = 1;
while #rowcount > 0
begin
select top 1 #revID_overrepresented = RevID, #cnt_over = count(*)
from T
group by RevID
having count(*) > #balance
order by count(*) desc
select top 1 #revID_underrepresented = RevID, #cnt_under = count(*)
from T
group by RevID
having count(*) < #balance - #cnt_over
order by count(*) desc
update top #cnt_over - #balance T
set RevId = #revID_underrepresented
where RevId = #revID_overrepresented;
set #rowcount = ##rowcount;
end
The problem is I don't even know what you mean by balance...You say it needs to be evenly represented but it seems like you want it to be 125. 125 is not "even", it is just 125.
I can't tell what you are trying to do, but I'm guessing this is not really an SQL problem. But you can use SQL to help. Here is some helpful SQL for you. You can use this in your language of choice to solve the problem.
Find the rev values and their counts:
SELECT RevID, COUNT(*)
FROM MyTable
GROUP BY MyTable
Update #X rows (with RevID of value #RevID) to a new value #NewValue
UPDATE TOP #X FROM MyTable
SET RevID = #NewValue
WHERE RevID = #RevID
Using these two queries you should be able to apply your business rules (which you never specified) in a loop or whatever to change the data.

Order by and apply a running total to the same column without using a temporary table

A representation of my table:
CREATE TABLE Sales
(
id int identity primary key,
SaleAmount numeric(10,2)
);
DECLARE #i INT;
SELECT #i = 1;
SET NOCOUNT ON
WHILE #i <= 100
BEGIN
INSERT INTO Sales VALUES (ABS(CHECKSUM(NEWID()))/10000000.0 );
SELECT #i = #i + 1;
END;
SET NOCOUNT OFF
I need to order my table Sales by SaleAmount and then select all records where a running total of SaleAmount is no greater than X.
To do this I'm currently using a temporary table to first sort the records and then selecting records where the running total is less than or equal to X (in this example 10).
CREATE TABLE #TEMP_TABLE
(
ID integer IDENTITY PRIMARY KEY,
SaleAmount numeric(10,2)
);
INSERT INTO #TEMP_TABLE
(SaleAmount)
SELECT SaleAmount FROM Sales
ORDER BY SaleAmount
SELECT * FROM
(SELECT
Id,
SaleAmount,
(SaleAmount+COALESCE((SELECT SUM(SaleAmount)
FROM #TEMP_TABLE b
WHERE b.Id < a.Id),0))
AS RunningTotal
FROM #TEMP_TABLE a) InnerTable
WHERE RunningTotal <= 10
Is there a way in which I can first order my Sales table without the use of a temporary table?
If you are using SQL Server 2012, then you can just use the window function for cumulative sum:
select s.*,
sum(SaleAmount) over (order by id) as RunningTotal
from Sales s
This is equivalent to the following correlated subquery:
select s.*,
(select sum(SalesAmount) from sales s2 where s2.id <= s.id) as RunningTotal
from Sales s
Following Aaron Bertrand's suggestion of using a cursor method :
DECLARE #st TABLE
(
Id Int PRIMARY KEY,
SaleAmount Numeric(10,2),
RunningTotal Numeric(10,2)
);
DECLARE
#Id INT,
#SaleAmount Numeric(10,2),
#RunningTotal Numeric(10,2) = 0;
DECLARE c CURSOR
LOCAL STATIC FORWARD_ONLY READ_ONLY
FOR
SELECT id, SaleAmount
FROM Sales
ORDER BY SaleAmount;
OPEN c;
FETCH NEXT FROM c INTO #Id, #SaleAmount;
WHILE ##FETCH_STATUS = 0
BEGIN
SET #RunningTotal = #RunningTotal + #SaleAmount;
INSERT #st(Id, SaleAmount, RunningTotal)
SELECT #Id, #SaleAmount, #RunningTotal;
FETCH NEXT FROM c INTO #Id, #SaleAmount;
END
CLOSE c;
DEALLOCATE c;
SELECT Id, SaleAmount, RunningTotal
FROM #st
WHERE RunningTotal<=10
ORDER BY SaleAmount;
This is an increase in code and still requires a table variable. However the improvement in performance is significant.
Credit has to go to Aaron Bertrand for the excellent article on running totals he wrote.
One more option with CTE, ROW_NUMBER() ranking function and APPLY() operator
;WITH cte AS
(
SELECT ROW_NUMBER() OVER(ORDER BY SaleAmount) AS rn, SaleAmount
FROM Sales s
)
SELECT *
FROM cte c CROSS APPLY (
SELECT SUM(s2.SaleAmount) AS RunningTotal
FROM Sales s2
WHERE c.SaleAmount >= s2.SaleAmount
) o
WHERE o.RunningTotal <= 10
FYI, for avoiding operation of sorting you can use this index:
CREATE INDEX ix_SaleAmount_Sales ON Sales(SaleAmount)
After some research, i believe that what your aiming is not possible, unless using SS2012, or Oracle.
Since your solution seems to work i would advise using a table variable instead of a schema table:
DECLARE #TEMP_TABLE TABLE (
ID integer IDENTITY PRIMARY KEY,
SaleAmount numeric(10,2)
);
INSERT INTO #TEMP_TABLE
(SaleAmount)
SELECT SaleAmount FROM Sales
ORDER BY SaleAmount
SELECT * FROM
(SELECT
Id,
SaleAmount,
(SaleAmount+COALESCE((SELECT SUM(SaleAmount)
FROM #TEMP_TABLE b
WHERE b.Id < a.Id),0))
AS RunningTotal
FROM #TEMP_TABLE a) InnerTable
WHERE RunningTotal <= 10
When testing side-by-side, i found some performance improvements.
First of all, you are doing a sub-select and then doing a select * from the sub-select. This is unnecessary.
SELECT
Id,
SaleAmount,
(SaleAmount+COALESCE((SELECT SUM(SaleAmount)
FROM #TEMP_TABLE b
WHERE b.Id < a.Id),0))
AS RunningTotal
FROM #TEMP_TABLE
WHERE RunningTotal <= 10
Now, the temp table is just a query on the Sales table. There is no purpose to ordering the temporary table because by the rules of SQL, the order in the temporary table does not have to be honored, only the order by clause on the outer query, so
SELECT
Id,
SaleAmount,
(SaleAmount+COALESCE((SELECT SUM(SaleAmount)
FROM Sales b
WHERE b.Id < a.Id),0))
AS RunningTotal
FROM Sales
WHERE RunningTotal <= 10

returning total records in cte

With help from article here and recent answers from SO experts I have arrived with the following which will help me efficiently page through a set of records.
I think my last couple of questions are
See how I include the Total Number of Records in the payload at the end
of SQL CTE called 'Total'. Is that how you would do this?
Any other suggestions? Potential areas for being more concise or improvements? Return Total Number of Pages
DECLARE #page_size INT = 5;
DECLARE #page_nbr INT = 4;
DECLARE #search NVARCHAR(MAX) = '';
DECLARE #sort_order INT = 2;
WITH AllProducts
AS
(
SELECT *,
CASE #sort_order
WHEN 1 THEN ROW_NUMBER() OVER ( ORDER BY ProductID )
WHEN 2 THEN ROW_NUMBER() OVER ( ORDER BY ProductName )
END AS 'Seq'
FROM Products
),
Filtered
AS
(
SELECT * FROM AllProducts
WHERE ProductName like '%'+#search+'%'
OR
#search is null
)
SELECT (select COUNT(*) from Filtered) as 'Total', * FROM Filtered
WHERE seq > (#page_nbr - 1) * #page_size
AND seq <= #page_nbr * #page_size
I think there's something wrong in your query: it numbers records (for paging) and after that applies the filter.
It is, for example, possible that you request page 2 of records, but all records with the corresponding seq values could be filtered out in the meantime. So, in this case the query would yield no results, although there may be plenty of records in the table.
In order to fix that, you could do the filtering and record numbering in the same CTE, like this:
DECLARE #page_size INT = 5;
DECLARE #page_nbr INT = 4;
DECLARE #search NVARCHAR(MAX) = '';
DECLARE #sort_order INT = 2;
WITH Filtered AS (
SELECT *,
CASE #sort_order
WHEN 1 THEN ROW_NUMBER() OVER ( ORDER BY ProductID )
WHEN 2 THEN ROW_NUMBER() OVER ( ORDER BY ProductName )
END AS 'Seq'
FROM AllProducts
WHERE ProductName like '%'+#search+'%' OR #search is null
)
SELECT (select COUNT(*) from Filtered) as 'Total', * FROM Filtered
WHERE seq > (#page_nbr - 1) * #page_size
AND seq <= #page_nbr * #page_size