I'm now all day on a fairly simple udf. It's below. When I paste the select statement into a query, it runs as expected... when I execute the entire function, I get "0" every time. As you know there aren't a ton of debugging options, so it's hard to see what value are/ aren't being set as it executes. The basic purpose of it is to make sure stock data exists in a daily pricing table. So I can check by how many days' data I'm checking for, the ticker, and the latest trading date to check. A subquery gets me the correct trading dates, and I use "IN" to pull data out of the pricing and vol table... if the count of what comes back is less than the number of days I'm checking, no good. If it does, we're in business. Any help would be great, I'm a newb that is punting at this point:
ALTER FUNCTION dbo.PricingVolDataAvailableToDateProvided
(#Ticker char,
#StartDate DATE,
#NumberOfDaysBack int)
RETURNS bit
AS
BEGIN
DECLARE #Result bit
DECLARE #RecordCount int
SET #RecordCount = (
SELECT COUNT(TradeDate) AS Expr1
FROM (SELECT TOP (100) PERCENT TradeDate
FROM tblDailyPricingAndVol
WHERE ( Symbol = #Ticker )
AND ( TradeDate IN (SELECT TOP (#NumberOfDaysBack)
CAST(TradingDate AS DATE) AS Expr1
FROM tblTradingDays
WHERE ( TradingDate <= #StartDate )
ORDER BY TradingDate DESC) )
ORDER BY TradeDate DESC) AS TempTable )
IF #RecordCount = #NumberOfDaysBack
SET #Result = 1
ELSE
SET #Result = 0
RETURN #Result
END
#Ticker char seems suspect.
If you don't declare a length in the parameter definition it defaults to char(1) so quite likely your passed in tickers are being silently truncated - hence no matches.
SELECT TOP (100) PERCENT TradeDate ... ORDER BY TradeDate DESC
in the derived table is pointless but won't affect the result.
Related
I've created a stored procedure that filters and paginates for a DataTable.
Problem: I need to set an OUTPUT variable for #TotalRecords found before an OFFSET occurs, otherwise it sets #TotalRecord to #RecordPerPage.
I've messed around with CTE's and also simply trying this:
SELECT *, #TotalRecord = COUNT(1)
FROM dbo
But that doesn't work either.
Here is my stored procedure, with most of the stuff pulled out:
ALTER PROCEDURE [dbo].[SearchErrorReports]
#FundNumber varchar(50) = null,
#ProfitSelected bit = 0,
#SortColumnName varchar(30) = null,
#SortDirection varchar(10) = null,
#StartIndex int = 0,
#RecordPerPage int = null,
#TotalRecord INT = 0 OUTPUT --NEED TO SET THIS BEFORE OFFSET!
AS
BEGIN
SET NOCOUNT ON;
SELECT *
FROM
(SELECT *
FROM dbo.View
WHERE (#ProfitSelected = 1 AND Profit = 1)) AS ERP
WHERE
((#FundNumber IS NULL OR #FundNumber = '')
OR (ERP.FundNumber LIKE '%' + #FundNumber + '%'))
ORDER BY
CASE
WHEN #SortColumnName = 'FundNumber' AND #SortDirection = 'asc'
THEN ERP.FundNumber
END ASC,
CASE
WHEN #SortColumnName = 'FundNumber' AND #SortDirection = 'desc'
THEN ERP.FundNumber
END DESC
OFFSET #StartIndex ROWS
FETCH NEXT #RecordPerPage ROWS ONLY
Thank you in advance!
You could try something like this:
create a CTE that gets the data you want to return
include a COUNT(*) OVER() in there to get the total count of rows
return just a subset (based on your OFFSET .. FETCH NEXT) from the CTE
So your code would look something along those lines:
-- CTE definition - call it whatever you like
WITH BaseData AS
(
SELECT
-- select all the relevant columns you need
p.ProductID,
p.ProductName,
-- using COUNT(*) OVER() returns the total count over all rows
TotalCount = COUNT(*) OVER()
FROM
dbo.Products p
)
-- now select from the CTE - using OFFSET/FETCH NEXT, get only those rows you
-- want - but the "TotalCount" column still contains the total count - before
-- the OFFSET/FETCH
SELECT *
FROM BaseData
ORDER BY ProductID
OFFSET 20 ROWS FETCH NEXT 15 ROWS ONLY
As a habit, I prefer non-null entries before possible null. I did not reference those in my response below, and limited a working example to just the two inputs you are most concerned with.
I believe there could be some more clean ways to apply your local variables to filter the query results without having to perform an offset. You could return to a temp table or a permanent usage table that cleans itself up and use IDs that aren't returned as a way to set pages. Smoother, with less fuss.
However, I understand that isn't always feasible, and I become frustrated myself with those attempting to solve your use case for you without attempting to answer the question. Quite often there are multiple ways to tackle any issue. Your job is to decide which one is best in your scenario. Our job is to help you figure out the script.
With that said, here's a potential solution using dynamic SQL.
I'm a huge believer in dynamic SQL, and use it extensively for user based table control and ease of ETL mapping control.
use TestCatalog;
set nocount on;
--Builds a temp table, just for test purposes
drop table if exists ##TestOffset;
create table ##TestOffset
(
Id int identity(1,1)
, RandomNumber decimal (10,7)
);
--Inserts 1000 random numbers between 0 and 100
while (select count(*) from ##TestOffset) < 1000
begin
insert into ##TestOffset
(RandomNumber)
values
(RAND()*100)
end;
set nocount off;
go
create procedure dbo.TestOffsetProc
#StartIndex int = null --I'll reference this like a page number below
, #RecordsPerPage int = null
as
begin
declare #MaxRows int = 30; --your front end will probably manage this, but don't trust it. I personally would store this on a table against each display so it can also be returned dynamically with less manual intrusion to this procedure.
declare #FirstRow int;
--Quick entry to ensure your record count returned doesn't excede max allowed.
if #RecordsPerPage is null or #RecordsPerPage > #MaxRows
begin
set #RecordsPerPage = #MaxRows
end;
--Same here, making sure not to return NULL to your dynamic statement. If null is returned from any variable, the entire statement will become null.
if #StartIndex is null
begin
set #StartIndex = 0
end;
set #FirstRow = #StartIndex * #RecordsPerPage
declare #Sql nvarchar(2000) = 'select
tos.*
from ##TestOffset as tos
order by tos.RandomNumber desc
offset ' + convert(nvarchar,#FirstRow) + ' rows
fetch next ' + convert(nvarchar,#RecordsPerPage) + ' rows only'
exec (#Sql);
end
go
exec dbo.TestOffsetProc;
drop table ##TestOffset;
drop procedure dbo.TestOffsetProc;
I have wrote this cursor for commission report. What happens is commission comes in one table, the records are another table. I match two based on certain critera (there is not exact match available). The problem is there are duplicates where records exist. When I match commission with the records table, it can result picking up these duplicates. Thus the rep gets paid more. On the other hand, there are duplicates in commission table also but those are valid beause they simple mean an account got paid for 2 months.
I wrote this query but it takes 5+ minutes to run. I have 50,000 records in records table and 100,000 in commission table. Is there any way I an improve this cursor?
/* just preparation of cursor, this is not time consuming */
CREATE TABLE #result
(
repid INT,
AccountNo VARCHAR(100),
supplier VARCHAR(15),
CompanyName VARCHAR(200),
StartDate DATETIME,
EndDate DATETIME,
Product VARCHAR(25),
commodity VARCHAR(25),
ContractEnd DATETIME,
EstUsage INT,
EnrollStatus VARCHAR(10),
EnrollDate DATETIME,
ActualEndDate DATETIME,
MeterStart DATETIME,
MeterEnd DATETIME,
ActualUsage INT
)
DECLARE #AccountNo VARCHAR(100)
DECLARE #supplier VARCHAR(10)
DECLARE #commodity VARCHAR(15)
DECLARE #meterstart DATETIME
DECLARE #meterEnd DATETIME
DECLARE #volume FLOAT
DECLARE #RepID INT
DECLARE #Month INT
DECLARE #Year INT
SET #repID = 80
SET #Month = 1
SET #year = 2012
/* the actual cursor */
DECLARE commission_cursor CURSOR FOR
SELECT AccountNo,
supplier,
commodity,
meterStart,
MeterEnd,
Volume
FROM commission
WHERE Datepart(m, PaymentDate) = #Month
AND Datepart(YYYY, PaymentDate) = #Year
OPEN commission_cursor
FETCH next FROM commission_cursor INTO #AccountNo, #supplier, #commodity, #MeterStart, #MeterEnd, #Volume;
WHILE ##fetch_status = 0
BEGIN
IF EXISTS (SELECT id
FROM Records
WHERE AccountNo = #AccountNo
AND supplier = #supplier
AND Commodity = #commodity
AND RepID = #repID)
INSERT INTO #result
SELECT TOP 1 RepID,
AccountNo,
Supplier,
CompanyName,
[Supplier Start Date],
[Supplier End Date],
Product,
Commodity,
[customer end date],
[Expected Usage],
EnrollStatus,
ActualStartDate,
ActualEndDate,
#meterstart,
#MeterEnd,
#volume
FROM Records
WHERE AccountNo = #AccountNo
AND supplier = #supplier
AND Commodity = #commodity
AND RepID = #repID
AND #MeterStart >= Dateadd(dd, -7, ActualStartDate)
AND #meterEnd <= Isnull(Dateadd(dd, 30, ActualEndDate), '2015-12-31')
FETCH next FROM commission_cursor INTO #AccountNo, #supplier, #commodity, #MeterStart, #MeterEnd, #Volume;
END
SELECT *
FROM #result
/* clean up */
CLOSE commission_cursor
DEALLOCATE commission_cursor
DROP TABLE #result
I have read answer to How to make a T-SQL Cursor faster?, for that what I get is rewrite this query in table form. But I do have another query which uses join and is lightening fast. The problem is, it can not differentiate between the dups in my records table.
Is there anything I can do to make is faster. This is primary question. If not, do you have any alternative way to do it.
I specifically need help with
Will using Views or store procedure help
I there a way I can use cache in Cursor to make it faster
Any other option in syntax
The very first option is to set the least resource intensive options for your cursor:
declare commission_cursor cursor
local static read_only forward_only
for
Next is to investigate whether you need a cursor at all. In this case I think you can do the same with a single pass and no loops:
;WITH x AS
(
SELECT
rn = ROW_NUMBER() OVER (PARTITION BY r.AccountNo, r.Supplier, r.Commodity, r.RepID
ORDER BY r.ActualEndDate DESC),
r.RepID,
r.AccountNo,
r.Supplier,
r.CompanyName,
StartDate = r.[Supplier Start Date],
EndDate = r.[Supplier End Date],
r.Product,
r.Commodity,
ContractEnd = r.[customer end date],
EstUsage = r.[Expected Usage],
r.EnrollStatus,
EnrollDate = r.ActualStartDate,
r.ActualEndDate,
c.MeterStart,
c.MeterEnd,
ActualUsage = c.Volume
FROM dbo.commission AS c
INNER JOIN dbo.Records AS r
ON c.AccountNo = r.AccountNo
AND c.Supplier = r.Supplier
AND c.Commodity = r.Commodity
AND c.RepID = r.RepID
WHERE
c.PaymentDate >= DATEADD(MONTH, #Month-1, CONVERT(CHAR(4), #Year) + '0101')
AND c.PaymentDate < DATEADD(MONTH, 1, CONVERT(CHAR(4), #Year) + '0101')
AND r.RepID = #RepID
)
SELECT RepID, AccountNo, Supplier, CompanyName, StartDate, EndDate,
Product, Commodity, ContractEnd, EstUsage, EnrollStatus, EnrollDate,
ActualEndDate, MeterStart, MeterEnd, ActualUsage
FROM x
WHERE rn = 1 --ORDER BY something;
If this is still slow, then the cursor probably wasn't the problem - the next step will be investigating what indexes might be implemented to make this query more efficient.
Temp tables are your friend
The way I solved my problem, merging data from two tables, removed duplicates in complex fashion and everything extremely fast was to use temporary table. This is what I did
Create a #temp table, fetch the merged data from both the tables. Make sure you include ID fields in both tables even if you do not required it. This will help remove duplicates.
Now you can do all sort of calculation on this table. Remove duplicates from table B, just remove duplicate table B IDs. Remove duplicates from table A, just remove duplicate table A Ids. There is more complexity to the problem but at least this is probably the best way to solve your problem and make it considerably faster if cursors are too expensive and takes considerable time to calculate. In my case it was taking +5 min. The #temp table query about about 5 sec, which had a lot more calculations in it.
While applying Aaron solution, the cursor did not get any faster. The second query was faster but it did not give me the correct answer, so finally I used temp tables. This is my own answer.
I am using MS SQL Server 2005 at work to build a database. I have been told that most tables will hold 1,000,000 to 500,000,000 rows of data in the near future after it is built... I have not worked with datasets this large. Most of the time I don't even know what I should be considering to figure out what the best answer might be for ways to set up schema, queries, stuff.
So... I need to know the start and end dates for something and a value that is associated with in ID during that time frame. SO... we can the table up two different ways:
create table xxx_test2 (id int identity(1,1), groupid int, dt datetime, i int)
create table xxx_test2 (id int identity(1,1), groupid int, start_dt datetime, end_dt datetime, i int)
Which is better? How do I define better? I filled the first table with about 100,000 rows of data and it takes about 10-12 seconds to set up in the format of the second table depending on the query...
select y.groupid,
y.dt as [start],
z.dt as [end],
(case when z.dt is null then 1 else 0 end) as latest,
y.i
from #x as y
outer apply (select top 1 *
from #x as x
where x.groupid = y.groupid and
x.dt > y.dt
order by x.dt asc) as z
or
http://consultingblogs.emc.com/jamiethomson/archive/2005/01/10/t-sql-deriving-start-and-end-date-from-a-single-effective-date.aspx
Buuuuut... with the second table.... to insert a new row, I have to go look and see if there is a previous row and then if so update its end date. So... is it a question of performance when retrieving data vs insert/update things? It seems silly to store that end date twice but maybe...... not? What things should I be looking at?
this is what i used to generate my fake data... if you want to play with it for some reason (if you change the maximum of the random number to something higher it will generate the fake stuff a lot faster):
declare #dt datetime
declare #i int
declare #id int
set #id = 1
declare #rowcount int
set #rowcount = 0
declare #numrows int
while (#rowcount<100000)
begin
set #i = 1
set #dt = getdate()
set #numrows = Cast(((5 + 1) - 1) *
Rand() + 1 As tinyint)
while #i<=#numrows
begin
insert into #x values (#id, dateadd(d,#i,#dt), #i)
set #i = #i + 1
end
set #rowcount = #rowcount + #numrows
set #id = #id + 1
print #rowcount
end
For your purposes, I think option 2 is the way to go for table design. This gives you flexibility, and will save you tons of work.
Having the effective date and end date will allow you to have a query that will only return currently effective data by having this in your where clause:
where sysdate between effectivedate and enddate
You can also then use it to join with other tables in a time-sensitive way.
Provided you set up the key properly and provide the right indexes, performance (on this table at least) should not be a problem.
for anyone who can use LEAD Analytic function of SQL Server 2012 (or Oracle, DB2, ...), retrieving data from the 1st table (that uses only 1 date column) would be much much quicker than without this feature:
select
groupid,
dt "start",
lead(dt) over (partition by groupid order by dt) "end",
case when lead(dt) over (partition by groupid order by dt) is null
then 1 else 0 end "latest",
i
from x
I have to work with a potentially large list of records and I've been Googling for ways to avoid selecting the whole list, instead I want to let users select a page (like from 1 to 10) and display the records accordingly.
Say, for 1000 records I will have 100 pages of 10 records each and the most recent 10 records will be displayed first then if the user click on page 5, it will show records from 41 to 50.
Is it a good idea to add a row number to each record then query based on row number? Is there a better way of achieving the paging result without too much overhead?
So far those methods as described here look the most promising:
http://developer.berlios.de/docman/display_doc.php?docid=739&group_id=2899
http://www.codeproject.com/KB/aspnet/PagingLarge.aspx
The following T-SQL stored procedure is a very efficient implementation of paging. THE SQL optimiser can find the first ID very fast. Combine this with the use of ROWCOUNT, and you have an approach that is both CPU-efficient and read-efficient. For a table with a large number of rows, it certainly beats any approach that I've seen using a temporary table or table variable.
NB: I'm using a sequential identity column in this example, but the code works on any column suitable for page sorting. Also, sequence breaks in the column being used don't affect the result as the code selects a number of rows rather than a column value.
EDIT: If you're sorting on a column with potentially non-unique values (eg LastName), then add a second column to the Order By clause to make the sort values unique again.
CREATE PROCEDURE dbo.PagingTest
(
#PageNumber int,
#PageSize int
)
AS
DECLARE #FirstId int, #FirstRow int
SET #FirstRow = ( (#PageNumber - 1) * #PageSize ) + 1
SET ROWCOUNT #FirstRow
-- Add check here to ensure that #FirstRow is not
-- greater than the number of rows in the table.
SELECT #FirstId = [Id]
FROM dbo.TestTable
ORDER BY [Id]
SET ROWCOUNT #PageSize
SELECT *
FROM dbo.TestTable
WHERE [Id] >= #FirstId
ORDER BY [Id]
SET ROWCOUNT 0
GO
If you use a CTE with two row_number() columns - one sorted asc, one desc, you get row numbers for paging as well as the total records by adding the two row_number columns.
create procedure get_pages(#page_number int, #page_length int)
as
set nocount on;
with cte as
(
select
Row_Number() over (order by sort_column desc) as row_num
,Row_Number() over (order by sort_column) as inverse_row_num
,id as cte_id
From my_table
)
Select
row_num+inverse_row_num as total_rows
,*
from CTE inner join my_table
on cte_id=df_messages.id
where row_num between
(#page_number)*#page_length
and (#page_number+1)*#page_length
order by rownumber
Using OFFSET
Others have explained how the ROW_NUMBER() OVER() ranking function can be used to perform pages. It's worth mentioning that SQL Server 2012 finally included support for the SQL standard OFFSET .. FETCH clause:
SELECT first_name, last_name, score
FROM players
ORDER BY score DESC
OFFSET 40 ROWS FETCH NEXT 10 ROWS ONLY
If you're using SQL Server 2012 and backwards-compatibility is not an issue, you should probably prefer this clause as it will be executed more optimally by SQL Server in corner cases.
Using the SEEK Method
There is an entirely different, much faster way to perform paging in SQL. This is often called the "seek method" as described in this blog post here.
SELECT TOP 10 first_name, last_name, score
FROM players
WHERE (score < #previousScore)
OR (score = #previousScore AND player_id < #previousPlayerId)
ORDER BY score DESC, player_id DESC
The #previousScore and #previousPlayerId values are the respective values of the last record from the previous page. This allows you to fetch the "next" page. If the ORDER BY direction is ASC, simply use > instead.
With the above method, you cannot immediately jump to page 4 without having first fetched the previous 40 records. But often, you do not want to jump that far anyway. Instead, you get a much faster query that might be able to fetch data in constant time, depending on your indexing. Plus, your pages remain "stable", no matter if the underlying data changes (e.g. on page 1, while you're on page 4).
This is the best way to implement paging when lazy loading more data in web applications, for instance.
Note, the "seek method" is also called keyset paging.
Try something like this:
declare #page int = 2
declare #size int = 10
declare #lower int = (#page - 1) * #size
declare #upper int = (#page ) * #size
select * from (
select
ROW_NUMBER() over (order by some_column) lfd,
* from your_table
) as t
where lfd between #lower and #upper
order by some_column
Here's an updated version of #RoadWarrior's code, using TOP. Performance is identical, and extremely fast. Make sure you have an index on TestTable.ID
CREATE PROC dbo.PagingTest
#SkipRows int,
#GetRows int
AS
DECLARE #FirstId int
SELECT TOP (#SkipRows)
#FirstId = [Id]
FROM dbo.TestTable
ORDER BY [Id]
SELECT TOP (#GetRows) *
FROM dbo.TestTable
WHERE [Id] >= #FirstId
ORDER BY [Id]
GO
Try this
Declare #RowStart int, #RowEnd int;
SET #RowStart = 4;
SET #RowEnd = 7;
With MessageEntities As
(
Select ROW_NUMBER() Over (Order By [MESSAGE_ID]) As Row, [MESSAGE_ID]
From [TBL_NAFETHAH_MESSAGES]
)
Select m0.MESSAGE_ID, m0.MESSAGE_SENDER_NAME,
m0.MESSAGE_SUBJECT, m0.MESSAGE_TEXT
From MessageEntities M
Inner Join [TBL_NAFETHAH_MESSAGES] m0 on M.MESSAGE_ID = m0.MESSAGE_ID
Where M.Row Between #RowStart AND #RowEnd
Order By M.Row Asc
GO
Why not to use recommended solution:
SELECT VALUE product FROM
AdventureWorksEntities.Products AS product
order by product.ListPrice SKIP #skip LIMIT #limit
I use ROW_NUMBER() to do paging with my website content and when you hit the last page it timeout because the SQL Server takes too long to complete the search.
There's already an article concerning this problem but seems no perfect solution yet.
http://weblogs.asp.net/eporter/archive/2006/10/17/ROW5F00NUMBER28002900-OVER-Not-Fast-Enough-With-Large-Result-Set.aspx
When I click the last page of the StackOverflow it takes less a second to return a page, which is really fast. I'm wondering if they have a real fast database servers or just they have a solution for ROW_NUMBER() problem?
Any idea?
Years back, while working with Sql Server 2000, which did not have this function, we had the same issue.
We found this method, which at first look seems like the performance can be bad, but blew us out the water.
Try this out
DECLARE #Table TABLE(
ID INT PRIMARY KEY
)
--insert some values, as many as required.
DECLARE #I INT
SET #I = 0
WHILE #I < 100000
BEGIN
INSERT INTO #Table SELECT #I
SET #I = #I + 1
END
DECLARE #Start INT,
#Count INT
SELECT #Start = 10001,
#Count = 50
SELECT *
FROM (
SELECT TOP (#Count)
*
FROM (
SELECT TOP (#Start + #Count)
*
FROM #Table
ORDER BY ID ASC
) TopAsc
ORDER BY ID DESC
) TopDesc
ORDER BY ID
The base logic of this method relies on the SET ROWCOUNT expression to both skip the unwanted rows and fetch the desired ones:
DECLARE #Sort /* the type of the sorting column */
SET ROWCOUNT #StartRow
SELECT #Sort = SortColumn FROM Table ORDER BY SortColumn
SET ROWCOUNT #PageSize
SELECT ... FROM Table WHERE SortColumn >= #Sort ORDER BY SortColumn
The issue is well covered in this CodeProject article, including scalability graphs.
TOP is supported on SQL Server 2000, but only static values. Eg no "TOP (#Var)", only "TOP 200"