Stored procedure timing out on particular connection pool - sql

I have a stored procedure which occasionally times out when called from our website (through the website connection pool). Once it has timed out, it has always been locked into the time-out, until the procedure is recompiled using drop/create or sp_recompile from a Management Studio session.
While it is timing out, there is no time-out using the same parameters for the same procedure using Management Studio.
Doing an "ALTER PROCEDURE" through Management Studio and (fairly drastically) changing the internal execution of the procedure did NOT clear the time out - it wouldn't clear until a full sp_recompile was run.
The stored procedure ends with OPTION (RECOMPILE)
The procedure calls two functions, which are used ubiquitously throughout the rest of the product. The other procedures which use these functions (in similar ways) all work, even during a period where the procedure in question is timing out.
If anyone can offer any further advice as to what could be causing this time out it would be greatly appreciated.
The stored procedure is as below:
ALTER PROCEDURE [dbo].[sp_g_VentureDealsCountSizeByYear] (
#DateFrom AS DATETIME = NULL
,#DateTo AS DATETIME = NULL
,#ProductRegion AS INT = NULL
,#PortFirmID AS INT = NULL
,#InvFirmID AS INT = NULL
,#SpecFndID AS INT = NULL
) AS BEGIN
-- Returns the stats used for Market Overview
DECLARE #IDs AS IDLIST
INSERT INTO #IDs
SELECT IDs
FROM dbo.fn_VentureDealIDs(#DateFrom,#DateTo,#ProductRegion,#PortFirmID,#InvFirmID,#SpecFndID)
CREATE TABLE #DealSizes (VentureID INT, DealYear INT, DealQuarter INT, DealSize_USD DECIMAL(18,2))
INSERT INTO #DealSizes
SELECT vDSQ.VentureID, vDSQ.DealYear, vDSQ.DealQuarter, vDSQ.DealSize_USD
FROM dbo.fn_VentureDealsSizeAndQuarter(#IDs) vDSQ
SELECT
yrs.Years Heading
,COUNT(vDSQ.VentureID) AS Num_Deals
,SUM(vDSQ.DealSize_USD) AS DealSize_USD
FROM tblYears yrs
LEFT OUTER JOIN #DealSizes vDSQ ON vDSQ.DealYear = yrs.Years
WHERE (
((#DateFrom IS NULL) AND (yrs.Years >= (SELECT MIN(DealYear) FROM #DealSizes))) -- If no minimum year has been passed through, take all years from the first year found to the present.
OR
((#DateFrom IS NOT NULL) AND (yrs.Years >= DATEPART(YEAR,#DateFrom))) -- If a minimum year has been passed through, take all years from that specified to the present.
) AND (
((#DateTo IS NULL) AND (yrs.Years <= (SELECT MAX(DealYear) FROM #DealSizes))) -- If no maximum year has been passed through, take all years up to the last year found.
OR
((#DateTo IS NOT NULL) AND (yrs.Years <= DATEPART(YEAR,#DateTo))) -- If a maximum year has been passed through, take all years up to that year.
)
GROUP BY yrs.Years
ORDER BY Heading DESC
OPTION (RECOMPILE)
END

If you wanted to recompile SP each time it is executed, you should have declared it with recompile; your syntax recompiles last select only:
ALTER PROCEDURE [dbo].[sp_g_VentureDealsCountSizeByYear] (
#DateFrom AS DATETIME = NULL
,#DateTo AS DATETIME = NULL
,#ProductRegion AS INT = NULL
,#PortFirmID AS INT = NULL
,#InvFirmID AS INT = NULL
,#SpecFndID AS INT = NULL
) WITH RECOMPILE
I could not tell which part of your procedure causes problems. You might try commenting out select part to see if creating temp tables from table functions produces performance issue; if it does not, then the query itself is a problem. You might rewrite filter as following:
WHERE (#DateFrom IS NULL OR yrs.Years >= DATEPART(YEAR,#DateFrom))
AND (#DateTo IS NULL OR yrs.Years <= DATEPART(YEAR,#DateTo))
Or, perhaps better, declare startYear and endYear variables, set them accordingly and change where like this:
declare #startYear int
set #startYear = isnull (year(#DateFrom), (SELECT MIN(DealYear) FROM #DealSizes))
declare #endYear int
set #endYear = isnull (year(#DateTo), (SELECT MAX(DealYear) FROM #DealSizes))
...
where yrs.Year between #startYear and #endYear
If WITH RECOMPILE does not solve the problem, and removing last query does not help either, then you need to check table functions you use to gather data.

Related

Access data across function calls in same connection

I need help solving a performance problem related to a recursive function in SQL Server. I have a table of tasks for items, each of which have a lead time. My function recursively calls itself to calculate the due date for each task, based on the sum of the preceding tasks (simplistically put..). The function performs slowly at large scale, I believe mainly because must recalculate the due date for each ancestor, for each subsequent task.
So I am wondering, is there a way to store a calculated value that could persist from function call to function call, that would last only the lifetime of the connection? Then my function could 'short-circuit' if it found a pre-calculated value, and avoid re-evaluating for each due date request.
The basic schema is as below, with a crude representation of the function in question (This function could also be done with a cte, but the calculations are still repeating the same calculations):
Create Table Projects(id int, DueDate DateTime)
Create Table Items(id int, Parent int, Project int, Offset int)
Create Table Tasks (id int, Parent int, Leadtime Int, Sequence int)
insert into Projects Values
(100,'1/1/2021')
Insert into Items Values
(0,null, 100, 0)
,(1,12, null, 0)
,(2,15, null, 1)
Insert into Tasks Values
(10,0,1,1)
,(11,0,1,2)
,(12,0,2,3)
,(13,0,1,4)
,(14,1,1,1)
,(15,1,1,2)
,(16,2,2,1)
,(17,2,1,2);
CREATE FUNCTION GetDueDate(#TaskID int)
Returns DATETIME
AS BEGIN
Declare #retval DateTime = null
Declare #parent int = (Select Parent from Tasks where ID = #TaskID)
Declare #parentConsumingOp int = (select Parent from Items where ID = #parent)
Declare #parentOffset int = (select Offset from Items where ID = #parent)
Declare #seq int = (Select Sequence from Tasks where ID = #TaskID)
Declare #NextTaskID int = (select ID from Tasks where Parent = #parent and Sequence = #seq-1)
Declare #Due DateTime = (select DueDate from Projects where ID = (Select Project from Items where ID = (Select Parent from Tasks where ID = #TaskID)))
Declare #leadTime int = (Select LeadTime from Tasks where ID = #TaskID)
if #NextTaskID is not null
BEGIN
SET #retval = DateAdd(Day,#leadTime * -1,dbo.GetDueDate(#NextTaskID))
END ELSE IF #parentConsumingOp Is Not Null
BEGIN
SET #retval = DateAdd(Day,(#leadTime + #parentOffset)*-1,dbo.GetDueDate(#parentConsumingOp))
END ELSE SET #retval = DateAdd(Day,#parentOffset*-1,#Due)
Return #retval
END
EDIT: Sql Fiddle Here
Caveat: the following is based on the sample data you've provided rather than trying to work through the logic in your function (i.e. what you are trying to achieve rather than how you have implemented it)...
The result of the function appears to be:
for "this task"
project.due_date - (sum(tasks.leadtime) +1) where tasks.sequence <= sequence of this task and tasks.parent = parent of this task
If this is the case then this function gives the same result as yours but is much simpler:
CREATE FUNCTION GetDueDate1(#TaskID int)
Returns DATETIME
AS BEGIN
Declare #retval DateTime = null
Declare #parent int = (Select Parent from Tasks where ID = #TaskID)
Declare #seq int = (Select sequence from Tasks where ID = #TaskID)
Declare #totlead int = (select Sum(Leadtime) - 1 from Tasks where parent = #parent and sequence <= #Seq)
Declare #duedate DateTime = (select p.DueDate from tasks t inner join items i on t.parent = i.id inner join projects p on i.Project = p.id where t.id = 13)
SET #retval = DateAdd(Day,#totlead * -1,#duedate)
Return #retval
END;
If I run both functions against your data:
select id
,leadtime
, sequence
, [dbo].[GetDueDate](id) "YourFunction"
, [dbo].[GetDueDate1](id) "MyFunction"
from tasks
where parent = 0;
I get the same result:
id leadtime sequence YourFunction MyFunction
10 1 1 2021-01-01 00:00:00.000 2021-01-01 00:00:00.000
11 1 2 2020-12-31 00:00:00.000 2020-12-31 00:00:00.000
12 2 3 2020-12-29 00:00:00.000 2020-12-29 00:00:00.000
13 1 4 2020-12-28 00:00:00.000 2020-12-28 00:00:00.000
Hope this helps? If it doesn't then please provide some sample data where my function doesn't produce the same result as yours
Update following Comment
Good point, the code above doesn't work for all your data.
I've been thinking this problem through and have come up with the following - please feel free to point it out if I have misunderstood anything:
Your function will, obviously, only return the Due Date for the task you have passed in as a parameter. It will also only calculate the due dates of each of the preceding tasks once during this process
Therefore there is no point "saving" the due dates calculated for other tasks as they will only be used once in the calculation of the initial task id (so no performance gain from holding these values as they wont get re-used) and they wont be used if you called the function again - because that's not how functions work: it can't "know" that you may have called the function previously and already calculated the due date for that task id as part of an intermediate step
Re-reading your initial explanation, it appears that you actually want to calculate the due dates for a number of tasks (or all of them?) - not just a single one. If this is the case then I wouldn't (just) use a function (which is inherently limited to 1 task), instead I would write a Stored Procedure that would loop through all your tasks, calculate their due date and save this to a table (either a new table or update your Tasks table).
You would need to ensure that the tasks were processed in an appropriate order so that those used in calculations for subsequent tasks were calculated first
You can re-use the logic in your function (or even call the function from within the SP) but add step(s) that check if the Due Date has already been calculated (i.e. try and select it from the table), use it if it had or calculate it (and save it to the table) if it hadn't
You would need to run this SP whenever relevant data in the tables used in the calculation was amended

Procedure returning 0 instead of higher number

I have the following procedure to retrieve some data, based by the year, which is input by the user. However, I always get a 0 back. I'm still fairly new to SQL, but this seemed like it should work
Create PROCEDURE [dbo].[Yearly]
#year int
AS
BEGIN
DECLARE #yearly Datetime
DECLARE #summ int
SELECT #summ = SUM([dbo].[Out].[OutPcs]), #yearly = [dbo].[Out].[DateTime]
FROM [dbo].[Out]
WHERE YEAR(#yearly) = #year
GROUP BY [Out].[DateTime]
END;
Should I have used nested select statements? I suspect something is wrong in that part of the procedure.
You have DECLARE #yearly Datetime.
You attempt to set it in SELECT ... #yearly = Out.Datetime FROM Out, but then you have this WHERE statement: YEAR(#yearly) = #year
This returns nothing since #yearly is NULL when called by YEAR()
This makes the statement equivalent to WHERE NULL = 2018
Which will never be true.
To fix this, you need to set yearly before calling it in your WHERE clause or use something else there.
It looks like you want to use YEAR(Dbo.Out.Datetime) instead there
Since it looks like you're new to SQL I will add some extra explanation. This is an oversimplification.
Most programming languages run top to bottom. Executing the line1 first, line2 second, line3 third, and so on. SQL does not do this.
The command SELECT Name FROM Employee WHERE EmpID = 1 Runs in the following order.
First - FROM Employee --> Load the Employee table
Second - WHERE EmpID = 1 --> Scan Employee for the records where EmpID = 1
Third - SELECT Name --> Display the `Name` field of the records I found.
Your command looks like this to the SQL compiler
First - FROM dbo.Out --> Load Out table
Second - WHERE YEAR(#yearly) = #year --> Scan for records that meet this req.
Third - SELECT ... #yearly = dbo.Out.Datetime --> Set #yearly to the [Datetime] field associated to the record(s) I found.
Note that if your statement had returned multiple records, then SQL would have tried to set your 1-dimensional variable to an array of values. It would fail and give you something like
Too many records returned. Have me only return 1 record.
Why your code is not working is well explained by #Edward
Here is a working code:
Create PROCEDURE [dbo].[Yearly]
#year int
AS
BEGIN
SELECT SUM([dbo].[Out].[OutPcs])
FROM [dbo].[Out]
WHERE YEAR([dbo].[Out].[DateTime]) = #year
END;
You forgot to return "summ":
And #yearly var is not necessary.
Group by Year is not necessary too.
Create PROCEDURE [dbo].[Yearly]
#year int
AS
BEGIN
DECLARE #summ int
SELECT #summ = SUM([dbo].[Out].[OutPcs])
FROM [dbo].[Out]
WHERE YEAR([dbo].[Out].[DateTime]) = #year
Return #summ
END;

Big difference in Estimated and Actual rows when using a local variable

This is my first post on Stackoverflow so I hope I'm correctly following all protocols!
I'm struggling with a stored procedure in which I create a table variable and filling this table with an insert statement using an inner join. The insert itself is simple, but it gets complicated because the inner join is done on a local variable. Since the optimizer doesn't have statistics for this variable my estimated row count is getting srewed up.
The specific piece of code that causes trouble:
declare #minorderid int
select #minorderid = MIN(lo.order_id)
from [order] lo with(nolock)
where lo.order_datetime >= #datefrom
insert into #OrderTableLog_initial
(order_id, order_log_id, order_id, order_datetime, account_id, domain_id)
select ot.order_id, lol.order_log_id, ot.order_id, ot.order_datetime, ot.account_id, ot.domain_id
from [order] ot with(nolock)
inner join order_log lol with(nolock)
on ot.order_id = lol.order_id
and ot.order_datetime >= #datefrom
where (ot.domain_id in (1,2,4) and lol.order_log_id not in ( select order_log_id
from dbo.order_log_detail lld with(nolock)
where order_id >= #minorderid
)
or
(ot.domain_id = 3 and ot.order_id not IN (select order_id
from dbo.order_log_detail_spa llds with(nolock)
where order_id >= #minorderid
)
))
order by lol.order_id, lol.order_log_id
The #datefrom local variable is also declared earlier in the stored procedure:
declare #datefrom datetime
if datepart(hour,GETDATE()) between 4 and 9
begin
set #datefrom = '2011-01-01'
end
else
begin
set #datefrom = DATEADD(DAY,-2,GETDATE())
end
I've also tested this with a temporary table in stead of a table variable, but nothing changes. However, when I replace the local variable >= #datefrom with a fixed datestamp then my estimates and actuals are almost the same.
ot.order_datetime >= #datefrom = SQL Sentry Plan Explorer
ot.order_datetime >= '2017-05-03 18:00:00.000' = SQL Sentry Plan Explorer
I've come to understand that there's a way to fix this by turning this code into a dynamic sp, but I'm not sure how to do this. I would be grateful if someone could give me suggestions on how to do this. Maybe I have to use a complete other approach? Forgive me if I forgot something to mention, this is my first post.
EDIT:
MSSQL version = 11.0.5636
I've also tested with trace flag 2453, but with no success
Best regards,
Peter
Indeed, the behavior what you are experiencing is because the variables. SQL Server won't store an execution plan for each and every possible inputs, thus for some queries the execution plan may or may not optimal.
To answer your explicit question: You'll have to create a varchar variable and build the query as a string, then execute it.
Some notes before the actual code:
This can be prone to SQL injection (in general)
SQL Server will store the plans separately, meaning they will use more memory and possibly knock out other plans from the cache
Using an imaginary setup, this is what you want to do:
DECLARE #inputDate DATETIME2 = '2017-01-01 12:21:54';
DELCARE #dynamiSQL NVARCHAR(MAX) = CONCAT('SELECT col1, col2 FROM MyTable WHERE myDateColumn = ''', FORMAT(#inputDate, 'yyyy-MM-dd HH:mm:ss'), ''';');
INSERT INTO #myTableVar (col1, col2)
EXEC sp_executesql #stmt = #dynamicSQL;
As an additional note:
you can try to use EXISTS and NOT EXISTS instead of IN and NOT IN.
You can try to use a temp table (#myTempTable) instead of a local variable and put some indexes on it. Physical temp tables can perform better with large amount of data and you can put indexes on it. (For more info you can go here: What's the difference between a temp table and table variable in SQL Server? or to the official documentation)

Best Practice for a scheduled stored procedure

I have a stored procedure that takes user input from a webform and updates a database.
CREATE TABLE AccountTable
(
RowID int IDENTITY(1, 1),
AccountID varchar(2),
AccountName varchar(50),
SeqNum int,
SeqDate datetime
)
CREATE PROCEDURE [ACCOUNTTABLE_UPDATE]
(
#SeqNum int,
#SeqDate datetime,
#Account_ID varchar(2)
)
AS
SET NOCOUNT ON
BEGIN
UPDATE AccountTable
SET SeqNum = #SeqNum, SeqDate = #SeqDate
WHERE AccountID = #AccountID
END
Each time the user runs the webapp, the table updates the SeqNum and SeqDate columns. I would like to have the SeqNum column reset after ever 24 hours to NULL. Would just putting in the stored procedure checking if the current date is greater than the dates column be ideal or implementing a scheduled task?
Simply try running your SP as Sql Agent service's task.
If you really want to this I think you should use SQL Agent for scheduling.

Stored Procedure; Insert Slowness

I have an SP that takes 10 seconds to run about 10 times (about a second every time it is ran). The platform is asp .net, and the server is SQL Server 2005. I have indexed the table (not on the PK also), and that is not the issue. Some caveats:
usp_SaveKeyword is not the issue. I commented out that entire SP and it made not difference.
I set #SearchID to 1 and the time was significantly reduced, only taking about 15ms on average for the transaction.
I commented out the entire stored procedure except the insert into tblSearches and strangely it took more time to execute.
Any ideas of what could be going on?
set ANSI_NULLS ON
go
ALTER PROCEDURE [dbo].[usp_NewSearch]
#Keyword VARCHAR(50),
#SessionID UNIQUEIDENTIFIER,
#time SMALLDATETIME = NULL,
#CityID INT = NULL
AS
BEGIN
SET NOCOUNT ON;
IF #time IS NULL SET #time = GETDATE();
DECLARE #KeywordID INT;
EXEC #KeywordID = usp_SaveKeyword #Keyword;
PRINT 'KeywordID : '
PRINT #KeywordID
DECLARE #SearchID BIGINT;
SELECT TOP 1 #SearchID = SearchID
FROM tblSearches
WHERE SessionID = #SessionID
AND KeywordID = #KeywordID;
IF #SearchID IS NULL BEGIN
INSERT INTO tblSearches
(KeywordID, [time], SessionID, CityID)
VALUES
(#KeywordID, #time, #SessionID, #CityID)
SELECT Scope_Identity();
END
ELSE BEGIN
SELECT #SearchID
END
END
Why are you using top 1 #SearchID instead of max (SearchID) or where exists in this query? top requires you to run the query and retrieve the first row from the result set. If the result set is large this could consume quite a lot of resources before you get out the final result set.
SELECT TOP 1 #SearchID = SearchID
FROM tblSearches
WHERE SessionID = #SessionID
AND KeywordID = #KeywordID;
I don't see any obvious reason for this - either of aforementioned constructs should get you something semantically equivalent to this with a very cheap index lookup. Unless I'm missing something you should be able to do something like
select #SearchID = isnull (max (SearchID), -1)
from tblSearches
where SessionID = #SessionID
and KeywordID = #KeywordID
This ought to be fairly efficient and (unless I'm missing something) semantically equivalent.
Enable "Display Estimated Execution Plan" in SQL Management Studio - where does the execution plan show you spending the time? It'll guide you on the heuristics being used to optimize the query (or not in this case). Generally the "fatter" lines are the ones to focus on - they're ones generating large amounts of I/O.
Unfortunately even if you tell us the table schema, only you will be able to see actually how SQL chose to optimize the query. One last thing - have you got a clustered index on tblSearches?
Triggers!
They are insidious indeed.
What is the clustered index on tblSearches? If the clustered index is not on primary key, the database may be spending a lot of time reordering.
How many other indexes do you have?
Do you have any triggers?
Where does the execution plan indicate the time is being spent?