SELECT hangs when using a variable - sql

SQL Server 2014 (v13.0.4001.0) - this sample script hangs:
DECLARE #from int = 0
DECLARE #to int = 1000
select
*
from
TaskNote dtn
join
Participants tp on dtn.Task_ID = tp.TaskId
where
dtn.TaskNote_ID between #from and #to
But if I change variables to constants - it is all OK.
Like this:
where
dtn.DocTaskNote_ID between 0 and 1000
Also, if I remove the join, all is ok.
Can't figure out where the problem is

A possible cause for the problem you mention, in case your query lies within a stored procedure, is parameter sniffing. SQL Server compiles the query for the first time using the initial values of the parameters. In subsequent calls to the procedure the engine uses the cached execution plan which is probably not optimal for the current variable values.
One workaround this problem is to use OPTION (RECOMPILE):
select *
from TaskNote dtn
join Participants tp on dtn.Task_ID = tp.TaskId
where dtn.TaskNote_ID between #from and #to
option (recompile)
This way the query is being compiled every time the procedure is executed using the current parameters values.
Further reading:
Parameter Sniffing Problem and Possible Workarounds

Related

Big difference in Estimated and Actual rows when using a local variable

This is my first post on Stackoverflow so I hope I'm correctly following all protocols!
I'm struggling with a stored procedure in which I create a table variable and filling this table with an insert statement using an inner join. The insert itself is simple, but it gets complicated because the inner join is done on a local variable. Since the optimizer doesn't have statistics for this variable my estimated row count is getting srewed up.
The specific piece of code that causes trouble:
declare #minorderid int
select #minorderid = MIN(lo.order_id)
from [order] lo with(nolock)
where lo.order_datetime >= #datefrom
insert into #OrderTableLog_initial
(order_id, order_log_id, order_id, order_datetime, account_id, domain_id)
select ot.order_id, lol.order_log_id, ot.order_id, ot.order_datetime, ot.account_id, ot.domain_id
from [order] ot with(nolock)
inner join order_log lol with(nolock)
on ot.order_id = lol.order_id
and ot.order_datetime >= #datefrom
where (ot.domain_id in (1,2,4) and lol.order_log_id not in ( select order_log_id
from dbo.order_log_detail lld with(nolock)
where order_id >= #minorderid
)
or
(ot.domain_id = 3 and ot.order_id not IN (select order_id
from dbo.order_log_detail_spa llds with(nolock)
where order_id >= #minorderid
)
))
order by lol.order_id, lol.order_log_id
The #datefrom local variable is also declared earlier in the stored procedure:
declare #datefrom datetime
if datepart(hour,GETDATE()) between 4 and 9
begin
set #datefrom = '2011-01-01'
end
else
begin
set #datefrom = DATEADD(DAY,-2,GETDATE())
end
I've also tested this with a temporary table in stead of a table variable, but nothing changes. However, when I replace the local variable >= #datefrom with a fixed datestamp then my estimates and actuals are almost the same.
ot.order_datetime >= #datefrom = SQL Sentry Plan Explorer
ot.order_datetime >= '2017-05-03 18:00:00.000' = SQL Sentry Plan Explorer
I've come to understand that there's a way to fix this by turning this code into a dynamic sp, but I'm not sure how to do this. I would be grateful if someone could give me suggestions on how to do this. Maybe I have to use a complete other approach? Forgive me if I forgot something to mention, this is my first post.
EDIT:
MSSQL version = 11.0.5636
I've also tested with trace flag 2453, but with no success
Best regards,
Peter
Indeed, the behavior what you are experiencing is because the variables. SQL Server won't store an execution plan for each and every possible inputs, thus for some queries the execution plan may or may not optimal.
To answer your explicit question: You'll have to create a varchar variable and build the query as a string, then execute it.
Some notes before the actual code:
This can be prone to SQL injection (in general)
SQL Server will store the plans separately, meaning they will use more memory and possibly knock out other plans from the cache
Using an imaginary setup, this is what you want to do:
DECLARE #inputDate DATETIME2 = '2017-01-01 12:21:54';
DELCARE #dynamiSQL NVARCHAR(MAX) = CONCAT('SELECT col1, col2 FROM MyTable WHERE myDateColumn = ''', FORMAT(#inputDate, 'yyyy-MM-dd HH:mm:ss'), ''';');
INSERT INTO #myTableVar (col1, col2)
EXEC sp_executesql #stmt = #dynamicSQL;
As an additional note:
you can try to use EXISTS and NOT EXISTS instead of IN and NOT IN.
You can try to use a temp table (#myTempTable) instead of a local variable and put some indexes on it. Physical temp tables can perform better with large amount of data and you can put indexes on it. (For more info you can go here: What's the difference between a temp table and table variable in SQL Server? or to the official documentation)

SQL Server 2008 Stored proc - Optimizer thinks my parameter is nullable

Optimizer seems to be getting confused about the null-ability of a varchar parameter and I'm not sure I understand why. I'm using SQL Server 2008 btw. All columns being queried are indexed. The TDate column is a clustered, partitioned index. The FooValue column is indexed, non-nullable column.
Example:
CREATE PROCEDURE dbo.MyExample_sp #SDate DATETIME, #EDate DATETIME, #FooValue VARCHAR(50)
AS
SET NOCOUNT ON
--To avoid parameter spoofing / sniffing
DECLARE #sDate1 DATETIME, #eDate1 DATETIME
SET #sDate1 = #sDate
SET #eDate1 = #eDate
SELECT
fd.Col1,
fd.Col2,
fd.TDate,
fl.FooValue,
fd.AccountNum
FROM dbo.FooData fd
INNER JOIN dbo.FooLookup fl
ON fl.FL_ID = fd.FL_ID
WHERE fd.TDate >= #sDate1
AND fd.TDate < #eDate1
AND fl.FooValue = #FooValue
Running this as a query works as expected. All indexes are seeks, no spoofing etc. Running this by executing the sproc takes 20 times longer - same query - same parameters. However, if I make the following change (very last line) everything works again.
CREATE PROCEDURE dbo.MyExample_sp #SDate DATETIME, #EDate DATETIME, #FooValue VARCHAR(50)
AS
SET NOCOUNT ON
--To avoid parameter spoofing / sniffing
DECLARE #sDate1, #eDate1
SET #sDate1 = #sDate
SET #eDate1 = #eDate
SELECT
fd.Col1,
fd.Col2,
fd.TDate,
fl.FooValue,
fd.AccountNum
FROM dbo.FooData fd
INNER JOIN dbo.FooLookup fl
ON fl.FL_ID = fd.FL_ID
WHERE fd.TDate >= #sDate1
AND fd.TDate < #eDate1
AND fl.FooValue = ISNULL(#FooValue, 'testthis')
It's like the optimizer is getting confused about whether the parameter is nullable or not? Also, adding a default value to the parameter doesn't make any difference. It still takes forever for the sproc to run unless I use = isnull(#parameter, 'some constant')
I'm happy I figured this out. But, I'd like to understand why this is happening and if there was a more elegant way to resolve the issue.
Re: Nullable variables
There is no concept of nullable for variables in T-SQL, the way that you can define a variable as nullable in c# using the ?.
If you have a parameter in a stored procedure, the end user can pass whatever he or she wants into the stored procedure, be it a real value or a null.
Re: the query plan
The query plan that will get cached is the query plan that gets generated upon the first time you call this stored procedure.. so if you passed in a null for #FooValue the very first time you ran it, then it will be optimized for #FooValue = null.
There is an OPTIMIZE FOR hint that you can use to optimize the query for some other value:
Or you can use WITH RECOMPILE, which will force the query plan to get regenerated on every run of the stored procedure.
Obviously there are trade-offs when using these types of hints, so make sure you understand them before using them.

No query plan for procedure in SQL Server 2005

We have a SQL Server DB with 150-200 stored procs, all of which produce a viewable query plan in sys.dm_exec_query_plan except for one. According to http://msdn.microsoft.com/en-us/library/ms189747.aspx:
Under the following conditions, no Showplan output is returned in the query_plan column of the returned table for sys.dm_exec_query_plan:
If the query plan that is specified by using plan_handle has been evicted from the plan cache, the query_plan column of the returned table is null. For example, this condition may occur if there is a time delay between when the plan handle was captured and when it was used with sys.dm_exec_query_plan.
Some Transact-SQL statements are not cached, such as bulk operation statements or statements containing string literals larger than 8 KB in size. XML Showplans for such statements cannot be retrieved by using sys.dm_exec_query_plan unless the batch is currently executing because they do not exist in the cache.
If a Transact-SQL batch or stored procedure contains a call to a user-defined function or a call to dynamic SQL, for example using EXEC (string), the compiled XML Showplan for the user-defined function is not included in the table returned by sys.dm_exec_query_plan for the batch or stored procedure. Instead, you must make a separate call to sys.dm_exec_query_plan for the plan handle that corresponds to the user-defined function.
And later..
Due to a limitation in the number of nested levels allowed in the xml data type, sys.dm_exec_query_plan cannot return query plans that meet or exceed 128 levels of nested elements.
I'm confident that none of these apply to this procedure. The result never has a query plan, no matter what the timing, so 1 doesn't apply. There are no long string literals or bulk operations, so 2 doesn't apply. There are no user defined functions or dynamic SQL, so 3 doesn't apply. And there's little nesting, so the last doesn't apply. In fact, it's a very simple proc, which I'm including in full (with some table names changed to protect the innocent). Note that the parameter-sniffing shenanigans postdate the problem. It still happens even if I use the parameters directly in the query. Any ideas on why I don't have a viewable query plan for this proc?
ALTER PROCEDURE [dbo].[spGetThreadComments]
#threadId int,
#stateCutoff int = 80,
#origin varchar(255) = null,
#includeComments bit = 1,
#count int = 100000
AS
if (#count is null)
begin
select #count = 100000
end
-- copy parameters to local variables to avoid parameter sniffing
declare #threadIdL int, #stateCutoffL int, #originL varchar(255), #includeCommentsL bit, #countL int
select #threadIdL = #threadId, #stateCutoffL = #stateCutoff, #originL = #origin, #includeCommentsL = #includeComments, #countL = #count
set rowcount #countL
if (#originL = 'Foo')
begin
select * from FooComments (nolock) where threadId = #threadId and statusCode <= #stateCutoff
order by isnull(parentCommentId, commentId), dateCreated
end
else
begin
if (#includeCommentsL = 1)
begin
select * from Comments (nolock)
where threadId = #threadIdL and statusCode <= #stateCutoffL
order by isnull(parentCommentId, commentId), dateCreated
end
else
begin
select userId, commentId from Comments (nolock)
where threadId = #threadIdL and statusCode <= #stateCutoffL
order by isnull(parentCommentId, commentId), dateCreated
end
end
Hmm, perhaps the tables aren't really tables. They could be views or something else.
try putting dbo. or whatever the schema is in front of all of the table names, and then check again.
see this article:
http://www.sommarskog.se/dyn-search-2005.html
quote from the article:
As you can see, I refer to all tables
in two-part notation. That is, I also
specify the schema (which in SQL
7/2000 parlance normally is referred
to as owner.) If I would leave out the
schema, each user would get his own
his own private version of the query
plan

Forcing a SQL Remote Query to filter remotely instead of locally

I have a MS SQL Query that is pulling data via from a remote server. The data that I'm pulling down needs to be filtered by a date that is determined at run time.. When I run the query like this:
SELECT * FROM SERVER.Database.dbo.RemoteView
WHERE EntryDate > '1/1/2009'
then the filter is applied remotely... However, I don't actually want to use '1/1/2009' as the date - I want the date to be supplied by a user-defined function, like this:
SELECT * FROM SERVER.Database.dbo.RemoteView
WHERE EntryDate > dbo.MyCustomCLRDateFunction()
where the function is a custom CLR scalar-valued function that returns a date time... (You may ask why I need to do this... the details are a bit complicated, so just trust me - I have to do it this way.)
When I run this query, the remote query is NOT filtered remotely - the filtering is done after all of the data is pulled down (400,000 rows vs 100,000 rows) and it makes a significant difference.
Is there a way that I can force the query to do the filtering remotely?
Thanks!
You could also construct a string and use an openquery ...
set #sqlString =
' select into myTable from openquery
(remoteServer,
"SELECT * FROM Database.dbo.RemoteView WHERE EntryDate > %DTSTART"
)
'
set #sqlString =
replace(#sqlString, '%DTSTART',
(select cast(dbo.MyCustomCLRDateFunction() as char(8))
)
EXECUTE sp_executesql #stmt=#sqlString
Can't you just send a query like this, or does the clr function have to actually be called inside the select statement?
Declare #datetime datetime
Set #datetime = dbo.MyCustomCLRDateFunction()
SELECT * FROM SERVER.Database.dbo.RemoteView
WHERE EntryDate > #datetime
You need to properly decorate your CLR function to mark it as Deterministic, Precise and Data Access/System Data Access as DataAccessKind.None.

Different execution plan when executing statement directly and from stored procedure

While developing a new query at work I wrote it and profiled it in SQL Query Analyzer. The query was performing really good without any table scans but when I encapsulated it within a stored procedure the performance was horrible. When I looked at the execution plan I could see that SQL Server picked a different plan that used a table scan instead of an index seek on TableB (I've been forced to obfuscate the table and column names a bit but none of the query logic has changed).
Here's the query
SELECT
DATEADD(dd, 0, DATEDIFF(dd, 0, TableA.Created)) AS Day,
DATEPART(hh, TableA.Created) AS [Hour],
SUM(TableB.Quantity) AS Quantity,
SUM(TableB.Amount) AS Amount
FROM
TableA
INNER JOIN TableB ON TableA.BID = TableB.ID
WHERE
(TableA.ShopId = #ShopId)
GROUP BY
DATEADD(dd, 0, DATEDIFF(dd, 0, TableA.Created)),
DATEPART(hh, TableA.Created)
ORDER BY
DATEPART(hh, TableA.Created)
When I run the query "raw" I get the following trace stats
Event Class Duration CPU Reads Writes
SQL:StmtCompleted 75 41 7 0
And when I run the query as a stored proc using the following command
DECLARE #ShopId int
SELECT #ShopId = 1
EXEC spStats_GetSalesStatsByHour #ShopId
I get the following trace stats
Event Class Duration CPU Reads Writes
SQL:StmtCompleted 222 10 48 0
I also get the same result if I store the query in an nvarchar and execute it using sp_executesql like this (it performs like the sproc)
DECLARE #SQL nvarchar(2000)
SET #SQL = 'SELECT DATEADD(dd, ...'
exec sp_executesql #SQL
The stored procedure does not contain anything except for the select statement above. What would cause sql server to pick an inferior execution plan just because the statement is executed as a stored procedure?
We're currently running on SQL Server 2000
This generally has something to do with parameter sniffing. It can be very frustrating to deal with. Sometimes it can be solved by recompiling the stored procedure, and sometimes you can even use a duplicate variable inside the stored procedure like this:
alter procedure p_myproc (#p1 int) as
declare #p1_copy int;
set #p1_copy = #p1;
And then use #p1_copy in the query. Seems ridiculous but it works.
Check my recent question on the same topic:
Why does the SqlServer optimizer get so confused with parameters?
Yes -- I had seen this on Oracle DB 11g as well -- same query ran fast on 2 nodes of db server at SQL prompt BUT when called from package it literally hung up!
had to clear the shared pool to get identical behaviour: reason some job/script was running that had older copy locked in library cache/memory on one node with inferior execution plan.