SARGable query where parameter is NULL - sql

I have the following stored procedure
CREATE PROCEDURE my_proc
(
#Person VARCHAR(50)
)
AS
SELECT updatedBy
FROM MyTable t
WHERE updatedBy = #Person AND updatedDate > DATEADD(day,-1,GETDATE()) OR
#CompletedBy IS NULL
The last line (OR #CompletedBy IS NULL) ensures that if no parameter is passed, all records will be displayed. However, looking at the execution plan, this is causing a clustered index scan when calling the procedure even when supplying a parameter.
If I take out the OR, the query uses an index seek.
Is there a way I can leave in the catch all behaviour and not cause an index scan?

This is an advantage of a stored procedure. You can build two queries, and use an If block to decide which one to run.
Alternatively, you could build this as dynamic SQL that only includes the WHERE clause elements you actually need, then use sp_executesql to run the result.

Related

Update from stored procedure return

I have a stored procedure that I want to run on every row in a table that matches a where clause, the procedure already exists on the server and is used in other places so it cannot be modified for these changes.
The stored procedure returns a scalar value, I need to store this value in a column in the table, I've tried using the update:
UPDATE tbl SET tbl.Quantity =
EXEC checkQuantity #ProductID = tbl.ProductID, #Quantity = tbl.Quantity
FROM orders tbl WHERE orderNumber = #orderNumber
But this of course doesn't work, is there a way to do this without multiple queries, reading the line info, running the proc in a loop then updating the original line?
No there is no way to do this without multiple queries. This is one of the few scenarios where a cursor or loop is necessary.
Unless you can replace your stored procedure with a user-defined function, which can be run in the context of a single query.

Stored procedure execution taking long because of function used inside

In SQL Server 2012 I have the following user defined function:
CREATE FUNCTION [dbo].[udfMaxDateTime]()
RETURNS datetime
AS
BEGIN
RETURN '99991231';
END;
This is then being used in a stored procedure like so:
DECLARE #MaxDate datetime = dbo.udfMaxDateTime();
DELETE FROM TABLE_NAME
WHERE
ValidTo = #MaxDate
AND
Id NOT IN
(
SELECT
MAX(Id)
FROM
TABLE_NAME
WHERE
ValidTo = #MaxDate
GROUP
BY
COL1
);
Now, if I run the stored procedure with the above code, it takes around 12 seconds to execute. (1,2 million rows)
If I change the WHERE clauses to ValidTo = '99991231' then, the stored procedure runs in under 1 second and it runs in Parallel.
Could anyone try and explain why this is happening ?
It is not because of the user-defined function, it is because of the variable.
When you use a variable #MaxDate in the DELETE query optimizer doesn't know the value of this variable when generating the execution plan. So, it generates a plan based on available statistics on the ValidTo column and some built-in heuristics rules for cardinality estimates when you have an equality comparison in a query.
When you use a literal constant in the query the optimizer knows its value and can generate a more efficient plan.
If you add OPTION(RECOMPILE) the execution plan would not be cached and would be always regenerated and all parameter values would be known to the optimizer. It is quite likely that query will run fast with this option. This option does add a certain overhead, but it is noticeable only when you run a query very often.
DECLARE #MaxDate datetime = dbo.udfMaxDateTime();
DELETE FROM TABLE_NAME
WHERE
ValidTo = #MaxDate
AND
Id NOT IN
(
SELECT
MAX(Id)
FROM
TABLE_NAME
WHERE
ValidTo = #MaxDate
GROUP BY
COL1
)
OPTION(RECOMPILE);
I highly recommend to read Slow in the Application, Fast in SSMS by Erland Sommarskog.

Does assigning stored procedure input parameters to local variables help optimize the query?

I have a stored procedure that takes 5 input parameters. The procedure is a bit complicated and takes around 2 minutes to execute. I am in process of optimizing query.
So, my question is, does it always help to assign input parameters to local variables and then use local variables in the procedure?
If so, how does it help?
I will not try and explain the full details of parameter sniffing, but in short, no it does not always help (and it can hinder).
Imagine a table (T) with a primary key and an indexed Date column (A), in the table there are 1,000 rows, 400 have the same value of A (lets say today 20130122), the remaining 600 rows are the next 600 days, so only 1 record per date.
This query:
SELECT *
FROM T
WHERE A = '20130122';
Will yield a different execution plan to:
SELECT *
FROM T
WHERE A = '20130123';
Since the statistics will indicate that for the first 400 out of 1,000 rows will be returned the optimiser should recognise that a table scan will be more efficient than a bookmark lookup, whereas the second will only yield 1 rows, so a bookmark lookup will be much more efficient.
Now, back to your question, if we made this a procedure:
CREATE PROCEDURE dbo.GetFromT #Param DATE
AS
SELECT *
FROM T
WHERE A = #Param
Then run
EXECUTE dbo.GetFromT '20130122'; --400 rows
The query plan with the table scan will be used, if the first time you run it you use '20130123' as a parameter it will store the bookmark lookup plan. Until such times as the procedure is recompiled the plan will remain the same. Doing something like this:
CREATE PROCEDURE dbo.GetFromT #Param VARCHAR(5)
AS
DECLARE #Param2 VARCHAR(5) = #Param;
SELECT *
FROM T
WHERE A = #Param2
Then this is run:
EXECUTE dbo.GetFromT '20130122';
While the procedure is compiled in one go, it does not flow properly, so the query plan created at the first compilation has no idea that #Param2 will become the same as #param, so the optimiser (with no knowledge of how many rows to expect) will assume 300 will be returned (30%), as such will deem a table scan more efficient that a bookmark lookup. If you ran the same procedure with '20130123' as a parameter it would yield the same plan (regardless of what parameter it was first invoked with) because the statistics cannot be used for an unkonwn value. So running this procedure for '20130122' would be more efficient, but for all other values would be less efficient than without local parameters (assuming the procedure without local parameters was first invoked with anything but '20130122')
Some queries to demonstate so you can view execution plans for yourself
Create schema and sample data
CREATE TABLE T (ID INT IDENTITY(1, 1) PRIMARY KEY, A DATE NOT NULL, B INT,C INT, D INT, E INT);
CREATE NONCLUSTERED INDEX IX_T ON T (A);
INSERT T (A, B, C, D, E)
SELECT TOP 400 CAST('20130122' AS DATE), number, 2, 3, 4
FROM Master..spt_values
WHERE type = 'P'
UNION ALL
SELECT TOP 600 DATEADD(DAY, number, CAST('20130122' AS DATE)), number, 2, 3, 4
FROM Master..spt_values
WHERE Type = 'P';
GO
CREATE PROCEDURE dbo.GetFromT #Param DATE
AS
SELECT *
FROM T
WHERE A = #Param
GO
CREATE PROCEDURE dbo.GetFromT2 #Param DATE
AS
DECLARE #Param2 DATE = #Param;
SELECT *
FROM T
WHERE A = #Param2
GO
Run procedures (showing actual execution plan):
EXECUTE GetFromT '20130122';
EXECUTE GetFromT '20130123';
EXECUTE GetFromT2 '20130122';
EXECUTE GetFromT2 '20130123';
GO
EXECUTE SP_RECOMPILE GetFromT;
EXECUTE SP_RECOMPILE GetFromT2;
GO
EXECUTE GetFromT '20130123';
EXECUTE GetFromT '20130122';
EXECUTE GetFromT2 '20130123';
EXECUTE GetFromT2 '20130122';
You will see that the first time GetFromT is compiled it uses a table scan, and retains this when run with the parameter '20130122', GetFromT2 also uses a table scan and retains the plan for '20130122'.
After the procedures have been set for recompilation and run again (note in a different order), GetFromT uses a bookmark loopup, and retains the plan for '20130122', despite having previously deemed that an table scan is a more approprate plan. GetFromT2 is unaffected by the order and has the same plan as before the recompliateion.
So, in summary, it depends on the distribution of your data, and your indexes, your frequency of recompilation, and a bit of luck as to whether a procedure will benefit from using local variables. It certainly does not always help.
Hopefully I have shed some light on the effect of using local parameters, execution plans and stored procedure complilation. If I have failed completely, or missed a key point a much more in depth explanation can be found here:
http://www.sommarskog.se/query-plan-mysteries.html
I don't believe so. Modern computer architectures have plenty of cache close to the processor for putting in stored procedure values. Essentially, you can consider these as being on a "stack" which gets loaded into local cache memory.
If you have output parameters, then possibly copying input values to a local variable would eliminate one step of indirection. However, the first time that indirection is executed, the destination memory will be put in the local cache and it will probably remain there.
So, no, I don't think this is an important optimization.
But, you could always time different variants of a stored procedure to see if this would help.
It does help.
Below links contain more details about parameter sniffing.
http://blogs.msdn.com/b/turgays/archive/2013/09/10/parameter-sniffing-problem-and-workarounds.aspx
http://sqlperformance.com/2013/08/t-sql-queries/parameter-sniffing-embedding-and-the-recompile-options
When you execute a SP with parameters for the first time, query optimizer creates the query plan based on the value of the parameter.
Query optimizer uses statistics data for that particular value to decide the best query plan. But cardinality issues can affect this. Which means if you execute the same SP with different parameter value that previously generated query plan may not be the best plan.
By assigning parameters to local variables we hide the parameter values from query optimizer. So it creates the query plan for general case.
this is same as using "OPTIMIZE FOR UNKNOWN" hint in the SP.

Optional parameters, "index seek" plan

In my SELECT statement i use optional parameters in a way like this:
DECLARE #p1 INT = 1
DECLARE #p2 INT = 1
SELECT name FROM some_table WHERE (id = #p1 OR #p1 IS NULL) AND (name = #p2 OR #p2 IS NULL)
In this case the optimizer generates "index scan" (not seek) operations for the entity which is not most effective when parameters are supplied with not null values.
If i add the RECOMPILE hint to the query the optimizer builds more effective plan which uses "seek". It works on my MSSQL 2008 R2 SP1 server and it also means that the optimizer CAN build a plan which consider only one logic branch of my query.
How can i make it to use that plan everywhere i want with no recompiling? The USE PLAN hint seemes not to work in this case.
Below is test code:
-- see plans
CREATE TABLE test_table(
id INT IDENTITY(1,1) NOT NULL,
name varchar(10),
CONSTRAINT [pk_test_table] PRIMARY KEY CLUSTERED (id ASC))
GO
INSERT INTO test_table(name) VALUES ('a'),('b'),('c')
GO
DECLARE #p INT = 1
SELECT name FROM test_table WHERE id = #p OR #p IS NULL
SELECT name FROM test_table WHERE id = #p OR #p IS NULL OPTION(RECOMPILE)
GO
DROP TABLE test_table
GO
Note that not all versions of SQL server will change the plan the way i shown.
The reason you get a scan is because the predicate will not short-circuit and both statements will always be evaluated. As you have already stated it will not work well with the optimizer and force a scan. Even though with recompile appears to help sometimes, it's not consistent.
If you have a large table where seeks are a must then you have two options:
Dynamic sql.
If statements separating your queries and thus creating separate execution plans (when #p is null you will of course always get a scan).
Response to Comment on Andreas' Answer
The problem is that you need two different plans.
If #p1 = 1 then you can use a SEEK on the index.
If #p1 IS NULL, however, it is not a seek, by definition it's a SCAN.
This means that when the optimiser is generating a plan Prior to knowledge of the parameters, it needs to create a plan that can fullfil all possibilities. Only a Scan can cover the needs of Both #p1 = 1 And #p1 IS NULL.
It also means that if the plan is recompiled at the time when the parameters are known, and #p1 = 1, a SEEK plan can be created.
This is the reason that, as you mention in your comment, IF statements resolve your problem; Each IF block represents a different portion of the problem space, and each can be given a different execution plan.
See Dynamic Search Conditions in T-SQL.
This explains comprehensively the versions where the RECOMPILE option works and alternatives where it doesn't.
Look at this article http://www.bigresource.com/Tracker/Track-ms_sql-fTP7dh01/
It seems that you can try to use proposal solution:
`SELECT * FROM <table> WHERE IsNull(column, -1) = IsNull(#value, -1)`
or
`SELECT * FROM <table> WHERE COALESCE(column, -1) = COALESCE(#value, -1)`

No query plan for procedure in SQL Server 2005

We have a SQL Server DB with 150-200 stored procs, all of which produce a viewable query plan in sys.dm_exec_query_plan except for one. According to http://msdn.microsoft.com/en-us/library/ms189747.aspx:
Under the following conditions, no Showplan output is returned in the query_plan column of the returned table for sys.dm_exec_query_plan:
If the query plan that is specified by using plan_handle has been evicted from the plan cache, the query_plan column of the returned table is null. For example, this condition may occur if there is a time delay between when the plan handle was captured and when it was used with sys.dm_exec_query_plan.
Some Transact-SQL statements are not cached, such as bulk operation statements or statements containing string literals larger than 8 KB in size. XML Showplans for such statements cannot be retrieved by using sys.dm_exec_query_plan unless the batch is currently executing because they do not exist in the cache.
If a Transact-SQL batch or stored procedure contains a call to a user-defined function or a call to dynamic SQL, for example using EXEC (string), the compiled XML Showplan for the user-defined function is not included in the table returned by sys.dm_exec_query_plan for the batch or stored procedure. Instead, you must make a separate call to sys.dm_exec_query_plan for the plan handle that corresponds to the user-defined function.
And later..
Due to a limitation in the number of nested levels allowed in the xml data type, sys.dm_exec_query_plan cannot return query plans that meet or exceed 128 levels of nested elements.
I'm confident that none of these apply to this procedure. The result never has a query plan, no matter what the timing, so 1 doesn't apply. There are no long string literals or bulk operations, so 2 doesn't apply. There are no user defined functions or dynamic SQL, so 3 doesn't apply. And there's little nesting, so the last doesn't apply. In fact, it's a very simple proc, which I'm including in full (with some table names changed to protect the innocent). Note that the parameter-sniffing shenanigans postdate the problem. It still happens even if I use the parameters directly in the query. Any ideas on why I don't have a viewable query plan for this proc?
ALTER PROCEDURE [dbo].[spGetThreadComments]
#threadId int,
#stateCutoff int = 80,
#origin varchar(255) = null,
#includeComments bit = 1,
#count int = 100000
AS
if (#count is null)
begin
select #count = 100000
end
-- copy parameters to local variables to avoid parameter sniffing
declare #threadIdL int, #stateCutoffL int, #originL varchar(255), #includeCommentsL bit, #countL int
select #threadIdL = #threadId, #stateCutoffL = #stateCutoff, #originL = #origin, #includeCommentsL = #includeComments, #countL = #count
set rowcount #countL
if (#originL = 'Foo')
begin
select * from FooComments (nolock) where threadId = #threadId and statusCode <= #stateCutoff
order by isnull(parentCommentId, commentId), dateCreated
end
else
begin
if (#includeCommentsL = 1)
begin
select * from Comments (nolock)
where threadId = #threadIdL and statusCode <= #stateCutoffL
order by isnull(parentCommentId, commentId), dateCreated
end
else
begin
select userId, commentId from Comments (nolock)
where threadId = #threadIdL and statusCode <= #stateCutoffL
order by isnull(parentCommentId, commentId), dateCreated
end
end
Hmm, perhaps the tables aren't really tables. They could be views or something else.
try putting dbo. or whatever the schema is in front of all of the table names, and then check again.
see this article:
http://www.sommarskog.se/dyn-search-2005.html
quote from the article:
As you can see, I refer to all tables
in two-part notation. That is, I also
specify the schema (which in SQL
7/2000 parlance normally is referred
to as owner.) If I would leave out the
schema, each user would get his own
his own private version of the query
plan