We're seeing strange behavior when running two versions of a query on SQL Server 2005:
version A:
SELECT otherattributes.* FROM listcontacts JOIN otherattributes
ON listcontacts.contactId = otherattributes.contactId WHERE listcontacts.listid = 1234
ORDER BY name ASC
version B:
DECLARE #Id AS INT;
SET #Id = 1234;
SELECT otherattributes.* FROM listcontacts JOIN otherattributes
ON listcontacts.contactId = otherattributes.contactId
WHERE listcontacts.listid = #Id
ORDER BY name ASC
Both queries return 1000 rows; version A takes on average 15s; version B on average takes 4s.
Could anyone help us understand the difference in execution times of these two versions of SQL?
If we invoke this query via named parameters using NHibernate, we see the following query via SQL Server profiler:
EXEC sp_executesql N'SELECT otherattributes.* FROM listcontacts JOIN otherattributes ON listcontacts.contactId = otherattributes.contactId WHERE listcontacts.listid = #id ORDER BY name ASC',
N'#id INT',
#id=1234;
...and this tends to perform as badly as version A.
Try take a look at the execution plan for your query. This should give you some more explanation on how your query is executed.
I've not seen the execution plans, but I strongly suspect that they are different in these two cases. The issue that you are having is that in case A (the faster query) the optimiser knows the value that you are using for the list id (1234) and using a combination of the distribution statistics and the indexes chooses an optimal plan.
In the second case, the optimiser is not able to sniff the value of the ID and so produces a plan that would be acceptable for any passed in list id. And where I say acceptable I do not mean optimal.
So what can you do to improve the scenario? There are a couple of alternatives here:
1) Create a stored procedure to perform the query as below:
CREATE PROCEDURE Foo
#Id INT
AS
SELECT otherattributes.* FROM listcontacts JOIN otherattributes
ON listcontacts.contactId = otherattributes.contactId WHERE listcontacts.listid = #Id
ORDER BY name ASC
GO
This will allow the optimiser to sniff the value of the input parameter when passed in and produce an appropriate execution plan for the first execution. Unfortunately it will cache that plan for reuse later so unless the you generally call the sproc with similarly selective values this may not help you too much
2) Create a stored procedure as above, but specify it to be WITH RECOMPILE. This will ensure that the stored procedure is recompiled each time it is executed and hence produce a new plan optimised for this input value
3) Add OPTION (RECOMPILE) to the end of the SQL Statement. Forces recompilation of this statement, and is able to optimise for the input value
4) Add OPTION (OPTIMIZE FOR (#Id = 1234)) to the end of the SQL statement. This will cause the plan that gets cached to be optimised for this specific input value. Great if this is a highly common value, or most common values are similarly selective, but not so great if the distribution of selectivity is more widely spread.
It's possible that instead of casting 1234 to be the same type as listcontacts.listid and then doing the comparison with each row, it might be casting the value in each row to be the same as 1234. The first requires just one cast, the second needs a cast per row (and that's probably on far more than 1000 rows, it may be for every row in the table). I'm not sure what type that constant will be interpreted as but it may be 'numeric' rather than 'int'.
If this is the cause, the second version is faster because it's forcing 1234 to be interpreted as an int and thus removing the need to cast the value in every row.
However, as the previous poster suggests, the query plan shown in SQL Server Management Studio may indicate an alternative explanation.
The best way to see what is happening is to compare the execution plans, everything else is speculation based on the limited details presented in the question.
To see the execution plan, go into SQL Server Management Studio and run SET SHOWPLAN_XML ON then run query version A, the query will not run but the execution plan will be displayed in XML. Then run query version B and see its execution plan. If you still can't tell the difference or solve the problem, post both execution plans and someone here will explain it.
Related
I am working with MS SQL Server 2008 R2. I have a stored procedure named rpt_getWeeklyScheduleData. This is the query I used to look up its execution plan in a specific database:
select
*
from
sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
where
OBJECT_NAME(st.objectid, st.dbid) = 'rpt_getWeeklyScheduleData' and
st.dbid = DB_ID()
The above query returns me 9 rows. I was expecting 1 row.
This stored procedure has been modified multiple times so I believe SQL Server has been building a new execution plan for it whenever it was modified and run. Is it correct explanation? If not then how can you explain this?
Also is it possible to see when each plan was created? If yes then how?
UPDATE:
This is the stored proc's signature:
CREATE procedure [dbo].[rpt_getWeeklyScheduleData]
(
#a_paaipk int,
#a_location_code int,
#a_department_code int,
#a_week_start_date varchar(12),
#a_week_end_date varchar(12),
#a_language_code int,
#a_flag int
)
as
begin
...
end
The stored proc is long; has only 2 if conditions both for #a_flag parameter.
if #a_flag = 0
begin
...
end
if #a_flag = 1
begin
...
end
Depending on the nature of the stored procedure (which wasn't provided) this is very possible for any number of reasons (most likely not limited to below):
Does the proc use a lot of if this then this select, else this select/update
Does the proc contain dynamic sql?
Are you executing the SP from both web and SSMS? Then you're likely executing the SP with different connection settings.
Does the stored proc have parameters? Sometimes a difference in parameters can cause one execution plan to be terrible for a specific set, so a different plan is used.
Going to try an analogy which might help... maybe...
Say you have a stored procedure for your weekend shopping.
You typically need to get groceries, sometimes an air filter, and even less often a big pack of something that needs replacing 4 times a year.
The grocery store can handle groceries, and is the closest to your house (5 minutes).
Target can handle the air filter and groceries, but add 25 minutes travel time.
"Big place of everything" has everything you'd possibly need, but is an hours drive away.
So here, depending on your parameters #needsAirFilter and #needsBigPackOfSomething could vastly change your "execution plan" of your stored procedure of "shopping".
If #needsAirFilter and #needsBigPackOfSomething is false, there's no reason to make the 30 minute or hour drive, as everything you need is at the grocery store.
One a month, #needsAirFilter is true, in that case we need to go to Target, as the grocery store's execution plan is insufficient.
4 times a year #needsBigPackOfSomething is true, and we need to make the hour drive to get the big pack of something, while grabbing groceries, and airfilter since we're there.
Sure... we could make the hour drive every time to get groceries, and the other things when needed (imagine single execution plan). But this is in no way the most efficient way to do it. In instances like this, we have different execution plans for what information/goods are actually needed.
No idea if that helps... but I had fun :D
Typically SQL Server will generate a new query plan depending on the values of the parameters being passed in (this can determine what indexes, if any, it will use) and if indexes are added, changed or updated (on the tables/views being used in the proc) so SQL Server may decide that it is more effective to use one or more indexes that it previously ignored. The more involved the SQL in the proc will also kick off more work on SQL Server side as it attempts to optimize the query. If the data changes (suddenly you have many more customers in NJ and there is a query and index for states) it may decide that its going to use that index and the query plan is changed. If any of the tables or views involved in the query change (schema change) will also invalidate an existing plan and result in a new plan being generated.
Suppose we have a poorly performing stored procedure with 6 parameters. If one of the six parameters is transferred to a local variable within the stored procedure, is that enough to disable parameter sniffing or is it necessary to transfer all 6 parameters that're passed to the stored procedure into local variables within the stored procedure?
Per Paul White's comment, assigning a variable to a local variable is a workaround from older versions of SQL Server. It won't help with sp_executesql, and Microsoft could write a smarter parser that would invalidate this workaround. The workaround works by confusing the parser about a parameter's value, so in order for it to work for each parameter, you'd have to store each parameter in a local variable.
More recent versions of SQL Server have better solutions. For an expensive query that is not run often, I'd use option (recompile). For example:
SELECT *
FROM YourTable
WHERE col1 = #par1 AND col2 = #par2 AND ...
OPTION (RECOMPILE)
This will cause the query planner to recreate ("recompile") a plan every time the stored procedure is called. Given the low cost of planning (typically below 25ms) that is sensible behavior for expensive queries. It's worth 25ms to check if you can create a smarter plan for specific parameters to a 250ms query.
If your query is run so often that the cost of planning is nontrivial, you can use option (optimize for unknown). That will cause SQL Server to create a plan that it expects to work well for all values of all parameters. When you specify this option, SQL Server ignores the first values of the parameters, so this literally prevents sniffing.
SELECT *
FROM YourTable
WHERE col1 = #par1 AND col2 = #par2 AND ...
OPTION (OPTIMIZE FOR UNKNOWN)
This variant works for all parameters. You can use optimize for (#par1 unknown) to prevent sniffing for just one parameter.
I got tired off for last two days, and now I find a solution. If anyone get relief, so I posted my experience here.
I have a query, fairly complex, having 5 CTEs and an union already known for parameter sniffing from its design. We opted OPTION RECOMPILE to solve it, and it works fairly good. After 2 years, we create a high available cluster and separate the report server. All works well for 1 year and now cause of Covid19 we need to shut down for 30 days. Server is on but all activity goes quite. Meanwhile we need to truncate the database because of extreme log size and data growth, taking out of available groups and re adding to availability. From last two date this query shows parameter sniffing activity, and no remedy works except one.
The silver bullet saves me is EXEC sp_updatestats.
This works for me, and now I have time to find a proper solution for permanent fix.
I have a sql query, the exact code of which is generated in C#, and passed through ADO.Net as a text-based SqlCommand.
The query looks something like this:
SELECT TOP (#n)
a.ID,
a.Event_Type_ID as EventType,
a.Date_Created,
a.Meta_Data
FROM net.Activity a
LEFT JOIN net.vu_Network_Activity na WITH (NOEXPAND)
ON na.Member_ID = #memberId AND na.Activity_ID = a.ID
LEFT JOIN net.Member_Activity_Xref ma
ON ma.Member_ID = #memberId AND ma.Activity_ID = a.ID
WHERE
a.ID < #LatestId
AND (
Event_Type_ID IN(1,2,3))
OR
(
(na.Activity_ID IS NOT NULL OR ma.Activity_ID IS NOT NULL)
AND
Event_Type_ID IN(4,5,6)
)
)
ORDER BY a.ID DESC
This query has been working well for quite some time. It takes advantage of some indexes we have on these tables.
In any event, all of a sudden this query started running really slow, but ran almost instantaneously in SSMS.
Eventually, after reading several resources, I was able to verify that the slowdown we were getting was from poor parameter sniffing.
By copying all of the parameters to local variables, I was able to successfully reduce the problem. The thing is, this just feels like all kind of wrong to me.
I'm assuming that what happened was the statistics of one of these tables was updated, and then by some crappy luck, the very first time this query was recompiled, it was called with parameter values that cause the execution plan to differ?
I was able to track down the query in the Activity Monitor, and the execution plan resulting in the query to run in ~13 seconds was:
Running in SSMS results in the following execution plan (and only takes ~100ms):
So what is the question?
I guess my question is this: How can I fix this problem, without copying the parameters to local variables, which could lead to a large number of cached execution plans?
Quote from the linked comment / Jes Borland:
You can use local variables in stored procedures to “avoid” parameter sniffing. Understand, though, that this can lead to many plans stored in the cache. That can have its own performance implications. There isn’t a one-size-fits-all solution to the problem!
My thinking is that if there is some way for me to manually remove the current execution plan from the temp db, that might just be good enough... but everything I have found online only shows me how to do this for an actual named stored procedure.
This is a text-based SqlCommand coming from C#, so I do not know how to find the cached execution plan, with the sniffed parameter values, and remove it?
Note: the somewhat obvious solution of "just create a proper stored procedure" is difficult to do because this query can get generated in a number of different ways... and would require a somewhat unpleasant refactor.
If you want to remove a specific plan from the cache then it is really a two step process: first obtain the plan handle for that specific plan; and then use DBCC FREEPROCCACHE to remove that plan from the cache.
To get the plan handle, you need to look in the execution plan cache. The T-SQL below is an example of how you could search for the plan and get the handle (you may need to play with the filter clause a bit to hone in on your particular plan):
SELECT top (10)
qs.last_execution_time,
qs.creation_time,
cp.objtype,
SUBSTRING(qt.[text], qs.statement_start_offset/2, (
CASE
WHEN qs.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.[text])) * 2
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2 + 1
) AS query_text,
qt.text as full_query_text,
tp.query_plan,
qs.sql_handle,
qs.plan_handle
FROM
sys.dm_exec_query_stats qs
LEFT JOIN sys.dm_exec_cached_plans cp ON cp.plan_handle=qs.plan_handle
CROSS APPLY sys.dm_exec_sql_text (qs.[sql_handle]) AS qt
OUTER APPLY sys.dm_exec_query_plan(qs.plan_handle) tp
WHERE qt.text like '%vu_Network_Activity%'
Once you have the plan handle, call DBCC FREEPROCCACHE as below:
DBCC FREEPROCCACHE(<plan_handle>)
There are many ways to delete/invalidate a query plan:
DBCC FREEPROCCACHE(plan_handle)
or
EXEC sp_recompile 'net.Activity'
or
adding OPTION (RECOMPILE) query hint at the end of your query
or
using optimize for ad hoc workloads server settings
or
updating statistics
If you have a crappy product from a crappy vendor, the best way to handle parameter sniffing is to create you own plan using EXEC sp_create_plan_guide/
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
This system was built years ago in SQL7 and currently runs in SQL2k5
We have a table tProducts with a proudctid Guid as the PK/Clustered Index. I realize for optimum performance we should modify the table but that's not a possibility at this point. I'm joining it to the tProductSpecial table which also has a PK/Clustered Index of productid, which is also the FK in this relationship. We have about 50k records in the tProducts table and about 35k records in the tProductSpecial table (some products have special information and some do not). There is one more piece. I am using a Temporary table in the sproc to grab the logged in users security roles and loading them, this also joins to the tProducts table, roleid is non-clustered index in tProducts. I've included some of the WHERE conditions that access these table.
SELECT *
FROM tProducts
JOIN tProductSpecial ON tProducts.productid=tProductSpecial.productid
JOIN #tRoles ON tProducts.roleid=#tRoles.roleid
WHERE
(tProducts.productSKU = #sku AND tProducts.productStatus=1) --DIRECT MATCH
OR
( -- KEYWORD SEARCH
CONTAINS(tProducts.*,'FORMSOF(INFLECTIONAL,''' + #lookuptext + ''')')
AND
(
#productStatus IS NULL
OR
(
#productStatus IS NOT NULL
AND
tProducts.productStatus = #productStatus
)
)
AND
( --- item on sale
#bOnSale IS NULL
OR #bOnSale=0
OR
(
#bOnSale = 1
AND tProducts.productOnSale=1
)
)
AND
( -- from price
#from=0
OR #from IS NULL
OR
(
#from<>0
AND
tProducts.customerCost>=#from
)
)
AND
( --to price
#to=0
OR #to IS NULL
OR
(
#to<>0
AND
tProducts.customerCost<=#from
)
)
AND
( --how old is product
#age IS NULL
OR #age = 0
OR
(
#age IS NOT NULL
AND #age > 0
AND DATEDIFF(day,tProducts.productCreated,GETDATE())
<=CONVERT(varchar(10),#age)
)
)
ORDER BY tProducts.productSKU
The problem is probably due to parameter sniffing - you're using the same plan over and over again even if the parameters are different and only sometimes a scan would be the best approach. In SQL Server 2008, you could simply add OPTION (RECOMPILE) or OPTIMIZE FOR UNKNOWN, but in SQL Server 2005, try altering the stored procedure and adding the WITH RECOMPILE option. This will force SQL Server to consider a new plan every time, based on the incoming parameters.
Another option would be to build up the query dynamically based on whether #bOnSale, #from etc. are populated. In 2005 this will lead to plan cache bloat but you may be better off overall. This could completely avoid the Full-Text access, for example, when #sku is populated. Again, in SQL Server 2008 this is better, as you can stave off some of the plan cache bloat by using Optimize for ad hoc workloads.
A stored procedure of long standing can suddenly develop a case of the poor plan for all sorts of reasons. Just off the top of my head here are a few that stand out:
Index Changes
Dropping/Modifying indices on tables can cause queries that have had a perfectly fine execution plan for years to go south. Solution: don't do that. If you are going to do that, examine the dependencies on that table and make sure that everything is still getting a good execution plan.
Table Statistics
Index statistics can get fouled up for a number of reasons. For instance, large bulk loads of data into table(s) referenced by the stored procedure can bollux up index statistics on that table and thus mislead the query optimizer into constructing a poor execution plan. Solutions include 1 or more of the following:
update statistics on the table(s) involved
run DBCC FREEPROCCACHE to flush the stored procedure cache.
Execute sp_recompile on the offending stored procedure.
Direct References To Stored Procedure Parameters
Stored procedures whose queries directly use the parameters passed into the stored procedure get an execution plan based on the value of the parameters on the stored procedures first execution, when the plan is created and cached. If those parameter values are a little outré, the cached execution plan may perform well for the oddball cause but extremely poorly for the general case. The short-term Solution is torun sp_recompile or modify the stored procedure so that has the with recompile option (Note however, that this has a significant performance impact since the stored procedure gets recompiled for every execution. That means that compile locks are taken during the compilation process and that can (and does) cause blocking.
The correct fix for this problem is to declare a local variable within the stored procedure and set it to the value of the parameter. That turns the parameter value to an expression and breaks the plan's dependency on the parameter value. Floored me the first time I got hit with this problem.
Good luck!
I have worked on SQL stored procedures and I have noticed that many people use two different approaches -
First, to use select queries i.e. something like
Select * from TableA where colA = 10 order by colA
Second, is to do the same by constructing a query i.e. like
Declare #sqlstring varchar(100)
Declare #sqlwhereclause varchar(100)
Declare #sqlorderby varchar(100)
Set #sqlstring = 'Select * from TableA '
Set #sqlwhereclause = 'where colA = 10 '
Set #sqlorderby = 'order by colA'
Set #sqlstring = #sqlstring + #sqlwhereclause + #sqlorderby
exec #sqlstring
Now, I know both work fine. But, the second method I mentioned is a little annoying to maintain.
I want to know which one is better? Is there any specific reason one would resort to one method over the other? Any benefits of one method over other?
Use the first one. This will allow a query plan to be cached properly, apart from being the way you are supposed to work with SQL.
The second one is open to SQL Injection attacks, apart from the other issues.
With the dynamic SQL you will not get compile time checking, so it may fail only when invoked (the sooner you know about incorrect syntax, the better).
And, you noted yourself, the maintenance burden is also higher.
The second method has the obvious drawback of not being syntax checked at compile time. It does however allow a dynamic order by clause, which the first does not. I recommend that you always use the first example unless you have a very good reason to make the query dynamic. And, as #Oded has already pointed out, be sure to guard yourself against sql injection if you do go for the second approach.
I don't have a full comprehensive answer for you, but I can tell you right now that the latter method is much more difficult to work with when importing the stored procedure as a function in an ORM. Since the SQL is constructed dynamically, you have to manually create any type-classes that are returned from the stored procedure that aren't directly correlated to entities in your model.
With that in mind, there are times where you simply can't avoid constructing a SQL statement, especially when where clauses and joins depend on the parameters passed in. In my experience, I have found that stored procs that are creating large, variably joined/whered statements for EXECs are trying to do too many things. In these situations, I would recommend you keep the Single Responsibility Principle in mind.
Executing dynamic SQL inside a stored procedure reduces the value of using stored procedures to just a saved query container. Stored procedures are mostly beneficial in that the query execution plan (a very costly operation) is compiled and stored in memory the first time the procedure is executed. This means that every subsequent execution of the procedure is bypassing the query plan calculations, and jumping right to the data retrieval portiion of the operation.
Also, allowing a stored procedure to take an executable query string as a parameter is dangerous. Anyone with execute permission on granted on the procedure could potentially cause havoc on the rest of the database.