Query Plan Recompiled suddenly and degrades performance - sql

Scenario: We have a simple select query
Declare P#
SELECT TOP(1) USERID
FROM table
WHERE non_clusteredindex_column = (#P) ORDER BY PK_column DESC
It usually executes with in 0.12sec since 1 year. But Yesterday suddenly exactly after mid night it started consuming all my CPU and taking 150 sec to execute. I checked SP_who2 and found no dead locks and nothing except this one query consuming all CPU. I decided to reboot the server to get rid of any Parameter sniffing issue or to kill any stale connections.I took a SLQ profiler Trace for 1 min before restarting the server for future Root Cause Analysis. After reboot, everything is back to normal. I was surprised and curiously started reviewing the Execution plan in profiler that I took and comparing to the current execution plan of the SAME query. I found both are different.
Execution plan before problematic Night is same as the execution plan after the Reboot. (Doing perfect Index seeks)
But the execution plan in Problematic Night SQL profiler is doing full Index Scan which is taking all CPU and taking 150 sec to execute.
Quesion:
I can say the execution plan was suddenly recompiled or query started using new execution plan(full scan) after yesterday midnight and after I rebooted, again it started using the old and good execution plan( Index SEEK).
Q1. What made SQL server to use new EXECUTION plan all of a sudden?
Q2. What made SQL server use the old & good execution plan after Reboot?
Q3. Anything related to Parameter Sniffing as I am passing Parameter. But technically, it shouldn't be as The parameter column is well organized with evenly distributed Data.

It sounds like you are having a parameter sniffing issue. I can't see your data but often we found these crop up even in simple query scenarios when either many rows match the parameter result and it flipped to a scan even when it shouldn't or there was some other problem with the data such as many values are unique but they decided under some scenario that column should have a 0 in a large portion of the table throwing everything for a loop. If the query from code is running slow but you can do a test procedure execution from ssms this is a pretty big red flag that something along this line is your issue.
You are correct that SQL restart flushes all the plan cache or you can manually flush all the plans out but you absolutely do not want to use this method to fix the plan of a single procedure. A quick fix is you can execute a EXEC sp_recompile 'dbo.procname'; to force it to flush just a single procedure execution plan and make a new one. Redoing all your plans especially in a busy database can cause significant performance concerns of other procs and restart of course has some downtime. This only temporarily fixes the problem when it crops up though if you have identified a parameter causing issues I would consider looking into addition an optimize for unknown hint specifically designed for parameter sniffing issues that have been identified. But also maybe make sure some good index maintenance is going on the regular in your environment in case that is causing bad plans not the sql engine.

In your case, you can do the following :
-- Activate the query store option in you database setting . Set Operation Mode To On.
-- This will start capturing the query plan for each request.
-- You can start tracking the query that consumes a lot of resources
-- Finally you can force an execution plan to be used for this query

Related

Debugging a strange scenario in SQL Server 2016 with regards to stored procedure execution

In our organization, we have SQL Server VM on Azure with always on availability group with 2 nodes.
Scenario:
We have one procedure called "SP_xyz" and it contains one select query with few inner joins to get list of credential holders. After some load, this stored procedure (SP) started running slow and hence we have optimized this and put that SP back in production and it was running fine for some time.
After couple of months as load increased, again there is slowness issue in this SP and again we analysed this SP and optimized. Now the mystery comes, Just to cross verify the new optimized SP, we created the same SP with _test in production. The new SP is "SP_xyz_Test".
When we ran this new _Test SP in prod with same set of parameters for which old SP (SP_xyz) was running slow, the new optimized SP gave results in milliseconds against few seconds of older SP.
To our surprise, the next movement when we ran the old SP, it also started giving results in milliseconds. This really scared us as where all this kind of issue would be there in production, as we have around 300+ SQL stored procedures.
We did analyse few things that we could think of to find the root cause:
Index rebuild
Stats update
Also as we know the SP execution plan would be specific to SP name. But here how the old SP has become faster is what we are wondering.
But all these things have been scheduled and were running in production and old SP started running slow. But the movement the new _test SP ran, it has become very fast.
Have we missed anything here, and has anybody has faced this issue before?
I think with the details you provided ,it is not clear ..But since you are using sqlserver 2016.. you can use querystore to track a statement or stored procedure execution over time
A query might have different plans over time and one plan may perform better and one may not ..So when you enable query store, you can see all the plan changes over time in the regressed query section,which can help you analyze why one plan is taking more time than the another..At least its a starting point..
below is a query with different plan(dots represent new plans over time) and place where the plotted on the graph indicates time taken
Not sure whether you got the answer for this.
I guess it is typical case of execution plans get outdated due to the dynamic nature of your procedure.
Try recompile option.
CREATE PROCEDURE SP_xyz
WITH RECOMPILE
AS
BEGIN
.......
END
GO

How to investigate why sql script that runs every day taking 2 min is taking 2 hours?

My colleague asked me a question today
"I have a SQL script containing 4 select queries. I have been using it
daily for more than a month but yesterday same query took 2 hours and
I had to aborting execution."
His questions were
Q1. What happened to this script on that day?
Q2. How can I check of those 4 queries which of them got executed and which one culprit for abort?
My answer to Q2 was to use SQL profiler and check trace for Sql statement event.
For Q1:
I asked few questions to him
What was the volume of data on that day?
His answer: No change
Was there any change in indexing i.e. someone might have dropped indexing? His answer: No Change
Did it trapped in a deadlock by checking data management views to track it? His answer: Not in a deadlock
What else do you think I should have considered to ask? Can there be any other reason for this?
Since I didn't see the query so I can't paste it here.
Things to look at (SQL Server):
Statistics out of date? Has somebody run a large bulk insert operation? Run update statistics.
Change in indexing? If so, if it's a stored procedure, check the execution plan and/or recompile it...then check the execution plan again and correct any problems.
SQL Server caches execution plans. If you query is parameterized or uses if-then-else logic, the first time it runs, if the parameters are an edge case, the execution plan cached can work poorly for ordinary executions. You can read more about this...ah...feature at:
http://www.solidq.com/sqj/Pages/2011-April-Issue/Parameter-Sniffing-Problem-with-SQL-Server-Stored-Procedures.aspx
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/88ff51a4-bfea-404c-a828-d50d25fa0f59
SQL poor stored procedure execution plan performance - parameter sniffing
In this case my approach would be:
Here is the case, he had to abort the execution because the query was taking more than expected time and finally it didn't complete. As per my understanding, there might be any blocking session/uncommitted transaction for the table you are querying(executed by any different user on the day). Since you were executing 'select' statement and as I know, 'select' statements used to wait for any other transactions to get completed(if the transaction executed before 'select'). Your query might be waiting for any other transaction to get completed(the transaction might have update/insert or delete). Check for the blocking session if any.
For a single session sql server switches between threads. You need to check either the thread containing your query is in 'suspended'/'running' or 'runnable' mode. In your case your query might be in suspended mode. Investigate in which mode the query is and why.
Next thing is fragmentation. Best practice is to have a index rebuild/reorganize job configured in your environment which helps to remove unnecessary fragmentation. So that your query will need to scan less amount of pages while returning data. Otherwise , your query will be taking more and more time for returning data. Configure the job and execute the job at least once in a week. It will keep refreshing your indexes and pages.
Use EXPLAIN to analyze the four queries. That will tell you how the optimizer will be using indexes (or not using indexes).
Add queries to the script to SELECT NOW() in between the statements, so you can measure how long each query took. You can also have MySQL do arithmetic for you, by storing NOW() into a session variable and then use TIMEDIFF() to calculate the difference between start and finish of the statement.
SELECT NOW() INTO #start;
SELECT SLEEP(5); -- or whatever query you need to measure
SELECT TIMEDIFF(#start, NOW());
#Scott suggests in his comment, use the slow query log to measure the time for long-running queries.
Once you have identified the long-running query, use the query PROFILER while executing the query to see exactly where it's spending its time.

Can you force Linq2SQL to NOT use sp_executesql?

So I write a Linq query and it takes 16 seconds to run. Decide to see what the query plan is, so I get that out of Linq to SQL Profiler and the query only takes 2 seconds to run. sigh
After spending most of the day poking at things and finally getting around to using SQL Server Profiler I see that Linq2SQL is using sp_executesql to run the query. I understand that it's supposed to improve performance because it's more likely to re-use the execution plan... but it seems to have chosen a horrible execution plan to use.
The weirder part is that it only gets slow if I join a specific table, and I have no idea why that specific table is causing a problem.
EDIT Just to clarify the actual issue here:
It's actually getting to different queries. One is, essentially,
SELECT col1, col2, ... FROM table1, table2 WHERE table1.val IN (1234, 2343, 2435)
The other is
EXEC sp_executesql 'SELECT col1, col2, ... FROM table1, table2 WHERE table1.val IN (#p1, #p2, #p3)',
N'#p0 int,#p1 int,#p2 int,#p3 int',
#p0=1234, #p1=2343, #p3=2435
Your problem doesn't stem from the use of sp_executesql, and so circumventing it (which you can't) will not solve your problems. I suggest you read Erland Sommarskog's excellent article:
Slow in the Application, Fast in SSMS?
Understanding Performance Mysteries
This will give you a deep understanding of why you're getting a performance difference, how to diagnose and consistently reproduce it, and finally, how to solve it.
If the exact same query is fast from one application or server, but slow from another, it's usually all about execution plans. An execution plan is the blueprint the server uses to run the query. The plan is supposed to be created once, and then reused for all queries which differ only in parameter values.
Different execution plans can lead to wildly difference performance, a factor of 100 is not at all unusual. As a first step, examine if the execution plans are different. The profiler event performance -> showplan xml logs the plan.
If the plan is different, one possible cause can be the session options, like ansi nulls:
SET ANSI_NULLS
Another possibility is a different login (the blueprint contains security information, so each security context has its own set of cached execution plans.)
The easiest way to clear the plan cache is to restart the SQL Server service. There's also an advanced command to clear the entire query plan cache:
DBCC FREEPROCCACHE
P.S. If you have a stored procedure that performs differently based on the value of parameters, it's worth to check out parameter sniffing. But since you're copying the exact same procedure from the profiler, I assume the parameters are identical for both the slow and the fast invocations.
To answer your question....
NO, you can't...

Option Recompile makes query fast - good or bad?

I have two SQL queries with about 2-3 INNER JOINS each. I need to do an INTERSECT between them.
Problem is that indiividually the queryes work fast, but after intersecting take about 4 seconds in total.
Now, if I put an OPTION (RECOMPILE) at the end of this whole query, the query works great again working quite fast returning almost instantly!.
I understand that option recopile forces a rebuild of execution plan, so I am confused now if my earler query taking 4 seconds is better or now the one with recompile, but taking 0 seconds is better.
Rather than answer the question you asked, here's what you should do:
Update your statistics:
EXEC sp_updatestats
If that doesn't work, rebuild indexes.
If that doesn't work, look at OPTIMIZE FOR
WITH RECOMPILE is specified SQL Server does not cache a plan for this stored procedure,
the stored procedure is recompiled each time it is executed.
Whenever a stored procedure is run in SQL Server for the first time, it is optimized and a query plan is compiled and cached in SQL Server's memory. Each time the same stored procedure is run after it is cached, it will use the same query plan eliminating the need for the same stored procedure from being optimized and compiled every time it is run. So if you need to run the same stored procedure 1,000 times a day, a lot of time and hardware resources can be saved and SQL Server doesn't have to work as hard.
you should not use this option because by using this option, you lose most of the advantages you get by substituting SQL queries with the stored procedures.

Different Execution Plan for the same Stored Procedure

We have a query that is taking around 5 sec on our production system, but on our mirror system (as identical as possible to production) and dev systems it takes under 1 second.
We have checked out the query plans and we can see that they differ. Also from these plans we can see why one is taking longer than the other. The data, schame and servers are similar and the stored procedures identical.
We know how to fix it by re-arranging the joins and adding hints, However at the moment it would be easier if we didn't have to make any changes to the SProc (Paperwork). We have also tried a sp_recompile.
What could cause the difference between the two query plans?
System: SQL 2005 SP2 Enterprise on Win2k3 Enterprise
Update: Thanks for your responses, it turns out that it was statistics. See summary below.
Your statistics are most likely out of date. If your data is the same, recompute the statistics on both servers and recompile. You should then see identical query plans.
Also, double-check that your indexes are identical.
Most likely statistics.
Some thoughts:
Do you do maintenance on your non-prod systems? (eg rebuidl indexes, which will rebuild statistics)
If so, do you use the same fillfactor and statistics sample ratio?
Do you restore the database regularly onto test so it's 100% like production?
is the data & data size between your mirror and production as close to the same as possible?
If you know why one query taking longer then the other? can you post some more details?
Execution plans can be different in such cases because of the data in the tables and/or the statistics. Even in cases where auto update statistics is turned on, the statistics can get out of date (especially in very large tables)
You may find that the optimizer has estimated a table is not that large and opted for a table scan or something like that.
Provided there is no WITH RECOMPILE option on your proc, the execution plan will get cached after the first execution.
Here is a trivial example on how you can get the wrong query plan cached:
create proc spTest
#id int
as
select * from sysobjects where #id is null or id = id
go
exec spTest null
-- As expected its a clustered index scan
go
exec spTest 1
-- OH no its a clustered index scan
Try running your Sql in QA on the production server outside of the stored proc to determine if you have an issue with your statistics being out of date or mysterious indexes missing from production.
Tying in to the first answer, the problem may lie with SQL Server's Parameter Sniffing feature. It uses the first value that caused compilation to help create the execution plan. Usually this is good but if the value is not normal (or somehow strange), it can contribute to a bad plan. This would also explain the difference between production and testing.
Turning off parameter sniffing would require modifying the SProc which I understand is undesirable. However, after using sp_recompile, pass in parameters that you'd consider "normal" and it should recompile based off of these new parameters.
I think the parameter sniffing behavior is different between 2005 and 2008 so this may not work.
The solution was to recalculate the statistics. I overlooked that as usually we have scheduled tasks to do all of that, but for some reason the admins didn't put one one this server, Doh.
To summarize all the posts:
Check the setup is the same
Indexes
Table sizes
Restore Database
Execution Plan Caching
If the query runs the same outside the SProc, it's not the Execution Plan
sp_recompile if it is different
Parameter sniffing
Recompute Statistics