strange SQL server report performance problem related with update statistics

strange SQL server report performance problem related with update statistics - sql

I got a complex report using reporting service, the report connect to a SQl 2005 database, and calling a number of store procedure and functions. it works ok initially, but after a few months(data grows), it run into timeout error.
I created a few indexes to improve the performance, but the strange thing it that it works after the index was created, but throws out the same error the next day. Then I try to update the statistics on the database, it works again (the running time of the query improve 10 times). But again, it stop working the next day.
Now, the temp solution is that I run the update statistic every hour. But I can't find a reasonable explanation for this behaviour. the database is not very busy, there won't be lots of data being updated for one day. how can the update statistics make so much difference?

I suspect you have parameter sniffing. Updating statistics merely forces all query plans to be discarded, so it appears to work for a time
CREATE PROC dbo.MyReport
#SignatureParam varchar(10),
...
AS
...
DECLARE #MaskedParam varchar(10), ...
SELECT #MaskedParam = #SignatureParam, ...
SELECT...WHERE column = #MaskedParam AND ...
...
GO

I've seen this problem when the indexes on the underlying tables need to be adjusted or the SQL needs work.
The rebuild index and the update statistics read the table into the cache, which improves performance. The next day the table has been flushed out of the cache and the performance problems return.
SQL Profiler is very useful in these situations to identify what changes from run to run.

Related

How to investigate why sql script that runs every day taking 2 min is taking 2 hours?

My colleague asked me a question today
"I have a SQL script containing 4 select queries. I have been using it
daily for more than a month but yesterday same query took 2 hours and
I had to aborting execution."
His questions were
Q1. What happened to this script on that day?
Q2. How can I check of those 4 queries which of them got executed and which one culprit for abort?
My answer to Q2 was to use SQL profiler and check trace for Sql statement event.
For Q1:
I asked few questions to him
What was the volume of data on that day?
His answer: No change
Was there any change in indexing i.e. someone might have dropped indexing? His answer: No Change
Did it trapped in a deadlock by checking data management views to track it? His answer: Not in a deadlock
What else do you think I should have considered to ask? Can there be any other reason for this?
Since I didn't see the query so I can't paste it here.

Things to look at (SQL Server):
Statistics out of date? Has somebody run a large bulk insert operation? Run update statistics.
Change in indexing? If so, if it's a stored procedure, check the execution plan and/or recompile it...then check the execution plan again and correct any problems.
SQL Server caches execution plans. If you query is parameterized or uses if-then-else logic, the first time it runs, if the parameters are an edge case, the execution plan cached can work poorly for ordinary executions. You can read more about this...ah...feature at:
http://www.solidq.com/sqj/Pages/2011-April-Issue/Parameter-Sniffing-Problem-with-SQL-Server-Stored-Procedures.aspx
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/88ff51a4-bfea-404c-a828-d50d25fa0f59
SQL poor stored procedure execution plan performance - parameter sniffing

In this case my approach would be:
Here is the case, he had to abort the execution because the query was taking more than expected time and finally it didn't complete. As per my understanding, there might be any blocking session/uncommitted transaction for the table you are querying(executed by any different user on the day). Since you were executing 'select' statement and as I know, 'select' statements used to wait for any other transactions to get completed(if the transaction executed before 'select'). Your query might be waiting for any other transaction to get completed(the transaction might have update/insert or delete). Check for the blocking session if any.
For a single session sql server switches between threads. You need to check either the thread containing your query is in 'suspended'/'running' or 'runnable' mode. In your case your query might be in suspended mode. Investigate in which mode the query is and why.
Next thing is fragmentation. Best practice is to have a index rebuild/reorganize job configured in your environment which helps to remove unnecessary fragmentation. So that your query will need to scan less amount of pages while returning data. Otherwise , your query will be taking more and more time for returning data. Configure the job and execute the job at least once in a week. It will keep refreshing your indexes and pages.

Use EXPLAIN to analyze the four queries. That will tell you how the optimizer will be using indexes (or not using indexes).
Add queries to the script to SELECT NOW() in between the statements, so you can measure how long each query took. You can also have MySQL do arithmetic for you, by storing NOW() into a session variable and then use TIMEDIFF() to calculate the difference between start and finish of the statement.
SELECT NOW() INTO #start;
SELECT SLEEP(5); -- or whatever query you need to measure
SELECT TIMEDIFF(#start, NOW());
#Scott suggests in his comment, use the slow query log to measure the time for long-running queries.
Once you have identified the long-running query, use the query PROFILER while executing the query to see exactly where it's spending its time.

SP taking 15 minutes, but the same query when executed returns results in 1-2 minutes

So basically I have this relatively long stored procedure. The basic execution flow is that it SELECTS INTO some data into temp tables declared with the # sign and then runs a cursor through these tables to generate a 'running total' into a third temp table which is created using CREATE. Then this resulting temp table is joined with other tables in the DB to generated the result after some grouping etc. The problem is, this SP had been running fine until now returning results in 1-2 minutes. And now, suddenly, its taking 12-15 minutes. If I extract the query from the SP and executed it in management studio by manually setting the same parameters, it returns results in 1-2 minutes but the SP takes very long. Any idea what could be happening? I tried to generate the Actual Execution plans of both the query and the SP but it couldn't generate it because of the cursor. Any idea why the SP takes so long while the query doesn't?

This is the footprint of parameter-sniffing. See here for another discussion about it; SQL poor stored procedure execution plan performance - parameter sniffing
There are several possible fixes, including adding WITH RECOMPILE to your stored procedure which works about half the time.
The recommended fix for most situations (though it depends on the structure of your query and sproc) is to NOT use your parameters directly in your queries, but rather store them into local variables and then use those variables in your queries.

its due to parameter sniffing. first of all declare temporary variable and set the incoming variable value to temp variable and use temp variable in whole application here is an example below.
ALTER PROCEDURE [dbo].[Sp_GetAllCustomerRecords]
#customerId INT
AS
declare #customerIdTemp INT
set #customerIdTemp = #customerId
BEGIN
SELECT *
FROM Customers e Where
CustomerId = #customerIdTemp
End
try this approach

Try recompiling the sproc to ditch any stored query plan
exec sp_recompile 'YourSproc'
Then run your sproc taking care to use sensible paramters.
Also compare the actual execution plans between the two methods of executing the query.
It might also be worth recomputing any statistics.

I'd also look into parameter sniffing. Could be the proc needs to handle the parameters slighlty differently.

I usually start troubleshooting issues like that by using
"print getdate() + ' - step '". This helps me narrow down what's taking the most time. You can compare from where you run it from query analyzer and narrow down where the problem is at.

I would guess it could possible be down to caching. If you run the stored procedure twice is it faster the second time?
To investigate further you could run them both from management studio the stored procedure and the query version with the show query plan option turned on in management studio, then compare what area is taking longer in the stored procedure then when run as a query.
Alternativly you could post the stored procedure here for people to suggest optimizations.

For a start it doesn't sound like the SQL is going to perform too well anyway based on the use of a number of temp tables (could be held in memory, or persisted to tempdb - whatever SQL Server decides is best), and the use of cursors.
My suggestion would be to see if you can rewrite the sproc as a set-based query instead of a cursor-approach which will give better performance and be a lot easier to tune and optimise. Obviously I don't know exactly what your sproc does, to give an indication as to how easy/viable this is for you.
As to why the SP is taking longer than the query - difficult to say. Is there the same load on the system when you try each approach? If you run the query itself when there's a light load, it will be better than when you run the SP during a heavy load.
Also, to ensure the query truly is quicker than the SP, you need to rule out data/execution plan caching which makes a query faster for subsequent runs. You can clear the cache out using:
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
But only do this on a dev/test db server, not on production.
Then run the query, record the stats (e.g. from profiler). Clear the cache again. Run the SP and compare stats.

1) When you run the query for the first time it may take more time. One more point is if you are using any corellated sub query and if you are hardcoding the values it will be executed for only one time. When you are not hardcoding it and run it through the procedure and if you are trying to derive the value from the input value then it might take more time.
2) In rare cases it can be due to network traffic, also where we will not have consistency in the query execution time for the same input data.

I too faced a problem where we had to create some temp tables and then manipulating them had to calculate some values based on rules and finally insert the calculated values in a third table. This all if put in single SP was taking around 20-25 min. So to optimize it further we broke the sp into 3 different sp's and the total time now taken was around 6-8 mins. Just identify the steps that are involved in the whole process and how to break them up in different sp's. Surely by using this approach the overall time taken by the entire process will reduce.

This is because of parameter snipping. But how can you confirm it?
Whenever we supposed to optimize SP we look for execution plan. But in your case, you will see an optimized plan from SSMS because it's taking more time only when it called through Code.
For every SP and Function, the SQL server generates two estimated plans because of ARITHABORT option. One for SSMS and second is for the external entities(ADO Net).
ARITHABORT is by default OFF in SSMS. So if you want to check what exact query plan your SP is using when it calls from Code.
Just enable the option in SSMS and execute your SP you will see that SP will also take 12-13 minutes from SSMS.
SET ARITHABORT ON
EXEC YourSpName
SET ARITHABORT OFF
To solve this problem you just need to update the estimate query plan.
There are a couple of ways to update the estimate query plan.
1. Update table statistics.
2. recompile SP
3. SET ARITHABORT OFF in SP so it will always use query plan created for SSMS (this option is not recommended)
For more options please refer to this awesome article -
http://www.sommarskog.se/query-plan-mysteries.html

I would suggest the issue is related to the type of temp table (the # prefix). This temp table holds the data for that database session. When you run it through your app the temp table is deleted and recreated.
You might find when running in SSMS it keeps the session data and updates the table instead of creating it.
Hope that helps :)

MS SQL Server 2005 - Stored Procedure "Spontaneously Breaks"

A client has reported repeated instances of Very strange behaviour when executing a stored procedure.
They have code which runs off a cached transposition of a volatile dataset. A stored proc was written to reprocess the dataset on demand if:
1. The dataset had changed since the last reprocessing
2. The datset has been unchanged for 5 minutes
(The second condition stops massive repeated recalculation during times of change.)
This worked fine for a couple of weeks, the SP was taking 1-2 seconds to complete the re-processing, and it only did it when required. Then...
The SP suddenly "stopped working" (it just kept running and never returned)
We changed the SP in a subtle way and it worked again
A few days later it stopped working again
Someone then said "we've seen this before, just recompile the SP"
With no change to the code we recompiled the SP, and it worked
A few days later it stopped working again
This has now repeated many, many times. The SP suddenly "stops working", never returning and the client times out. (We tried running it through management studio and cancelled the query after 15 minutes.)
Yet every time we recompile the SP, it suddenly works again.
I haven't yet tried WITH RECOMPILE on the appropriate EXEC statments, but I don't particularly want to do that any way. It gets called hundred of times an hour and normally does Nothing (It only reprocesses the data a few times a day). If possible I want to avoid the overhead of recompiling what is a relatively complicated SP "just to avoid something which "shouldn't" happen...
Has anyone experienced this before?
Does anyone have any suggestions on how to overcome it?
Cheers,
Dems.
EDIT:
The pseduo-code would be as follows:
read "a" from table_x
read "b" from table_x
If (a < b) return
BEGIN TRANSACTION
DELETE table_y
INSERT INTO table_y <3 selects unioned together>
UPDATE table_x
COMMIT TRANSACTION
The selects are "not pretty", but when executed in-line they execute in no time. Including when the SP refuses to complete. And the profiler shows it is the INSERT at which the SP "stalls"
There are no parameters to the SP, and sp_lock shows nothing blocking the process.

This is the footprint of parameter-sniffing. Yes, first step is to try RECOMPILE, though it doesn't always work the way that you want it to on 2005.
Update:
I would try statement-level Recompile on the INSERT anyway as this might be a statistics problem (oh yeah, check that automatics statistics updating is on).
If this does not seem to fit parameter-sniffing, then compare th actual query plan from when it works correctly and from when it is running forever (use estimated plan if you cannot get the actual, though actual is better). You are looking to see if the plan changes or not.

I totally agree with the parameter sniffing diagnosis. If you have input parameters to the SP which are varying (or even if they aren't varying) - be sure to mask them with a local variable and use the local variable in the SP.
You can also use the WITH RECOMPILE if the set is changing but the query plan is no longer any good.
In SQL Server 2008, you can use the OPTIMIZE FOR UNKNOWN feature.
Also, if your process involves populating a table and then using that table in another operation, I recommend breaking the process up into separate SPs and calling them individually WITH RECOMPILE. I think the plans generated at the outset of the process can sometimes be very poor (so poor as not to complete) when you populate a table and then use the results of that table to carry out an operation. Because at the time of the initial plan, the table was a lot different than after the initial insert.

As others have said, something about the way the data or the source table statistics are changing is causing the cached query plan to go stale.
WITH RECOMPILE will probably be the quickest fix - use SET STATISTICS TIME ON to find out what the recompilation cost actually is before dismissing it out of hand.
If that's still not an acceptable solution, the best option is probably to try to refactor the insert statement.
You don't say whether you're using UNION or UNION ALL in your insert statement. I've seen INSERT INTO with UNION produce some bizarre query plans, particularly on pre-SP2 versions of SQL 2005.
Raj's suggestion of dropping and
recreating the target table with
SELECT INTO is one way to go.
You could also try selecting each of
the three source queries into their own
temporary table, then UNION those temp tables
together in the insert.
Alternatively, you could try a
combination of these suggestions -
put the results of the union into a
temporary table with SELECT INTO,
then insert from that into the target
table.
I've seen all of these approaches resolve performance problems in similar scenarios; testing will reveal which gives the best results with the data you have.

Obviously changing the stored procedure (by recompiling) changes the circumstances that led to the lock.
Try to log the progress of your SP as described here or here.

I would agree with the answer given above in a comment, this sounds like an unclosed transaction, particularly if you are still able to run the select statement from query analyser.
Sounds very much like there is an open transaction with a pending delete for table_y and the insert can't happen at this point.
When your SP locks up, can you perform an insert into table_y?

Do you have an index maintenance job?
Are your statistics up to date? One way to tell is examine the estimated and actual query plans for large variations.

As others have said, this sounds very likely to be an uncommitted transaction.
My best guess:
You'll want to make sure that table_y can be deleted completely and quickly.
If there are other stored procedures or external pieces of code that ever hold transactions on this table, you may be waiting forever. (They may error out and never close the transaction)
Another note: try using truncate if possible. it uses fewer resources than a delete with no where clause:
truncate table table_y
Also, once an error happens within your OWN transaction, it will cause all following calls (every 5 minutes apparently) to "hang", unless you handle your error:
begin tran
begin try
-- do normal stuff
end try
begin catch
rollback
end catch
commit
The very first error is what will give you information about the actual error. Seeing it hang in your own subsequent tests is just a secondary effect.

If you are doing these steps:
DELETE table_y
INSERT INTO table_y <3 selects unioned together>
You might want to try this instead
DROP TABLE table_y
SELECT INTO table_y <3 selects unioned together>

Different Execution Plan for the same Stored Procedure

We have a query that is taking around 5 sec on our production system, but on our mirror system (as identical as possible to production) and dev systems it takes under 1 second.
We have checked out the query plans and we can see that they differ. Also from these plans we can see why one is taking longer than the other. The data, schame and servers are similar and the stored procedures identical.
We know how to fix it by re-arranging the joins and adding hints, However at the moment it would be easier if we didn't have to make any changes to the SProc (Paperwork). We have also tried a sp_recompile.
What could cause the difference between the two query plans?
System: SQL 2005 SP2 Enterprise on Win2k3 Enterprise
Update: Thanks for your responses, it turns out that it was statistics. See summary below.

Your statistics are most likely out of date. If your data is the same, recompute the statistics on both servers and recompile. You should then see identical query plans.
Also, double-check that your indexes are identical.

Most likely statistics.
Some thoughts:
Do you do maintenance on your non-prod systems? (eg rebuidl indexes, which will rebuild statistics)
If so, do you use the same fillfactor and statistics sample ratio?
Do you restore the database regularly onto test so it's 100% like production?

is the data & data size between your mirror and production as close to the same as possible?
If you know why one query taking longer then the other? can you post some more details?
Execution plans can be different in such cases because of the data in the tables and/or the statistics. Even in cases where auto update statistics is turned on, the statistics can get out of date (especially in very large tables)
You may find that the optimizer has estimated a table is not that large and opted for a table scan or something like that.

Provided there is no WITH RECOMPILE option on your proc, the execution plan will get cached after the first execution.
Here is a trivial example on how you can get the wrong query plan cached:
create proc spTest
#id int
as
select * from sysobjects where #id is null or id = id
go
exec spTest null
-- As expected its a clustered index scan
go
exec spTest 1
-- OH no its a clustered index scan
Try running your Sql in QA on the production server outside of the stored proc to determine if you have an issue with your statistics being out of date or mysterious indexes missing from production.

Tying in to the first answer, the problem may lie with SQL Server's Parameter Sniffing feature. It uses the first value that caused compilation to help create the execution plan. Usually this is good but if the value is not normal (or somehow strange), it can contribute to a bad plan. This would also explain the difference between production and testing.
Turning off parameter sniffing would require modifying the SProc which I understand is undesirable. However, after using sp_recompile, pass in parameters that you'd consider "normal" and it should recompile based off of these new parameters.
I think the parameter sniffing behavior is different between 2005 and 2008 so this may not work.

The solution was to recalculate the statistics. I overlooked that as usually we have scheduled tasks to do all of that, but for some reason the admins didn't put one one this server, Doh.
To summarize all the posts:
Check the setup is the same
Indexes
Table sizes
Restore Database
Execution Plan Caching
If the query runs the same outside the SProc, it's not the Execution Plan
sp_recompile if it is different
Parameter sniffing
Recompute Statistics

Stored Procedure Timing out.. Drop, then Create and it's up again?

I have a web-service that calls a stored procedure from a MS-SQL2005 DB. My Web-Service was timing out on a call to one of the stored procedures I have (this has been in production for a couple of months with no timeouts), so I tried running the query in Query Analyzer which also timed out. I decided to drop and recreate the stored procedure with no changes to the code and it started performing again..
Questions:
Would this typically be an error in the TSQL of my Stored Procedure?
-Or-
Has anyone seen this and found that it is caused by some problem with the compilation of the Stored Procedure?
Also, of course, any other insights on this are welcome as well.
Similar:
SQL poor stored procedure execution plan performance - parameter sniffing
Parameter Sniffing (or Spoofing) in SQL Server

Have you been updating your statistics on the database? This sounds like the original SP was using an out-of-date query plan. sp_recompile might have helped rather than dropping/recreating it.

There are a few things you can do to fix/diagnose this.
1) Update your statistics on a regular/daily basis. SQL generates query plans (think optimizes) bases on your statistics. If they get "stale" your stored procedure might not perform as well as it used to. (especially as your database changes/grows)
2) Look a your stored procedure. Are you using temp tables? Do those temp tables have indexes on them? Most of the time you can find the culprit by looking at the stored procedure (or the tables it uses)
3) Analyze your procedure while it is "hanging" take a look at your query plan. Are there any missing indexes that would help keep your procedure's query plan from going nuts. (Look for things like table scans, and your other most expensive queries)
It is like finding a name in a phone book, sure reading every name is quick if your phone book only consists of 20 or 30 names. Try doing that with a million names, it is not so fast.

This happend to me after moving a few stored procs from development into production, It didn't happen right away, it happened after the production data grew over a couple months time. We had been using Functions to create columns. In some cases there were several function calls for each row. When the data grew so did the function call time.The original approach was good in a testing environment but failed under a heavy load. Check if there are any Function calls in the proc.

I think the table the SP is trying to use is locked by some process. Use "exec sp_who" and "exec sp_lock" to find out what is going on to your tables.

If it was working quickly, is (with the passage of several months) no longer working quickly, and the code has not been changed, then it seems likely that the underlying data has changed.
My first guess would be data growth--so much data has been added over the past few months (or past few hours, ya never know) that the query is now bogging down.
Alternatively, as CodeByMoonlight implies, the data may have changed so much over time that the original query plan built for the procedure is no longer good (though this assumes that the query plan has not cleared and recompiled at all over a long period of time).
Similarly Index/Database statistics may also be out of date. Do you have AutoUpdateSatistics on or off for the database?
Again, this might only help if nothing but the data has changed over time.

Parameter sniffing.
Answered a day or 3 ago: "strange SQL server report performance problem related with update statistics"

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas