Intermittent sqlexception timeout expired errors - sql-server-2005

We have an app with around 200-400 users and once a day or every other day we get the dreaded sql exception:
"Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding".
Once we get this then it happens several times for different users and then all users are stuck. They can't perform any operations.
I don't have the full specs of the boxes right in front of me but we have:
IIS and SQL Server running on separate boxes
each box has 64gb of memory with multiple cores
We get nothing in the SQL server logs (as would be expected) and our application catches the sqlexception so we just see the timeout error there - on an UPDATE. In the database we have only a few key tables. The timeout happens on one of the tables where there is 30k of rows. We have run profiler on these queries hitting the UI against a copy of production to get the size and made sure we have all of the right indexes (clustered/non-clustered). In a local environment (smaller box, same size database) everything runs fast and to the users most of the day the system runs fast. The exact same query (which had the timeout error in production) ran in less than a second.
We did change our command timeout from 30 seconds to 300 seconds (I know that 0 is unlimited and I guess we should use that, but it seems like that's just masking the real problem).
We had the profiler running in production, but unfortunately it wasn't fully enabled the last time it happened. We are setting it up correctly now.
Any ideas on what this might be?

Related

Simple UPDATE taking too long at first, then speeding up

In what situation would a simple update statement
UPDATE [BasicUserTable]
SET [DateTimeCol] = '9/6/2022'
WHERE [UniqueIntPKCol] = 123
take 1m 30s to complete, AND THEN all subsequent updates using the same statement and lines of code (except for id and datetime), execute in < 100 ms?
The table has less than 10,000 records, standard int auto incrementing primary key.
Background: our app was timing out (standard 30 sec timeout) while it waited for SQL Server to execute the statement above. We manually tried the statement using SSMS on the same server, and it took ~1m 30s to execute.
Immediately afterward, all other attempts to run the same code were blazing fast as expected. We can't walk past this issue without knowing the real reason that it happened, so we can prevent it in the future.
After looking at logs, there were no apparent blocking locks on the records, nor code that could intervene an cause issue.
SQL Logs did not have any errors
Microsoft.EntityFrameworkCore.DbUpdateException
Inner exception: Microsoft.Data.SqlClient.SqlException: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Has anyone run into this before, or do you have a plausible working theory? (index rebuild, caching, etc.)
A lock wait is the only thing I can imagine that would cause this.
After looking at logs, there were no apparent blocking locks
Lock waits don't cause any logging. You might see logs if you configure the blocked process report, but it's not on by default.
Turning on the Query Store can help by helping track query resource utilization and waits.
Although extremely unlikely here, file growth can also cause sporadic delays, as the statement that needs the additional log file or data file space has to wait for the file to be resized.

Random delays on opening connection to SQL Server

In one of our systems we experience random delays when opening a connection to SQL Server.
The system is running Windows Server 2012 R2 Standard and SQL Server 2012, located on the same physical machine as our application.
Even when our application is idle, it is executing DB operations once every few seconds on average.
DB operations our application executes usually consist of 3 steps:
open a connection to SQL server
run a stored procedure
close the connection
Normally the first step takes a tiny fraction of second, while running a stored procedure may take much longer, depending on many factors.
The problem: opening a connection may randomly take 5-13 seconds. This only happens rarely, once in a few hours, even once in a day.
In other words this could happen once per a few thousand DB operations. We have not detected any discernable pattern in the timing of these delays.
There is nothing suspicious in the SQL Server log files.
Running SQL Server profiler does not seem practical, as the fault may not be exhibited for 10-20 hours.
We have not seen this phenomenon on any other machine.
It looks like we've fixed the problem. Somewhere I read a recommendation to try to use SQL Server authentication instead of Windows authentication. The problem discussed there was not exactly the same as ours but somewhat similar. Since connection string is used in every Open Connection operation, I decided to give it a try. As a result, our application has been working for 3 days in a row by now and there has not been a single incident of opening connection being slow. To put this in a context, before this fix we had several incidents in 24 hours on average, and not a single incident-free 24-hour period for the last two months.

Timeout expired SQL Server 2008

I have a SQL Server database in production and it has been live for 2 months. A while ago the web application associated with it loading takes too long time. And sometimes it says timeout occurred.
Found a quick fix by running a command 'exec sp_updatestats' will fixed the problem. But I need to be run that one consistently (for every 5 minutes).
So I created a Windows service with timer and started on server. My question is what are the root causes and possible permanent solutions? Anyone?
Here is a Most expensive query from Activity Monitor
WAITFOR(RECEIVE TOP (1) message_type_name, conversation_handle, cast(message_body AS XML) as message_body from [SqlQueryNotificationService-2eea594b-f994-43be-a5ed-d9a47837a391]), TIMEOUT #p2;
To diagnose a poorly performing queries you need to:
Identify the poorly performing query, e.g. via application logging, a SQL Profiler trace filtered to show only queries with a longer duration than a certain threshold etc...
Get an execution plan for the query
At that point you can start to try to figure out what the performance issue is.
Given that exec sp_updatestats fixes your issue it could be that statistics on certain tables are out of date (this is a pretty common cause of performance issues). If thats the case then you might be able to tweak your statistics or at least rebuild only those statistics that are causing issues.
Its worth noting that updating statistics will also cause cached execution plans to become invalid, and so its plausible that your issue is unrelated to statistics - you need to collect more information about the poorly performing queries before looking at solutions.
Your most expensive query looks like its waiting for a message, i.e. its in your list of long running queries because its designed to run for a long time, not because its expensive.
Thanks for everyone i found a solution for my issue . Its quite different I've enabled sql dependency module on my sql server by setting up enable broker on , thats the one causing timeout query so by setting it to false everything is fine working now.

How to forcefully stop long postgres query under heavy load?

I am working on a Rails app with Postgres on Ubuntu. Unfortunately for me, this legacy app uses some heavyweight stored procedures in the db. What's more, the db is quite large (5GB) and my computer is not particularly fast. Every now and then, if I pass some bad parameters from my code to the db, my computer becomes super slow to the degree that I cannot get to the console and kill the postgres process. I assume, this is due to some very expensive db query. My only solution is to hard reset my laptop. So my question is - is there a way to forcefully kill a long-taking query? Or perhaps, is there a way to limit the CPU and RAM the db is allowed to use, so that I still have some resources left to go and manually restart postgres?
You can set a maximum time for statements to take with the statement_timeout configuration option:
Abort any statement that takes more than the specified number of milliseconds, starting from the time the command arrives at the server from the client. If log_min_error_statement is set to ERROR or lower, the statement that timed out will also be logged. A value of zero (the default) turns this off.
You can set this option a variety of ways, such as in postgresql.conf for everyone, per session with the SET command, or even per database or per role. More information on setting options is in the documentation.

What would cause SQL Server to stop writing to the error log?

Error logs for our SQL Server instance are gathering a large amount of data (250k records in a month) all day, then all of a sudden stop at roughly the same time of day (9:15pm), though on different days of the week and at seemingly random intervals of days.
This corresponds to other issues on the server: 1) jobs that move files to shares on the database server fail 2) I am not able to access the server via any method (tried RDP and SSMS). Once the servers are rebooted, SQL Server comes up and SQL Server error logging resumes.
Windows Event Viewer doesn't show any notable error messages for System (the other event logs have wrapped already).
The error logs are being written to the D:\ drive, which has over 100GB free currently. The error log files are in the range of tens of megabytes.
Appreciate any ideas on what might have caused this or how troubleshoot it. Thanks!
The cause appears to have been a corrupted maintenance plan. I discovered this by correlating the timing of the lock-up to the times the maintenance plan was running. The lack of logging made this difficult to confirm. Guessing that at least some parts of it ran normally, but got rolled back on restart.
The current fix was to disable the maintenance plan and replace it with a collection of jobs that do the same tasks. I will likely recreate the original maintenance plan if the server remains stable for another week or two. If we stay stable past that point, it should solidly confirm the maintenance plan as the source of the problem.