Execution Timeout Expired - Randomly for simple update command - sql

I am randomly getting execution timeout expired error using the Entity Framework (EF6). At the time of executing the below update command it gives randomly execution timeout error.
UPDATE [dbo].[EmployeeTable] SET [Name]=#0,[JoiningDate]=#1 WHERE
([EmpId]=#2)
The above update command is simple and it takes 2-5 seconds to update the EmployeeTable. But sometime the same update query takes 40-50 seconds and leads the error as
Execution Timeout Expired. The timeout period elapsed prior to
completion of the operation or the server is not responding. the
statement has been terminated
.
For that I updated my code inside constructor of MyApplicationContext class can be changed to include the following property
this.Database.CommandTimeout = 180;
The above command should resolve my timeout issue. But I can’t find out the root cause of that issue.
For my understanding this type of timeout issue can have three causes;
There's a deadlock somewhere
The database's statistics and/or query plan cache are incorrect
The query is too complex and needs to be tuned
Can you please tell me what the main root cause of that error is?

This query:
UPDATE [dbo].[EmployeeTable]
SET [Name] = #0,
[JoiningDate] = #1
WHERE ([EmpId]=#2);
Should not normally take 2 seconds. It is (presumably) updating a single row and that is not a 2-second operation, even on a pretty large table. Do you have an index on EmployeeTable(EmpId)? If not, that would explain the 2 seconds as well as the potential for deadlock.
If you do have an index, this perhaps something else is going on. One place to look is for any triggers on the table.

If it's random, maybe something else is accessing the database while your application is updating rows.
We got the same exact issue. In our case, the root cause was BO reports being run on the production database. These reports were locking rows and causing timeouts in our applications (randomly since these reports were executed on demand).
Other things that you might want to check:
Do you have complex triggers on your table ?
Does all foreign keys used in your table are indexed in the foreign tables ?

Related

Understanding locks and query status in Snowflake (multiple updates to a single table)

While using the python connector for snowflake with queries of the form
UPDATE X.TABLEY SET STATUS = %(status)s, STATUS_DETAILS = %(status_details)s WHERE ID = %(entry_id)s
, sometimes I get the following message:
(snowflake.connector.errors.ProgrammingError) 000625 (57014): Statement 'X' has locked table 'XX' in transaction 1588294931722 and this lock has not yet been released.
and soon after that
Your statement X' was aborted because the number of waiters for this lock exceeds the 20 statements limit
This usually happens when multiple queries are trying to update a single table. What I don't understand is that when I see the query history in Snowflake, it says the query finished successfully (Succeded Status) but in reality, the Update never happened, because the table did not alter.
So according to https://community.snowflake.com/s/article/how-to-resolve-blocked-queries I used
SELECT SYSTEM$ABORT_TRANSACTION(<transaction_id>);
to release the lock, but still, nothing happened and even with the succeed status the query seems to not have executed at all. So my question is, how does this really work and how can a lock be released without losing the execution of the query (also, what happens to the other 20+ queries that are queued because of the lock, sometimes it seems that when the lock is released the next one takes the lock and have to be aborted as well).
I would appreciate it if you could help me. Thanks!
Not sure if Sergio got an answer to this. The problem in this case is not with the table. Based on my experience with snowflake below is my understanding.
In snowflake, every table operations also involves a change in the meta table which keeps track of micro partitions, min and max. This meta table supports only 20 concurrent DML statements by default. So if a table is continuously getting updated and getting hit at the same partition, there is a chance that this limit will exceed. In this case, we should look at redesigning the table updation/insertion logic. In one of our use cases, we increased the limit to 50 after speaking to snowflake support team
UPDATE, DELETE, and MERGE cannot run concurrently on a single table; they will be serialized as only one can take a lock on a table at at a time. Others will queue up in the "blocked" state until it is their turn to take the lock. There is a limit on the number of queries that can be waiting on a single lock.
If you see an update finish successfully but don't see the updated data in the table, then you are most likely not COMMITting your transactions. Make sure you run COMMIT after an update so that the new data is committed to the table and the lock is released.
Alternatively, you can make sure AUTOCOMMIT is enabled so that DML will commit automatically after completion. You can enable it with ALTER SESSION SET AUTOCOMMIT=TRUE; in any sessions that are going to run an UPDATE.

Query Plan Recompiled suddenly and degrades performance

Scenario: We have a simple select query
Declare P#
SELECT TOP(1) USERID
FROM table
WHERE non_clusteredindex_column = (#P) ORDER BY PK_column DESC
It usually executes with in 0.12sec since 1 year. But Yesterday suddenly exactly after mid night it started consuming all my CPU and taking 150 sec to execute. I checked SP_who2 and found no dead locks and nothing except this one query consuming all CPU. I decided to reboot the server to get rid of any Parameter sniffing issue or to kill any stale connections.I took a SLQ profiler Trace for 1 min before restarting the server for future Root Cause Analysis. After reboot, everything is back to normal. I was surprised and curiously started reviewing the Execution plan in profiler that I took and comparing to the current execution plan of the SAME query. I found both are different.
Execution plan before problematic Night is same as the execution plan after the Reboot. (Doing perfect Index seeks)
But the execution plan in Problematic Night SQL profiler is doing full Index Scan which is taking all CPU and taking 150 sec to execute.
Quesion:
I can say the execution plan was suddenly recompiled or query started using new execution plan(full scan) after yesterday midnight and after I rebooted, again it started using the old and good execution plan( Index SEEK).
Q1. What made SQL server to use new EXECUTION plan all of a sudden?
Q2. What made SQL server use the old & good execution plan after Reboot?
Q3. Anything related to Parameter Sniffing as I am passing Parameter. But technically, it shouldn't be as The parameter column is well organized with evenly distributed Data.
It sounds like you are having a parameter sniffing issue. I can't see your data but often we found these crop up even in simple query scenarios when either many rows match the parameter result and it flipped to a scan even when it shouldn't or there was some other problem with the data such as many values are unique but they decided under some scenario that column should have a 0 in a large portion of the table throwing everything for a loop. If the query from code is running slow but you can do a test procedure execution from ssms this is a pretty big red flag that something along this line is your issue.
You are correct that SQL restart flushes all the plan cache or you can manually flush all the plans out but you absolutely do not want to use this method to fix the plan of a single procedure. A quick fix is you can execute a EXEC sp_recompile 'dbo.procname'; to force it to flush just a single procedure execution plan and make a new one. Redoing all your plans especially in a busy database can cause significant performance concerns of other procs and restart of course has some downtime. This only temporarily fixes the problem when it crops up though if you have identified a parameter causing issues I would consider looking into addition an optimize for unknown hint specifically designed for parameter sniffing issues that have been identified. But also maybe make sure some good index maintenance is going on the regular in your environment in case that is causing bad plans not the sql engine.
In your case, you can do the following :
-- Activate the query store option in you database setting . Set Operation Mode To On.
-- This will start capturing the query plan for each request.
-- You can start tracking the query that consumes a lot of resources
-- Finally you can force an execution plan to be used for this query

TimeOut on Create Unique Index Concurrently

I am trying to create Unique Index on a table on my production box. For that:
Have created a table TABLE1 with COLUMN1
Ran following command on my production DB(which is Postgres)
Create UNIQUE INDEX CONCURRENTLY idx_id_unique on TABLE1(COLUMN1)
This is throwing error(after some time):
2014-10-07 20:46:49.056 EDT ERROR: cancelling statement due to statement timeout
2014-10-07 20:46:49.056 EDT STATEMENT: Create UNIQUE INDEX CONCURRENTLY idx_id_unique on TABLE1(COLUMN1)
My Question is:
What could be the probable reasons for this timeout failure?
Note: As this is production DB server, we have thousands of multiple queries/transactions running concurrently so this CREATE INDEX.... will require a significant amount of time. But still, will this query throw timeout exception?
Does Postgres throw statement timeout error for CREATE UNIQUE INDEX CONCURRENTLY? As this query requires multiple hours to finish for large tables.
What could be the probable workaround for this?
What could be the probable reasons for this timeout failure? Note: As
this is production DB server, we have thousands of multiple
queries/transactions running concurrently so this CREATE INDEX....
will require a significant amount of time. But still, will this query
throw timeout exception?
Well, it tells you: cancelling statement due to statement timeout. So, yes.
Does POSTGRES throw statement timeout error for CREATE UNIQUE INDEX CONCURRENTLY? As this query requires multiple hours to finish for large tables.
Yes, it just told you so.
What could be the probable workaround for this?
Increase the statement timeout limit for the session that's creating the index?
EDIT: add links to manuals
Well details on statement_timeout are in the manuals: http://www.postgresql.org/docs/current/static/runtime-config-client.html#GUC-STATEMENT-TIMEOUT
Likewise how to set a configuration settings are too: http://www.postgresql.org/docs/current/static/sql-set.html
I know nothing of this TransactionTemplate thing you are using, but google suggests it has documentation. Presumably there is some way to issue raw SQL through it if nothing else.
As an aside, you will find your development easier if you:
Carefully read provided error messages.
Have at least some familiarity with the manuals for the software you are using.
Can search for #1 against #2 via google or built-in searching.

How can I force a Snapshot Isolation failure of 3960

Story
I have a SPROC using Snapshot Isolation to perform several inserts via MERGE. This SPROC is called with very high load and often in parallel so it occasionally throws an Error 3960- which indicates the snapshot rolled back because of change conflicts. This is expected because of the high concurrency.
Problem
I've implemented a "retry" queue to perform this work again later on, but I am having difficulty reproducing the error to verify my checks are accurate.
Question
How can I reproduce a snapshot failure (3960, specifically) to verify my retry logic is working?
Already Tried
RAISEERROR doesn't work because it doesn't allow me to raise existing errors, only user defined ones
I've tried re-inserted the same record, but this doesn't throw the same failure since it's not two different transactions "racing" another
Open two connections, start a snapshot transaction on both, on connection 1 update a record, on the connection 2 update the same record (in background because it will block), then on connection 1 commit
Or treat a user error as a 3960 ...
Why not just do this:
RAISERROR(3960, {sev}, {state})
Replacing {sev} and {state} with the actual values that you see when the error occurs in production?
(Nope, as Martin pointed out, that doesn't work.)
If not that then I would suggest trying to run your test query multiple times simultaneously. I have done this myself to simulate other concurrency errors. It should be doable as long as the test query is not too fast (a couple of seconds at least).

Master database DB STARTUP problem

I have a SQL Server 2008 database and I have a problem with this database that I don't understand.
The steps that caused the problems are:
I ran a SQL query to update a table called authors from another table called authorAff
The authors table is 123,385,300 records and the authorsAff table is 139,036,077
The query took about 7 days executing but it didn't finish
I decided to cancel the query to do it another way.
The connection on which I was running the query disconnected suddenly so the database became in recovery until the query cancels
The server was shut down many times afterwards because of some electricity problems
The database took about two days and then recovered.
Now when I run this query
SELECT TOP 1000 *
FROM AUTHORS WITH(READUNCOMMITTED)
It executes and returns the results but when I remove WITH(READUNCOMMITTED) hint it gets locked by a process running on the master database that appears only on the Activity Monitor with Command [DB STARTUP] and no results show up.
so what is the DB STARTUP command and if it's a problem, how can I solve it?
Thank you in advance.
I suspect that your user database is still trying to rollback the transaction that you canceled. A general rule of thumb indicates that it will take about the same amount of time, or more, for an aborted transaction to rollback as it has taken to run.
The rollback can't be avoided even with the SQL Server stops and starts you had.
The reason you can run a query WITH(READUNCOMMITTED) is because it's ignoring the locks associated with transaction that is rolling back. Your query results are considered unreliable, but ironically, the results are probably what you want to see since the blocking process is a rollback.
The best solution is to wait it out, if you can afford to do so. You may be able to find ways to kill the blocking process, but then you should be concerned with database integrity.