I'm starting to use the transient fault block provided by Microsoft for SQL Azure and I'm noticing that some of my functions which have a transient handled block may have reference to other functions with this block also.
I am guessing that the retry wait time will then compound?
What you mean by retry time will then compound?
Each query to the SQL Database is executed with its own Retry Logic. If you have a method or a function call that executes 10 Queries, each one of these 10 Queries will execute with respective retry logic.
If there is transient error while executing just one of the queries. Then wait time will be that wait time. If there is transient error during execution of more than one query, then wait time will be sum of all wait times occurred during execution.
However the transient errors are what their name suggests - transient. It is very unlikely that you will encounter a transient error for more than query executed in consecutive order. But if it happens, then yes - the wait times will sum up. However if you execute queries in parallel, or asynchronously, then you don't really sum up wait times.
And finally, retry policy for one query execution, does not affect retry policy for other query execution.
Related
I develop some jobs with Talend, and use Tlogcatcher to record errors on a database. It seems working for all jobs except one.
Here is how it work:
The first SQL connection read SQL statement from a database table, and for each of them, the tMSSqlRow read and execute it.
But when SQL failed (i.e delete not possible cause of constraint integrity), the error is not caught by the tLogCatcher component.
How I can do that?
tLogCatcher is not supposed to be used like you did in your job (with OnComponentError/OnSubjobError triggers), it has to be the first component of an independent subjob (not linked by a trigger) which gets called whenever there is an error/warning/java exception, depending on which types you check in tLogCatcher settings.
If you want to keep the OnComponentError trigger, you can omit the tLogCatcher altogether and just do your error processing in the subjob triggered by OnComponentError.
Also, make sure you check "Die on error" option in your tMSSqlRow component, otherwise no error is thrown and the job just prints an error message to the console and continues execution, thus the tLogCatcher won't be invoked.
Edit:
Based on your requirement (continue job execution on error), a solution would be to encapsulate the processing that happens starting from tMSSqlRow onward (as well as the error handling technique I suggested above) in a child job.
This child job needs to define a context parameter (for instance QUERY) and does the processing for this single query. The parent job calls this child job with an Iterate trigger, and passing it each query to be processed via the QUERY context parameter (the global variable that is in your tFlowToIterate is passed to QUERY context parameter in the tRunJob parameters tab).
This way, if your query processing in the child job results in an error, it is handled inside the child job by the tLogCatcher, and the parent job isn't aware of this error, so it continues to the next query.
In my vb application i have a query that updates one column in a table.
But because of the fact that the property for this database lock mode is
SET LOCK MODE TO NOT WAIT
sometimes when running query with update I get errors like this:
SQL ERR: EIX000: (-144) ISAM error: key value locked
EIX000: (-245) Could not position within a file via an index. (informix.table1)
My question is , is it safe to execute:
1st SET LOCK MODE TO WAIT;
2nd the update query;
3rd SET LOCK MODE TO NOT WAIT;
Or you can point me to other solution if this is not safe
It is "safe" to do the three operations as suggested, but …
Your application may block for an indefinite time while the operation runs.
If you terminate the query somehow but don't reset the lock mode, other parts of your code may get hung on locks unexpectedly.
Consider whether a wait with timeout is appropriate.
Each thread, if there are threads, should have exclusive access to one connection for the duration of the three operations.
I am randomly getting execution timeout expired error using the Entity Framework (EF6). At the time of executing the below update command it gives randomly execution timeout error.
UPDATE [dbo].[EmployeeTable] SET [Name]=#0,[JoiningDate]=#1 WHERE
([EmpId]=#2)
The above update command is simple and it takes 2-5 seconds to update the EmployeeTable. But sometime the same update query takes 40-50 seconds and leads the error as
Execution Timeout Expired. The timeout period elapsed prior to
completion of the operation or the server is not responding. the
statement has been terminated
.
For that I updated my code inside constructor of MyApplicationContext class can be changed to include the following property
this.Database.CommandTimeout = 180;
The above command should resolve my timeout issue. But I can’t find out the root cause of that issue.
For my understanding this type of timeout issue can have three causes;
There's a deadlock somewhere
The database's statistics and/or query plan cache are incorrect
The query is too complex and needs to be tuned
Can you please tell me what the main root cause of that error is?
This query:
UPDATE [dbo].[EmployeeTable]
SET [Name] = #0,
[JoiningDate] = #1
WHERE ([EmpId]=#2);
Should not normally take 2 seconds. It is (presumably) updating a single row and that is not a 2-second operation, even on a pretty large table. Do you have an index on EmployeeTable(EmpId)? If not, that would explain the 2 seconds as well as the potential for deadlock.
If you do have an index, this perhaps something else is going on. One place to look is for any triggers on the table.
If it's random, maybe something else is accessing the database while your application is updating rows.
We got the same exact issue. In our case, the root cause was BO reports being run on the production database. These reports were locking rows and causing timeouts in our applications (randomly since these reports were executed on demand).
Other things that you might want to check:
Do you have complex triggers on your table ?
Does all foreign keys used in your table are indexed in the foreign tables ?
I have one stored procedure in SQl Azure is calling periodically at 5 minutes and processing crore of data and it sometimes give Timeout error as per below as per my log.
Timeout expired. The timeout period elapsed prior to completion of
the operation or the server is not responding.
How can i increase Timeout of this query or whole Db? and what would be default Timeout?
Update
I think Time out is not due to connection to sql azure here as per answers of #Ming and #Ruchit because when i have checked the log then below error it will display message like
Warning: Null value is eliminated by an aggregate or other SET
operation.
It means query is being executed, above message because of i have used some aggregate function on NULL value. Am i thinking correct? what should be other possible cause?
Thanks in Advance.
According to http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/7070c2f9-07d1-4935-82c2-b74a196908ee/, SQL Azure will close idle connections that are longer than 5 minutes. As you mentioned you’re calling the stored procedure every 5 minutes, thus you may be on the edge of timeout. I would like to suggest you to change to every 4 minutes to see whether it works.
In addition, when using SQL Azure, retry is very important. When a query fails, please close the connection, wait for a few seconds, and then create a new connection, try the query again. Usually the second time will work fine.
Best Regards,
Ming Xu.
Ming Xu is right. the cause of the error is most probably the 5 minute timeout.
If you can not change the time period to call the stored procedure, one option is to make dummy call to SQL Azure every 3 or 4 mins. This will keep the connection from being closed.
Story
I have a SPROC using Snapshot Isolation to perform several inserts via MERGE. This SPROC is called with very high load and often in parallel so it occasionally throws an Error 3960- which indicates the snapshot rolled back because of change conflicts. This is expected because of the high concurrency.
Problem
I've implemented a "retry" queue to perform this work again later on, but I am having difficulty reproducing the error to verify my checks are accurate.
Question
How can I reproduce a snapshot failure (3960, specifically) to verify my retry logic is working?
Already Tried
RAISEERROR doesn't work because it doesn't allow me to raise existing errors, only user defined ones
I've tried re-inserted the same record, but this doesn't throw the same failure since it's not two different transactions "racing" another
Open two connections, start a snapshot transaction on both, on connection 1 update a record, on the connection 2 update the same record (in background because it will block), then on connection 1 commit
Or treat a user error as a 3960 ...
Why not just do this:
RAISERROR(3960, {sev}, {state})
Replacing {sev} and {state} with the actual values that you see when the error occurs in production?
(Nope, as Martin pointed out, that doesn't work.)
If not that then I would suggest trying to run your test query multiple times simultaneously. I have done this myself to simulate other concurrency errors. It should be doable as long as the test query is not too fast (a couple of seconds at least).