Error 40 and SqlAzureExecutionStrategy - azure-sql-database

I have a service fabric service (guest executable), using entityframework core, talking to sql azure.
From time to time I see the following error:
A network-related or instance-specific error occurred while establishing a connection
to SQL Server. The server was not found or was not accessible. Verify that the
instance name is correct and that SQL Server is configured to allow remote connections.
(provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)
It seems transient as there are numerous database transactions that occur without errors. This seems to occur more when a node is busy.
I've added code in start up to set EnableRetryOnFailure to set the SqlServerRetryingExecutionStrategy:
services.AddEntityFrameworkSqlServer()
.AddDbContext<MyDbContext>(options =>
options.UseSqlServer(_configuration.GetConnectionString("MyDbConnection"),
o => o.EnableRetryOnFailure()))
One major caveat, is at the moment I'm losing context so I don't know what data was attempting to be updated/inserted, so I don't know if it was eventually successful or not.
Couple of questions:
From the Transient Detection Code it doesn't look like error: 40 is caught, but my understanding is that error 40 may actually be another error (unclear). Is that correct?
Is this really a transient issue or does it mean I have another problem?
Without additional logging (working on it), do we know if the retry strategy logs the error, but still retry's and in fact may have been successful?
If this is a transient error, but it's not caught in the default execution strategy, why not? and what would be unintentded consequences of sub classing the SqlAzureExecutionStrategy to include this error.
I've seen this question: Sql Connection Error 40 Under Load, and it feels familiar, but he seems to have resolved it by tuning his database - which I will look at doing, I'm trying to make my code more resilient in the case of database issues.

There is a certain version of EF Core that caches the query or requests if the time span between two database transactions is very small, so update your packages to make sure you are using the most recent.
Query: Threading issues cause NullReferenceException in SimpleNullableDependentKeyValueFactory #5456
check these other links
https://github.com/aspnet/EntityFramework/issues/5456
https://github.com/aspnet/Security/issues/739
https://github.com/aspnet/EntityFramework/issues/6347
https://github.com/aspnet/EntityFramework/issues/4120

Related

Unhandled Exception Error - Login Failed for User

We have a strange error here. In our ASP.NET 4.6 app, using Entity Framework 6.2, we are getting "Login failed for user" when accessing the SQL Azure database. I'm pretty sure the cause of the error is switching tiers in Azure. What I don't get is why the error isn't caught. Every SQL operation we have is inside a try...catch block. The errors fall out of the block and get caught by Globals.asax just before the app crashes.
We have
SetExecutionStrategy("System.Data.SqlClient", Function() New SqlServer.SqlAzureExecutionStrategy(10, TimeSpan.FromSeconds(7)))
which,as I understand it, will retry any SQL execution 10 times for at least 70 seconds from the first error. According to the Microsoft tech support, this isn't engaged because it hasn't made the connection to SQL Azure yet. The ConnectRetryCount and interval in the connection string do not apply since it is talking to the server. The server is just saying, "I know you are there, but I'm not going to let you in!"
According to MS Tech support, the only way around this is to have a try...catch block around all of our SQL commands... which we do! It just falls through and crashes the app!
I can't do a retry in globals.asax because at that point, it is already crashed.
According to MS, there is no way to trap the error in the context and retry from there. So, what's the solution? There must be some answer other than, "just let the app crash and have them refresh the page!"
When the page is refreshed seconds later, all is fine. No errors, no problems.
Example of one of the lines of code throwing the error:
MapTo = ctx.BrowserMaps.FirstOrDefault(Function(x) code.Contains(x.NameOrUserAgent))
It's really very straight forward. this one just happens to come up a lot because this code block is called frequently. The actual SQL request is irrelevant because no matter what line is used, the connection, within EF, fails.
Server logins will be disconnected while scaling up/down to a new tier, and transactions are rolled back. However, contained database logins stay connected during the scaling process, and for that reason they are recommended over server logins.
Having a try and catch may not solve the issue because you may be capturing error # zero and a lot of errors in Azure SQL database fall on that error 0 category.
Just a comment, performance after scaling may be poor right after scaling and improves after a few minutes. Query plans may also change.

SQL Azure: Frequent error

We are having trouble with Sql Azure. We deploy a ancient .Net 4.0/C# based website on Azure. It is all good, except that every here and there we start getting Timeout or anything related to database start giving error. Couple of time it stop insert, DataSet.Fill function fails with Timeout or "Column not found".
As I understand it happens when SQL Azure is probably is busy to respond to our request and break connection in between, probably restarting or switching the node. However it looks bad on end client, when such error appears. Our Database is less than 1 GB in size and we got 10-15 at a time. So I don't think we have heavy load [as Azure is ready for 150 GB Database ours is less than 1% of it].
Anyone suggest what we should do to avoid such error, can we detect such upcoming error?
You will need to use Transient Fauly handling, The Transient Fault Handling Application Block is good place to start. When refactoring with legacy code for Sql Azure, I found it easier to use the SqlConnection and SqlCommand Extension methods:
Example of SqlConnection extions:
var retryPolicy = RetryPolicyFactory.GetDefaultSqlConnectionRetryPolicy()
using(var connection = new SqlConnection(connectingString))
{
connection.OpenWithRetry(retryPolicy);
///Do usual stuff with connection
}

Async=true and Entity Framework

Background WCF Stack, Data Access Implemented in Entity Framework, Simple ASP.NET Front End
This is a two part question.
Recently we ran into an issue with periodic crashes with an exception that read:
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available
We had been running our application without issues for over a week, and then all the sudden we were hit with this random crash/ If I had to guess I would say it was network related, but we were unable to determine the exact source. Has anyone periodically gotten this message? If so what was the root cause?
Second question is someone suggested to set "async=true" in our Entity Framework connection string. I was under the impression this just enables the async api. Does this do anything when you are using EF? Does switching this flag do anything with the queries that get generated by EF?
To be that guy I will answer this one on my own.
First I posted the question about the "async=true"s effect on entity framrwork to MS and no one answered ... as usual(if they answer i will update this post).
Our issue:
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available
Was environment related. Something was causing the DB to run a little bit slower, but it was hinting to a larger issue. Apparently EF has horrible issues when you share context between threads (not an easy problem to solve), so we were seeing a race condition with opening connections.
We basically had a "read only context" that only did gets. Our issue was two threads attempt to open the connection at the same time, one wins, the other loses resulting in some variation of the exception below:
The connection was not closed. The connection's current state is connecting.
Our solution was to convert our singleton to be thread specific. Not exactly what we wanted, but it worked, and when we pushed this fix our other issue magically went away.
The second half to this question was what does async=true do. When it comes to EF, it made our system crash. We had a block of code that did a join, and if async=true and MARS=false we got a:
There is already an open DataReader associated with this Command which must be closed first
Once we cut back on MARS, and disabled async things were good again.

SQL Server - Timed Out Exception

We are facing the SQL Timed out issue and I found that the Error event ID is either Event 5586 or 3355 (Unable to connect / Network Issue), also could see few other DB related error event ids (3351 & 3760 - Permission issues) reported at different times.
what could be the reason? any help would be appreciated..
Can you elaborate a little? When is this happening? Can you reproduce the behavior or is it sporadic?
It appears SharePoint is involved. Is it possible there is high demand for a large file?
You should check for blocking/locking that might be preventing your query from completing. Also, if you have lots of computed/calculated columns (or just LOTS of data), your query make take a long time to compute.
Finally, if you can't find something blocking your result or optimize your query, it's possible to increase the timeout duration (set it to "0" for no timeout). Do this in Enterprise Manager under the server or database settings.
Troubleshooting Kerberos Errors. It never fails.
Are some of your webapps running under either the Local Service or Network Service account? If so, if your databases are not on the same machine (i.e. SharePoint is on machine A and SQL on machine B), authentication will fail for some tasks (i.e. timerjob related actions etc.) but not all. For instance it seems content databases are still accessible (weird, i know, but i've seen it happen....).

ODBC SQL Server driver error

I have a VB6 app that access's a database thru a ODBC Connection. It will run fine for a few hours then I get the following Error. Any Ideas?
[Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionWrite(WrapperWrite())
From Googling the error, it sounds like that's just ADO's way of saying it can't connect - that the server is unreachable. Are there are any other services on that server or that use the database that become unavailable at the same time as this error? It sounds like the client is just losing its connection, so I'd look for anything around that - dropped network connectivity, or a downed/overwhelmed server, to name a few examples.
Does your program have to reach across a network to get to the Access file?
If so I'd look into any intermittent network connectivity issues, especially if your program is always connected to the data source.
Check any logs you can to see what's happening on your network at the time of the error.
If possible change your app to connect to the data source only when you need to access it and then disconnect when done.
Is there more than one instance of the program running on the same and/or different machines? If so, do they all get the error at the same time?
If possible try to have more than one instance of you program running on the same machine and see if they all get the error at the same time.
Also:
Does the error happen about the same amount of time after initial connection?
Does the error happen about the same amount of inactivity in your application?