I created an internal website for our company. It run smoothly for several months and then I add more items to website. When I run in live, it run normally. Then suddenly one of my user from another server sending me an "The Wait operation timed out." error. When I check access that certain link, It run normally for me and some other who I ask to check if they access that page. I already increase the connection timeout but still no luck. Is it the error come from another server? Can someone explain the possible causes?
This is how the another plant faced, every time they firstly open the website, error screen show up, but when they refresh it, they can use the website. I dont know why this happened. I need your help.
Down below is a error detail:
1.Exception Details: System.ComponentModel.Win32Exception: The wait operation timed out
source error :An unhandled exception was generated during the execution of the current web request.
2.Information regarding the origin and location of the exception can be identified using the exception stack trace below.
Thanks in advance
The fact that this happens for a user but not for the testers implies this may occur when the system is under load; database timeouts are pretty common in database queries functioning under stress if the database has been set up "out of the box" without tuning.
I would suggest referring to
The wait operation timed out. ASP
I don't have enough information to troubleshoot more question properly, since I don't know what DBMS you are working with. But as a rule this seems to happen because a call to the database is timing out. In SQL Server, increasing the CommandTimeout (NOT connection timeout) is one of the quick-and-dirty ways to solve the problem.
In SQL Server, CommandTimeout is the time allowed for an operation before exiting with a time out error. Connectiontimeout, by contrast, is the time the system waits when trying to open an initial connection to the database. Changing connectiontimeout won't help with the timeout of an operation, but commandtimeout will.
Other DBMS systems will have other mechanisms for resolving timeout issues.
That's one quick and dirty solution. The longer solution is to add more logging to your system to identify which calls are timing out, then doing some DBA work to optimize the query and database performance. My understanding is that entity frameworks also have tuning options for automatically generated queries, but exactly what those are depends on which one you're using!
Related
I'm facing this issue intermittently now, where the query (called from stored Procedure) goes for CXSYNC_PORT wait type and continues to remain in that for longer time (sometimes 8hours in stretch). I had to kill the process and then rerun the procedure. This procedure is called every 2-hours from ADF pipeline.
What's the reason for this behavior and how do I fix the issue?
I searched a lot and there is not Microsoft documents talk about the wait type: CXSYNC_PORT. Others have asked the same question but still with no more details.
Most suggestions are that ask the same problem in more forums. Or ask professional engineer for help, and they will deal with your problem separately and confidentially.
Ask Azure support for details help: https://learn.microsoft.com/en-us/azure/azure-portal/supportability/how-to-create-azure-support-request
And here's the same question which Microsoft engineer gave more details about the issue:
As part of a fix CXPACKET waits were further broken down into
CXSYNC_CONSUMER and CXSYNC_PORT (and data transfer waits still
reported as CXPACKET) as to distinguish between different wait times
for correct diagnose of the problem.
Basically, CXPACKET is divided into 3: CXPACKET, CXSYNC_PORT,
CXSYNC_CONSUMER. CXPACKET is used for data transfer sync, while
CXSYNC_* are used for other synchronizations. CXSYNC_PORT is used for
synchronizing opening/closing of exchange port between consuming
thread and producing thread. Long waits here may indicate server load
and lack of available threads. Plans containing sort may contribute
this wait type because complete sorting may occur before port is
synchronized.
Please ref this link What is causing wait type CXSYNC_PORT and what to do about it? to get more useful messages. But for now, there isn't an exact solution.
use query hint OPTION(MAXDOP 1)
This will run your long running query in a single thread and you won't get the CX type waits. In my experience this can make a massive 10-20X decrease in execution time and will free up CPU for other tasks as there will be no context switching and thread coordination activity.
We have a strange error here. In our ASP.NET 4.6 app, using Entity Framework 6.2, we are getting "Login failed for user" when accessing the SQL Azure database. I'm pretty sure the cause of the error is switching tiers in Azure. What I don't get is why the error isn't caught. Every SQL operation we have is inside a try...catch block. The errors fall out of the block and get caught by Globals.asax just before the app crashes.
We have
SetExecutionStrategy("System.Data.SqlClient", Function() New SqlServer.SqlAzureExecutionStrategy(10, TimeSpan.FromSeconds(7)))
which,as I understand it, will retry any SQL execution 10 times for at least 70 seconds from the first error. According to the Microsoft tech support, this isn't engaged because it hasn't made the connection to SQL Azure yet. The ConnectRetryCount and interval in the connection string do not apply since it is talking to the server. The server is just saying, "I know you are there, but I'm not going to let you in!"
According to MS Tech support, the only way around this is to have a try...catch block around all of our SQL commands... which we do! It just falls through and crashes the app!
I can't do a retry in globals.asax because at that point, it is already crashed.
According to MS, there is no way to trap the error in the context and retry from there. So, what's the solution? There must be some answer other than, "just let the app crash and have them refresh the page!"
When the page is refreshed seconds later, all is fine. No errors, no problems.
Example of one of the lines of code throwing the error:
MapTo = ctx.BrowserMaps.FirstOrDefault(Function(x) code.Contains(x.NameOrUserAgent))
It's really very straight forward. this one just happens to come up a lot because this code block is called frequently. The actual SQL request is irrelevant because no matter what line is used, the connection, within EF, fails.
Server logins will be disconnected while scaling up/down to a new tier, and transactions are rolled back. However, contained database logins stay connected during the scaling process, and for that reason they are recommended over server logins.
Having a try and catch may not solve the issue because you may be capturing error # zero and a lot of errors in Azure SQL database fall on that error 0 category.
Just a comment, performance after scaling may be poor right after scaling and improves after a few minutes. Query plans may also change.
Just had a bizarre issue with SQL Azure, and it's happened in a small phase just before full go live with some users doing some data entry.
"Database 'dbname' on server 'xxx' is not currently available. Please rety the connection later. If the problem persists, contact customer support."
When I tried to connect via SQL Azure database website I got:
"Firewall check failed.
Resource ID : 1. The request minimum guarantee is 0,
maximum limit is 180 and the current usage for the database is 0.
However, the server is currently too busy to support request greater than 0 for this database."
Looking at the databases section of the Azure Management website the site reported it couldn't access the DB, but I didn't capture the exact error message unfortunately.
Bizarrely, a couple of my users were still able to login to our system website that access the DB, and view and save data. Eventually they lost connection too however.
After an hour or so, the databases came back to life and we could fully access them again.
I have looked at the servers master db event table using queries from here and there was a couple of connection failures but nothing interesting. No throttling or deadlocks, a couple of failed connections that said "Client may have timed out when establishing connection. Try increasing the connection timeout." in the description
Any ideas where else to look?
Business users have had a massive drop in confidence because of this.
What your describing normally occurs because of :
1) SQL Connection limit being hit. Assuming you don't see this often you unlikely to be the cause. But worth checking putting a limit on your connection pool can help.
2)You neighbours being extremely noisy and thus the node re-adjusts.
3) Hardware failure and Microsoft bringing your database back online in a different node. This can take some time.
Normally I have seen this when Microsoft have throttled or had problems with a box and had to recover everyone over. Because you are on a shared system you have to keep in mind that they are recovering everyone else also in that node also and thus sometimes this takes time.
The best bet if you are worried and need to get a resolution for the business is to open a support ticket with MS and give them the time and error message you saw this. They will investigate and generally they have really good back end telemetry that will point to a reason. This will allow you to give the business a resolution and then you can make a call on future plans and contingencies. You have to keep in mind though that SQL Azure is shared system and transient errors can happen, you might need to design more failover into your designs.
Background WCF Stack, Data Access Implemented in Entity Framework, Simple ASP.NET Front End
This is a two part question.
Recently we ran into an issue with periodic crashes with an exception that read:
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available
We had been running our application without issues for over a week, and then all the sudden we were hit with this random crash/ If I had to guess I would say it was network related, but we were unable to determine the exact source. Has anyone periodically gotten this message? If so what was the root cause?
Second question is someone suggested to set "async=true" in our Entity Framework connection string. I was under the impression this just enables the async api. Does this do anything when you are using EF? Does switching this flag do anything with the queries that get generated by EF?
To be that guy I will answer this one on my own.
First I posted the question about the "async=true"s effect on entity framrwork to MS and no one answered ... as usual(if they answer i will update this post).
Our issue:
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available
Was environment related. Something was causing the DB to run a little bit slower, but it was hinting to a larger issue. Apparently EF has horrible issues when you share context between threads (not an easy problem to solve), so we were seeing a race condition with opening connections.
We basically had a "read only context" that only did gets. Our issue was two threads attempt to open the connection at the same time, one wins, the other loses resulting in some variation of the exception below:
The connection was not closed. The connection's current state is connecting.
Our solution was to convert our singleton to be thread specific. Not exactly what we wanted, but it worked, and when we pushed this fix our other issue magically went away.
The second half to this question was what does async=true do. When it comes to EF, it made our system crash. We had a block of code that did a join, and if async=true and MARS=false we got a:
There is already an open DataReader associated with this Command which must be closed first
Once we cut back on MARS, and disabled async things were good again.
We are facing the SQL Timed out issue and I found that the Error event ID is either Event 5586 or 3355 (Unable to connect / Network Issue), also could see few other DB related error event ids (3351 & 3760 - Permission issues) reported at different times.
what could be the reason? any help would be appreciated..
Can you elaborate a little? When is this happening? Can you reproduce the behavior or is it sporadic?
It appears SharePoint is involved. Is it possible there is high demand for a large file?
You should check for blocking/locking that might be preventing your query from completing. Also, if you have lots of computed/calculated columns (or just LOTS of data), your query make take a long time to compute.
Finally, if you can't find something blocking your result or optimize your query, it's possible to increase the timeout duration (set it to "0" for no timeout). Do this in Enterprise Manager under the server or database settings.
Troubleshooting Kerberos Errors. It never fails.
Are some of your webapps running under either the Local Service or Network Service account? If so, if your databases are not on the same machine (i.e. SharePoint is on machine A and SQL on machine B), authentication will fail for some tasks (i.e. timerjob related actions etc.) but not all. For instance it seems content databases are still accessible (weird, i know, but i've seen it happen....).