My application has 50 service endpoints (such as /mysite/myService.svc). It's hosted in IIS. Intermittently (once every two or three days) a service stops responding. It's never the same service that hangs. While a service is hung, some of the other services work fine and some other are also hung.
All clients (from different computers) get this error:
ServiceModel.CommunicationException
Message: An error occurred while receiving the HTTP response to
https://server/mysite/myservice1.svc.
This could be due to the service endpoint binding not using the HTTP
protocol. This could also be due to an HTTP request context being
aborted by the server (possibly due to the service shutting down).
See server logs for more details.
No exceptions are raised by the server when the client attempts to call the service that is hung. All I have is that error on the client side.
I have to manually recycle the application pool to fix the problem.
Do you know what could be the cause? How can I investigate this issue? I'm willing to take a memory dump of the worker process when a service is hung but I would not know what to search for in the dump.
Update (Aug 13 2009): I have almost ruled out the idea that the server runs out of connections (see comment in Shiraz Bhaiji's answer). I might have a new lead: I log all server-side exceptions in a log file. So in theory, when this occurs on the client, no exceptions are raised on the server; otherwise I'd have proof of that in my logs. But what if an error does occur on the server but is happening at a low level where exceptions are not routed to my exception handling code? I have posted this question about scenarios where low level exceptions cannot be handled. I'll keep you informed of the progress of my investigation.
Sounds like you are running out of connections.
By default WCF has a timeout and therefore holds a connection open for 10 mins.
When you recycle the app pool all connections are closed, and therefore things work again.
To fix it check your code to make sure that you close connections / dispose of proxies.
To resolve this, we set establishSecurityContext to False on the binding.
I have not come across this particular issue but would suggest to turn on tracing/message logging for the WCF service in the config for the service and/or the client app (if you have control over that). I've done this in the last few days for a service that I needed to troubleshoot.
The MSDN link here is a good starting point.
Also see the table in this post for the varying levels of trace detail you can configure. There are several levels which can go from exception only logging to full message details. It is quite quick to set this up in the app.config file.
To parse the log file output use the SvcTraceViewer.exe that comes with the Windows SDK, which if you have it installed should be located in this folder: C:\Program Files\Microsoft SDKs\Windows\v6.0\Bin
Related
I have two windows services. One ('server') acts as a WCF host to which the other ('client') connects. So I have configured a dependency from client to server. Both are also set up to start automatically.
When I start these services by hand, everything works fine. When I stop both services and tell client to start, then server will be started before client and all is fine.
However, when I reboot the machine only server is started.
When I add a diagnostic listener I see it got a TimeoutException with the helpful message:
The HTTP request to 'http://[server address]' has exceeded the allotted timeout of 00:00:00. The time allotted to this operation may have been a portion of a longer timeout.
At some other SO question there was an answer that claims WCF is probably confused about what went wrong and therefore starts lying about the timeout.
Did I perhaps miss a dependency for either service? Does WCF require something that hasn't or is being started when client is trying to contact server?
I think you should check your client service. On startup windows services are starting while network devices are still being initialized. Services should be ready to start without network and without any network device. Usual approach is to keep periodic retries to establish connection. You can do little experiment on your machine by uninstalling all network adapters and trying to start up your services.
Additional quick workaround you can do is to setup recovery options on your service -- for example you can configure it to restart service on crash after some timeout -- you can do this through UI in services.msc or in command line using 'sc config' command.
Configuring the dependency between the two Windows Services is not necessarily sufficient to avoid there being a race condition: i.e. to avoid the client service calling the WCF service before the server's WCF channel stack is fully initialised.
The service dependency just ensures that the Windows Service Control Manager won't start the client service process before the server Windows Service has notified the SCM that it has started. Whether this is sufficient depends on how you write the server.
If the server service starts a new thread on which to initialize the WCF stack, your OnStart method is probably returning before the WCF stack is ready for clients. There is then a race condition as to whether the client's first call will succeed.
On the other hand, if the server service does not return from OnStart (and thus doesn't notify the SCM that it has started) until the channel stack is fully open, the dependency removes the race condition, but there is a different pitfall: you need to beware that the SCM's own timeout for starting the Windows service is not triggered while waiting for the WCF stack to initialise, as might well happen on a reboot if the WCF service depends on the network stack, for example. If the server's OnStart does not return within the SCM's timeout, the SCM will not try to start the dependent client service at all, because it does not receive the server's start notification. (There will be a message in the Windows event log from the SCM saying that the server service didn't start within the expected time.) You can extend the SCM timeout by calling ServiceBase.RequestAdditionalTime while the WCF service is being initialised.
Either way, the client service really ought to be written so that it doesn't fail completely if the first WCF call doesn't succeed.
You don't actually say what binding you are using. If client and server services are always running on the same machine, as you seem to indicate, then consider using the NetNamedPipeBinding: then your service won't be dependent on initialization of networking resources and startup should be quicker.
This is a problem which has had me baffled for weeks now on a client's Live environment.
The WCF service is hosted on Windows Server 2003, and has both HTTP and MSMQ endpoints.
When placing the service in the test environment, the service cleanly starts and stops, and messages are passed without problems. However on the Live environment, the service starts fine, but does not exit cleanly.
When attempting to stop the service, the machine takes a long time to respond and eventually displays an error saying that the service could not be stopped. Inspecting the error on the event log, it says that it was unable to write to the MSMQ queue (access denied), however, the service is able read and remove messages from the queue. If one then refreshes the service manager, the service is in fact stopped.
The MSMQ queue is hosted on a different physical machine, and we have been unable to reproduce the error on the test environment.
We are not sure if it is related or not, but the service will also occasionally stop pulling messages from the queue. This has been solved by restarting the service. Again, we have not been able to reproduce the error.
Recently we experienced another error with the HTTP based client where upon midnight one night, the service suddenly started rejecting connections with the following exception:
The HTTP request is unauthorized with client authentication scheme 'Anonymous'. The authentication header received from the server was 'Negotiate,NTLM'. ---> System.Net.WebException: The remote server returned an error: (401) Unauthorized.
Even more curious, is that simply restarting the service seems to correct the problem.
If anyone has seen anything like this before or has any comments, it would be much appreciated!
Speaking to a colleague, apparently setting the ServiceModelEx throttling options all to "1" help with the lock ups on MSMQ based WCF services.
consider a WCF service, which is heavily used and behaves normally. But then is stopps responding. In the sevice level message trace you can see the outgoing message on the client, but no incoming message on the server. On transport level theres a incoming message and then nothing. After 60 seconds the client throws a TimeoutException.
What can cause a behavior like this?
What would you do to debug this behavior?
Is it possible that this behavior is caused by too many concurrent connections/sessions?
EDIT:
Client and Server are on the same machine. Both are .NET apps. When the client is restarted the problem sometimes does not happen. Also the problem does only appear on a single machine. I was not able to reproduce the behavior on any other machine.
Regards
Michael
I understand you have no problem on network level as you have mentioned that you can see incoming request on transport level.
So the first thing to check does the service is up and does it works if the client is on the same machine.
Also you can analyze incoming messages may be the problem can be there.
Here WireShark will be your friend.
Also check can you view the wsdl from the client machine. By the way are your clients also .NET apps?
You can configure tracing using the application’s configuration file—either Web.config for Web-hosted applications, or Appname.config for self-hosted applications by Service Trace Viewer
or use from debug tools such Debug Diag
I have a fairly straightforward WCF service that performs one-way file synchronization for a bunch of smart clients. I've noticed that when there's a network or service interruption during a call, the client stops being able to communicate with the server until the entire application is restarted.
The service runs with BasicHttpBinding and is hosted with IIS6 (a .svc page), using transferMode="Streamed" and messageEncoding="Mtom". The service is configured to use the default InstanceContextMode (I think it's Per Call?) and ConcurrencyMode=Single. It's using the default throttling behavior, but I'm in an isolated test environment that nobody else is hitting.
Clients are Windows Services. I'm using this ServiceProxyHelper to ensure connections are Close()'d or Abort()'d correctly when Dispose()'d, though there are no sessions so I don't think that even matters. When an error occurs, the Client object is disposed and then goes out of scope. After the exception is detected, the service waits a bit, then creates a new client object and tries again. So it should recover from the failure, but for some reason all subsequent calls to the service fail.
I can reproduce this reliably by starting a client, allowing it to transfer a few files, then iisresetting the server. First the client generally displays a "Service is Too Busy" error (which maps to the IIS 503 error that you get during an app restart). After that, all subsequent calls to the service time out. As far as I can tell the calls are not even being attempted by the client. I have tracing enabled and what I see is: Timeout error, followed by a "Failed to send request message over HTTP" warning, followed by another Timeout error.
The crazy thing is that when I configure the client to use Fiddler (port 8888) as a proxy in app.config, everything works as desired. So somehow Fiddler as the proxy is closing or finalizing some kind of connection that WCF on its own is not.
Thoughts?
Edit 2009-10-30 8:54PM: Changed service attributes to: InstanceContextMode=Single and ConcurrencyMode=Multiple. No difference.
Well that was painful. It took me forever but finally I zeroed in on the difference between running with a proxy vs without and started poking around the <system.net> settings. It turns out that adding this configuration bit to the client fixes the problem:
<system.net>
<settings>
<servicePointManager expect100Continue="false" />
</settings>
</system.net>
Can somebody explain what's going on? Why should this setting cause WCF clients to hang irreparably when there's a service interruption?
Are you sure this isn't a client side issue. If your Windows service is making the WCF calls on a seperate thread from the main, and you have an un-handled exception happening on the child thread...the calling thread may or may not sit there and wait forever becaues it's waiting for that thread to return.
That would explain why there's an Exception inside the service and then it looks like the service makes no more calls to the service...it's hung.
Used to be a huge issue when using Timers to spawn processes in .NET 2.0 Windows Services.
I have a WCF service that does some document conversions and returns the document to the caller. When developing locally on my local dev server, the service is hosted on ASP.NET Development server, a console application invokes the operation and executes within seconds.
When I host the service in IIS via a .svc file, two of the documents work correctly, the third one bombs out, it begins to construct the word document using the OpenXml Sdk, but then just dies. I think this has something to do with IIS, but I cannot put my finger on it.
There are a total of three types of documents I generate. In a nutshell this is how it works
SQL 2005 DB/IBM DB2 -> WCF Service written by other developer to expose data. This service only has one endpoint using basicHttpBinding
My Service invokes his service, gets the relevant data, uses the Open Xml Sdk to generate a Microsoft Word Document, saves it on a server and returns the path to the user.
The word documents are no bigger than 100KB.
I am also using basicHttpBinding although I have tried wsHttpBinding with the same results.
What is amazing is how fast it is locally, and even more that two of the documents generate just fine, its the third document type that refuses to work.
To the error message:
An error occured while receiving the HTTP Response to http://myservername.mydomain.inc/MyService/Service.Svc. This could be due to the service endpoint binding not using the HTTP Protocol. This could also be due to an HTTP request context being aborted by the server (possibly due to the server shutting down). See server logs for more details.
I have spent the last 2 days trying to figure out what is going on, I have tried everything, including changing the maxReceivedMessageSize, maxBufferSize, maxBufferPoolSize, etc etc to large values, I even included:
<httpRuntime maxRequestLength="2097151" executionTimeout="120"/>
To see maybe if IIS was choking because of that.
Programatically the service does nothing special, it just constructs the word documents from the data using the Open Xml Sdk and like I said, locally all 3 documents work when invoked via a console app running locally on the asp.net dev server, i.e. http://localhost:3332/myService.svc
When I host it on IIS and I try to get a Windows Forms application to invoke it, I get the error.
I know you will ask for logs, so yes I have logging enabled on my Host.
And there is no error in the logs, I am logging everything.
Basically I invoke two service operations written by another developer.
MyOperation calls -> HisOperation1 and then HisOperation2, both of those calls give me complex types. I am going to look at his code tomorrow, because he is using LINQ2SQL and there may be some funny business going on there. He is using a variety of collections etc, but the fact that I can run the exact same document, lets call it "Document 3" within seconds when the service is being hosted locally on ASP WebDev Server is what is most odd, why would it run on scaled down Cassini and blow up on IIS?
From the log it seems, after calling HisOperation1 and HisOperation2 the service just goes into la-la land dies, there is a application pool (w3wp.exe) error in the Windows Event Log.
Faulting application w3wp.exe, version 6.0.3790.1830, stamp 42435be1, faulting module kernel32.dll, version 5.2.3790.3311, stamp 49c5225e, debug? 0, fault address 0x00015dfa.
It's classified as .NET 2.0 Runtime error.
Any help is appreciated, the lack of sleep is getting to me.
Help me Obi-Wan Kenobi, you're my only hope.
I had this message appearing:
An error occured while receiving the HTTP Response to http://myservername.mydomain.inc/MyService/Service.Svc. This could be due to the service endpoint binding not using the HTTP Protocol. This could also be due to an HTTP request context being aborted by the server (possibly due to the server shutting down). See server logs for more details.
And the problem was that the object that I was trying to transfer was not [Serializable]. The object I was trying to transfer was DataTable.
I believe word documents you were trying to transfer are also non serializable so that might be the problem.
Yes, we'd want logs, or at least some idea of what you're logging. I assume you have both message and transport logging on at the WCF level.
One thing to look at is permissions. When you run under Cassini the web server is running as the currently logged in user. This hides any SQL or CAS permission problems (as, lets be honest, your account is usually a local administrator). As soon as you publish to IIS you are now running under the application pool user, which is, by default, a lot more limited.
Try turning on IIS debug dumps and following the steps in KB919789
Fyi, I changed IIS 6 to work in IIS 5.0 Isolation mode and everything works. Odd.
I had the same error when using an IEnumerable<T> DataMember in my WCF service. Turned out that in some cases I was returning an IQueryable<T> as an IEnumerable<T>, so all I had to do was add .ToList<T>() to my LINQ statements.
I changed the IEnumerable<T> to IList<T> to prevent making the same mistake again.