Intermittent hang in WCF service during GENERAL_SET_RESPONSE_HEADER - wcf

I've got an intermittent hang in a WCF service. Calls that generally take milliseconds start taking upwards of 30 seconds or more to complete before the service recovers. All of the calls complete successfully however. New Relic reports that all of the time in the requests are spent in ExecuteRequestHandler.
I enabled failed request tracing on the server for all requests and watched and waited. When the site started hanging I pulled the traces down and I see the following which is typical:
136 - GENERAL_SET_RESPONSE_HEADER
HeaderName: Content-Length
HeaderValue:2237
Replace:false
Informational
273281 ms
All of the other steps in the log are timed at 0ms. The hanging function varies and when the service is running normally exactly the same functions with exactly the same parameters and response payloads behave perfectly. It seems that when the site starts hanging, all requests are blocked until is recovers.
Can anyone suggest where I go from here.
Thanks

This was way out there and pretty annoying from my point of view. New Relic was saying that the service was hanging in ExecuteRequestHandler and I dove into a rabbit hole trying to diagnose the hang.
The resolution turned out to be to adjust the throttling configuration for the WCF service:
<system.serviceModel>
<behaviors>
<serviceBehaviors>
<behavior name="...">
<serviceThrottling maxConcurrentCalls="512" maxConcurrentSessions="3200" maxConcurrentInstances="3712" />
Turns out that WCF services are throttled by default even without this entry being present and recently MS increased the default values. My service was falling over at about 200 rpm (which I don't think is excessive). The values that I went with are 8x the new defaults and everything works wonderfully now.
Why services are throttled with no indication that this is happening in the default config entries I have no idea. WCF has been configuration hell and I have resolved never to use it again. Web API from here on in.
Hope this helps someone one day :)

Related

Set DNS timeout for WCF webservice

I am using Visual Studio 2012 to generate a web service to be used by a winforms client. I created the client side by using "add service reference". This winforms client is a .net c# replacement of an old VB 6 app. Previously, in the VB app there were external settings for timeout values including the following:
DNS timeout
Connect timeout
Request timeout
The DNS timeout would work when the endpoint host address is a FQDN forcing a DNS lookup. The timeout value here would place a limit on the amount of time to wait for DNS resolution.
The connect timeout would place a limit on the amount of time the winforms client would wait to establish an http connection to the server. DNS lookup would have been successful.
The request timeout would place a limit on the amount of time to wait for the request to return after an http connection was successful. This would come into play if a long running query took too long after the web service call was initiated.
Is there something similar to the above in .net 4.0. I would like to be able to configure this in the app.config. I do know about the below.
<bindings>
<basicHttpBinding>
<binding name="IncreasedTimeout"
openTimeout="12:00:00"
receiveTimeout="12:00:00" closeTimeout="12:00:00"
sendTimeout="12:00:00">
</binding>
</basicHttpBinding>
Could these map to the ones I need or does it really not matter?
thanks
The OpenTimeout setting for the WCF binding is for the length of time to wait when opening the channel, so I believe this will be analogous to your old Connect timeout. This should be fast so you normally would only want to specify a few seconds to wait (30 or less), not 12 hours.
The WCF CloseTimeout is for when a Close Channel message is sent, and this is how long to wait for an acknowledgement. This may not have an equivalent in your old architecture. Again, this should be fast and should only need a few seconds.
The WCF SendTimeout (for the client config) essentially covers the time for the Client to send the message to the service, and to receive back the response (if any). This would correspond to your old Request timeout. This may need to be for several minutes if your server takes a while to process things.
The WCF SendTimeout (for the server config) is for when you want callbacks, so that the Server knows how long to wait for acknowledgement that its callback was received.
The WCF ReceiveTimeout does not apply to client-side configuration. For Server-side config the ReceiveTimeout is used by ServiceFramework layer to initialize the session-idle timeout (to be honest I don't really know what that is)
This MSDN discussion may be helpful http://social.msdn.microsoft.com/Forums/vstudio/en-US/84551e45-19a2-4d0d-bcc0-516a4041943d/explaination-of-different-timeout-types?forum=wcf
As a final note, having really big timeout values isn't a good idea unless you definitely have long running requests. This is because you can run out of available resources on your server if the client isn't closing the connections properly.

The server rejected the session-establishment request: WCF hosted on IIS

We have some WCF services implemented in an IIS application, communicating over net.tcp on the default port (808), using the Microsoft Net.Tcp Port Sharing Service, throwing an error on production servers. When I instantiate a connection to the first of the services, I get back an exception:
The server at <URL> rejected the session-establishment request. All the other services respond fine.
But it runs fine on our test servers.
I initially thought there was something wrong with the particular service that was failing, but I tried rearranging the list of services into a different order, and it SEEMS to always be the first service that I hit that fails. (I say SEEMS because it think once in the early iterations of testing, I saw it happen on the second service that it hit. But I haven't been able to reproduce that.)
I've looked at application startup delays, and that doesn't seem to be the problem, because I can come back and run the test again as soon as it finishes - a delay of only a minute or two - and get the same error. Also, in the lower level environments, there is a start up delay of probably 30 seconds to a minute, but the result still comes back as expected.
I've tried accessing the services over http from INetManager, and I get intermittent failures on all the services - a particular service will return a yellow screen of death on on invocation, then come up with the expected link to the WSDL on the next one seconds later.
I'm completely at a loss to explain this behavior, or how to resolve it. I've googled the error message, and not found anything helpful. It may be a configuration issue - the production servers are newly provisioned VM's, and we may not have the config exactly right (whereas all the lower level environments have been running this and other similar apps for some time), but I have not idea what to look for. I've looked at the properties of the app pool that the app is running on and compared it to the lower level environments without finding any differences.
If somebody can point me in the right direction, you would have my undying gratitude.
Things I can find:
http://go4answers.webhost4life.com/Example/connect-busy-wcf-service-host-while-725.aspx:
MaxConcurrentSessions (default = 10) [Per-channel] The maximum number of sessions that a service can accept at one time. Only comes into play with session-based bindings (wsHttp or netTcp)"
http://blogs.infosupport.com/unable-to-generate-a-wcf-proxy-using-svcutil-but-retreiving-the-wsdl-works/
So in the end the trick is to add the additional right on the c:\windows\temp folder for your App Pool Identity [for the service to be able to generate metadata] to solve the problem.
Also, are timeouts or other limits configured and being hit? Give tracing a look and access the service using WcfTestClient and see if you can find underlying errors.

How is a WCF Service and IIS integrated, what is the architecture and flow for incoming requests

I have been technically testing a WCF service recently and have got to the point where, my lack of understanding is not allowing me to progress forward and find a solution to a timeout problem we see.
We are load testing a WCF Service which is hosted on IIS7 on windows server 2008. The system set up to fire the messages actually fires them at an application which is biztalk. Biztalk then process the messages and sends them on to the end point of the WCF Service. The WCF Serviceis also using .net 2.0 in it's app pool (I guess this means it could actually be 3.0 or 3.5 as these were not full releases?
We fire 40 messages within a seconds and 90% of them become timed out due to the send timeout on the client (biztalk). We thought at first this was strange because we expected the server's basic http binding receive timeout to trigger first, but it turned out that was set at 10 minutes and the client send timeout was set at 1Min and 30 Secs.
What I understand:
WCF Services have config files which have inside them behaviors and http bindings. The Server end point we are sending an XML message to is using BasicHtppBindings: Timeouts:Open/Close is 1 Minute, Send and Recieve are 10 minutes. The server's timeout which we know are involved so far is: sendtimeout: 1 minute.
I understand WCF's architecture works by creating an instance of either a channel factory or service host and creates a channel stack which contains the behaviors and binding settings from the config as channels. There is a TransportAdaptor which is used to move the xml message once it has been processed through the channel stack.
I understand from IIS that http.sys handles the incoming requests. It passes requests to the workerprocess and when that is busy, it places requests onto the kernel mode queue? I understand there some machine.config settings that can be set to increase this queue/limit this queue?
I also know about how to make an app pool into a webgarden and I have read you can increase the number of threads per core, from the default of 12; this is don e via a registry setting or a later on in .net a web config change.
I just read about InstanceContextMode and how it can effect the server's service too... But I'm unsure what that is set to in this case.
We recorded some perforamance counters, .net ones and I noticed the number of current requests minus the (Queued+Disconnected) = 12. Which indicates we are using 1 core? and the number of threads of on that core is set to 12.
Can anyone help me for a clearer picture and help piece my knowledge with some extra into something that is more complete?
The WCF Behavior has a throttle setting. Here is an example (grabbed from msdn):
<service
name="Microsoft.WCF.Documentation.SampleService"
behaviorConfiguration="Throttled" />
..... .....
<behaviors>
<serviceBehaviors>
<behavior name="Throttled">
<serviceThrottling
maxConcurrentCalls="1"
maxConcurrentSessions="1"
maxConcurrentInstances="1"/>
</behavior>
</serviceBehaviors>
By default (if not specified), the service is throttled to 10 concurrent calls.
I find that a sensible production setting for high volume clients running short calls is more like 100. Of course it depends on your implementation, but the defualt definitely hurts performance on my test and production systems.

WCF Client hang on service interruption

I have a fairly straightforward WCF service that performs one-way file synchronization for a bunch of smart clients. I've noticed that when there's a network or service interruption during a call, the client stops being able to communicate with the server until the entire application is restarted.
The service runs with BasicHttpBinding and is hosted with IIS6 (a .svc page), using transferMode="Streamed" and messageEncoding="Mtom". The service is configured to use the default InstanceContextMode (I think it's Per Call?) and ConcurrencyMode=Single. It's using the default throttling behavior, but I'm in an isolated test environment that nobody else is hitting.
Clients are Windows Services. I'm using this ServiceProxyHelper to ensure connections are Close()'d or Abort()'d correctly when Dispose()'d, though there are no sessions so I don't think that even matters. When an error occurs, the Client object is disposed and then goes out of scope. After the exception is detected, the service waits a bit, then creates a new client object and tries again. So it should recover from the failure, but for some reason all subsequent calls to the service fail.
I can reproduce this reliably by starting a client, allowing it to transfer a few files, then iisresetting the server. First the client generally displays a "Service is Too Busy" error (which maps to the IIS 503 error that you get during an app restart). After that, all subsequent calls to the service time out. As far as I can tell the calls are not even being attempted by the client. I have tracing enabled and what I see is: Timeout error, followed by a "Failed to send request message over HTTP" warning, followed by another Timeout error.
The crazy thing is that when I configure the client to use Fiddler (port 8888) as a proxy in app.config, everything works as desired. So somehow Fiddler as the proxy is closing or finalizing some kind of connection that WCF on its own is not.
Thoughts?
Edit 2009-10-30 8:54PM: Changed service attributes to: InstanceContextMode=Single and ConcurrencyMode=Multiple. No difference.
Well that was painful. It took me forever but finally I zeroed in on the difference between running with a proxy vs without and started poking around the <system.net> settings. It turns out that adding this configuration bit to the client fixes the problem:
<system.net>
<settings>
<servicePointManager expect100Continue="false" />
</settings>
</system.net>
Can somebody explain what's going on? Why should this setting cause WCF clients to hang irreparably when there's a service interruption?
Are you sure this isn't a client side issue. If your Windows service is making the WCF calls on a seperate thread from the main, and you have an un-handled exception happening on the child thread...the calling thread may or may not sit there and wait forever becaues it's waiting for that thread to return.
That would explain why there's an Exception inside the service and then it looks like the service makes no more calls to the service...it's hung.
Used to be a huge issue when using Timers to spawn processes in .NET 2.0 Windows Services.

WCF Service hangs and clients receive a ServiceModel.CommunicationException

My application has 50 service endpoints (such as /mysite/myService.svc). It's hosted in IIS. Intermittently (once every two or three days) a service stops responding. It's never the same service that hangs. While a service is hung, some of the other services work fine and some other are also hung.
All clients (from different computers) get this error:
ServiceModel.CommunicationException
Message: An error occurred while receiving the HTTP response to
https://server/mysite/myservice1.svc.
This could be due to the service endpoint binding not using the HTTP
protocol. This could also be due to an HTTP request context being
aborted by the server (possibly due to the service shutting down).
See server logs for more details.
No exceptions are raised by the server when the client attempts to call the service that is hung. All I have is that error on the client side.
I have to manually recycle the application pool to fix the problem.
Do you know what could be the cause? How can I investigate this issue? I'm willing to take a memory dump of the worker process when a service is hung but I would not know what to search for in the dump.
Update (Aug 13 2009): I have almost ruled out the idea that the server runs out of connections (see comment in Shiraz Bhaiji's answer). I might have a new lead: I log all server-side exceptions in a log file. So in theory, when this occurs on the client, no exceptions are raised on the server; otherwise I'd have proof of that in my logs. But what if an error does occur on the server but is happening at a low level where exceptions are not routed to my exception handling code? I have posted this question about scenarios where low level exceptions cannot be handled. I'll keep you informed of the progress of my investigation.
Sounds like you are running out of connections.
By default WCF has a timeout and therefore holds a connection open for 10 mins.
When you recycle the app pool all connections are closed, and therefore things work again.
To fix it check your code to make sure that you close connections / dispose of proxies.
To resolve this, we set establishSecurityContext to False on the binding.
I have not come across this particular issue but would suggest to turn on tracing/message logging for the WCF service in the config for the service and/or the client app (if you have control over that). I've done this in the last few days for a service that I needed to troubleshoot.
The MSDN link here is a good starting point.
Also see the table in this post for the varying levels of trace detail you can configure. There are several levels which can go from exception only logging to full message details. It is quite quick to set this up in the app.config file.
To parse the log file output use the SvcTraceViewer.exe that comes with the Windows SDK, which if you have it installed should be located in this folder: C:\Program Files\Microsoft SDKs\Windows\v6.0\Bin