WCF net.tcp connection dies after 9 hours, 1 minute - wcf

I've got a WCF client talking to a WCF server (on the same machine). We use netTcpBinding with message-level security (using a custom principalPermissionMode and a custom implementation of serviceCredentials). The service is marked with InstanceContextMode.PerSession.
The WCF service is self-hosted in a Windows service (not in IIS).
In order to fake keep-alive, we have a Ping method that the client calls every 15 seconds. We keep the client proxy open for the lifetime of the client program (because initializing the session is expensive in our case).
Despite this, the connection is dropped after 9 hours, 1 minute and a bit (in 10 test runs, 7 of them died after 9h1m6s).
The only thing of consequence in the WCF logs is a "SocketConnection aborted" message, followed by a varying set of exceptions, but usually including a "connection was in the faulted state" exception.
Is there some timeout in WCF, or in TCP/IP, that's causing this? Because I'm stumped.

After much tedious investigation: After about 9 hours, the WCF client re-authenticates with the service. Something I'm doing during the authentication step is killing the existing session.

From your comments above you were running the tests at the same time.
Were they also on the same server, using the same application pool?
If so a recycling of the application pool could have caused all the tests to stop at the same time.

Related

Creating a connection to a WCF application

Good time of day. I would like to know how to properly connect in a WCF application. In other words, it should be created when the app is launched and be active throughout the entire operation? Or do you need to create a connection every time a service function is called? Now I have the first option, but somewhere everything is fine, and sometimes for unknown reasons I get an error: it is Impossible to use the object for communication, since it is in the failed state. There are no visible reasons for this - the code runs without errors. NetTcpBinding is used as the binding
The wcf service needs to be hosted in the process so that the client can connect to the server. As long as you are using the wcf service, you need to enable it. Faulted state means there has been an unexpected exception on the server side, so you need to use a try…catch block. Another possibility is that the channel has expired. The default timeout period of the WCF service is 10 minutes. If the client does not communicate with the server within 10 minutes, the channel will be closed. You need to recreate the channel to call the service.

How to recover from a comms/service failure with WCF netTcpBinding?

I'm developing a client/server app in which the client calls the WCF service every few seconds. I'm not using IIS - the service runs as a console app (with the intention of installing it as a Windows service on production systems).
I started off using basicHttpBinding, and if I stop the service (to simulate a comms/server failure) the client simply ignores the fact that it can't connect to the service, by handling the EndpointNotFoundException that gets thrown. After restarting the service, the client is able to start calling it again and everything is good.
I've now switched to using netTcpBinding, and this time when I stop the service it takes a little while for its console window to close (presumably due to the way TCP manages the connection, which eventually times out). At this point the client gets a CommunicationException ("the socket connection was aborted"). When I restart the service, the client isn't able to "resume" like it did with basicHttpBinding. Each time it tries to call the service it throws a CommunicationObjectFaultedException ("The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state.").
How would I go about building in some kind of resume/recovery behaviour, similar to what I saw with basicHttpBinding?
You cannot reuse the channel as it has faulted. You should cast your client to an ICommunicationObject and call Abort() to clean up.
After that you simply start afresh by creating a new client channel. You may want to do this on a timer if your server is down for a period of time.

WCF Service: Status 200 with sc-win32-status of 64

We observed the following behavior on one of the servers hosting a WCF service on IIS 6.0:
The IIS log shows a high value for time-taken (> 100000)
The HTTP status code is 200
sc-win32-status code shows a value of 64
I found out that sc-win32-status code of 64 indicates "The specified network is no longer available"
Initially I suspected that it could be because of limits set on MinFileBytesPerSecond, which sets the minimum throughput rate that HTTP.sys enforces when sending data from the client to the server, and back from the server to the client.
But the value for sc-bytes and cs-bytes indicate that the amount of data is sent is within the range generally observed for the service.
Also note that the WCF service is hosted on four boxes and is load-balanced, but the problem occurs only one of the servers. (but not essentially on the same server). The problem is also intermittent.
Has anybody else encountered this error? Any clues about what could be wrong?
Update
Note: Observation on IIS 7.5 (IIS version does not really matter)
I was able to replicate the issue. The issue occurs if:
1. The WCF service takes a long time to respond
2. The client proxy times out before it receives a response from the server. In this case it leads to TimeoutException on the client.
3. The server keeps waiting for TCP ACK for the client, which it would never receive.
Hence a long timeout (TCP socket timeout (default value: 4 minutes) and sc-win32-status of 64
So essentially it appears that WCF code is taking a long time to respond and the client is timing out, what I observe in IIS log is just a symptom and not a problem.
The behavior you are describing will also occur if you exceed a WCF service's max sessions, calls or instances (depending on how you have your service instancecontext mode configured). If you observe the System.ServiceModel performance counters for %max concurrent sessions and/or %max concurrent calls (again depending on your service's instance context), you may see a correlation with the IIS log entries.
Note that these maxes can be configured in the service throttling behavior.
https://msdn.microsoft.com/en-us/library/vstudio/system.servicemodel.description.servicethrottlingbehavior(v=vs.100).aspx
I saw your question again and wanted to point out that I found a solution for this. It turned out to be this piece of code in the web.config:
<pages smartNavigation="true">
After turning this off I stopped receiving the same time-out errors. See also the answer here
IIS put the services into sleep to save recources.
Copied from here (WCF REST Service goes to sleep after inactivity)
The application pool hosting your service defines Idle Time-out property (advanced settings of app pool in IIS management console) which defaults to 20 minutes. If no request is received by the app pool within idle timeout the worker processes serving the pool is terminated. After receiving a new request the IIS must start the process again, the process must load application domain and all related assemblies, compile .svc file, run the service host and process the request.The solution can be increasing idle time-out but the meaning of this time-out is correct handling of server resources. If the process is not needed it should be stopped. Another ugly workaround is using some ping process (for example cron job or scheduled task on the server) which will regularly ping call some method on the service or page in the same application.

WCF client timeout under heavy load

I have already asked a similar question here: WCF Service calling an external web service results in timeouts in heavy load environment but I've got a better idea now as to what's happening so posting a new question.
This is what is happening:
.NET client sends multiple requests at the same time to a WCF service (if it helps - I'm replicating this scneario by using Visual Studio Load Tests)
The client has got a "sendTimeout" set to 5 seconds
The WCF service receives it and start processing it. The processing involves sending a request to an external service which could take about 1 second to come back with a response
This is where I think the problem is: the client has sent many requests to the service and since the service is still busy processing the concurrent requests, some of the reqeusts from the client are timing out after 5 seconds
I have tried the following:
Changed the InstanceContextMode to PerCall
Increased the values of maxConcurrentCalls & maxConcurrentInstances
Increased the value of connectionManagement.maxconnection in machine.config
But none of that seems to be making any difference. Does anyone has any idea how can I ensure that I don't run into this timeout issue?
OK, you say WCF and that is not enough. What binding are you using and where are you hosting it? If you are using IIS, the could be different underlying problem than self-hosting.
The likely reason is the small number of ThreadPool size. You can use ThreadPool.SetMaxThreads() to change this but beware this is a sensitive value. Have a look here.
Check out the following link:
http://weblogs.asp.net/paolopia/archive/2008/03/23/wcf-configuration-default-limits-concurrency-and-scalability.aspx
I'm not sure what you're trying to achieve. Since the WCF service is doing a time consuming operation, you can't overload it and expect it to function. You can do the following (check the link about to set the following):
Increase the receiving capacity of the wcf service
Increase the send timeout of the service
Increase the send timeout of the client
Increase the receive timeout of the client
Limit the outgoing connections to the wcf service
The best and most robust option would be to configure and use MSMQ with the WCF service.

WCF Service polling hangs

I have 2 wcf services, 1 which polls the other service at regular interval.The service2 is hosted in no. of machines with the same configuration.
My problem is that whenever the poller service gets restarted, even though the service2 on other machines runs fine, i am not getting the response from those services (basically it gets timed out - getting SYSTEM.TimeOutException ). If I try to access the same service (service2) from some temp application (without restarting the service2) it gives response.
If I restart the service2, than it works fine, the service1 (poller service) gets the responses from all hosted services (Service2).
Dont know what is causing problem.
Regards,
Chirag
Attach VS to your wcf service which hangs. And find out if your connection is successful.
Do it with both services, so that you can debug the services at runtime.
If you're using a sessionful binding (netTcpBinding, wsHttpBinding), it's more than likely that you're not explicitly closing your client channel when you're done with it. This would cause the behavior you see, because the session takes a minute or so to time out if you don't explicitly close it, and the default max number of sessions is low (10)- the server will let new sessions stack up until old ones close. You can also adjust the service throttle on the server side binding to increase the max number of open sessions allowed, but you really should make sure your clients are getting cleaned up properly first.