Is there something in WCF configuration that defines a timeout for executing a request at service side? E.g. WCF service will stop executing request after some time period. I have a service which make some work depending on client input. In some cases a such call may take too much time. I want to limit the execution time of such requests on service side, not client one using SendTimeout. I know about OperationTimeout property, but it doesn't abort the service request, it just tells a client that the request is timed out.
In general terms, there's nothing that will totally enforce this. Unfortunately, it's one of those things that the runtime can't really enforce nicely without possibly leaving state messed up (pretty much the only alternative for it would be to abort the running thread, and that has a bunch of undesirable consequences).
So, basically, if this is something you want to actively enforce, it's a lot better to design your service to deal with this so that your operation execution has safe interruption points where the operation can be terminated if the maximum execution time has been exceeded.
Though it's a lot more work, you'll likely be more satisfied with it in the long run.
Related
Please excuse the Obvious Self-Q/A, but this information is widely misunderstood, and almost always incorrectly answered. So I Wanted to place this information here for people searching for a definitive answer to this problem.
Even so, there's still some information I haven't been able to nail down. I will put this towards the end of the question (skip to that if you are not interested in the preamble).
How do I correctly configure a WCF NetTcp Duplex Reliable Session?
There are many questions and answers regarding this topic, and nearly all of them suggest setting inactivityTimeout="Infinite" in your configuration. This doesn't really seem to work correctly, particularly for the case of NetTcp (It may work correctly for WSDualHttp Bindings, but I have never used those).
There are a number of other issues that are often associated with this: Including, Channel not faulting after client or server unexpectedly disconnected, Channel disconnecting after 10 minutes, Channel randomly disconnecting... Channel throwing exception when trying to open... Unable to configure Metadata on same endpoint...
Please note: There are two concepts that are important below. Infrastructure messages are internal to the way WCF communicates, and are used by the framework to keep things running smoothly. Operation messages are messages that occur because your app has done something, like send a message across the wire. Infrastructure messages are largely invisible to your app (but they still occur in the background) while operation messages are the result of an action your app has taken.
Information I have figured out, through hard won trial and error.
Infinite does not appear to be a valid configuration setting in all situations (and certainly, the visual studio validation schema doesn't know about it).
There are two special configuration converters, called InfiniteIntConverter and InfiniteTimeSpanConverter which will sometimes work to convert the value Infinite to either Int.MaxValue or TimeSpan.MaxValue, but I haven't yet figured out the situations in which this appears to be valid as sometimes it works, and sometimes it doesn't. What's more, it appears that some libraries will allow Infinite in the config, while others will not, so you can succeed in one part of a configuration, but fail in another.
You must configure BOTH inactivityTimeout and receiveTimeout, on both the client and the server. While these values do not HAVE to be the same, they probably should be as they will probably cause confusion if they are not. (technically, you can leave inactivityTimeout to its default value if you want, but you should be aware of its value, and what it does)
inactivityTimeout should NEVER be set to a large value, much less Infinite or TimeSpan.MaxValue.
inactivityTimeout has two functions (and this is not widely understood). The first function defines the maximum amount of time that can elapse on a channel without receiving any "infrastructure" or "operation" messages. The second function defines the time period in which infrastructure messages are sent (half the time specified). If no infrastructure or operation messages have been received during the timeout period, the connection is aborted.
receiveTimeout specifies the maximum amount of time that can elapse between operation messages only. This value can be set to a large value, such as TimeSpan.MaxValue (particularly if your channel runs internally over a trusted network or over a vpn). This value is what defines how long the reliable session will "stay alive" if there is no activity between client and server (other than infrastructure messages). ie, your client does not call any methods of the interface, and your server does not call back into the client.
setting a short inactivityTimeout and a large receiveTimeout keeps your reliable session "tacked up" even when there is no operational activity between your client and server. The short inactivity timeout (i like to keep the default 10 minutes or less) sends infrastructure "ping" messages to keep the TCP connection alive while the long receive timeout keeps the reliable session active. while at the same time providing a reasonable timeout in case of disconnection.
If you set inactivityTimeout to a large value, then the reliable session will not be reliable as it has no way to keep the Tcp connection alive, nor does it have any way to verify the integrity of the connection. It won't know if a user has disconnected unexpectedly until you try and send a message to that client and find out the connection is no longer there. This is why many people who use Infinite for this setting resort to creating a "Ping" method in their service, which is completely unnecessary if you've configured these settings correctly.
If you set inactivityTimeout to a value larger than receiveTimeout then it will likewise also be unreliable, as you will still be governed by the receiveTimeout for operation messages. ie. if you forget to set receiveTimeout and leave it at the default 10 minutes, then if the user is idle for 10 minutes, the connection will be aborted.
When the client or server unexpectedly disconnects (app crashes, network failure, someone trips over the power cord, etc..), the other side may not notice right away. I have attached various ChannelFaulted event handlers in various test situations, and sometimes the connection is faulted right away... other times it doesn't seem to fault at all. What i have discovered through trial and error is that the when it doesn't seem to fault, it will actually fault after the inactivityTimeout expires on that end. (so if it's set to 10 minutes, then after 10 minutes it will call the ChannelFaulted event).
I have not yet figured out why in some situations it notices the disconnection right away, and others it waits for the timer to expire. In both cases, I notice internal first chance communication exceptions thrown and handled by the framework, and there are calls to Abort the connection... but somehow the call to the event handler gets lost and it must wait for the timeout. My suspicion is this is somehow thread related.
When trying to configure Metadata to work across the NetTcp channel, I have had sporadic results. Sometimes it works, sometimes it doesn't. I've read many reports that Metadata doesn't work over NetTcp and that you have to use an Http channel for the Metadata, but I have in fact had it work on several occasions using the net.tcp:// url to generate the proxy. Then I would change something, recompile and it would no longer work. Changing things back, it wouldn't work again. So it was very confusing what magic incantation was necessary to make Metadata function over net.tcp, shared with the endpoint on the same port (obviously with a different address).
When configuring both a NetTcp and Metatdata endpoint on the same service, and specifying non-default settings for connection parameters like listenBacklog, and maxConnections, you also need to make sure the Metadata endpoint uses the same settings, which typically means you have to define a custom binding, since these settings are not available from the standard tcp mex binding. This includes setting listenBacklog and maxPendingConnections on tcpTransport, and groupName and maxOutboundConnectionsPerEndpoint on connectionPoolSettings.
The default setting for the Ordered setting of ReliableSession is True. This uses a lot more overhead than turning it off. If you don't need ordered messages, i would suggest turning it off (need to set this on both sides)
-
Configuration I still need to understand:
How do I correctly configure the shared net.tcp Metadata endpoint? (I will add an example when I get a chance) Currently, i'm specifying an http get url to bypass the problem. It's so inconsistent as to why it sometimes works and sometimes does not. I kept getting the error `The URI Prefix is not recognized' when generating the proxy in Visual Studio.
Why does WCF sometimes Fault the channel immediately upon disconnect, and sometimes waits for inactivityTimeout to expire? What controls/causes one vs the other behavior?
Is it possible to determine the client timeout values on the server? I am in the unfortunate position that I have a long running WCF service (about 90 seconds) and I would like to know beforehand if the client is going to time out.
Any ideas?
Unless you force the client to tell you what his timeout is, you have no way of knowing that.
You could kindly ask for the information, adding a method parameter, or header.
You could also try to break your long running call into smaller parts, forcing the client to make subsequent calls if your business allows.
You could use asynchronous calls with a callback, one way method / duplex channels.
There are other possibilities, but we need to know more about your environment.
Error code:"
The request channel timed out while waiting for a reply after 00:09:59.6320000. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding."
This error occurs infrequently when calling a Wcf service methods. It doesn't matter what method is. I have created test methods that returns simple strings. Sometimes it times out, sometimes it works perfectly. The strange thing is that when the WCF service is published on one server(for testing purposes)- there is no timeout. When I publish it on another server(live/public) there occurs these timeouts infrequently. I have set the timeout to 10 min as you could see above.
The webconfig setting should be correct, because it works for the one server. The only change made is the ip address. I know this is very difficult to answer and a bit ambiguous.
I'm sure this problem is too high level for me to solve, or maybe I'm making a simple mistake and it is too obvious for me to notice. If you could give me a pointer or just friendly advice on this problem I would really really appreciate it. I am shooting in the dark here. I thank you for your interest, proved by you reading up to here.
does it happen first time you call the service? if not, but does subsequently, it could be that the service instance has been locked by the calling thread - look into multiple instances or allowing concurrent use, obviously taking into account the thread safety requirements of your code
I've got a really busy self-hosted WCF server that requires 2000+ clients to update their status on a frequent basis. What I'm finding is that the CPU utilization of the server is sitting at around 70% constantly, and the clients have a 50% chance of actually getting a connection to the server. They will timeout after 60 seconds. This is problematic because if the server doesn't hear back from a client, it'll assume the client is offline.
I've implemented throttling so I can adjust concurrent connections/sessions/etc., but if I'm not mistaken, increasing this will only lead to higher CPU utilization and worse connectivity problems. Right?
Will increasing the timeout to something more than 60 seconds help? I'm not exactly sure how it works, but will a client sit in a type of queue until the server can field the request? Or is it best to set the timeout to something smaller and make the client check in more often if it can't get connected (this seems like it could only make the problem worse in a sense)?
If it's really important for the server to know if the client is still connected, I don't think relying solely on WCF is your best bet for that.
Maybe your server should have some sort of ping mechanism that either allows it to ping client machines based on some sort of timer or vice versa.
If you're super concerned about the messages always getting through, no matter what, then I suggest exploring Reliable services. Check out the enableReliableSession behavior attribute. I suggest reading through at least the first chapter in Juval Lowy's Programming WCF Services which is available for free as the Kindle sample of the book.
Increasing the timeout may help, but probably not much, and the Amazing Ever-Increasing Timeout is kind of a motif on http://www.thedailywtf.com . Making the client hammer the server if it can't get through the first time is guaranteed to cause pain.
If all that you care about is knowing whether the client is there, might it be practical to go down a layer or two, and have the client send you an HTTP POST once in a while? WCF requires some active back-and-forth, but a POST can just lay there until your server has time to deal with it, and the client can just send it and forget about it.
I have a simple WCF need - basically clients running in isolation and a server so really client/server intially.
WCF helps us decouple the service layer and practise a SOA approach for scale.
All we are doing on the server (per call/multiple concurrency) is writing to a db and then performing some IO for another system which will have immediate use for - but this might change as (unknown) requirements build.
Speed: We need the service to be literally quick as possible: 1 second is OK - 2 is slow - and some errors need to be sent back immediately.
I was considering using server async patterns, queues (MSMQ), Azure, to allow the service method to queue and return quickly. NB However, some processing might be 'online' in the WCF service (db write) with an immediate return with response/error, others could be offline (IO). Disadvantage: This requires a means to callback the client if there is a show-stopper error and design and development scales accordingly.
i) Although WCF allows for the service I see the technology as providing an interprocess comm channel and perhaps the actual service operations should run in win services. Eg. WCF writes to a db which a long-running service polls and picks up. As the system gets bigger and bigger some operations may be genuine fire and forget long running - which complete or are needed hours after. We can take these out of the immediate loop. This is true decoupling even if it slows us down. A WCF method can't pass to a service unless it is calling another WCF service and can't call a windows service!
From an architectural viewpoint, is it OK to have some operations complete and return, and others pass to a true bus or service (by some mechanism)? Am I over-engineering this?
ii) As all the operations of db and IO will take say 1 or 2 seconds max I feel I might just call the service aysnc from the client and wait for it to return and then marshal back to the client UI. This is also simple. This might prove a wrong decision in the long run but having said that, all service layer ops would be in a seperate dll so that these could be called by another service for later scale. A method call could be marked as immediate or queue for processing, say.
Thoughts?
From an architectural viewpoint, is it
OK to have some operations complete
and return, and others pass to a true
bus or service (by some mechanism)? Am
I over-engineering this?
It is OK, all depends on detailed requirements.
Thought #1
Have operations return unique request ID and one operation that provides status by request ID.
Thought #2
Have operations return result if they are done within X number of seconds or request ID.