Closing of WCF service (ServiceHost) not as graceful as it seems - wcf

When I call serviceHost.Close(), even if I put a timeout value serviceHost.Close(TimeSpan.FromMinutes(10)), while it stops receiving messages, it seems one or two of my messages gets lost from time to time.
I want my service to stop in the middle of an operation, but process the rest of the messages already received. Is there a way to gracefully close the service while a message is still processing?
I already set the closeTimeout and receiveTimeout in the app.config to 10 minutes, also use RequestAdditionalTime(600000) but I still get this issue. It does not close in 10 minutes, rather 1 minute at max.
Binding I am using is netMsmqBinding.
EDIT:
I temporarily fixed the problem by adding a while statement in the ServiceBase method OnStop that if the process is not finish then it will keep looping. It seems like a dirty hack, but it works for now. Let me know if there's a cleaner way to end it gracefully.

Related

RabbitMQ + kombu - A long callback blocks the heartbeat leading to aborting the connection

We have been trying to use RabbitMQ to transfer data from Project A to Project B.
We created a producer who takes the data from Project A and puts it in a queue, and that was relatively easy. Then, create a k8s pod for Project B, which listens to the appropriate queue with the ConsumerMixin of kombu.
Overall, the integration was reasonable and straightforward. But when we started to process long messages, we noticed that they were coming back into the queue repeatedly.
After research, we found out that whenever the processing of the message takes more than 20 seconds, the message showed up in the queue again, even though the processing was successful.
The source of this issue lies with the heartbeat of RabbitMQ. We set the heartbeat for 10 seconds, and the RabbitMQ checks the connection twice before it kills it. However, because the process of the callback takes more than 20 seconds, and the .ack() (acknowledge) of the message happens at the end of the callback (to ensure it was successful), the heartbeat is being blocked by the process of this message (as described here: https://github.com/celery/kombu/issues/621#issuecomment-251836611).
We have been trying to find a workaround with Threading, to process the message on a different thread and avoid the block of the heartbeat, but it didn't work. Also, it feels like we were trying to hack things and not solve the problem.
So my question here is if there is a proper workaround to handle this situation, or what alternatives do we have? RabbitMQ seemed like the right choice since we use it in standalone projects with Celery, and it is also recommended on the internet.

Messages being lost on consumer falling over

This seems like a pretty basic question, but I seem to be losing messages when the consumer falls over before acknowledging them. I have set up the broker with an exchange audit:exchange and a queue bound to it audti:queue. Both are durable, and as expected if I send messages when no consumer is active they sit on the queue and get processed by the consumer when it starts up. However if I put a break point in the consumer and kill the process half way through, the message is not requeued - it just seems to get lost. The consumer is set up using the annotation
#RabbitListener(queues="audit:queue")
public void process(Message message) {
routeMessage(message) //stop here and kill process - message removed from q
}
I can't reproduce your issue.
With the breakpoint triggered, I see the message still in the queue (unacked=1) on the rabbit console.
When the process is killed; the message goes back to ready.
Have you configured the listener container factory to use Acknowledgemode.NONE?
That will exhibit the behavior you describe.
The default is AUTO which means the message will only be acknowledged when the listener returns successfully.
If you still think there's an issue; please supply the complete test case.
Sorry this was my bad (I just wasted a few hours .. sigh). I was killing the app from within my ide. Which probably detaches and then kills the process - allowing time for it to proceed just enough that it actually does send the ack. When I just killed the process from a terminal it worked exactly as expected. Particualr apologies to you Gary for wasting your time as well.

How do I correctly configure a WCF NetTcp Duplex Reliable Session?

Please excuse the Obvious Self-Q/A, but this information is widely misunderstood, and almost always incorrectly answered. So I Wanted to place this information here for people searching for a definitive answer to this problem.
Even so, there's still some information I haven't been able to nail down. I will put this towards the end of the question (skip to that if you are not interested in the preamble).
How do I correctly configure a WCF NetTcp Duplex Reliable Session?
There are many questions and answers regarding this topic, and nearly all of them suggest setting inactivityTimeout="Infinite" in your configuration. This doesn't really seem to work correctly, particularly for the case of NetTcp (It may work correctly for WSDualHttp Bindings, but I have never used those).
There are a number of other issues that are often associated with this: Including, Channel not faulting after client or server unexpectedly disconnected, Channel disconnecting after 10 minutes, Channel randomly disconnecting... Channel throwing exception when trying to open... Unable to configure Metadata on same endpoint...
Please note: There are two concepts that are important below. Infrastructure messages are internal to the way WCF communicates, and are used by the framework to keep things running smoothly. Operation messages are messages that occur because your app has done something, like send a message across the wire. Infrastructure messages are largely invisible to your app (but they still occur in the background) while operation messages are the result of an action your app has taken.
Information I have figured out, through hard won trial and error.
Infinite does not appear to be a valid configuration setting in all situations (and certainly, the visual studio validation schema doesn't know about it).
There are two special configuration converters, called InfiniteIntConverter and InfiniteTimeSpanConverter which will sometimes work to convert the value Infinite to either Int.MaxValue or TimeSpan.MaxValue, but I haven't yet figured out the situations in which this appears to be valid as sometimes it works, and sometimes it doesn't. What's more, it appears that some libraries will allow Infinite in the config, while others will not, so you can succeed in one part of a configuration, but fail in another.
You must configure BOTH inactivityTimeout and receiveTimeout, on both the client and the server. While these values do not HAVE to be the same, they probably should be as they will probably cause confusion if they are not. (technically, you can leave inactivityTimeout to its default value if you want, but you should be aware of its value, and what it does)
inactivityTimeout should NEVER be set to a large value, much less Infinite or TimeSpan.MaxValue.
inactivityTimeout has two functions (and this is not widely understood). The first function defines the maximum amount of time that can elapse on a channel without receiving any "infrastructure" or "operation" messages. The second function defines the time period in which infrastructure messages are sent (half the time specified). If no infrastructure or operation messages have been received during the timeout period, the connection is aborted.
receiveTimeout specifies the maximum amount of time that can elapse between operation messages only. This value can be set to a large value, such as TimeSpan.MaxValue (particularly if your channel runs internally over a trusted network or over a vpn). This value is what defines how long the reliable session will "stay alive" if there is no activity between client and server (other than infrastructure messages). ie, your client does not call any methods of the interface, and your server does not call back into the client.
setting a short inactivityTimeout and a large receiveTimeout keeps your reliable session "tacked up" even when there is no operational activity between your client and server. The short inactivity timeout (i like to keep the default 10 minutes or less) sends infrastructure "ping" messages to keep the TCP connection alive while the long receive timeout keeps the reliable session active. while at the same time providing a reasonable timeout in case of disconnection.
If you set inactivityTimeout to a large value, then the reliable session will not be reliable as it has no way to keep the Tcp connection alive, nor does it have any way to verify the integrity of the connection. It won't know if a user has disconnected unexpectedly until you try and send a message to that client and find out the connection is no longer there. This is why many people who use Infinite for this setting resort to creating a "Ping" method in their service, which is completely unnecessary if you've configured these settings correctly.
If you set inactivityTimeout to a value larger than receiveTimeout then it will likewise also be unreliable, as you will still be governed by the receiveTimeout for operation messages. ie. if you forget to set receiveTimeout and leave it at the default 10 minutes, then if the user is idle for 10 minutes, the connection will be aborted.
When the client or server unexpectedly disconnects (app crashes, network failure, someone trips over the power cord, etc..), the other side may not notice right away. I have attached various ChannelFaulted event handlers in various test situations, and sometimes the connection is faulted right away... other times it doesn't seem to fault at all. What i have discovered through trial and error is that the when it doesn't seem to fault, it will actually fault after the inactivityTimeout expires on that end. (so if it's set to 10 minutes, then after 10 minutes it will call the ChannelFaulted event).
I have not yet figured out why in some situations it notices the disconnection right away, and others it waits for the timer to expire. In both cases, I notice internal first chance communication exceptions thrown and handled by the framework, and there are calls to Abort the connection... but somehow the call to the event handler gets lost and it must wait for the timeout. My suspicion is this is somehow thread related.
When trying to configure Metadata to work across the NetTcp channel, I have had sporadic results. Sometimes it works, sometimes it doesn't. I've read many reports that Metadata doesn't work over NetTcp and that you have to use an Http channel for the Metadata, but I have in fact had it work on several occasions using the net.tcp:// url to generate the proxy. Then I would change something, recompile and it would no longer work. Changing things back, it wouldn't work again. So it was very confusing what magic incantation was necessary to make Metadata function over net.tcp, shared with the endpoint on the same port (obviously with a different address).
When configuring both a NetTcp and Metatdata endpoint on the same service, and specifying non-default settings for connection parameters like listenBacklog, and maxConnections, you also need to make sure the Metadata endpoint uses the same settings, which typically means you have to define a custom binding, since these settings are not available from the standard tcp mex binding. This includes setting listenBacklog and maxPendingConnections on tcpTransport, and groupName and maxOutboundConnectionsPerEndpoint on connectionPoolSettings.
The default setting for the Ordered setting of ReliableSession is True. This uses a lot more overhead than turning it off. If you don't need ordered messages, i would suggest turning it off (need to set this on both sides)
-
Configuration I still need to understand:
How do I correctly configure the shared net.tcp Metadata endpoint? (I will add an example when I get a chance) Currently, i'm specifying an http get url to bypass the problem. It's so inconsistent as to why it sometimes works and sometimes does not. I kept getting the error `The URI Prefix is not recognized' when generating the proxy in Visual Studio.
Why does WCF sometimes Fault the channel immediately upon disconnect, and sometimes waits for inactivityTimeout to expire? What controls/causes one vs the other behavior?

MSMQ + WCF - Retry with Growing Delay

I am using MSMQ 4 with WCF. We have a Microsoft Dynamics plugin putting a message on an queue. A service picks up the message and makes an HTTP request to another web server. The web server responds by putting another message on a different queue. A second service picks up the messages and sends the response back to Dynamics...
We have our retry queue set up to retry 3 times and then wait for 5 minutes before retrying again. The Dynamics system some times takes so long (due to other plugins) that we can round-trip before the database transaction commits. The user's aren't seeing the update come through for another 5 minutes.
I am curious if there is a way to configure the retry mechanism to retry incrementally. So, the first time it fails, it only waits a few seconds. If it fails a second time, it waits twice that. And the time between retries just keeps growing.
The problem with just reducing the time between retries is that a bad message could easily fill up a log file.
It turns out there is no built-in way of doing this. One slightly involved option is to create multiple queues, each with its own retry/poison sub-queues, each with a growing retry delay. You can reuse the same handler for each queue - the only thing that changes is the configuration. You also need a handler that can read the poison sub-queues (service) and move the message to the next queue in the chain (client).
So, you set receiveErrorHandling to Move. The maxRetryCycles and receiveRetryCount are just 1. Each queue will use a growing retryCycleDelay. Each queue you create will have a poison sub-queue created for it automatically. You simply read from each poison sub-queue and use a client to move it to the next queue.
I am sure someone could write some code that would automatically create N queues with a growing retryCycleDelay and hook it up all programmatically. Since it is the same handler/client for every queue, it wouldn't be a big deal.

Performance of WCF with net.tcp

I have a WCF net.tcp service hosted with the builtin ServiceHost, and when doing some stress tests I get a strange behavior. The first time i send a bunch of requests, 5 to 10 requests are answered quickly, and the rest are returning at about 2 second intervals. The second time i send the requests, 10 - 20 are returned quickly, and rest with 2 sencond intervals.
The above repeats until I can get over 100 requests returned quickly, but if I wait a minute or so the memory usage of the service goes down and the requests go back to 5-10 returning quick.
The service I am testing has a small delay, so that I can get many open connections at the same time, if this delay is removed the requests return so quickly that i have perhaps 2-5 connections open at the same time. This delay is to simulate DB connections and other outgoing stuff.
From the behavior it looks like the ServiceHost is allocating something, threads, class instances, but I can not figure out what it is.
I could have a timer in the client that calls the service to keep it working, but that seems like a bad solution.
If I have a high sustained load to the service it will crunch all requests quickly, but if I have a period of low activity and then a surge of connections comes in the service will be slow.
I guess my question is WHAT is it the get allocated during high load of the WCF service, and HOW can I configure the service to preallocate more of the things that get allocated.
EDIT:
I did some more testing, and looking at the taskmgr for the process I can see that when the servicehost is 'resting' there are 10 threads open, but when I start sending requests, the threadcount goes up. As long as the threadcount is high the servicehost can process incoming requests quickly, but if I pause sending the requests, the open threadcount decreases, and subsequent requests starts taking longer time to process.
Now, how can I tell the servicehost to keep a bunch of threads open? Or more than the 10-12 that it keeps by default?
Well, after lots of googling, it seems that the problem is the threadpool. The CLR threadpool allocates a few threads, and when they are used, it throttles the creation of new threads, and after a time it also deallocates unused threads.
There is some confusion about a bug that meant that the ThreadPool did not honor the SetMinThreads call.
http://www.michaelckennedy.net/blog/PermaLink,guid,708ee9c0-a1fd-46e5-8fa0-b1894ad6ce0f.aspx
I am not sure if this bug is solved, or what, because when I modify the ThreadPool settings, the problem persists.
The thing that determines how may request are handled simultaneous is the ServiceThrottlingBehavior. There are a number of different threasholds that will limit the amount of request being processed. This also depends on the binding your are using, for example wsHttpBinding defaults to sessions on while basicHttpBinding uses no sessions and the default session limit of 10 is no problem.
See http://msdn.microsoft.com/en-us/library/ms735114.aspx for more details.
The bug you referenced is fixed in .NET 3.5 SP1. That may have had something to do with the problem, I think it's more likely (much more likely) that throttling is your problem rather than thread as Maurice keyed into.
<system.serviceModel>
<service name="???" >
<endpoint ... />
</service>
</system.serviceModel>
What's the throttle limit for this "empty" config? 10 session, 16 concurrent calls! Beware.
Here's more on the threading:
http://www.michaelckennedy.net/blog/2008/08/20/ThreadPoolBugInNET20SP1IsFixed.aspx
This feels like a hack but it seems to solve your issue. The problem is that the threadpool will take time to start up a new thread, so what you really need is threads waiting on standby. Add a constructor to your service and set the minimum number of threads you would like.
public YourService()
{
int workerThreads;
int portThreads;
ThreadPool.GetMinThreads(out workerThreads, out portThreads);
ThreadPool.SetMinThreads(200, portThreads);
}