WCF / MSMQ "time-to-be-received has elapsed" dead letter queue issue - wcf

I'm doing testing against some software I've written. The test enqueues messages into MSMQ via WCF at a rate faster than than my software can dequeue and process them. This shouldn't be a problem, since that is MSMQ's intended purpose, but if I enqueue enough messages to where it's taking my software more than 24 hours to process, those messages will get moved to the "Transactional dead-letter messages" queue and have their Class set to "The time-to-be-received has elapsed".
The only configurable that I can find is on the binding itself:
<bindings>
<netMsmqBinding>
<binding timeToLive="7.00:00:00" /> <!-- 7 days -->
...
I use this binding both when enqueuing and dequeuing and it doesn't seem to do the trick. Setting the value 2 seconds does have an effect, but setting it longer than 1 day, including to its max value (24 days) does not.
Is there another way to lengthen this time-to-be-received window? I can't find anything else to configure (when sending the message or creating the queue).

The timeToLive attribute on the binding itself is, in fact, the only configurable necessary. I went back through all my configurations and apparently missed a spot. From "Programming WCF Services":
The TimeToLive property is only
relevant to the posting client, and
has no affect on the service side, nor
can the service change it. TimeToLive
defaults to one day.
I've had the test running all weekend now, progressing 1,000,000 messages. Nothing has ended up in the dead-letter queue yet.

I'm not 100% sure, but I believe that the TimeToLive property only sets the Time-To-Reach-Queue msmq property, but I don't know of a built-in way right now of setting the Time-To-Be-Received property...

Related

ActiveMQ: How do I limit the number of messages being dispatched?

Let's say I have one ActiveMQ Broker and an undefined numbers of consumers.
Problem:
To process a message, consumers need an external service which is either "DATA1" or "DATA2" (specified in the message)
Each server, "DATA1" and "DATA2", can only handle 20 connections
So at most 20 "DATA1" and 20 "DATA2" messages must be dispatched at any time
Because of priorization, the messages must be enqueued in the same queue
Even if message A has a higher prio than message B, if A can't be processed because the external service has no free slots, message B needs to be processed instead
How can this be solved? As long as I was using message pulling (prefetch of 0), I was able to do this by using a BrokerPlugin that, on messagePull, achieved this by using semaphores and selectors. If the limits were reached, the pull returned null.
However, due to performance issues I had to set prefetch to 1 and use push instead. Therefore, my messagePull hack no longer works (it's never called).
So far I'm considering implementing a custom Cursor but I was wondering if someone knows a better solution.
Update the custom cursor worked but broke features like message removal. I tried with a custom Queue and QueueDispatchSelector (which is a pain to configure since there isn't a proper API to do so) and it mostly works but I still have synchronisation issues.
Also, a very suitable API seems to be DispatchPolicy, however, while it is referenced by Queue, it's never used.
Queues give you buffering for system processing time for free. Messages are delivered on demand. With prefetch=0 or prefetch=1, should effectively get you there. Messages will only be delivered to a consumer when the consumer is ready (ie.. during the consumer.receive() method).
consumer.receive() is a blocking call, so you should not need any custom plugin or other to delay delivery until the consumer process (and its required downstream services) are ready to handle it.
The behavior should work out-of-the-box, or there are some details to your use case that are not provided to shed more light on the scenario.

How do I correctly configure a WCF NetTcp Duplex Reliable Session?

Please excuse the Obvious Self-Q/A, but this information is widely misunderstood, and almost always incorrectly answered. So I Wanted to place this information here for people searching for a definitive answer to this problem.
Even so, there's still some information I haven't been able to nail down. I will put this towards the end of the question (skip to that if you are not interested in the preamble).
How do I correctly configure a WCF NetTcp Duplex Reliable Session?
There are many questions and answers regarding this topic, and nearly all of them suggest setting inactivityTimeout="Infinite" in your configuration. This doesn't really seem to work correctly, particularly for the case of NetTcp (It may work correctly for WSDualHttp Bindings, but I have never used those).
There are a number of other issues that are often associated with this: Including, Channel not faulting after client or server unexpectedly disconnected, Channel disconnecting after 10 minutes, Channel randomly disconnecting... Channel throwing exception when trying to open... Unable to configure Metadata on same endpoint...
Please note: There are two concepts that are important below. Infrastructure messages are internal to the way WCF communicates, and are used by the framework to keep things running smoothly. Operation messages are messages that occur because your app has done something, like send a message across the wire. Infrastructure messages are largely invisible to your app (but they still occur in the background) while operation messages are the result of an action your app has taken.
Information I have figured out, through hard won trial and error.
Infinite does not appear to be a valid configuration setting in all situations (and certainly, the visual studio validation schema doesn't know about it).
There are two special configuration converters, called InfiniteIntConverter and InfiniteTimeSpanConverter which will sometimes work to convert the value Infinite to either Int.MaxValue or TimeSpan.MaxValue, but I haven't yet figured out the situations in which this appears to be valid as sometimes it works, and sometimes it doesn't. What's more, it appears that some libraries will allow Infinite in the config, while others will not, so you can succeed in one part of a configuration, but fail in another.
You must configure BOTH inactivityTimeout and receiveTimeout, on both the client and the server. While these values do not HAVE to be the same, they probably should be as they will probably cause confusion if they are not. (technically, you can leave inactivityTimeout to its default value if you want, but you should be aware of its value, and what it does)
inactivityTimeout should NEVER be set to a large value, much less Infinite or TimeSpan.MaxValue.
inactivityTimeout has two functions (and this is not widely understood). The first function defines the maximum amount of time that can elapse on a channel without receiving any "infrastructure" or "operation" messages. The second function defines the time period in which infrastructure messages are sent (half the time specified). If no infrastructure or operation messages have been received during the timeout period, the connection is aborted.
receiveTimeout specifies the maximum amount of time that can elapse between operation messages only. This value can be set to a large value, such as TimeSpan.MaxValue (particularly if your channel runs internally over a trusted network or over a vpn). This value is what defines how long the reliable session will "stay alive" if there is no activity between client and server (other than infrastructure messages). ie, your client does not call any methods of the interface, and your server does not call back into the client.
setting a short inactivityTimeout and a large receiveTimeout keeps your reliable session "tacked up" even when there is no operational activity between your client and server. The short inactivity timeout (i like to keep the default 10 minutes or less) sends infrastructure "ping" messages to keep the TCP connection alive while the long receive timeout keeps the reliable session active. while at the same time providing a reasonable timeout in case of disconnection.
If you set inactivityTimeout to a large value, then the reliable session will not be reliable as it has no way to keep the Tcp connection alive, nor does it have any way to verify the integrity of the connection. It won't know if a user has disconnected unexpectedly until you try and send a message to that client and find out the connection is no longer there. This is why many people who use Infinite for this setting resort to creating a "Ping" method in their service, which is completely unnecessary if you've configured these settings correctly.
If you set inactivityTimeout to a value larger than receiveTimeout then it will likewise also be unreliable, as you will still be governed by the receiveTimeout for operation messages. ie. if you forget to set receiveTimeout and leave it at the default 10 minutes, then if the user is idle for 10 minutes, the connection will be aborted.
When the client or server unexpectedly disconnects (app crashes, network failure, someone trips over the power cord, etc..), the other side may not notice right away. I have attached various ChannelFaulted event handlers in various test situations, and sometimes the connection is faulted right away... other times it doesn't seem to fault at all. What i have discovered through trial and error is that the when it doesn't seem to fault, it will actually fault after the inactivityTimeout expires on that end. (so if it's set to 10 minutes, then after 10 minutes it will call the ChannelFaulted event).
I have not yet figured out why in some situations it notices the disconnection right away, and others it waits for the timer to expire. In both cases, I notice internal first chance communication exceptions thrown and handled by the framework, and there are calls to Abort the connection... but somehow the call to the event handler gets lost and it must wait for the timeout. My suspicion is this is somehow thread related.
When trying to configure Metadata to work across the NetTcp channel, I have had sporadic results. Sometimes it works, sometimes it doesn't. I've read many reports that Metadata doesn't work over NetTcp and that you have to use an Http channel for the Metadata, but I have in fact had it work on several occasions using the net.tcp:// url to generate the proxy. Then I would change something, recompile and it would no longer work. Changing things back, it wouldn't work again. So it was very confusing what magic incantation was necessary to make Metadata function over net.tcp, shared with the endpoint on the same port (obviously with a different address).
When configuring both a NetTcp and Metatdata endpoint on the same service, and specifying non-default settings for connection parameters like listenBacklog, and maxConnections, you also need to make sure the Metadata endpoint uses the same settings, which typically means you have to define a custom binding, since these settings are not available from the standard tcp mex binding. This includes setting listenBacklog and maxPendingConnections on tcpTransport, and groupName and maxOutboundConnectionsPerEndpoint on connectionPoolSettings.
The default setting for the Ordered setting of ReliableSession is True. This uses a lot more overhead than turning it off. If you don't need ordered messages, i would suggest turning it off (need to set this on both sides)
-
Configuration I still need to understand:
How do I correctly configure the shared net.tcp Metadata endpoint? (I will add an example when I get a chance) Currently, i'm specifying an http get url to bypass the problem. It's so inconsistent as to why it sometimes works and sometimes does not. I kept getting the error `The URI Prefix is not recognized' when generating the proxy in Visual Studio.
Why does WCF sometimes Fault the channel immediately upon disconnect, and sometimes waits for inactivityTimeout to expire? What controls/causes one vs the other behavior?

How to implement WCF server-side retries

I want to add server-side retry behavior if something specific happens during operation.
Custom IOperationInvoker looks like a good candidate for this functionality, but...
Unfortunately instance on which operation should be performed is already created/resolved, so to correctly implement retry logic everyone should write stateless service implementations which is quite limiting, sometimes it's nice to have InstanceContextMode.PerCall mode and state which lives only during request lifetime.
Is there any place or possibility to force WCF to re-create/resolve service and invoke operation again?
Not so easy to imlement retry logic on your own. You will have to deal with a number of issues like those:
What if all attempts failed?
Should there be an interval between attempts?
What if you server is down between the first and second attempt?
etc.
Fortunately there is an out of the box solution which already does everything for you. Take a look at the MSMQ WCF binding.
It will put you contract object to the dead letter queue if you wish, so you can easyly keep track of such messages and even force to process them again.
You can set an interval between the attempts. So if you server is down for 10 minutes not all attempts are shot.
You can configure it to be durable. It will store the messages on disk, So when your server is back it will try to process the message again.
Etc. Take a look at the MSMQ binding you will find a lot of useful features.
Here's a good article about the queuing in WCF
So in the long run your design will be something like that:
Client Call -> MSMQ -> Service
Please note you don't have to make any changes to the code you have now at all, you just change the binding configuration and set up MSMQ which is fairly easy.
Just in case you can't use msmq binding directly with you client, you can always go with an special server side service which just puts the messages to the queue. So the design will be:
Client service -> HTTP -> QueueService (one method which puts messages to a queue) -> MSMQ -> ProcessService
Hope it helps!

How can I tell a WAS service polling an MSMQ to wait when busy?

I'm working on a system which amongst other things, runs payroll, a heavy load process. It is likely that soon, there may be so many requests to run payroll at peak times that the batch servers will be overwhelmed.
I'm looking to put together a proof of concept to cope with this by using MSMQ (probably replacing this with a commercial solution like nservicebus later). I using this this example as a basis. I can see how to set up the bindings and stick it together, but I still need a way to tell the subscribers hosted by WAS to only process the 'run heavy payroll process' message if they are not busy. Otherwise the messages on the queue will get picked up straightaway and we have the same problem as before.
Can I set up the subscribing service to say, "I'm busy, I can't take the message, leave it on the queue"? Does the queue need to be transactional?
If you're using WCF then there's no way to conditionally activate the channel thereby leaving the messages on the queue for later.
A better solution is to host the message receiver in a completely different process, for example as a windows service. These can then be enabled/disabled according to your service window requirement.
You also get the additional benefit of being able to very easily scale out the message receivers to handle greater loads (by hosting more instances of your receiver).
One way to do this is to have 2 queues, your polling always checks the high priority queue first, only if there are no items in that queue does it take an item from the other

Implementing a 24 queue using MSMQ and WCF

I am shortly starting a project, which requires messages to be held in a queue for a period of 24 hours, this is because the database can't have any updates at certain times of the month. The service also has to be hosted on windows server 2003, which means it will have to be a windows service.
It is also required that the service use WCF so that in 12 months time when we move over to windows server 2008, the service can hosted in iis 7. At present I am wondering if MSMQ is the best way to handle this.
I've been looking into topics like poison message handling & dead letter queues, but nothing that really covers what I am intending to actually do. Could anyone recommend a sample or a tutorial for this ?
Thanks in advance
Yes, it sounds like this is a perfect scenario for WCF and MSMQ. It should be much easier to use MSMQ than to create your own queuing mechanism with the same robustness. You will want to look into the Message.TimeToBeReceived Property for a message expiring timeout.
If the interval specified by the TimeToBeReceived property expires
before the message is removed from the
queue, Message Queuing discards the
message in one of two ways. If the
message's UseDeadLetterQueue property
is true, the message is sent to the
dead-letter queue. If
UseDeadLetterQueue is false, the
message is ignored.
Here are some good starter tutorials on WCF with MSMQ: link1 and link2