MSMQ Poison Queue Retry - wcf

I have setup a self-hosted WCF service setup like this: https://msdn.microsoft.com/en-us/library/aa395218%28v=vs.110%29.aspx
For my poison queue processor, I must do the following:
try to process the message a couple of times
If retry exhausted, then update a status in the database, and remove the message.
Obviously, using maxRetryCycles and receiveRetryCount is the obvious solution for the first part. But when the retry cycles are exhausted, I need to delete with a custom action.
What is the best choice of action here, to retry an operation, but log on retry expiration?

Related

MassTransit compensation failure - deadletter?

I'm new to MassTransit (using rabbitmq), so please forgive me if this is a stupid question.
I just wanted to know how one is meant to handle an unsuccessful compensation? So all retries failed, i.e. no compensation succeeded - I would imagine the message should go to a deadletter queue of sorts for me to manually retry at a later date once ok to retry again?
Any help would be appreciated.
If you are using a routing slip, and during compensation of an activity an exception is thrown, the RoutingSlipCompensationFailed event is published. At that point, there is no retry, no error/dead-letter, etc. The routing slip is considered "ended" at that point, and the distributed transaction which faulted (thus causing the compensation methods to be invoked) is over.
When using routing slips, it's important to observe the events produced by the routing slip runtime (activity completed/compensated/faulted, as well as overall routing slip completed/faulted/compensation failed) - typically using a saga.
I'd suggest looking at the Demo-Registration sample on my GitHub to get an idea of how to use sagas in combination with routing slips to perform reliable distributed transactions.
https://github.com/phatboyg/Demo-Registration
The message will go to the poison queue if retry policies were unable to help processing the message and there is no redelivery (second-level retry) configured. Poison queues are called "error queues" in MassTransit.
The poison queue has the same queue name as your receive endpoint queue, with _error suffix.
Deadletter is something else, it is for messages that were received by the endpoint but the endpoint doesn't know how to handle it. Deadletter queues are called "skipped message queues" in MassTransit and have the suffix _skipped.
Update: this is the generic MassTransit behaviour. Courier works differently, as Chris described in another answer. It wasn't clear for me that the question is about using routing slips.

Cancelling an un-acked message in RabbitMQ

I have a service which tasks worker processes via RabbitMQ. The messages are sent with a TTL, and the worker will not ack the message until it successfully completes the task sent in the message.
The tasking process will monitor workers for timeouts, and if a worker exceeds the timeout it will be terminated. Since the message isn't ack'd, the message is re-queued immediately and the next worker will pick up the message (this is useful in my scenario, as workers are unreliable and may fail but subsequent attempts typically succeed.
However, I would also like the ability to cancel a message. Terminating and re-creating the worker process is the normal procedure (it's single threaded, so I can't send a separate 'cancel' message to the worker). However, doing so leads to the message immediately re-queueing if the TTL has not been exceeded.
The only suggested solution I've found is here, which suggested a separate data source which checks if a message is still valid. However, that answer is both a) old and b) inconvenient.
Does RabbitMQ offer a means to cancel a message once it's been placed into the queue?
Unfortunately rabbitmq does not have a way to cancel a message.
Without the ability to send a "cancel" message to your consumer, you may have to do something like what that other post suggests.
Another option to consider: message processing should be idempotent. That is, processing the same message more than once should only cause the desired result to occur once (the first time it is processed).
Idempotence is often achieved through the use of a correlationid in messaging. You can attach a correlationid to your message, then check a database or other service to see if that message should still be processed. If you want to "cancel" the message, you would update the other database/service with that specific correlationid to say "this one has been processed already" or "has been canceled" or something like that.

MSMQ + WCF - Immediately Move Messages to the Dead-Letter Queue

We have a WCF service that listens for messages on a queue (MSMQ). It sends a request to our web server (REST API), which returns an HTTP status code.
If the status code falls within the 400 range, we are throwing away the message. The idea is a 400 range error can never succeed (unauthorized, bad request, not found, etc.) and so we don't want keep retrying.
For all other errors (e.g., 500 - Internal Server Error), we have WCF configured to put the message on a "retry" queue. Messages on the retry queue get retried after a certain amount of time. The idea is that the server is temporarily down, so wait and try again.
The way WCF is set up, if we throw a FaultException in the service contract, it will automatically put the message on the retry queue.
When a message causes a 400 range error, we are just swallowing the error (we just log it). This prevents the retry mechanism from firing; however, it would be better to move the message to a dead-letter queue. This way we can react to the error by sending an email to the user and/or a system administrator.
Is there a way to immediately move these bad messages to a dead-letter queue?
First, I kept referring to the dead-letter queue. At the time when I posted this question, I was unaware that WCF/MSMQ automatically creates what's known as a poison sub-queue. Any message that can't be delivered in the configured number of times is put in the poison sub-queue.
In my situation, I knew that some messages would never succeed, so I wanted to move the message out of the queue immediately.
The solution was to create a second queue that I called "poison" (not to be confused with the poison sub-queue). My catch block would create an instance of a WCF client and forward the message to this poison queue. I could reuse the same client to post to both the original queue and the poison queue; I just had to create a separate client end-point in the configuration file for each.
I had two separate ServiceHost instances running that read the queues. The ServiceHost for the original queue did the HTTP request and forwarded messages to the poison queue when unrecoverable errors occurred. The second ServiceHost would simply send out an email to record that a message was lost.
There was also the issue of temporary errors that exceeded the maximum number of tries. WCF/MSMQ automatically creates a sub-queue called <myqueuename>;poison. You cannot directly write to a sub-queue via WCF, but you can read from it using a ServiceHost. Whenever messages end up in the poison sub-queue, I simply forward the message to the poison queue, with the exact same client I use in the original handler's catch block.
I wanted the ability to include a stack trace in the error emails. Since I was reusing the same client and service contract for all of the handlers, I couldn't just pass along the stack trace as a string (unless I added it to all of my data contracts). Instead, I had the poison handler try to execute the code one more time, which would fail again and spit out the stack trace.
This is what my message queues ended up looking like:
MyQueue
- Queue messages
- Retry
- Poison
MyQueuePoison
- Queue messages
This approach is pretty convoluted. It was strange calling A WCF client from within a WCF service handler. It also meant setting up one more queue on the server and a ton of additional configuration sections for specifying which queue a client should forward messages to.
hopefully I have understood your question and if it is what i think you are saying then yes there is but you obviously need to program it to do this. But you DO need a retry amount set so the MSMQ can retry until it gives up. Or you can create your own custom queue for dead letters/messages
http://msdn.microsoft.com/en-us/library/ms789035(v=vs.110).aspx
http://msdn.microsoft.com/en-us/library/ms752268(v=vs.110).aspx
take a look here also:
http://www.michaelfcollins3.me/blog/2012/09/20/wcf-msmq-bad-message-handling.html
How do I handle message failure in MSMQ bindings for WCF
I hope these links help.

MSMQ + WCF - Retry with Growing Delay

I am using MSMQ 4 with WCF. We have a Microsoft Dynamics plugin putting a message on an queue. A service picks up the message and makes an HTTP request to another web server. The web server responds by putting another message on a different queue. A second service picks up the messages and sends the response back to Dynamics...
We have our retry queue set up to retry 3 times and then wait for 5 minutes before retrying again. The Dynamics system some times takes so long (due to other plugins) that we can round-trip before the database transaction commits. The user's aren't seeing the update come through for another 5 minutes.
I am curious if there is a way to configure the retry mechanism to retry incrementally. So, the first time it fails, it only waits a few seconds. If it fails a second time, it waits twice that. And the time between retries just keeps growing.
The problem with just reducing the time between retries is that a bad message could easily fill up a log file.
It turns out there is no built-in way of doing this. One slightly involved option is to create multiple queues, each with its own retry/poison sub-queues, each with a growing retry delay. You can reuse the same handler for each queue - the only thing that changes is the configuration. You also need a handler that can read the poison sub-queues (service) and move the message to the next queue in the chain (client).
So, you set receiveErrorHandling to Move. The maxRetryCycles and receiveRetryCount are just 1. Each queue will use a growing retryCycleDelay. Each queue you create will have a poison sub-queue created for it automatically. You simply read from each poison sub-queue and use a client to move it to the next queue.
I am sure someone could write some code that would automatically create N queues with a growing retryCycleDelay and hook it up all programmatically. Since it is the same handler/client for every queue, it wouldn't be a big deal.

NServiceBus Retry Delay

What is the optimal way to configure/code NServiceBus to delay retrying messages?
In its default configuration retry happens almost immediately up to the number of attempts defined in the configuration file. I'd ideally like to retry again after an hour, etc.
Also, how does HandleCurrentMessageLater() work? What does the Later aspect refer to?
The NSB retries is there to remedy temporary problems like deadlocks etc. Longer retries is better handled by creating another process that monitors the error queue and puts them back into to the source queue at the interval you like. Take a look at the ReturnToSourceQueue.exe that comes with NSB for reference.
Edit: NServiceBus now supports this , we call it Second Level Retries, see http://docs.particular.net/ for more details
Here is a blog post on why NServiceBus doesn't include a retry delay that I wrote after asking Udi this very same question in his distributed systems architecture course:
NServiceBus Retries: Why no back-off delay?
And here is a discussion thread covering some of the points involved in building an error queue monitor/retry endpoint:
http://tech.groups.yahoo.com/group/nservicebus/message/10964
As far as HandleCurrentMessageLater(), all that does is puts the current message back at the end of the queue. If there are no other messages waiting, it's going to be processed again immediately.
As of NServiceBus 3.2.1, they provide an out of the box solution to handle back off delays in the event of consecutive message failures. The previously existing retry mechanism still retries failures without a delay to handle cases like Database deadlocks, quickly self healing network issues, etc.
Once a message has been retried the configured number of times, the message is moved to a "Second Level Retry" queue. This queue, as configured below, will retry after a 10, 20, and 30 second delay, then the message will be moved to the configured error queue. You're free to change these values to something that better suites your environment.
You can also check out this link:
http://docs.particular.net/nservicebus/second-level-retries