MassTransit compensation failure - deadletter? - rabbitmq

I'm new to MassTransit (using rabbitmq), so please forgive me if this is a stupid question.
I just wanted to know how one is meant to handle an unsuccessful compensation? So all retries failed, i.e. no compensation succeeded - I would imagine the message should go to a deadletter queue of sorts for me to manually retry at a later date once ok to retry again?
Any help would be appreciated.

If you are using a routing slip, and during compensation of an activity an exception is thrown, the RoutingSlipCompensationFailed event is published. At that point, there is no retry, no error/dead-letter, etc. The routing slip is considered "ended" at that point, and the distributed transaction which faulted (thus causing the compensation methods to be invoked) is over.
When using routing slips, it's important to observe the events produced by the routing slip runtime (activity completed/compensated/faulted, as well as overall routing slip completed/faulted/compensation failed) - typically using a saga.
I'd suggest looking at the Demo-Registration sample on my GitHub to get an idea of how to use sagas in combination with routing slips to perform reliable distributed transactions.
https://github.com/phatboyg/Demo-Registration

The message will go to the poison queue if retry policies were unable to help processing the message and there is no redelivery (second-level retry) configured. Poison queues are called "error queues" in MassTransit.
The poison queue has the same queue name as your receive endpoint queue, with _error suffix.
Deadletter is something else, it is for messages that were received by the endpoint but the endpoint doesn't know how to handle it. Deadletter queues are called "skipped message queues" in MassTransit and have the suffix _skipped.
Update: this is the generic MassTransit behaviour. Courier works differently, as Chris described in another answer. It wasn't clear for me that the question is about using routing slips.

Related

What is the best approach for dealing with RabbitMQ DLQ messages in Spring AMQP

I am using Spring AMQP to listen RabbitMQ queue. While listening queue, depending on business logic, my service can throw RuntimeException and in this case message will retry several times. After max count retries, message will stay in DLQ. And I am wondering, what is the best approach to deal with these messages in DLQ? I read from blogs that I can use ParkingLot Queue. But also in this case, how to monitor the queue and notify people about dead-letter messages?
P.S. Sorry for my English. Hope that I was able to explain my problem :)
You can use the RabbitMQ REST api (Hop client) to get the status of the DLQ.
You can also use Spring's RabbitAdmin.getQueueProperties("my.dlq") to get the message count.
https://docs.spring.io/spring-amqp/docs/current/reference/html/#broker-configuration
Other options include adding another listener on the DLQ and run it periodically to either send it back to the original queue or send it to a parking lot queue if it fails too many times.
There's an example of that in the spring cloud stream documentation.

Is there a way to handle messages directly from the Rebus error queue

Currently I have an IErrorHandler implementation dealing with messages going to the Rebus error queue. That handler then publishes messages to a saga that throttles output to a Slack notification channel. I think there may be an easier way to do this though. I would like to have the saga implement an IHandleMessages against messages from the Rebus error queue itself. Is that possible? Currently, we have the FleetManager process enabled and for my custom IErrorHandler to work it has to dual publish errors both to the error queue and to FleetManager using the FleetManager API options. This allows my IErrorHandler to be called so I can publish a custom message to start the slack saga and also feeds FleetManager with the data it needs. The problem with my approach is that the Rebus error queue just grows with data I no longer care about. So I guess my question is: is there a way to handle those Rebus error queue messages? Or perhaps even better, is there a simple way to make those error queue messages go away once I know I have them in my saga?
Note: the reason for the saga and to not simply use a FleetManager Slack web hook is to notify based on custom count thresholds of errors, rather than for every error encountered.
I think I just realized one approach I could take, which is to still use my custom IErrorHandler, yet not actually handle the poison message so that it never makes it to the error queue regardless. Instead I would just publish my custom message that is handled by the saga.

Instruct RabbitMQ to resend undelivered messages periodically

Background
We're using langohr to interact with RabbitMQ. We've tried two different approaches to let RabbitMQ resend messages that has not yet been properly handled by our service. One way that works is to send a basic.nack with requeue set to the true but this will resend the message immediately until the service responds with a basic.ack. This is a bit problematic if the service for example tries to persist the message to a datastore that is currently down (and is down for a while). It would be better for us to just fetch the undelivered messages say every 20 seconds or so (i.e. we neither do a basic.ack or basic.nack if the datastore is down, we just let the messages be retained in the queue). We've tried to implement this using an ExecutorService whose gist is implemented like this:
(let [chan (lch/open conn)] ; We create a new channel since channels in Langohr are not thread-safe
(log/info "Triggering \"recover\" for channel" chan)
(try
(lb/recover chan)
(catch Exception e (log/error "Failed to call recover" e))
(finally (lch/close chan))))
Unfortunately this doesn't seem to work (the messages are not redelivered and just remains in the queue). If we restart the service the queued messages are consumed correctly. However we have other services that are implemented using spring-rabbitmq (in Java) and they seem to be taking care of this out of the box. I've tried looking in the source code to figure out how they do it but I haven't managed to do so yet.
Question
How do you instruct RabbitMQ to (re-)deliver messages in the queue periodically (preferably using Langohr)?
I am not sure what you are doing with your Spring AMQP apps, but there's nothing built into RabbitMQ for this.
However, it's pretty easy to set up dead-lettering using a TTL to requeue back to the original queue after some period of time. See this answer for examples, links etc.
EDIT
However, Spring AMQP does have a retry interceptor which can be configured to suspend the consumer thread for some period(s) during retry.
Stateful retry rejects and requeues; stateless retry handles the retries internally and has no interaction with the broker during retries.
See this answer which has instructions: we Nack the message, the nack puts the message into a holding queue for N seconds, then it TTLs out of that queue and into another queue that puts it back in the original queue.
It took a little bit of work to setup, but it works great!

MassTransit Redirect Messages from the Error Queue

I'm going through a few examples using NServiceBus and I've stumbled across a feature I'm hoping ships with MassTransit (As it is a free service).
The feature is based around 'poisoned' messages.
If, due to a bug in your system, these messages cant ever be handled, and end up permanently in the error queue.
NServiceBus has a cool feature whereby, once you have corrected the bugs in your code, allows those messages in the error queue to be 'redirected' to the original working queue, to be redelivered.
This is done by using a NServiceBus specific tool :- ReturnToSourceQueue.exe.
Does MassTransit have a similar tool for this kind of issue?
Or is there another workaround availble, preferbly to work with RabbitMQ.
With RabbitMQ, it's easy to move messages between queues. You can use the management console to do it manually, by installing the shovel plug-in.
You can also create shovels in RabbitMQ that are scheduled, and perform the message movement in response to that schedule. The visibility of having the shovels configured in RabbitMQ has been invaluable to our operations staff, since they rarely think that a Windows Scheduled Task (or other random scheduler) is going to be doing something as risky as moving previously failed messages back into the production queues.
I would suggest reading this blog post on how MassTransit deals with poison messages: Error Handling in MassTransit with RabbitMQ
The tooling around RabbitMQ is so much better than anything MSMQ provides, which is one of the reasons we have completely abandoned MSMQ for production queuing.
This functionality is easily recreated with nothing more than RabbitMQ and a bit of code. While it's nice that NServicebus includes it, building it with MassTransit should be easy enough.
(note: i haven't used .NET in a few years, so my knowledge of NSB and MT are a bit rusty... this will be high level answer only, no code)
The thing to start with, is a proper configuration of a dead letter exchange and a poison message queue. https://www.rabbitmq.com/dlx.html
Once you have knowledge that a message is causing errors and is a bad message, you can reject or nack (with no requeue) the message in order to send it through the dead letter exchange (DLX).
Once a message has gone through the DLX, you will have some additional properties on the message, including:
queue - the name of the queue the message was in before it was dead-lettered,
exchange - the exchange the message was published to (note that this will be a dead letter exchange if the message is dead lettered multiple times),
routing-keys - the routing keys (including CC keys but excluding BCC ones) the message was published with,
there will be more, but these are the things you want to pay attention to. by examining these properties on the message, you can re-send the original message back through the original exchange, with the original routing-keys. alternatively, you can re-send straight to the original destination queue... i think sending through the exchange would be better, personally, as the original queue might not exist anymore (depending on system configuration, consumers creating exclusive queues, etc).
with this information, recreating the feature set should not be too difficult. rabbitmq provides all of the features that you need, you just have to write a bit of code to take advantage of it.

NServiceBus Retry Delay

What is the optimal way to configure/code NServiceBus to delay retrying messages?
In its default configuration retry happens almost immediately up to the number of attempts defined in the configuration file. I'd ideally like to retry again after an hour, etc.
Also, how does HandleCurrentMessageLater() work? What does the Later aspect refer to?
The NSB retries is there to remedy temporary problems like deadlocks etc. Longer retries is better handled by creating another process that monitors the error queue and puts them back into to the source queue at the interval you like. Take a look at the ReturnToSourceQueue.exe that comes with NSB for reference.
Edit: NServiceBus now supports this , we call it Second Level Retries, see http://docs.particular.net/ for more details
Here is a blog post on why NServiceBus doesn't include a retry delay that I wrote after asking Udi this very same question in his distributed systems architecture course:
NServiceBus Retries: Why no back-off delay?
And here is a discussion thread covering some of the points involved in building an error queue monitor/retry endpoint:
http://tech.groups.yahoo.com/group/nservicebus/message/10964
As far as HandleCurrentMessageLater(), all that does is puts the current message back at the end of the queue. If there are no other messages waiting, it's going to be processed again immediately.
As of NServiceBus 3.2.1, they provide an out of the box solution to handle back off delays in the event of consecutive message failures. The previously existing retry mechanism still retries failures without a delay to handle cases like Database deadlocks, quickly self healing network issues, etc.
Once a message has been retried the configured number of times, the message is moved to a "Second Level Retry" queue. This queue, as configured below, will retry after a 10, 20, and 30 second delay, then the message will be moved to the configured error queue. You're free to change these values to something that better suites your environment.
You can also check out this link:
http://docs.particular.net/nservicebus/second-level-retries