The "retry" system with NServiceBus is great. It works fantastic in making sure small things like dead locks don't mess us up.
However, sometimes I KNOW that a message is bad. Bad in the sense that no amount of retries is going to help.
Is there a way to tell NServiceBus: "This message is a bad apple, move it to the error queue"? (And have it skip the retries?)
If you are using NSB 3, you can take a look at the IManageMessageFailures interface. This will allow you to plug in your functionality, but this is after the message has failed. If you would like to get at the message earlier, then take a look at the Message Mutators feature. This gets you in both at the transport layer and at the application layer.
Would calling Bus.DoNotContinueDispatchingCurrentMessageToHandlers(); inside the handler not be a simpler way of doing this?
Related
In some exceptional situations I need somehow to tell consumer on receiving point that some messages shouldn’t be processed. Otherwise two systems will become out-of-sync (we deal with some outdates external systems, and if, for example, connection is dropped we have to discard all queued operations in scope of that connection).
Take a risk and resolve problem messages manually? Compensation actions (that could be tough to support in my case)? Anything else?
There are a few ways:
You can set a time-to-live when sending a message: await endpoint.Send(myMessage, c => c.TimeToLive = TimeSpan.FromHours(1));, but this will apply to all messages that are sent (or published) like this. I would consider this, after looking at your requirements. This is technical, but it is a proper messaging pattern.
Make TTL and generation timestamp properties of your message itself and let the consumer decide if the message is still worth processing. This is more business and, probably, the most correct way.
Combine tech and business - keep the timestamp and TTL in message headers so they don't pollute your message contracts, and filter them out using a custom middleware. In this case, you need to be careful to log such drops so you won't be left wonder why messages disappear now and then.
Almost any unreliable integration can be monitored using sagas, with timeouts. For example, we use a saga to integrate with Twilio. Since we have no ability to open a webhook for them, we poll after some interval to check the message status. You can start a saga when you get a message and schedule a message to check if the processing is still waiting. As discussed in comments, you can either use the "human intervention required" way to fix the issue or let the saga decide to drop the message.
A similar way could be to use a lookup table, where you put the list of messages that aren't relevant for processing. Such a table would be similar to the list of sagas. It seems that this way would also require scheduling. Both here, and for the saga, I'd recommend using a separate receive endpoint (a queue) for the DropIt message, with only one consumer. It would prevent DropIt messages from getting stuck behind the integration messages that are waiting to be processed (and some should be already dropped)
Use RMQ management API to remove messages from the queue. This is the worst method, I won't recommend it.
From what I understand, you're building a system that sends messages to 3rd party systems. In other words, systems you don't control. It has an API but compensating actions aren't always possible, because the API doesn't provide it or because actions are performed inside the 3rd party system that can't be compensated or rolled back?
If possible try to solve this via sagas. Make sure the saga executes the different steps (the sending of messages) in the right order. So that messages that cannot be compensated are sent last. This way message that can be compensated if they fail, will be compensated by the saga. The ones that cannot be compensated should be sent last, when you're as sure as possible that they don't have to be compensated. Because that last message is the last step in synchronizing all systems.
All in all this is one of the problems with distributed systems, keeping everything in sync. Compensating actions is the way to deal with this. If compensating actions aren't possible, you're in a very difficult situation. Try to see if the business can help by becoming more flexible and accepting that you need to compensate things, where they'll tell you it's not possible.
In some exceptional situations I need somehow to tell consumer on receiving point that some messages shouldn’t be processed.
Can't you revert this into:
Tell the consumer that an earlier message can be processed.
This way you can easily turn this in a state machine (like a saga) that acts on two messages. If the 2nd message never arrives then you can discard the 1st after a while or do something else.
The strategy here is to halt/wait until certain that no actions need to be reverted.
TL;DR: I need to "replay" dead letter messages back into their original queues once I've fixed the consumer code that was originally causing the messages to be rejected.
I have configured the Dead Letter Exchange (DLX) for RabbitMQ and am successfully routing rejected messages to a dead letter queue. But now I want to look at the messages in the dead letter queue and try to decide what to do with each of them. Some (many?) of these messages should be replayed (requeued) to their original queues (available in the "x-death" headers) once the offending consumer code has been fixed. But how do I actually go about doing this? Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to? And what about searching the dead letter queue? What if I know that a message (let's say which is encoded in JSON) has a certain attribute that I want to search for and replay? For example, I fix a defect which I know will allow message with PacketId: 1234 to successfully process now. I could also write a one-off program for this I suppose.
I certainly can't be the first one to encounter these problems and I'm wondering if anyone else has already solved them. It seems like there should be some sort of Swiss Army Knife for this sort of thing. I did a pretty extensive search on Google and Stack Overflow but didn't really come up with much. The closest thing I could find were shovels but that doesn't really seem like the right tool for the job.
Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to?
generally speaking, yes.
you could set up a delayed re-try to resend the message back to the original queue, using a combination of the delay message exchange plugin.
but this would only automate the retries on an interval, and you may not have fixed the problem before the retries happen.
in some circumstances this is ok - like when the error is caused by an external resource being temporarily unavailable.
in your case, though, i believe your thoughts on creating an app to handle the dead letters is the best way to go, for several reasons:
you need to search through the messages, which isn't possible RMQ
this means you'll need a database to store the messages from the DLX/queue
because you're pulling the messages out of the DLX/queue, you'll need to ensure you get all the header info from the message so that you can re-publish to the correct queue when the time comes.
I certainly can't be the first one to encounter these problems and I'm wondering if anyone else has already solved them.
and you're not!
there are many solutions to this problem that all come down to the solution you've suggested.
some larger "service bus" implementations have this type of feature built in to them. i believe NServiceBus (or the SaaS version of it) has this built in, for example - though I'm not 100% sure of it.
if you want to look into this further, do some search for the term "poison message" - this is generally the term used for this situation. I've found a few things on google with a quick search, that may help you down the path:
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-January/025019.html
https://web.archive.org/web/20170809194056/http://tafakari.co.ke/2014/07/rabbitmq-poison-messages/
https://web.archive.org/web/20170809170555/http://kjnilsson.github.io/blog/2014/01/30/spread-the-poison/
hope that helps!
What I'm really trying to do is leave the message on the queue in the case where it is rejected by the current consumer. In RabbitMQ I could send a NACK to accomplish this. Is NACK supported in EasyNetQ? Is there another way to achieve the behavior I'm looking for?
Update: not a lot of responses, so I'm wondering how people are generally handling the lack of NACK in EasyNetQ. Not having the equivalent of basic.reject limits consumers to "I can always process every message" scenarios. I suppose consumers could throw a specific "rejected" exception to cause EasyNetQ to dequeue the message to the error queue, and I could requeue messages with those errors. Anyone else have other workarounds in place?
I used EasyNetQ for almost a year, but no matter how we tweaked it (amongst other things added our own implementation of IConsumerErrorStrategy) I never really got it to work the way I wanted. The fact that it is single threaded gave us some unexpected behaviour (sometimes deadlocks) when performing RequestAsync while in a SubscribeAsync handler.
The solution for us was to move from EasyNetQ. After working with the official RabbitMq Client for a while, I spent a few days writing a super thin client on top of that. It is influenced by EasyNetQ and supports most of the concepts that EasyNetQ has. However, I added some neat features like pluggable message contexts. I think that the Nack feature of IAdvancedMessageContext that I just added can be something for you:
var client = service.GetService<IBusClient<AdvancedMessageContext>>();
client.RespondAsync<BasicRequest, BasicResponse>((req, ctx) =>
{
ctx?.Nack(); // the context implements IAdvancedMessageContext.
return Task.FromResult<BasicResponse>(null);
}, cfg => cfg.WithNoAck(false));
If you're interested you can read more about it at the Github page (especially the NackTests.cs).
I think you can change the behavior by implementing your own IConsumerErrorStrategy:
https://github.com/EasyNetQ/EasyNetQ/blob/master/Source/EasyNetQ/Consumer/DefaultConsumerErrorStrategy.cs
But if you need that kind of control you might consider just using the RabbitMQ client directly?
It sounds like you are trying to handle failures. You can NACK a message, but that means it sits at the head of the queue. Great, but then it means that you could end up with a bunch of messages that are truthfully unable to be processed, and you will be unable to actually process real messages.
The solution that I have always used when using RabbitMQ is to utilize the default error handling of EasyNetQ, and have a separate application to resend messages. That is, when an exception is captured in RabbitMQ, it routes the message to a queue called "EasyNetQ_Default_Error_Queue". You are able to override this name and have different queues go to different error queues, but for now let's stick with the default. You can then have a Windows Service/Azure Worker role reading these messages, and working out what to do. That may include having a "RetryCount" on your message envelope/wrapper to make sure that it only loops around so many times. All in all, it's going to be a bit of work.
What you are finding, is what many people run into when using RabbitMQ/EasyNetQ. She's pretty raw.
In wcf when i send to method which is one way-
I don't need to get answer now...
later,I need to get an answer for sure.
But how can I be sure that he got the message (to deal with it later )?
What about the 202 reponse ?
http://thejoyofcode.com/One_Way_operations_in_services.aspx
I think the article that you linked to does a nice job explaining it:
a one-way service call doesn't wait for the call to be processed, only
to be delivered - where delivery includes deserialization of the
request.
If you don’t get an exception then the message was successfully acknowledged as received.
IsOneWay introduces asynchronous aspects to your API. If you choose to go that route and you want to know what happened after the message was received, you’ll have to build that mechanism yourself. At a high level there’s nothing WCF specific about the solution. Either:
Call the service back and ask what the result was –OR–
Have the service call you back when its done
It's my understanding we have essentially 2 kinds of exceptions when using NServiceBus.
Environmental : Meaning any required component is not currently available. Usually resulting in a full rollback of the transaction. This is the description I see behind the rollback within NServiceBus Documentation (Including putting the message back on the bus - which sounds fantastic). How do I do this?
Validation : A message is being processed that cannot succeed because of business logic, rules, etc. Where in I want to rollback all database interaction but there's no value in keeping the command in the queue. In which case I just want to roll back the NHibernate section of the transaction - not the MSMQ portion. How do I do this? Typically I would perform validation before any single message is processed but when you have multiple messages bound together into a single transaction and you want to roll them all back this isn't possible via pre-validation.
My assumption is either the answer is insanely obvious and I've overlooked it or what I'm trying to do isn't possible (in regards to the Validation exception).
NSB takes care of getting the message out of the way by moving it to an error queue(v2.5). In v3 this functionality is enhanced and will give you more options to handle faults(DB, custom, etc.). The error queue is configured in your app.config.
In my experience, it's easiest (and probably also more appropriate) to ensure that messages have a very high probability that they can succeed when they participate in a distributed transaction.
Therefore, most validation logic should already have been carried out when you dispatch the command message, and rollback is reserved for the truly exceptional case.
If your client cannot perform the validation, maybe you should insert a validation service in front of your current service. This validation service could route invalid command messages somewhere else before they reach the real service.
Thank you for your answers. I believe the answer lies somewhere between the two.
We are unfortunately unable to implement a validation service but we've simply added better upfront validation to the message processing logic.
Unfortunately until we get to v3 we are currently unable to use the Error Queue as we are utilizing the message response functionality to alert integrators of issues with their messages. And throwing an unhandled error prevents any responses from being generated.