TL;DR: I need to "replay" dead letter messages back into their original queues once I've fixed the consumer code that was originally causing the messages to be rejected.
I have configured the Dead Letter Exchange (DLX) for RabbitMQ and am successfully routing rejected messages to a dead letter queue. But now I want to look at the messages in the dead letter queue and try to decide what to do with each of them. Some (many?) of these messages should be replayed (requeued) to their original queues (available in the "x-death" headers) once the offending consumer code has been fixed. But how do I actually go about doing this? Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to? And what about searching the dead letter queue? What if I know that a message (let's say which is encoded in JSON) has a certain attribute that I want to search for and replay? For example, I fix a defect which I know will allow message with PacketId: 1234 to successfully process now. I could also write a one-off program for this I suppose.
I certainly can't be the first one to encounter these problems and I'm wondering if anyone else has already solved them. It seems like there should be some sort of Swiss Army Knife for this sort of thing. I did a pretty extensive search on Google and Stack Overflow but didn't really come up with much. The closest thing I could find were shovels but that doesn't really seem like the right tool for the job.
Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to?
generally speaking, yes.
you could set up a delayed re-try to resend the message back to the original queue, using a combination of the delay message exchange plugin.
but this would only automate the retries on an interval, and you may not have fixed the problem before the retries happen.
in some circumstances this is ok - like when the error is caused by an external resource being temporarily unavailable.
in your case, though, i believe your thoughts on creating an app to handle the dead letters is the best way to go, for several reasons:
you need to search through the messages, which isn't possible RMQ
this means you'll need a database to store the messages from the DLX/queue
because you're pulling the messages out of the DLX/queue, you'll need to ensure you get all the header info from the message so that you can re-publish to the correct queue when the time comes.
I certainly can't be the first one to encounter these problems and I'm wondering if anyone else has already solved them.
and you're not!
there are many solutions to this problem that all come down to the solution you've suggested.
some larger "service bus" implementations have this type of feature built in to them. i believe NServiceBus (or the SaaS version of it) has this built in, for example - though I'm not 100% sure of it.
if you want to look into this further, do some search for the term "poison message" - this is generally the term used for this situation. I've found a few things on google with a quick search, that may help you down the path:
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-January/025019.html
https://web.archive.org/web/20170809194056/http://tafakari.co.ke/2014/07/rabbitmq-poison-messages/
https://web.archive.org/web/20170809170555/http://kjnilsson.github.io/blog/2014/01/30/spread-the-poison/
hope that helps!
Related
I am looking to replace an in-house key-value store and dispatch system and I keep hearing that RabbitMQ may be a solution.
I understand that sends and receives messages using queues, and that these events are triggered by producers creating messages, and consumers receiving them.
But what happens if a consumer is created after a message was sent? Can the consumer ask the queue what its last message was? If not, do I need to include some sort of database to store these messages? Or am I looking for some other technology?
A use case is that I want a GUI to get/set parameters that are used by other apps on a local network. On initialization, the GUI needs to know what the last values were.
In an attempt to answer my own question, it may be that RabbitMQ is not what I am looking for. I may want to instead use Kafka which stores its latest key:value pair in a table. Or I may want to use Redis. What do you think?
Thank you for your assistance.
I think I found a satisfactory answer to my question. I'm looking to create a request-reply model, which RabbitMQ is quite capable of handling. Upon opening the GUI, it sends a request to some other process for some variable, stored either in memory or in a database. That process responds with the requested data. Easy enough.
In my app(multiple instances), we occasionally see the case where connection is lost between my app and rabbitmq due to network issues(my app and rabbitmq are both alive), then after connection is recovered(re-established) we will receive messages that are unacked.
This creates an issue for us, because my app wasn't dead, and it is still processing the same message it received before, but now the message is redeivered, and it causes the app to process the message again (which can be fatal to us).
Since the app has multiple instances, it is not easy for an instance to check if another instance is processing the same message at the same time. We can't simply filter out redelivered message, because we need this feature to handle instance/app crashes/re-deployments.
It doesn't seem that there is an api to tell rabbitmq when to not redeliver unacked messages.
So what is the recommended practice to handle this situation ?
Thanks,
The general solution for such scenario is to make the consumers handle the messages in an idempotent manner . Generally what I do is from the producer side ( in case there is no unique identifier in the message body ) I add an attribute idempotencyId to the message body which is a guid and on the consumer side for each message this id is validated against the stored value in database , any duplicates are rejected.
This approach also works for messages which might be shoveled from another cluster or if in a same cluster multiple instances of consumers are listening then too this approach guarantee one time processing.
Would suggest to go over the RabbitMQ Reliability Guide here
Yeah, exactly-once delivery is not something RabbitMQ is good at. In fact, I'd say you should probably not be using it for these kinds of problems. Honestly, the only way to truly fix this is to use distributed transactions or locking.
Anyway, you could turn the problem on its head by ack'ing the message as soon as the consumer gets it, before it starts working on it. That would avoid the RabbitMQ-related duplication issue at least. This is at-most-once delivery.
Of course, it means that if the consumer crashes, the message is lost forever. So you need to persist the message right before you ack it so you can recover it later and also the consumer should remove it once it's complete.
Considering that crashes are rare, you can then have a single dedicated process that just works on those persisted messages. Or for that matter, handle them manually.
Just be aware that you are pushing the duplication problem in front of you, because the consumer might fail to remove the persisted message after it's done working with it anyway, but at least you have the option to implement it however you want.
Storage in this case could be anything from files, a RDBMS or something like ZooKeeper or Redis to lock/unlock in-flight messages.
In some exceptional situations I need somehow to tell consumer on receiving point that some messages shouldn’t be processed. Otherwise two systems will become out-of-sync (we deal with some outdates external systems, and if, for example, connection is dropped we have to discard all queued operations in scope of that connection).
Take a risk and resolve problem messages manually? Compensation actions (that could be tough to support in my case)? Anything else?
There are a few ways:
You can set a time-to-live when sending a message: await endpoint.Send(myMessage, c => c.TimeToLive = TimeSpan.FromHours(1));, but this will apply to all messages that are sent (or published) like this. I would consider this, after looking at your requirements. This is technical, but it is a proper messaging pattern.
Make TTL and generation timestamp properties of your message itself and let the consumer decide if the message is still worth processing. This is more business and, probably, the most correct way.
Combine tech and business - keep the timestamp and TTL in message headers so they don't pollute your message contracts, and filter them out using a custom middleware. In this case, you need to be careful to log such drops so you won't be left wonder why messages disappear now and then.
Almost any unreliable integration can be monitored using sagas, with timeouts. For example, we use a saga to integrate with Twilio. Since we have no ability to open a webhook for them, we poll after some interval to check the message status. You can start a saga when you get a message and schedule a message to check if the processing is still waiting. As discussed in comments, you can either use the "human intervention required" way to fix the issue or let the saga decide to drop the message.
A similar way could be to use a lookup table, where you put the list of messages that aren't relevant for processing. Such a table would be similar to the list of sagas. It seems that this way would also require scheduling. Both here, and for the saga, I'd recommend using a separate receive endpoint (a queue) for the DropIt message, with only one consumer. It would prevent DropIt messages from getting stuck behind the integration messages that are waiting to be processed (and some should be already dropped)
Use RMQ management API to remove messages from the queue. This is the worst method, I won't recommend it.
From what I understand, you're building a system that sends messages to 3rd party systems. In other words, systems you don't control. It has an API but compensating actions aren't always possible, because the API doesn't provide it or because actions are performed inside the 3rd party system that can't be compensated or rolled back?
If possible try to solve this via sagas. Make sure the saga executes the different steps (the sending of messages) in the right order. So that messages that cannot be compensated are sent last. This way message that can be compensated if they fail, will be compensated by the saga. The ones that cannot be compensated should be sent last, when you're as sure as possible that they don't have to be compensated. Because that last message is the last step in synchronizing all systems.
All in all this is one of the problems with distributed systems, keeping everything in sync. Compensating actions is the way to deal with this. If compensating actions aren't possible, you're in a very difficult situation. Try to see if the business can help by becoming more flexible and accepting that you need to compensate things, where they'll tell you it's not possible.
In some exceptional situations I need somehow to tell consumer on receiving point that some messages shouldn’t be processed.
Can't you revert this into:
Tell the consumer that an earlier message can be processed.
This way you can easily turn this in a state machine (like a saga) that acts on two messages. If the 2nd message never arrives then you can discard the 1st after a while or do something else.
The strategy here is to halt/wait until certain that no actions need to be reverted.
I am considering using AMQP for an application where delivery order is paramount.
I cannot therefore use the normal re-delivery features, as undelivered messages are re-queued out of order.
It looks like what I must do is to leave the message on the queue until it has been processed, and then specifically delete it. It is then possible that the same message is processed twice in order, but that is easy to trap and deal with.
However, I don't see how to do this. What I am looking for is some sort of peek and delete message methods, giving me direct control, but they don't seem to exist.
Am I missing something, or trying to solve the problem in the wrong way?
You cannot have peek-and-delete in AMQP. Actually, you cannot browse the messages on a queue without consuming them and Rabbit does not provide any extension to enable this.
The general response to your problem is "Think very carefully if you actually need that in-order constraint", because, for instance, with that constraint in place, you cannot have multiple consumers on a queue.
I have been solving the same problem. In my solutions I have been wrapping the messages into one single message where the outer message has been processed first and then I have processed the remaining inner messages in the wrapped order. This has some disadvantages, for example big messages (once your wrapping hierarchy contains many messages), more difficult serialization, ..., but for me the solution was suitable enough.
RabbitMQ ticks all the boxes for the project I am planning, save one. I would have different workers listening on a queue and it is important that they process the newest messages (i.e., latest sequence number) first (LIFO).
My application is such that newer messages pretty much obsolete older messages. If you have workers to spare you could still process the older messages but it is important the newer ones are done first.
After trawling the various forums and such I can only see one solution and that is for a client to process a message it should first:
consume all messages
re-order them according to the sequence number
re-submit to the queue
consume the first message
Ugly and problematic if the client dies halfway. But mabye somebody here has a better solution.
My research is based (in part) on:
http://groups.google.com/group/rabbitmq-discuss/browse_thread/thread/e79e77d86bc7a3b8?fwc=1
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2010-July/007934.html
http://groups.google.com/group/rabbitmq-discuss/browse_thread/thread/e40d1069dcebe2cc
http://old.nabble.com/Priority-Queue-implementation-and-performance-td29946348.html
Note: the expected traffic of messages will roughly be in the range of 1 msg/hour for some queues and 100/minute for others. So nothing stellar.
Since there is no reply I guess I did my homework rather well ;)
Anyway, after discussing the requirements with the other stakeholders it was decided I can drop the LIFO requirement for now. We can worry about that when it comes to it.
A solution that we will probably end up adopting is for the worker to open a second queue that the master can use to let the worker know what jobs to ignore + provide additional control/monitoring information (which it looks like we will need anyway).
The RabbitMQ implementing the AMQP 1.0 spec may also help here.
So I will mark this question as answered for now. Somebody else is still free to add or improve.
One possibility might be to use basic.get in a loop and wait for the response basic-ok.message-count to become zero (throwing away all other messages):
while (<get ok> = <call basic.get>) {
if (<get ok>.message-count == 0) {
// Now <get ok> is the most recent message on this queue
break;
} else if (<is get-empty>) {
// Someone else got it
}
}
Of course, you'd have to set up the message routing patterns on the broker such that 1 consumer throwing away messages doesn't mess with another. Try to avoid re queueing messages as they will re queue at the top of the stack, making them look like the most recent.