Regarding message order guarantees in RabbitMQ/AMQP - rabbitmq

One of the main characteristics of a message queue service, RabbitMQ included, is preserving message publication order. This is confirmed in the RabbitMQ documentation:
[QUOTE 1] Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Let's assume in the following that there are no consumers active, to simplify things. We are publishing over one single channel.
So far, so good.
RabbitMQ also provides possibility to inform the publisher that a certain publication has been completely and correctly processed [*]. This is explained here. Basically, the broker will either send a basic.ack or basic.nack message. The documentation also says this:
[QUOTE 2] basic.ack for a persistent message routed to a durable queue will be
sent after persisting the message to disk.
In most cases, RabbitMQ will acknowledge messages to publishers in the
same order they were published (this applies for messages published on
a single channel). However, publisher acknowledgements are emitted
asynchronously and can confirm a single message or a group of
messages. The exact moment when a confirm is emitted depends on the
delivery mode of a message (persistent vs. transient) and the
properties of the queue(s) the message was routed to (see above).
Which is to say that different messages can be considered ready for
acknowledgement at different times. This means that acknowledgements
can arrive in a different order compared to their respective messages.
Applications should not depend on the order of acknowledgements when
possible.
At first glance, this makes sense: persisting a message takes much more time than just storing it in memory, so it's perfectly possibly that the acknowledgment of a later transient message will arrive before the acknowledgement of an earlier persistent message.
But, if we re-read the first quote regarding message order [QUOTE 1] here above, it gets confusing. I'll explain. Assume we are sending two messages to the same exchange: first a persistent and then a transient message. Since RabbitMQ claims to preserve message order, how can it send an acknowledgment of the second/transient message before it knows that the first/persistent message is indeed completely written to disk?
In other words, does the remark regarding illogical acknowledgement order [QUOTE 2] here above only apply in case the two messages are each routed to completely different target queue(s) (which might happen if they have different routing keys, for example)? In that case, we don't have to guarantee anything as done in [QUOTE 1].
[*] In most cases, this means 'queued'. However, if there are no routing rules applicable, it cannot be enqueued in a target queue. However, this is still a positive outcome regarding publication confirmation.
update
I read this answer on a similar question. This basically says that there are no guarantees whatsoever. Even the most naive implementation, where we delay the publication of message 2 to the point after we got an acknowledgment of message 1, might not result in the desired message order. Basically, [QUOTE 1] is not met.
Is this correct?

From this response on rabbitmq-users:
RabbitMQ knows message position in a queue regardless of whether it is transient or not.
My guess (I did not write that part of the docs) the ack ordering section primarily tries to communicate that if two messages are routed to two different queues, those queues will handle/replicate/persist them concurrently. Reasoning about ordering in more than one queue is pretty hard. A message can go into more than one queue as well.
Nonetheless, RabbitMQ queues know what position a message has in what queues. Once all routing/delivery acknowledgements are received by a channel that handled the publish, it is added to the list of acknowledgements to send out. Note that that
list may or may not be ordered the same way as the original publishes and worrying about that is not practical for many reasons, most importantly: the user typically primarily cares about the ordering in the queues.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

Related

RabbitMQ redeliver message to the same consumer that rejected it

I have an queue and messages in it. Also i have two consumer in separate processes. I take message by one, and decide that this message is not mine, and reject it with requeue flag. In documentation I found the next phrase "The server MUST NOT deliver the message to the same client within the context of the current channel". Is that mean that the rejected message should be deliver to another consumer or not?
So, there are a couple of things going on here that I'd like to touch on.
First, your question as to the behavior of RabbitMQ. The rule referenced above comes from the AMQP-0-9-1 specification. As with most implementation of open specs, RabbitMQ is not fully-conforming. This page describes in precise detail exactly which portions of the specification are implemented, and where any deviations occur.
On that page, it stipulates that "No attempt is made to prevent redelivery to the same client." RabbitMQ lists this as a planned addition in a future release, but it has been planned for quite a few years now.
Should Consumers Be Picky?
The more important question is the one you haven't directly asked, but that is "should my consumer be picky about which messages from the queue it processes?
The answer to this is a definitive "no." One of the key design assumptions about message queues is that any consumer subscribed to the queue should be able to process any message in the queue. Thus, it should be considered proper design that all consumers attached to the queue are running identical code (same code base, same version). If not, you're going to have some serious problems with your application sooner or later.
Reject should only be used to tell the broker that there is a problem with a particular message. If there is a problem with a particular consumer (e.g. loses connection to a database), it should not reject the message, but instead should close the connection, triggering redelivery to another, working consumer. By design, messages that need to be processed by a specialized or different consumer should be deposited in a different queue.

How to achieve round-robin topic exchange in RabbitMQ

I know that achieving round-robin behaviour in a topic exchange can be tricky or impossible so my question in fact is if there is anything I can make out of RabbitMQ or look away to other message queues that support that.
Here's a detailed explanation of my application requirements:
There will be one producer, let's call it P
There (potentially) will be thousands of consumers, let's call them Cn
Each consumer can "subscribe" to 1 or more topic exchange and multiple consumers can be subscribed to the same topic
Every message published into the topic should be consumed by only ONE consumer
Use case #1
Assume:
Topics
foo.bar
foo.baz
Consumers
Consumer C1 is subscribed to topic #
Consumer C2 is subscribed to topic foo.*
Consumer C3 is subscribed to topic *.bar
Producer P publishes the following messages:
publish foo.qux: C1 and C2 can potentially consume this message but only one receives it
publish foo.bar: C1, C2 and C3 can potentially consume this message but only one receives it
Note
Unfortunately I can't have a separate queue for each "topic" therefore using the Direct Exchange doesn't work since the number of topic combinations can be huge (tens of thousands)
From what I've read, there is no out-of-the box solution with RabbitMQ. Does anybody know a workaround or there's another message queue solution that would support this, ex. Kafka, Kinesis etc.
Thank you
There appears to be a conflation of the role of the exchange, which is to route messages, and the queue, which is to provide a holding place for messages waiting to be processed. Funneling messages into one or more queues is the job of the exchange, while funneling messages from the queue into multiple consumers is the job of the queue. Round robin only comes into play for the latter.
Fundamentally, a topic exchange operates by duplicating messages, one for each queue matching the topic published with the message. Therefore, any expectation of round-robin behavior would be a mistake, as it goes against the very definition of the topic exchange.
All this does is to establish that, by definition, the scenario presented in the question does not make sense. That does not mean the desired behavior is impossible, but the terms and topology may need some clarifying adjustments.
Let's take a step back and look at the described lifetime for one message: It is produced by exactly one producer and consumed by one of many consumers. Ordinarily, that is the scenario addressed by a direct exchange. The complicating factor in this is that your consumers are selective about what types of messages they will consume (or, to put it another way, your producer is not consistent about what types of messages it produces).
Ordinarily in message-oriented processing, a single message type corresponds to a single consumer type. Therefore, each different type of message would get its own corresponding queue. However, based on the description given in this question, a single message type might correspond to multiple different consumer types. One issue I have is the following statement:
Unfortunately I can't have a separate queue for each "topic"
On its face, that statement makes no sense, because what it really says is that you have arbitrarily many (in fact, an unknown number of) message types; if that were the case, then how would you be able to write code to process them?
So, ignoring that statement for a bit, we are led to two possibilities with RabbitMQ out of the box:
Use a direct exchange and publish your messages using the type of message as a routing key. Then, have your various consumers subscribe to only the message types that they can process. This is the most common message processing pattern.
Use a topic exchange, as you have, and come up with some sort of external de-duplication logic (perhaps memcached), where messages are checked against it and discarded if another consumer has started to process it.
Now, neither of these deals explicitly with the round-robin requirement. Since it was not explained why or how this was important, it is assumed that it can be ignored. If not, further definition of the problem space is required.

rabbitmq: can consumer persist message change before nack?

Before a consumer nacks a message, is there any way the consumer can modify the message's state so that when the consumer consumes it upon redelivery, it sees that changed state. I'd rather not reject + reenqueue new message, but please let me know if that's the only way to accomplish this.
My goal is to determine how many times specific messages are being redelivered. I see two ways of doing this:
(1) On the message itself as described above. The message would be a container of basic stats and the application payload message.
(2) In some external storage. We would uniquely identify the message by the message id that we set.
I know 2 is possible, but my question is if 1 is possible.
There is no way to do (1) like you want. You would need to change the message, thus the message would become another message. If you want to do something like that (and it's possible that you meant this with I'd rather not reject + reenqueue new message) - you should ACK the message, increment one field in it and publish it again (again, maybe this is what you meant when you said reenqueue it). So your message payload would have some ID, counter, and again (obviously different) payload that is the content.
Definitvly much better way is (2) for multiple reasons:
it does not interfere with business logic, that is this diagnostic part is isolated
you are leaving re-queueing to rabbitmq (as you are supposed to do), meaning that you are not worrying about losing messages and handling some message meta info which has no use for you business logic
it's actually supposed to be used - the ACKing and NACKing, that's why it's in the AMQP specification
since you do need the number of how many times specific messages have been redelivered, you have it somewhere externally, meaning that it's independent of (rabbitmq's) message persistence, lifetime, potentially queue durability mirroring etc
Even if this question was marked as solved some time ago, I want to mention that there is a way at least for the redelivery. It might be integrated after the original answer. There is a different type of queues in RabbitMQ called Quorum queues.
Quorum queues offer the option to set redelivery limit:
Quorum queues support poison message handling via a redelivery limit. This feature is currently unique to Quorum queues.
In order to archive this, RabbitMQ is counting the numbers of deliveries in the header. The header attribute is called: x-delivery-count

Dead lettering messages on an expired queue bound with a consistent hash exchange

I have a situation where I am processing events that are related to specific sources. Each source has a key or ID, which I can use as the hash. Events from each source have to be processed in order, but events from different sources can be parallelized, to achieve horizontal scalability. There will be hundreds of source keys.
I am planning to set the key as part of the routing key when submitting messages to RabbitMQ, and then use the consistent-hash-exchange so that events from the same source are routed to the same queue. I was then thinking of dynamically binding private queues from consumers, with a TTL (so that they are gracefully removed if a consumer is down). At the beginning I will just have 2 or 3 consumers for redundancy, but if I want to scale up due to an increased number of messages, I can just start another consumer.
My question is what happens if a consumer is down and there are messages in its queue? Ideally I would want the messages in the queue to be rerouted back to the exchange, with the consistent-hash-exchange routing them to a different queue (since the original queue would be no longer there).
The RabbitMQ documentation about dead lettering doesn't explicitly mention the scenario of TTL on consumer queues, or what happens when the queue gets deleted.
Does my approach make sense? How can I achieve the consumer fault-tolerance I am looking for while retaining the ordering by a specific routing key?
Note: I know there is even a more subtle race condition if during the process of routing dead lettered messages to the exchange new messages come that were originally routed to the expired queue, which will now be routed to a different consumer, thus ordering will be broken at that specific instance.
There are more then one questions to be answered here, I'll try to go in the same order.
My question is what happens if a consumer is down and there are messages in its queue?
Outside of the context (rest of the question) - messages stay in the queue until they are ACKed or their TTL expires.
The RabbitMQ documentation about dead lettering doesn't explicitly mention the scenario of TTL on consumer queues, or what happens when the queue gets deleted.
It does say ...The TTL for the message expires..., so basically if the message is not ACKed within given TTL, it get's to DLX. For the queue TTL, check this link - it's basically an "expiry time" for the queue. Additionally, if the queue get's deleted, the messages are gone (when not taking into account any mirroring of course).
Now for the "does it makes sense" part. For the messages from the different sources, I think it's clear - process as much as you can in parallel and that's it. There are no collisions (well usually no) there.
How can I achieve the consumer fault-tolerance I am looking for while retaining the ordering by a specific routing key?
For sequential processing, basically you need exactly one consumer that does one source. Now for monitoring this consumer maybe add a watchdog to start it again if it crashes, or restart it if hangs etc. Maybe it would also make sense to use get instead of consume (amqp) method. I can't really recommend or not recommend this approach, because (for me at least) it's quite use case specific (performance, how often is there a new message etc), but I would say that in that way it's easier to achieve a "more synchronous" behavior.
And for sure (now referring to what you wrote in the note) you should try and avoid DLX-ing messages (higher TTL etc) if you really want to keep the original order of the sequence (said it redundantly on purpose :) )

How to solve message disorder in RabbitMq? [duplicate]

I need to choose a new Queue broker for my new project.
This time I need a scalable queue that supports pub/sub, and keeping message ordering is a must.
I read Alexis comment: He writes:
"Indeed, we think RabbitMQ provides stronger ordering than Kafka"
I read the message ordering section in rabbitmq docs:
"Messages can be returned to the queue using AMQP methods that feature
a requeue
parameter (basic.recover, basic.reject and basic.nack), or due to a channel
closing while holding unacknowledged messages...With release 2.7.0 and later
it is still possible for individual consumers to observe messages out of
order if the queue has multiple subscribers. This is due to the actions of
other subscribers who may requeue messages. From the perspective of the queue
the messages are always held in the publication order."
If I need to handle messages by their order, I can only use rabbitMQ with an exclusive queue to each consumer?
Is RabbitMQ still considered a good solution for ordered message queuing?
Well, let's take a closer look at the scenario you are describing above. I think it's important to paste the documentation immediately prior to the snippet in your question to provide context:
Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Messages can be returned to the queue using AMQP methods that feature
a requeue parameter (basic.recover, basic.reject and basic.nack), or
due to a channel closing while holding unacknowledged messages. Any of
these scenarios caused messages to be requeued at the back of the
queue for RabbitMQ releases earlier than 2.7.0. From RabbitMQ release
2.7.0, messages are always held in the queue in publication order, even in the presence of requeueing or channel closure. (emphasis added)
So, it is clear that RabbitMQ, from 2.7.0 onward, is making a rather drastic improvement over the original AMQP specification with regard to message ordering.
With multiple (parallel) consumers, order of processing cannot be guaranteed.
The third paragraph (pasted in the question) goes on to give a disclaimer, which I will paraphrase: "if you have multiple processors in the queue, there is no longer a guarantee that messages will be processed in order." All they are saying here is that RabbitMQ cannot defy the laws of mathematics.
Consider a line of customers at a bank. This particular bank prides itself on helping customers in the order they came into the bank. Customers line up in a queue, and are served by the next of 3 available tellers.
This morning, it so happened that all three tellers became available at the same time, and the next 3 customers approached. Suddenly, the first of the three tellers became violently ill, and could not finish serving the first customer in the line. By the time this happened, teller 2 had finished with customer 2 and teller 3 had already begun to serve customer 3.
Now, one of two things can happen. (1) The first customer in line can go back to the head of the line or (2) the first customer can pre-empt the third customer, causing that teller to stop working on the third customer and start working on the first. This type of pre-emption logic is not supported by RabbitMQ, nor any other message broker that I'm aware of. In either case, the first customer actually does not end up getting helped first - the second customer does, being lucky enough to get a good, fast teller off the bat. The only way to guarantee customers are helped in order is to have one teller helping customers one at a time, which will cause major customer service issues for the bank.
It is not possible to ensure that messages get handled in order in every possible case, given that you have multiple consumers. It doesn't matter if you have multiple queues, multiple exclusive consumers, different brokers, etc. - there is no way to guarantee a priori that messages are answered in order with multiple consumers. But RabbitMQ will make a best-effort.
Message ordering is preserved in Kafka, but only within partitions rather than globally. If your data need both global ordering and partitions, this does make things difficult. However, if you just need to make sure that all of the same events for the same user, etc... end up in the same partition so that they are properly ordered, you may do so. The producer is in charge of the partition that they write to, so if you are able to logically partition your data this may be preferable.
I think there are two things in this question which are not similar, consumption order and processing order.
Message Queues can -to a degree- give you a guarantee that messages will get consumed in order, they can't, however, give you any guarantees on the order of their processing.
The main difference here is that there are some aspects of message processing which cannot be determined at consumption time, for example:
As mentioned a consumer can fail while processing, here the message's consumption order was correct, however, the consumer failed to process it correctly, which will make it go back to the queue. At this point the consumption order is intact, but the processing order is not.
If by "processing" we mean that the message is now discarded and finished processing completely, then consider the case when your processing time is not linear, in other words processing one message takes longer than the other. For example, if message 3 takes longer to process than usual, then messages 4 and 5 might get consumed and finish processing before message 3 does.
So even if you managed to get the message back to the front of the queue (which by the way violates the consumption order) you still cannot guarantee they will also be processed in order.
If you want to process the messages in order:
Have only 1 consumer instance at all times, or a main consumer and several stand-by consumers.
Or don't use a messaging queue and do the processing in a synchronous blocking method, which might sound bad but in many cases and business requirements it is completely valid and sometimes even mission critical.
There are proper ways to guarantuee the order of messages within RabbitMQ subscriptions.
If you use multiple consumers, they will process the message using a shared ExecutorService. See also ConnectionFactory.setSharedExecutor(...). You could set a Executors.newSingleThreadExecutor().
If you use one Consumer with a single queue, you can bind this queue using multiple bindingKeys (they may have wildcards). The messages will be placed into the queue in the same order that they were received by the message broker.
For example you have a single publisher that publishes messages where the order is important:
try (Connection connection2 = factory.newConnection();
Channel channel2 = connection.createChannel()) {
// publish messages alternating to two different topics
for (int i = 0; i < messageCount; i++) {
final String routingKey = i % 2 == 0 ? routingEven : routingOdd;
channel2.basicPublish(exchange, routingKey, null, ("Hello" + i).getBytes(UTF_8));
}
}
You now might want to receive messages from both topics in a queue in the same order that they were published:
// declare a queue for the consumer
final String queueName = channel.queueDeclare().getQueue();
// we bind to queue with the two different routingKeys
final String routingEven = "even";
final String routingOdd = "odd";
channel.queueBind(queueName, exchange, routingEven);
channel.queueBind(queueName, exchange, routingOdd);
channel.basicConsume(queueName, true, new DefaultConsumer(channel) { ... });
The Consumer will now receive the messages in the order that they were published, regardless of the fact that you used different topics.
There are some good 5-Minute Tutorials in the RabbitMQ documentation that might be helpful:
https://www.rabbitmq.com/tutorials/tutorial-five-java.html