From RabbitMQ - Message order of delivery
Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Does this hold with EasyNetQ? I would have expected it to hold, but I'm sometimes (but not always) seeing a different behaviour.
If the consumer is synchronized (not my case yet; I have locking but it begins after an "if" which is always false in this case), should I trust messages to be consumed in the order they are published? (Same type of message, same origin, going to same destination) Or are there other elements at play here internal to EasyNetQ which I should be aware of which don't preserve the message order of delivery?
EasyNetQ implements a single consumer thread per IBus instance, so if you use the standard non-async subscribe method your message handler will fire synchronously in the same order that messages are delivered by RabbitMQ. There shouldn't be any need to implement a lock. If you use the async subscribe, the handlers will still be called in order, but of course they may ACK out of order depending on how you implement your async handler.
Related
Setting up a CMS consumer with a listener involves two separate calls: first, acquiring a consumer:
cms::MessageConsumer* cms::Session::createConsumer( const cms::Destination* );
and then, setting a listener on the consumer:
void cms::MessageConsumer::setMessageListener( cms::MessageListener* );
Could messages be lost if the implementation subscribes to the destination (and receives messages from the broker/router) before the listener is activated? Or are such messages queued internally and delivered to the listener upon activation?
Why isn't there an API call to create the consumer with a listener as a construction argument? (Is it because the JMS spec doesn't have it?)
(Addendum: this is probably a flaw in the API itself. A more logical order would be to instantiate a consumer from a session, and have a cms::Consumer::subscribe( cms::Destination*, cms::MessageListener* ) method in the API.)
I don't think the API is flawed necessarily. Obviously it could have been designed a different way, but I believe the solution to your alleged problem comes from the start method on the Connection object (inherited via Startable). The documentation for Connection states:
A CMS client typically creates a connection, one or more sessions, and a number of message producers and consumers. When a connection is created, it is in stopped mode. That means that no messages are being delivered.
It is typical to leave the connection in stopped mode until setup is complete (that is, until all message consumers have been created). At that point, the client calls the connection's start method, and messages begin arriving at the connection's consumers. This setup convention minimizes any client confusion that may result from asynchronous message delivery while the client is still in the process of setting itself up.
A connection can be started immediately, and the setup can be done afterwards. Clients that do this must be prepared to handle asynchronous message delivery while they are still in the process of setting up.
This is the same pattern that JMS follows.
In any case I don't think there's any risk of message loss regardless of when you invoke start(). If the consumer is using an auto-acknowledge mode then messages should only be automatically acknowledged once they are delivered synchronously via one of the receive methods or asynchronously through the listener's onMessage. To do otherwise would be a bug in my estimation. I've worked with JMS for the last 10 years on various implementations and I've never seen any kind of condition where messages were lost related to this.
If you want to add consumers after you've already invoked start() you could certainly call stop() first, but I don't see any problem with simply adding them on the fly.
One of the main characteristics of a message queue service, RabbitMQ included, is preserving message publication order. This is confirmed in the RabbitMQ documentation:
[QUOTE 1] Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Let's assume in the following that there are no consumers active, to simplify things. We are publishing over one single channel.
So far, so good.
RabbitMQ also provides possibility to inform the publisher that a certain publication has been completely and correctly processed [*]. This is explained here. Basically, the broker will either send a basic.ack or basic.nack message. The documentation also says this:
[QUOTE 2] basic.ack for a persistent message routed to a durable queue will be
sent after persisting the message to disk.
In most cases, RabbitMQ will acknowledge messages to publishers in the
same order they were published (this applies for messages published on
a single channel). However, publisher acknowledgements are emitted
asynchronously and can confirm a single message or a group of
messages. The exact moment when a confirm is emitted depends on the
delivery mode of a message (persistent vs. transient) and the
properties of the queue(s) the message was routed to (see above).
Which is to say that different messages can be considered ready for
acknowledgement at different times. This means that acknowledgements
can arrive in a different order compared to their respective messages.
Applications should not depend on the order of acknowledgements when
possible.
At first glance, this makes sense: persisting a message takes much more time than just storing it in memory, so it's perfectly possibly that the acknowledgment of a later transient message will arrive before the acknowledgement of an earlier persistent message.
But, if we re-read the first quote regarding message order [QUOTE 1] here above, it gets confusing. I'll explain. Assume we are sending two messages to the same exchange: first a persistent and then a transient message. Since RabbitMQ claims to preserve message order, how can it send an acknowledgment of the second/transient message before it knows that the first/persistent message is indeed completely written to disk?
In other words, does the remark regarding illogical acknowledgement order [QUOTE 2] here above only apply in case the two messages are each routed to completely different target queue(s) (which might happen if they have different routing keys, for example)? In that case, we don't have to guarantee anything as done in [QUOTE 1].
[*] In most cases, this means 'queued'. However, if there are no routing rules applicable, it cannot be enqueued in a target queue. However, this is still a positive outcome regarding publication confirmation.
update
I read this answer on a similar question. This basically says that there are no guarantees whatsoever. Even the most naive implementation, where we delay the publication of message 2 to the point after we got an acknowledgment of message 1, might not result in the desired message order. Basically, [QUOTE 1] is not met.
Is this correct?
From this response on rabbitmq-users:
RabbitMQ knows message position in a queue regardless of whether it is transient or not.
My guess (I did not write that part of the docs) the ack ordering section primarily tries to communicate that if two messages are routed to two different queues, those queues will handle/replicate/persist them concurrently. Reasoning about ordering in more than one queue is pretty hard. A message can go into more than one queue as well.
Nonetheless, RabbitMQ queues know what position a message has in what queues. Once all routing/delivery acknowledgements are received by a channel that handled the publish, it is added to the list of acknowledgements to send out. Note that that
list may or may not be ordered the same way as the original publishes and worrying about that is not practical for many reasons, most importantly: the user typically primarily cares about the ordering in the queues.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
I have a Java application which publishes events to RabbitMQ. It has one very important characteristic: message order must be preserved at all times. The consumer can handle duplicates, but it cannot handle when message 2 is enqueued before message 1, so to say.
I have been reading a lot about RabbitMQ lately, and I feel there is only solution to do this: set the channel in confirm mode (https://www.rabbitmq.com/confirms.html - basically, it forces the broker to acknowledge the publication) and publish one by one. With one by one I mean that the message 2 is only published after RabbitMQ confirmed (via an asynchronous ACK response) that message 1 is actually well received and persisted.
I tried this in a conceptual implementation, and while this works fine, it's uber slow, without exaggerating. Which makes sense: after all, we are now limiting our message rate to 1 message at a time.
So this leads me to my question: are there other, more performant, ways to ensure that message ordering is always preserved (either in RabbitMQ or via different approaches)?
Although my concern is RabbitMQ, I believe this question might be applied to any kind of asynchronous message queue service.
RabbitMQ's clients enqueue in the same order that you sent. It's when subscribers go down, you get network splits or the subscriber NACKs messages that they can get re-ordered; and even then RMQ tries to keep them in the same approximate order by re-queueing at the same position, or as close to the same position.
You can do it like you suggest; take one message at a time, because if you take a message, but crash before you've ACKed it from the broker, it will pop up when your service comes back up, at the same position.
This assumes you only have a single service instance at any given time, consuming from the queue. Which in turn is a distributed systems problem on its own, if you have a scheduler like Kubernetes or Mesos, spawning your service instances.
Another solution would be to ensure ordering of processing in the receiving service, by "resequencing" the messages based on their logical timestamps/sequence numbers.
I've written a much more thorough guide as annotated code here https://github.com/haf/rmq-publisher-confirms-hopac/blob/master/src/Server/Shared/RabbitMQ.fs — with batching you can resequence. Furthermore, if your idempotence builds the consecutive sequence numbers into its logic, you can start taking batches and each event will be idempotent, despite being re-consumed.
I need to choose a new Queue broker for my new project.
This time I need a scalable queue that supports pub/sub, and keeping message ordering is a must.
I read Alexis comment: He writes:
"Indeed, we think RabbitMQ provides stronger ordering than Kafka"
I read the message ordering section in rabbitmq docs:
"Messages can be returned to the queue using AMQP methods that feature
a requeue
parameter (basic.recover, basic.reject and basic.nack), or due to a channel
closing while holding unacknowledged messages...With release 2.7.0 and later
it is still possible for individual consumers to observe messages out of
order if the queue has multiple subscribers. This is due to the actions of
other subscribers who may requeue messages. From the perspective of the queue
the messages are always held in the publication order."
If I need to handle messages by their order, I can only use rabbitMQ with an exclusive queue to each consumer?
Is RabbitMQ still considered a good solution for ordered message queuing?
Well, let's take a closer look at the scenario you are describing above. I think it's important to paste the documentation immediately prior to the snippet in your question to provide context:
Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Messages can be returned to the queue using AMQP methods that feature
a requeue parameter (basic.recover, basic.reject and basic.nack), or
due to a channel closing while holding unacknowledged messages. Any of
these scenarios caused messages to be requeued at the back of the
queue for RabbitMQ releases earlier than 2.7.0. From RabbitMQ release
2.7.0, messages are always held in the queue in publication order, even in the presence of requeueing or channel closure. (emphasis added)
So, it is clear that RabbitMQ, from 2.7.0 onward, is making a rather drastic improvement over the original AMQP specification with regard to message ordering.
With multiple (parallel) consumers, order of processing cannot be guaranteed.
The third paragraph (pasted in the question) goes on to give a disclaimer, which I will paraphrase: "if you have multiple processors in the queue, there is no longer a guarantee that messages will be processed in order." All they are saying here is that RabbitMQ cannot defy the laws of mathematics.
Consider a line of customers at a bank. This particular bank prides itself on helping customers in the order they came into the bank. Customers line up in a queue, and are served by the next of 3 available tellers.
This morning, it so happened that all three tellers became available at the same time, and the next 3 customers approached. Suddenly, the first of the three tellers became violently ill, and could not finish serving the first customer in the line. By the time this happened, teller 2 had finished with customer 2 and teller 3 had already begun to serve customer 3.
Now, one of two things can happen. (1) The first customer in line can go back to the head of the line or (2) the first customer can pre-empt the third customer, causing that teller to stop working on the third customer and start working on the first. This type of pre-emption logic is not supported by RabbitMQ, nor any other message broker that I'm aware of. In either case, the first customer actually does not end up getting helped first - the second customer does, being lucky enough to get a good, fast teller off the bat. The only way to guarantee customers are helped in order is to have one teller helping customers one at a time, which will cause major customer service issues for the bank.
It is not possible to ensure that messages get handled in order in every possible case, given that you have multiple consumers. It doesn't matter if you have multiple queues, multiple exclusive consumers, different brokers, etc. - there is no way to guarantee a priori that messages are answered in order with multiple consumers. But RabbitMQ will make a best-effort.
Message ordering is preserved in Kafka, but only within partitions rather than globally. If your data need both global ordering and partitions, this does make things difficult. However, if you just need to make sure that all of the same events for the same user, etc... end up in the same partition so that they are properly ordered, you may do so. The producer is in charge of the partition that they write to, so if you are able to logically partition your data this may be preferable.
I think there are two things in this question which are not similar, consumption order and processing order.
Message Queues can -to a degree- give you a guarantee that messages will get consumed in order, they can't, however, give you any guarantees on the order of their processing.
The main difference here is that there are some aspects of message processing which cannot be determined at consumption time, for example:
As mentioned a consumer can fail while processing, here the message's consumption order was correct, however, the consumer failed to process it correctly, which will make it go back to the queue. At this point the consumption order is intact, but the processing order is not.
If by "processing" we mean that the message is now discarded and finished processing completely, then consider the case when your processing time is not linear, in other words processing one message takes longer than the other. For example, if message 3 takes longer to process than usual, then messages 4 and 5 might get consumed and finish processing before message 3 does.
So even if you managed to get the message back to the front of the queue (which by the way violates the consumption order) you still cannot guarantee they will also be processed in order.
If you want to process the messages in order:
Have only 1 consumer instance at all times, or a main consumer and several stand-by consumers.
Or don't use a messaging queue and do the processing in a synchronous blocking method, which might sound bad but in many cases and business requirements it is completely valid and sometimes even mission critical.
There are proper ways to guarantuee the order of messages within RabbitMQ subscriptions.
If you use multiple consumers, they will process the message using a shared ExecutorService. See also ConnectionFactory.setSharedExecutor(...). You could set a Executors.newSingleThreadExecutor().
If you use one Consumer with a single queue, you can bind this queue using multiple bindingKeys (they may have wildcards). The messages will be placed into the queue in the same order that they were received by the message broker.
For example you have a single publisher that publishes messages where the order is important:
try (Connection connection2 = factory.newConnection();
Channel channel2 = connection.createChannel()) {
// publish messages alternating to two different topics
for (int i = 0; i < messageCount; i++) {
final String routingKey = i % 2 == 0 ? routingEven : routingOdd;
channel2.basicPublish(exchange, routingKey, null, ("Hello" + i).getBytes(UTF_8));
}
}
You now might want to receive messages from both topics in a queue in the same order that they were published:
// declare a queue for the consumer
final String queueName = channel.queueDeclare().getQueue();
// we bind to queue with the two different routingKeys
final String routingEven = "even";
final String routingOdd = "odd";
channel.queueBind(queueName, exchange, routingEven);
channel.queueBind(queueName, exchange, routingOdd);
channel.basicConsume(queueName, true, new DefaultConsumer(channel) { ... });
The Consumer will now receive the messages in the order that they were published, regardless of the fact that you used different topics.
There are some good 5-Minute Tutorials in the RabbitMQ documentation that might be helpful:
https://www.rabbitmq.com/tutorials/tutorial-five-java.html
When using RabbitMQ as Message Broker, I have a scenario where multiple concurrent consumers pull messages from a Queue using the basic.get AMQP method and use explicit acknowledgement for deleting the message from the Queue. Assuming the following setup
Q has messages M1, M2, M3 and has consumers C1, C2 and C3 (each having its own connection and channel) connected to it.
How is concurrency handled in the basic.get method? Is the call to basic.get method synchronized to handle concurrent consumers each using its own connection and channel? C1, C2 and C3 issue a basic.get call to receive a message at the same time (assume the server receives all 3 requests simultaneously).
C1 requests a message using basic.get and gets M1. When C2 requests for a message, since its using a different connection, does it get M1 again?
How can consumers pull messages in batches of a predefined size?
Your questions really hit at the heart of queuing and process theory, so I will answer from that standpoint (RabbitMQ is really a generic message broker as far as my answers are concerned, as this applies to any message broker).
How is concurrency handled in the basic.get method? Is the call to
basic.get method synchronized to handle concurrent consumers each
using its own connection and channel? C1, C2 and C3 issue a basic.get
call to receive a message at the same time (assume the server receives
all 3 requests simultaneously).
Answer 1: RabbitMQ is designed to be a reliable message broker. It contains internal processes and controls to ensure that the same message does not get passed out multiple times to different consumers. Now, due to the impracticality of testing the scenario that you describe, does it work perfectly? Who knows. That is why properly-designed applications using message-based architecture will use idempotent transactions, such that if the same transaction is processed multiple times, the result will be the same as if the transaction was processed once.
Takeaway: Design your application so that the answer to this question is unimportant.
C1 requests a message using basic.get and gets M1. When C2 requests
for a message, since its using a different connection, does it get M1
again?
Answer 2: No. Subject to the assumptions of my previous answer, the RabbitMQ broker will not serve the same message back once it has been delivered. Depending on the settings of the channel and queue, the message may be automatically acknowledged upon delivery and will never be redelivered. Other settings will have the message requeue automatically upon the "death" of the processing thread/channel or a negative acknowledgment from your processing thread. This is important functionality, since a "poison" message could repeatedly wreak havoc in your application if it could be served up to multiple consumers. Takeaway: you may safely rely on this assumption in designing your application.
How can consumers pull messages in batches of a predefined size?
Answer: They can't, nor would it make sense for them to. In any queuing system, the fundamental assumption is that items are removed from the queue in single file. Attempts to violate this assumption result in unpredictable behavior; furthermore, single-piece flow is commonly the most efficient method of processing. However, in the real world, there are cases where batch sizes > 1 are necessary. In such cases, it makes sense to load the batch into its own single message, so this may require a separate processing thread that pulls messages from the queue and batches them together, or put them in batches initially. Keep in mind that once you have multiple consumers, there is no possible way to guarantee single messages will be processed in order. Takeaway: Batching should be avoided wherever possible, but where it is not practical to avoid, you may not assume that batches will contain individual messages in any particular order.
You might wanna read the RabbitMQ Api guide and the introduction to Amqp.
First of all, avoid consuming messages using basicGet in your consumers. Rather use the Consumer interface basicConsume. This allows RabbitMq to push you messages as they arrive on the queue. Everything else is a waist of resources here as it boils down to busy polling.
When using basicConsume RabbitMq will even push you more messages in the background up to a certain prefetch count. This allows you to process multiple messages concurrently as well as minimizing the time you need to wait for your next message to process (if some message is available).
Concurrency is not an issue at all, that's what you're using a queue for!
When having multiple consumers on one queue, a message will always only be delivered to one consumer (as long as the message is ACKed). Otherwise you need private queues for each consumer and route your messages accordingly.
Btw, if you're able to share the connection among your consumers, you should do so.
Just make sure to use one channel per thread.
There is no special configuration required for that scenario. Each client will atomically fetch and receive one message from the queue, just as you would like to happen.