I know that achieving round-robin behaviour in a topic exchange can be tricky or impossible so my question in fact is if there is anything I can make out of RabbitMQ or look away to other message queues that support that.
Here's a detailed explanation of my application requirements:
There will be one producer, let's call it P
There (potentially) will be thousands of consumers, let's call them Cn
Each consumer can "subscribe" to 1 or more topic exchange and multiple consumers can be subscribed to the same topic
Every message published into the topic should be consumed by only ONE consumer
Use case #1
Assume:
Topics
foo.bar
foo.baz
Consumers
Consumer C1 is subscribed to topic #
Consumer C2 is subscribed to topic foo.*
Consumer C3 is subscribed to topic *.bar
Producer P publishes the following messages:
publish foo.qux: C1 and C2 can potentially consume this message but only one receives it
publish foo.bar: C1, C2 and C3 can potentially consume this message but only one receives it
Note
Unfortunately I can't have a separate queue for each "topic" therefore using the Direct Exchange doesn't work since the number of topic combinations can be huge (tens of thousands)
From what I've read, there is no out-of-the box solution with RabbitMQ. Does anybody know a workaround or there's another message queue solution that would support this, ex. Kafka, Kinesis etc.
Thank you
There appears to be a conflation of the role of the exchange, which is to route messages, and the queue, which is to provide a holding place for messages waiting to be processed. Funneling messages into one or more queues is the job of the exchange, while funneling messages from the queue into multiple consumers is the job of the queue. Round robin only comes into play for the latter.
Fundamentally, a topic exchange operates by duplicating messages, one for each queue matching the topic published with the message. Therefore, any expectation of round-robin behavior would be a mistake, as it goes against the very definition of the topic exchange.
All this does is to establish that, by definition, the scenario presented in the question does not make sense. That does not mean the desired behavior is impossible, but the terms and topology may need some clarifying adjustments.
Let's take a step back and look at the described lifetime for one message: It is produced by exactly one producer and consumed by one of many consumers. Ordinarily, that is the scenario addressed by a direct exchange. The complicating factor in this is that your consumers are selective about what types of messages they will consume (or, to put it another way, your producer is not consistent about what types of messages it produces).
Ordinarily in message-oriented processing, a single message type corresponds to a single consumer type. Therefore, each different type of message would get its own corresponding queue. However, based on the description given in this question, a single message type might correspond to multiple different consumer types. One issue I have is the following statement:
Unfortunately I can't have a separate queue for each "topic"
On its face, that statement makes no sense, because what it really says is that you have arbitrarily many (in fact, an unknown number of) message types; if that were the case, then how would you be able to write code to process them?
So, ignoring that statement for a bit, we are led to two possibilities with RabbitMQ out of the box:
Use a direct exchange and publish your messages using the type of message as a routing key. Then, have your various consumers subscribe to only the message types that they can process. This is the most common message processing pattern.
Use a topic exchange, as you have, and come up with some sort of external de-duplication logic (perhaps memcached), where messages are checked against it and discarded if another consumer has started to process it.
Now, neither of these deals explicitly with the round-robin requirement. Since it was not explained why or how this was important, it is assumed that it can be ignored. If not, further definition of the problem space is required.
Related
I have a hard time understanding the basic concepts of RabbitMQ. I find the online documentation not perfectly clear.
So far I understand, what a channel, a queue, a binding etc. is.
But how would the following use case be implemented:
Use Case: Sender posts to one exchange with different topics. On the receiver side, depending on the topic, different receivers should be notified.
So the following should somehow be feasible with a topic exchange:
create a channel
within this channel, create a topic exchange
for each topic to be subscribed to, create a queue and a queue binding with this topic as property
My difficulty is that the callback would be related to the channel, not to the queue or the queue binding. I am not 100 % sure if I am right here.
So that's my question: in order to have multiple callbacks, IOW: different message handlers, depending on the subscribed topic - do you have to create multiple channels, one for each "different message handling"? All these channels should grab the same exchange and define their own queue + queue binding for that specific topic?
Please confirm if this is correct or if I am straying from the canonic path of AMPQ ... "queue" sounds so light-weight, so I intuitively thought of a queue or a queue binding as the right point to attach a consuming event handler to, but it seems that, instead, channel is my friend in this. Right?
Another aspect of my question:
If I really have to use multiple channels for this, do I have to declare the same exchange (exchange name and exchange type of "topic") for each channel? I hoped there was something like:
define the exchange with this name and the type of "topic" once
for each channel, "grab" this predefined exchange and use it by adding queues and queue bindings to this exchange
I find it helpful to think about the roles of the broker (RabbitMQ) and the clients (your applications) separately.
The broker, RabbitMQ, will receive messages from your publishers, route them to queues, and eventually send them to consumers. The message routing can be simple or complex. In your case, the routing is topic based with a few different queues.
You haven't said much about the publishers, likely because their job is simple. They send messages with a routing key to RabbitMQ.
The consumer side is where things can get interesting. At the simplest level, a consumer subscribes to a queue, receives messages from RabbitMQ, and processes them. The consumer opens a connection to RabbitMQ and will use a channel for a particular use (e.g., subscribing to a queue). The power of message brokers is that they allow designers to break up processes into separate apps if desired.
You don't give much insight into your application, other than the presence of different message topics. An important design choice for you to make is how to define the application(s). Are the different topics suitable for separate applications, or will a single application handle all types of messages.
For the former case, you would have one application for each queue. A single channel that subscribes to the queue is probably the most sensible decision unless your application needs to be threaded. For threaded applications, each thread would have its own channel and all threads can be subscribed to the same queue. Each application would have its own callback function for processing that type of message.
For the latter case (single application with multiple queues), the best approach would be to have at least one channel per queue. It sounds like each queue would require its own callback function, and you would assign the functions to the channels according to its subscription. You might have multiple channels per queue if your application can process multiple messages (of each topic) simultaneously.
Regarding your question about declaring exchanges, queues, and bindings, these items only need to be created once. But it is reasonable practice to have your clients declare them at connection time. Advantages of declaring them are that they will be created again if they were deleted and that any discrepancies between your declaration and what is on the broker will trigger errors.
One of the main characteristics of a message queue service, RabbitMQ included, is preserving message publication order. This is confirmed in the RabbitMQ documentation:
[QUOTE 1] Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Let's assume in the following that there are no consumers active, to simplify things. We are publishing over one single channel.
So far, so good.
RabbitMQ also provides possibility to inform the publisher that a certain publication has been completely and correctly processed [*]. This is explained here. Basically, the broker will either send a basic.ack or basic.nack message. The documentation also says this:
[QUOTE 2] basic.ack for a persistent message routed to a durable queue will be
sent after persisting the message to disk.
In most cases, RabbitMQ will acknowledge messages to publishers in the
same order they were published (this applies for messages published on
a single channel). However, publisher acknowledgements are emitted
asynchronously and can confirm a single message or a group of
messages. The exact moment when a confirm is emitted depends on the
delivery mode of a message (persistent vs. transient) and the
properties of the queue(s) the message was routed to (see above).
Which is to say that different messages can be considered ready for
acknowledgement at different times. This means that acknowledgements
can arrive in a different order compared to their respective messages.
Applications should not depend on the order of acknowledgements when
possible.
At first glance, this makes sense: persisting a message takes much more time than just storing it in memory, so it's perfectly possibly that the acknowledgment of a later transient message will arrive before the acknowledgement of an earlier persistent message.
But, if we re-read the first quote regarding message order [QUOTE 1] here above, it gets confusing. I'll explain. Assume we are sending two messages to the same exchange: first a persistent and then a transient message. Since RabbitMQ claims to preserve message order, how can it send an acknowledgment of the second/transient message before it knows that the first/persistent message is indeed completely written to disk?
In other words, does the remark regarding illogical acknowledgement order [QUOTE 2] here above only apply in case the two messages are each routed to completely different target queue(s) (which might happen if they have different routing keys, for example)? In that case, we don't have to guarantee anything as done in [QUOTE 1].
[*] In most cases, this means 'queued'. However, if there are no routing rules applicable, it cannot be enqueued in a target queue. However, this is still a positive outcome regarding publication confirmation.
update
I read this answer on a similar question. This basically says that there are no guarantees whatsoever. Even the most naive implementation, where we delay the publication of message 2 to the point after we got an acknowledgment of message 1, might not result in the desired message order. Basically, [QUOTE 1] is not met.
Is this correct?
From this response on rabbitmq-users:
RabbitMQ knows message position in a queue regardless of whether it is transient or not.
My guess (I did not write that part of the docs) the ack ordering section primarily tries to communicate that if two messages are routed to two different queues, those queues will handle/replicate/persist them concurrently. Reasoning about ordering in more than one queue is pretty hard. A message can go into more than one queue as well.
Nonetheless, RabbitMQ queues know what position a message has in what queues. Once all routing/delivery acknowledgements are received by a channel that handled the publish, it is added to the list of acknowledgements to send out. Note that that
list may or may not be ordered the same way as the original publishes and worrying about that is not practical for many reasons, most importantly: the user typically primarily cares about the ordering in the queues.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
I'm looking for a way to buffer messages received by the exchange as long as there is at least one queue bind to that exchange.
Is it supported by RabbitMQ?
Maybe there are some workarounds (I didn't find any).
EDIT
My use case:
I've got one data producer (which reads real-time data from an external system)
I've got one fanout exchange which receives data from the producer
On system startup, there might be no consumer, but after a few moments, there should be at least one which creates his own queue and binds it to the exchange from 2.
The problem is this short time between step 2. and 3. where there are no queues bound to the exchange created in step 1.
Of course, it's an edge case and after system initialization queues and exchanges are bound and everything works as expected.
Why queues and bindings has to be created by consumers (not by the producer)? Because I need a flexible setup where I can add consumers without any changes in other components code (e.g. producer).
EDIT 2
I'm processing the output from another system which stores both real-time and historical data. There are the cases where I want to read historical data first (on initialization) and then continue to handle real-time data.
I may mislead you by saying that there are multiple consumers. In the case where I need a buffer on exchange there is only one consumer (which writes everything to time series DB as it appears in queue).
The RabbitMQ team monitors this mailing list and only sometimes answers questions on StackOverflow.
Why queues and bindings has to be created by consumers (not by the producer)?
Queues and bindings can be created by producers or consumers or both. The requirement is that the exact same arguments are used when creating them if a client application tries to "re-create" a queue or binding. If different arguments are used, a channel-level error will happen.
As you have found, if a producer publishes to an exchange that can't route messages, they will be lost. Olivier's suggestion to use an alternate exchange is a good one, but I recommend you have your producers create queues and bindings as well.
If you mean to avoid throwing away messages because there is no destination configured for it, yes.
You should look at alternate exchange.
This assume that before (or when) you start (or when), the alternate exchange is created (would typically go for fanout) and a queue is binded to it (let's call it notroutedq).
So the messages are not lost, they will be stored in notroutedq.
From there you can possibly setup a mechanism that would reprocess messages in that queue - reinjecting them into the main exchange most likely - once a given time has passed or when a binding has been added to your main exchange.
-- EDIT --
Thanks for the updated info.
Could you indicate how long typically you'd expect the past messages to be useful to the consumers?
In your description, you mention real-time data and possibly multiple consumers coming and going. Based on that, I'm not sure how much of the data kept in the notroutedq would be of value, and with which frequency you'd expect to resend them to the consumers.
The cases I had with alternate exchange where mostly focused on identifying missing bindings, so that one could easily correct the bindings and reprocess the messages without loss.
If the number of consumers varies through time and the data content is real-time, I'd wonder a bit about the benefit of keeping the data.
Before a consumer nacks a message, is there any way the consumer can modify the message's state so that when the consumer consumes it upon redelivery, it sees that changed state. I'd rather not reject + reenqueue new message, but please let me know if that's the only way to accomplish this.
My goal is to determine how many times specific messages are being redelivered. I see two ways of doing this:
(1) On the message itself as described above. The message would be a container of basic stats and the application payload message.
(2) In some external storage. We would uniquely identify the message by the message id that we set.
I know 2 is possible, but my question is if 1 is possible.
There is no way to do (1) like you want. You would need to change the message, thus the message would become another message. If you want to do something like that (and it's possible that you meant this with I'd rather not reject + reenqueue new message) - you should ACK the message, increment one field in it and publish it again (again, maybe this is what you meant when you said reenqueue it). So your message payload would have some ID, counter, and again (obviously different) payload that is the content.
Definitvly much better way is (2) for multiple reasons:
it does not interfere with business logic, that is this diagnostic part is isolated
you are leaving re-queueing to rabbitmq (as you are supposed to do), meaning that you are not worrying about losing messages and handling some message meta info which has no use for you business logic
it's actually supposed to be used - the ACKing and NACKing, that's why it's in the AMQP specification
since you do need the number of how many times specific messages have been redelivered, you have it somewhere externally, meaning that it's independent of (rabbitmq's) message persistence, lifetime, potentially queue durability mirroring etc
Even if this question was marked as solved some time ago, I want to mention that there is a way at least for the redelivery. It might be integrated after the original answer. There is a different type of queues in RabbitMQ called Quorum queues.
Quorum queues offer the option to set redelivery limit:
Quorum queues support poison message handling via a redelivery limit. This feature is currently unique to Quorum queues.
In order to archive this, RabbitMQ is counting the numbers of deliveries in the header. The header attribute is called: x-delivery-count
Pretty new to RabbitMQ and we're still in the investigation stage to see if it's a good fit for our use cases--
We've readily come to the conclusion that our desired topology would have us deploying a few topic based exchanges, and then filtering from there to specific queues. For example, let's say we have a user and an upload exchange, where the user queue might receive messages where the topic is "new-registration" or "friend-request" and the upload exchange might receive messages like "video-upload" or "picture-upload".
Creating the queues, getting them routed to the appropriate queue, and then building listeners to handle the messages for the various queues has been quite straight forward.
What's unclear to me however is if it's possible to do a fanout on a topic exchange?
I.e. I have named queues that are bound to my topic exchange, but I'd like to be able to just throw tons of instances of my listeners at those queues to prevent single points of failure. But to the best of my knowledge, RabbitMQ treats these listeners in a straight forward round robin fashion--e.g. every Nth message always go to the same Nth listener rather than dispatching messages to the first available consumer. This is generally acceptable to us but given the load we anticipate, we'd like to avoid the possibility of hot spots developing amongst our consumer farm.
So, is there some way, either in the queue or exchange configuration or in the consumer code, where we can point our listeners to a topic queue but have the listeners treated in a fanout fashion?
Yes, by having the listeners bind using different queue names, they will be treated in a fanout fashion.
Fanout is 1:N though, i.e. each task can be delivered to multiple listeners like pub-sub. Note that this isn't restricted to a fanout exchange, but also applies if you bind multiple queues to a direct or topic exchange with the same binding key. (Installing the management plugin and looking at the exchanges there may be useful to visualize the bindings in effect.)
Your current setup is a task queue. Each task/message is delivered to exactly one worker/listener. Throw more listeners at the same queue name, and they will process the tasks round-robin as you say. With "fanout" (separate queues for a topic) you will process a task multiple times.
Depending on your platform there may be existing work queue solutions that meet your requirements, such as Resque or DelayedJob for Ruby, Celery for Python or perhaps Octobot or Akka for the JVM.
I don't know for a fact, but I strongly suspect that RabbitMQ will skip consumers with unacknowledged messages, so it should never bottleneck on a single stuck consumer. The comments on their FAQ seem to suggest that RabbitMQ will make an effort to keep things chugging along even in the presence of troublesome consumers.
This is a late answer, but in case others come across this question...
It sounds like what you want is fair dispatch rather than a fan out model (which would publish a given message to every queue).
Fair dispatch will give a message to the next available worker rather than using a simple round-robin approach. This should avoid the "hotspots" you are concerned about, without delivering the same message to multiple consumers.
If this is what you are looking for, then see the "Fair Dispatch" section on this page in the Rabbit docs. A prefetch count of 1 is the key here.