I want to create a consumer that process messages from multiple variable number of sources, that are connected or disconnected dynamically.
What I need is that each consumer prioritize first N messages of each source. Then to run multiple consumers to improve the speed.
I have been reading docs for Work queues, Routing and Topics, and a lot of other docs without identifying how to implement this. Also I made some tests without luck.
Can someone point me how to do it or where to read about it?
--EDIT--
QueueA-----A3--A2--A1-┐
QueueB-----B3--B2--B1-┼------ Consumer
QueueC-----C3--C2--C1-┘
The desired effect is that each consumer gets first messages of each queue. For example: A1, B1, C1, A2, B2, C2, A3, B3, C3, and so on. If a new queue is created (QueueD), the consumer would start receiving messages from it in the same fashion.
Thanks in advance
What I need is that each consumer prioritize first N messages of each source. Then to run multiple consumers to improve the speed.
All message queues that I know of only provide ordering guarantees within the queue itself (Kafka provides ordering guarantee not at queue level but within the partitions within queues). However, here you are asking to serialize multiple queues. Which will not be possible in a distributed system context.
Why? because if you have more than one consumers to these queues, messages will be delivered to each connected consumers of a queue in a round robin fashion.
Assuming a prefetch_count=1 and with two connected consumers, say first set of messages delivered as follows:
A1, B1 & C1 delivered to consumer 1 (X)
A2, B2 & C2 delivered to consumer 2 (Y)
Now, in a distributed system, everything is async, and things could go wrong. For example:
If X acks A1, A3 will be delivered to X. But if Y acks A2 before X, A3 will be delivered to Y.
Who acks first is not within your control in a distributed system. Consider following scenarios:
X might had to wait for I/O or CPU bound task, while Y might got lucky that it doesn't had to wait. Then Y will advance through the messages in queue.
Or Y got killed (a partition) or n/w got slow, then X will continue consuming the queue.
I'll strongly advice you to re-think your requirements, and consider your expected guarantees in an async context (you wouldn't be considering a MoM otherwise, would you?).
PS: it is possible to implement what you are asking for with some consumer side logic (with a penalty on performance/throughput).
A single consumer has to connect to all queues
wait for messages from every queue before Ack'ing the messages.
Once a message from every queue is received, group them as a single message and publish to another queue (P).
Now many consumers could be subscribed to P to process the ordered group of messages.
I do not advise it, but hey, it is your system, who is going to stop you ;)
Related
Is there is a way by which we can restrict RabbitMQ Queue to dispatch only a fixed number of messages from the Queue to the consumers?
I have 2 Queues Q1 and Q2 and 10 consumers.Every consumer can process the messages from Q1 and Q2.At any given time, only 2 consumers should process messages from Q2.All the 10 consumers can process message from Q1 simultaneously.
Is there any configuration in RabbitMQ which we can specify, so that RabbitMQ pushes only 2 messages from Q2 to any free consumer and push the next 2 only after they are acknowledged, even though other consumers are free and ready to consume.
More background on the issue:
Why only process 2 messages at a time ? :
Q2 messages are doing a web service call and the web service end point(third party) can only service 2 messages concurrently.
Cant we use concurrency ? :
If we use a ListenerContainer (Spring AMQP) the container is per consumer. We can restrict how many message one consumer can take at a time, but when we have 10 consumers, if there are messages in the Queue, each consumer will get its share.
Can we configure only 2 consumers listening to Q2 ? :
I understand we can achieve this by configuring only 2 consumers for Q2, but I am trying to avoid that. If for some reason these 2 consumers goes down, the processing of Q2 will be halted. If 10 consumers are configured, we can guarantee the processing will happen until the last consumer is down.
Looking to see if there is some config in RabbitMQ which we can make use of or any suggested solution.
Thanks in advance !
I'm pretty sure that consumer prefetch will accomplish what you want. But, Q2 can only have one consumer for this to work. There is no way to coordinate among multiple consumers - you would have to do that yourself, and could use RabbitMQ to do the coordination.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
I think you're getting wrapped up in the problem definition. What you really need is trivial, so let's break this down a bit.
Given two queues, Q1 and Q2
10 consumers
Every consumer can process the messages from Q1 and Q2.
At any given time, only 2 consumers should process messages from Q2.
All the 10 consumers can process message from Q1 simultaneously.
Comments on problem statement
First, queues are assumed to be indepenedent. An independent process P will have queue Q, thus Q1 serves process P1. This is a strict mathematical requirement - you cannot define two queues for a single process P.
Thus, the second constraint is mathematically incorrect, for the same reason that you could not write a valid function that accepts a parameter of type string and bool interchangeably. It must accept one or the other, as they are not compatible types, or it must accept a single common ancestor of the types without regard to the subtypes. This is a variant of the Liskov Substitution Principle.
Redefining the problem
There are a total of 12 consumers in the system:
Q1 has 10 consumers
Q2 has 2 consumers
[Important] Consumers are not shared between queues
Is there any configuration in RabbitMQ which we can specify, so that RabbitMQ pushes only 2 messages from Q2 to any free consumer and push the next 2 only after they are acknowledged, even though other consumers are free and ready to consume.
Based on the new definition of the problem, you have two options:
Use a Basic.Get - pull the next message from the queue as soon as the consumer finishes processing the last message.
Use consumer prefetch with limit 1. This will deliver the first and second messages for each consumer immediately, then deliver additional messages one at a time as the next message for that consumer is acknowledged. This is a bit more complicated, but might make sense if your latency margins are less than 10 milliseconds.
Note that by properly defining the problem space, we have eliminated the fundamental problem of trying to figure out how to ensure only two consumers are processing Q2 messages at any time.
try the new feature Single Active Consumer from version 3.8+.
Single active consumer allows to have only one consumer at a time consuming from a queue and to fail over to another registered consumer in case the active one is cancelled or dies.
Consuming with only one consumer is useful when messages must be consumed and processed in the same order they arrive in the queue.
Single active consumer can be enabled when declaring a queue, with the x-single-active-consumer argument set to true
https://www.rabbitmq.com/consumers.html#single-active-consumer
e.g. with the Java client:
I have one direct exchange. There is also one queue, bound to this exchange.
I have two consumers for that queue. The consumers are manually ack'ing the messages once they've done the corresponding processing.
The messages are logically ordered/sorted, and should be processed in that order. Is it possible to enforce that all messages are received and processed sequentially accross consumer A and consumer B? In other words, prevent A and B from processing messages at the same time.
Note: the consumers are not sharing the same connection and/or channel. This means I cannot use <channel>.basicQoS(1);.
Rationale of this question: both consumers are identicall. If one goes down, the other queue starts processing messages and everything keeps working without any required intervention.
One approach to handling failover in a case where you want redundant consumers but need to process messages in a specific order is to use the exclusive consumer option when setting up the bind to the queue, and to have two consumers who keep trying to bind even when they can't get the exclusive lock.
The process is something like this:
Consumer A starts first and binds to the queue as an exclusive consumer. Consumer A begins processing messages from the queue.
Consumer B starts next and attempts to bind to the queue as an exclusive consumer, but is rejected because the queue already has an exclusive consumer.
On a recurring basis, consumer B attempts to get an exclusive bind on the queue but is rejected.
Process hosting consumer A crashes.
Consumer B attempts to bind to the queue as an exclusive consumer, and succeeds this time. Consumer B starts processing messages from the queue.
Consumer A is brought back online, and attempts an exclusive bind, but is rejected now.
Consumer B continues to process messages in FIFO order.
While this approach doesn't provide load sharing, it does provide redundancy.
Even though this is already answered. May be this can help others.
RabbitMQ has a feature known as Single Active Consumer, which matches your case.
We can have N consumers attached to a Queue but only 1 (one) of them will be actively consuming messages from the Queue. Fail-over happens only when active consumer fails.
Kindly take a look at the link https://www.rabbitmq.com/consumers.html#single-active-consumer
Thank you
Usually the point of a MQ system is to distribute workload. Of course, there are some situations where processing of message N depends on result of processing the message N-1, or even the N-1 message itself.
If A and B can't process messages at the same time, then why not just have A or just B? As I see it, you are not saving anything with having 2 consumers in a way that one can work only when the other one is not...
In your case, it would be best to have one consumer but to actually do the parallelisation (not a word really) on the processing part.
Just to add that RMQ is distributing messages evenly to all consumers (in round-robin fashion) regardless on any criteria. Of course this is when prefetch is set to 1, which by default it is. More info on that here, look for "fair dispatch".
I have a scenario in my RabbitMQ setup that I'm curious about how to solve. The diagram below illustrates it (exchanges and most queues removed for succinctness):
Scenario
Producer creates message A(1), it is received by the top consumer, which begins processing the message.
Producer creates message A(2), it is received by the bottom consumer (assuming both consumers are on a round-robin exchange).
The bottom consumer publishes message B(2), which is put into Message B consumer's queue
The poor slow top consumer finally finishes and emits its message B(1).
Problem
If we assume that B consumer cannot be made idempotent, how do we ensure the result of both B messages are applied in the correct order?
I had thought of using a timestamp that is applied to the initial publish of message A, and having the consumer maintain a timestamp of last change, rejecting any timestamps before that time, but that only works if each message causes the exact same kind of change and requires a lot of tracking.
Other ideas for how to approach this would be appreciated. Thanks!
I am not sure what is specific to RabbitMQ here, but the idea with timestamps sounds like a good start if you have a single producer.
The producer attaches a timestamp to the messages A, each message B take the same timestamp of its respective message A.
With your approach some messages would not be processed, eg, message B(1). If all messages should be processed by consumer B, but they should be processed in a deterministic order, then you can do a deterministic merge:
Consumer B is equipped with two queues, one queue for each consumer A. Consumer B always checks the top of both queues:
if both queues are non-empty, consumer B pops the message with the lowest timestamp.
if at least one queue is empty, the consumer B waits.
With this approach the order in which consumer B processes messages is given by the timestamps of the producer and no message is discarded. Assumptions are:
queues are FIFO
no process crashes
always the case that eventually each consumer A processes a message
consumer B can check the top of the queues in a non-blocking fashion
When using RabbitMQ as Message Broker, I have a scenario where multiple concurrent consumers pull messages from a Queue using the basic.get AMQP method and use explicit acknowledgement for deleting the message from the Queue. Assuming the following setup
Q has messages M1, M2, M3 and has consumers C1, C2 and C3 (each having its own connection and channel) connected to it.
How is concurrency handled in the basic.get method? Is the call to basic.get method synchronized to handle concurrent consumers each using its own connection and channel? C1, C2 and C3 issue a basic.get call to receive a message at the same time (assume the server receives all 3 requests simultaneously).
C1 requests a message using basic.get and gets M1. When C2 requests for a message, since its using a different connection, does it get M1 again?
How can consumers pull messages in batches of a predefined size?
Your questions really hit at the heart of queuing and process theory, so I will answer from that standpoint (RabbitMQ is really a generic message broker as far as my answers are concerned, as this applies to any message broker).
How is concurrency handled in the basic.get method? Is the call to
basic.get method synchronized to handle concurrent consumers each
using its own connection and channel? C1, C2 and C3 issue a basic.get
call to receive a message at the same time (assume the server receives
all 3 requests simultaneously).
Answer 1: RabbitMQ is designed to be a reliable message broker. It contains internal processes and controls to ensure that the same message does not get passed out multiple times to different consumers. Now, due to the impracticality of testing the scenario that you describe, does it work perfectly? Who knows. That is why properly-designed applications using message-based architecture will use idempotent transactions, such that if the same transaction is processed multiple times, the result will be the same as if the transaction was processed once.
Takeaway: Design your application so that the answer to this question is unimportant.
C1 requests a message using basic.get and gets M1. When C2 requests
for a message, since its using a different connection, does it get M1
again?
Answer 2: No. Subject to the assumptions of my previous answer, the RabbitMQ broker will not serve the same message back once it has been delivered. Depending on the settings of the channel and queue, the message may be automatically acknowledged upon delivery and will never be redelivered. Other settings will have the message requeue automatically upon the "death" of the processing thread/channel or a negative acknowledgment from your processing thread. This is important functionality, since a "poison" message could repeatedly wreak havoc in your application if it could be served up to multiple consumers. Takeaway: you may safely rely on this assumption in designing your application.
How can consumers pull messages in batches of a predefined size?
Answer: They can't, nor would it make sense for them to. In any queuing system, the fundamental assumption is that items are removed from the queue in single file. Attempts to violate this assumption result in unpredictable behavior; furthermore, single-piece flow is commonly the most efficient method of processing. However, in the real world, there are cases where batch sizes > 1 are necessary. In such cases, it makes sense to load the batch into its own single message, so this may require a separate processing thread that pulls messages from the queue and batches them together, or put them in batches initially. Keep in mind that once you have multiple consumers, there is no possible way to guarantee single messages will be processed in order. Takeaway: Batching should be avoided wherever possible, but where it is not practical to avoid, you may not assume that batches will contain individual messages in any particular order.
You might wanna read the RabbitMQ Api guide and the introduction to Amqp.
First of all, avoid consuming messages using basicGet in your consumers. Rather use the Consumer interface basicConsume. This allows RabbitMq to push you messages as they arrive on the queue. Everything else is a waist of resources here as it boils down to busy polling.
When using basicConsume RabbitMq will even push you more messages in the background up to a certain prefetch count. This allows you to process multiple messages concurrently as well as minimizing the time you need to wait for your next message to process (if some message is available).
Concurrency is not an issue at all, that's what you're using a queue for!
When having multiple consumers on one queue, a message will always only be delivered to one consumer (as long as the message is ACKed). Otherwise you need private queues for each consumer and route your messages accordingly.
Btw, if you're able to share the connection among your consumers, you should do so.
Just make sure to use one channel per thread.
There is no special configuration required for that scenario. Each client will atomically fetch and receive one message from the queue, just as you would like to happen.
I have trouble understanding the routing in RabbitMQ. Consider I have several producers (let call them clients) that produce messages to the queue. E.g., clients A, B, and C send messages to queue X1.
Let the consumer respond to all messages sending responses back to the queue. E.g., consumer gets message from queue X1, does something, and sends responses to the queue X1.
How can, client A determine where are in the queue X1 messages sent to it and where are messages sent to clients B or C?
I can't declare one queue per connection because of large number of connections expected (~10^6). So I'm in trouble here. Any suggestions? Thanks.
I think you need to look at the RPC tutorial. From your description it sounds like that is what you want to do. However that would probably require you to declare more queues than you want.
Approaching this a different way. I cannot understand why you would send a reply back to the producer not only by the same exchange but the same queue that the consumers are consuming from.
Would it not make sense to have producers P1,P2 and P3 send to exchange X1 with routing key "abc.aaa.xyz" / "abc.bbb.xyz" / "abc.ccc.xyz". Then have queues Q1, Q2 and Q3 bound to X1 with binding keys ".aaa." / ".bbb." / ".ccc." or just Q1 with binding key "abc.*.xyz" (I am unclear on exactly what you want so just making some suggestions). Which are consumed by Consumers C1, C2 and C3
When the Consumer has finished processing the message then it will send a message to X2, with routing key that identifies itself. The producers will consume from queues bound to X2.
The point I am trying to make is that you do not want more than one consumer reading from a queue. There is only one case in which you want that and that is a task queue. I am not clear on your use case so you may want a task queue. If you do then you should still not have your producers reading from the same task queue as your consumers. Aside from task queues you should have one consumer read from one queue. You may have many queues to one exchange and even many bindings from one queue to one exchange.
I hope this helps