At the moment we have number of publishers (micro-services) which publish their messages to exchange. Each message has a serviceId attribute. The queue is connected to a single subscriber (micro-service) which processes the queue messages, processing of a single message is a costly operation (takes about 20-30 secs).
Currently we have the following situation: service A publishes ~200 messages, after some seconds service B publishes 2 messages. So the subscriber will process these 2 messages only after the first 200 will be processed.
We want to process the messages in the order they came to the queue, but with respect to the source serviceId.
Obvious solution is to split the queue to a separate queues (one per publisher) and subscribe to each queue separately, but the number of publishers can change, we need to request them dynamically and subscribe (unsubscribe) to them.
Another approach is to replicate our subscriber app to have one to one relationship between publisher and subscriber, but this will require more system resources.
What would be the best approach to handle this situation?
Thanks!
/!\ Be careful, publishers publish to an exchange, not to a queue.
We want to process the messages in the order they came to the queue,
but with respect to the source serviceId.
If I understand well, you want to load balance your messages according to a serviceId, and serviceIds are not known in advance.
The solution I would suggest here is to have a direct exchange, with routing keys such as xxxxx.<serviceId>. Then, you can bind one queue by serviceId (that is: one queue for service A, one for service B, ...), each consumer consuming on all queues.
Then you have to handle the publisher subscription: I would make a publisher publish a "hello" message, this message being consumed by each consumer, which in turn bind a new queue for that service (using xxxxx.<newServiceId>), and finally publish a response back (so that the publisher can start sending messages).
Note: each service queue is the same for all consumers, resulting in the worker configuration (see this tutorial)
Hope this helps.
Related
Requirement
A system undergoes some state change, and multiple other parts of the system has to know this(lets call them observers) so that they can perform some actions based on the current state, the actions of the observers are important, if some of the observers are not online(not listening currently due to some trouble, but will be back soon), the message should not be discarded till all the observers gets the message.
Trying to accomplish this with pub/sub model, here are my findings, (please correct if this understanding is wrong) -
The publisher creates an event on specific topic, and multiple subscribers can consume the same message. This model either provides no delivery guarantee(in redis), or delivery is guaranteed once(with messaging queues), ie. when one of the consumer acknowledges a message, the message is discarded(rabbitmq).
Example
A new Person Profile entity gets created in DB
Now,
A background verification service has to know this to trigger the verification process.
Subscriptions service has to know this to add default subscriptions to the user.
Now both the tasks are important, unrelated and can run in parallel.
Now In Queue model, if subscription service is down for some reason, a BG verification process acknowledges the message, the message will be removed from the queue, or if it is fire and forget like most of pub/sub, the delivery is anyhow not guaranteed for both the services.
One more point is both the tasks are unrelated and need not be triggered one after other.
In short, my need is to make sure all the consumers gets the same message and they should be able to acknowledge them individually, the message should be evicted only after all the consumers acknowledged it either of the above approaches doesn't do this.
Anything I am missing here ? How should I approach this problem ?
This scenario is explicitly supported by RabbitMQ's model, which separates "exchanges" from "queues":
A publisher always sends a message to an "exchange", which is just a stateless routing address; it doesn't need to know what queue(s) the message should end up in
A consumer always reads messages from a "queue", which contains its own copy of messages, regardless of where they originated
Multiple consumers can subscribe to the same queue, and each message will be delivered to exactly one consumer
Crucially, an exchange can route the same message to multiple queues, and each will receive a copy of the message
The key thing to understand here is that while we talk about consumers "subscribing" to a queue, the "subscription" part of a "pub-sub" setup is actually the routing from the exchange to the queue.
So a RabbitMQ pub-sub system might look like this:
A new Person Profile entity gets created in DB
This event is published as a message to an "events" topic exchange with a routing key of "entity.profile.created"
The exchange routes copies of the message to multiple queues:
A "verification_service" queue has been bound to this exchange to receive a copy of all messages matching "entity.profile.#"
A "subscription_setup_service" queue has been bound to this exchange to receive a copy of all messages matching "entity.profile.created"
The consuming scripts don't know anything about this routing, they just know that messages will appear in the queue for events that are relevant to them:
The verification service picks up the copy of the message on the "verification_service" queue, processes, and acknowledges it
The subscription setup service picks up the copy of the message on the "subscription_setup_service" queue, processes, and acknowledges it
If there are multiple consuming scripts looking at the same queue, they'll share the messages on that queue between them, but still completely independent of any other queue.
Here's a screenshot from this interactive visualisation tool that shows this scenario:
As you mentioned it is not something that you can control with Redis Pub/Sub data structure.
But you can do it easily with Redis Streams.
Streams will allow you to post messages using the XADD command and then control which consumers are dealing with the message and acknowledge that message has been processed.
You can look at these sample application that provides (in Java) example about:
posting and consuming messages
create multiple consumer groups
manage exceptions
Links:
Getting Started with Redis Streams and Java
Redis Streams in Action ( Project that shows how to use ADD/ACK/PENDING/CLAIM and build an error proof streaming application with Redis Streams and SpringData )
I have the following scenario:
One Producer service
A dynamic amount of consumers services
Messages contain tasks with a specific product, so once consumer x handles a message of product y. In the future x should handle all messages of product y. Ideally the producer service should send all messages of product x on a queue which only consumer x reads from.
In order to divide workload evenly, there should be a way that once a new product needs to be managed, that next available consumer takes it.(I suppose a queue which all consumers are reading from)
My approach:
An exchange send new product jobs in a "newProduct" queue to which all the consumers are consuming from.
The consumer y that reads such a message notifies to the producer service (on a separate queue) that he is now in charge of product x.
The producer then sends all messages for product x to a queue proper to consumer y.
When a new consumer service z goes online, it notifies the producer service on a therefore specific queue that he is online such that the producer can create a binding in the exchange for z's proper queue.
Questions:
is my approach a good way to solve the problem, or am I missing rabbitmq solutions that would solve the problem in a less complicated way ?
How do I add a new queue during runtime to the exchange ?
An exchange send new product jobs in a "newProduct" queue to which all
the consumers are consuming from.
This looks good to me.
The consumer y that reads such a message notifies to the producer
service (on a separate queue) that he is now in charge of product x.
This is also fine, I guess if producer did not receive notification
that product X is taken care of it will need to do something. The
producer then sends all messages for product x to a queue proper to
consumer y.
I'd send all messages for product X with the same routing key, like product-X. Which is what you probably mean here. I'd avoid telling producer who exactly handles the product-X now. For better separation of concerns and simplicity producers should know as less as possible about consumers and their queues and vice versa.
When a new consumer service z goes online, it notifies the producer
service on a therefore specific queue that he is online such that the
producer can create a binding in the exchange for z's proper queue.
You could do it this way, but I'd do it differently:
When consumer goes online, it will create needed queues (or subscribe to existing queues) by itself.
I see it like this:
Consumer comes online and subscribes to newProduct queue.
When received a message to handle product Z:
Creates a new queue for itself with binding key product-Z
Notifies producer that product Z is now being handled
Producer starts to send messages with routing key product-Z and they end up in Consumer's queue.
Make sure your consumer has some High Availability, otherwise you may end up in the situation when your consumer started to handle some of the messages and then gone dead, while producer is continuing to send messages for now unhandled product.
We're seeing an issue where consumers of our message queues are picking up messages from queues at the top of the alphabetical range. We have two applications: a producer, and a subscriber. We're using RabbitMQ 3.6.1.
Let's say that the message queues are setup like so:
Our first application, the producer, puts say 100 messages/second onto each queue:
Our second application, the subscriber, has five unique consumer methods that can deal with messages on each respective queue. Each method binds to it's respective queue. A subscriber has a prefetch of 1 meaning it can only hold one message at a time, regardless of queue. We may run numerous instances of the subscriber like so:
So the situation is thus: each queue is receiving 100 msg/sec, and we have four instances of subscriber consuming these messages, so each queue has four consumers. Let's say that the consumer methods can deal with 25 msg/sec each.
What happens is that instead of all the queues being consumed equally, the alphabetically higher queues instead get priority. It's seems as though when the subscriber becomes ready, RabbitMQ looks down the list of queues that this particular ready channel is bound to, and picks the first queue with pending messages.
In our situation, A_QUEUE will have every message consumed. B_QUEUE may have some consumed in certain race conditions, but C_QUEUE/D_QUEUE and especially E_QUEUE will rarely get touched.
If we turn off the publisher, the queues will eventually drain, top to bottom.
Is it possible to configure either RabbitMQ itself or possibly even the channel to use some sort of round robin distribution policy or maybe even random policy so that when a channel has numerous bound queues, all with messages pending, the distribution is even?
to clarify: you have a single subscriber application with multiple consumers in it, right?
I'm guessing you're using a single RabbitMQ Connection within the subscriber app.
Are you also re-using a single RabbitMQ Channel for all of your consumers? If so, that would be a problem. Be sure to use a new Channel for each consumer you start.
Maybe the picture is wrong, but if it's not then your setup is wrong. You don't need 4 queues if you are going to have subscribers that listen to each and every queue. You'd just need one queue, that has multiple instances of the same subscriber consuming from it.
Now to answer, yes (but no need to configure, as long as prefetch is 1), actually rabbitmq does distribute messages evenly. You can find about about that here, and on the same place actually how your setup should look like. Here is a quote from the link.
RabbitMQ just dispatches a message when the message enters the queue.
It doesn't look at the number of unacknowledged messages for a
consumer. It just blindly dispatches every n-th message to the n-th
consumer.
I've defined one topic exchange (alarms) and multiple queues, each with its own routing key:
allAlarms, with routing key alarms.#: I want this to be used for receiving all alarms in a monitoring application
alarms_[deviceID], with routing key alarms.[deviceID], where the number of devices can vary at any given time
When sending an alarm from the device, I publish it using the routing key alarms.[deviceID]. The monitoring app, however, only consumes from the allAlarms queue. This leads to the following problem:
The messages in the allAlarms queue have been consumed, while the messages in the remaining queues are ready. Is there a better way of handling messages from multiple consumers? Ideally, I'd like to be able to also send commands back to the devices using the same queues where the devices publish their alarms.
It looks like you have consumers bound to the allAlarms queue but not to any of the alarms_[deviceID] queues.
In AMQP, a single consumer is bound to a single queue by name (and each queue can have multiple consumers bound to it). Messages are delivered to the consumers of a queue in round robin such that for a given message in a queue there is exactly one consumer that will receive the message. That is, consumers cannot listen to multiple queues.
Since you're using a topic exchange, you're correctly routing a single message to multiple queues via the routing key and queue bindings. This means that you can have a consumer for each queue and when a message is delivered to the exchange, each queue will get a copy of the message and each queue will deliver the message to exactly one consumer on each queue.
Thus, if allAlarms is consuming messages, it's because it has a consumer attached to the queue. If any of the alarms_[deviceID] are not consuming messages then they must not have consumers bound to those individual queues. You have to start up consumers for each alarms_[deviceID] by name. That will allow you to also have different consumer logic for different queues.
One last thing:
Ideally, I'd like to be able to also send commands back to the devices using the same queues where the devices publish their alarms.
You don't want to do this using the same queue because there's nothing that will stop the non-device consumers on the queue from picking up those messages.
I believe you're describing RPC over RabbitMQ. For that you will want to publish the messages to the alarms queues with a reply-to header which is the name of a temporary queue. This temp queue is a single-use queue that the consumer will publish to when it's done to communicate back to the device. The device will publish to the alarms exchange and then immediately start listening to the temp queue for a response from the consumer.
For more info on RPC over RabbitMQ check out this tutorial.
I don't think you need any of the queues for the devices - the alarm_[deviceid] queues.
You don't have any consumer code set up on these queues, and the messages are backed up and waiting for you to consume them.
You also haven't mentioned a need to consume messages from these queues. Instead, you are only consuming messages form the alarmAll queue.
Therefore, I would drop all of the alarm_[deviceid] queues and only have the alarmAll queue.
Just publish the alarms through your exchange and route them all to the alarmAll queue and be done with it. No need for any other routing or queues.
I have a rabbitmq cluster used as a working queue. There are 5 kinds of consumers who want to consume exactly the same data.
What I know for now is using fanout exchange to "copy" the data to 5 DIFFERENT queues. And the 5 consumers can consume different queue. This is kind of wasting resources because the data is the same in file queues.
My question is, does rabbitmq support to push the same data to multi consumers? Just like a message need to be acked for a specified times to be deleted.
I got the following answer from rabbitmq email group. In short, the answer is no... and what I did above is the correct way.
http://rabbitmq.1065348.n5.nabble.com/Does-rabbitmq-support-to-push-the-same-data-to-multi-consumers-td36169.html#a36170
... fanout exchange to "copy" the data to 5 DIFFERENT queues. And the 5 consumers can consume different queue. This is kind of wasting resources because the data is the same in file queues.
You can consume with 5 consumers from one queue if you do not want to duplicate messages.
does rabbitmq support to push the same data to multiple consumers
In AMQP protocol terms you publish message to exchange and then broker (RabbitMQ) decide what to do with messages - assume it figured out the queue message intended for (one or more) and then put that message on top of that queue (queues in RabbitMQ are classic FIFO queues which is somehow break AMQP implementation in RabbitMQ). Only after that message may be delivered to consumer (or die due to queue length limit or per-queue or per-message ttl, if any).
message need to be acked for a specified times to be deleted
There are no way to change message body or attributes after message being published (actually, Dead Letter Exchanges extension and some other may change routing key, for example and add,remove and change some headers, but this is very specific case). So if you want to track ack's number you have to re-publish consumed message with changed body or header (depends on where do you plan to store ack's counter, but headers fits pretty nice for this.
Also note, that there are redeliverd message attribute which denotes whether message was already was consumed, but then redelivered. This flag doesn't count redelivers number so it usage is quite limited.