I am trying to build a system where I need to select next available and suitable consumer to send a message from a queue (or may be any other solution not using the queue)
Requirements
We have multiple publishers/clients who would send objects (images) to process on one side and multiple Analysts who would process them, once processed the publisher should get the corresponding response.
The publishers do not care which Analyst is going to process the data.
Users have a web app where they can map each client/publisher to one or more or all agents, say for instance if Publisher P1 is mapped to Agents A & B, all objects coming from P1 can be processed by Agent A or Agent B. Note: an object can only be processed by one agent only.
Depending on the mapping I should have a middleware which consumes the messages from all publishers and distributes to the agents
Solution 1
My initial thoughts were to have a queue where all publishers post their messages. Another queue where Agents publish message saying they are waiting to process an object.
A middleware picks the message, gets the possible list of agents it can send the message to (from cached database) and go through the agents queue to find the next suitable and available agent and publish the message to that agent.
The issue with this solution is if I have agents queue like a,b,c,d and the message I receive can only be processed by agent b I will be rejecting agents d & c and they would end up at the tail of the queue and I have around 180 agents so they might never be picked or if the next message can only be processed by agent d (for example) we have to reject all the agents to get there
Solution 2
First bit from publishers to middleware is still the same
Have a scaled fast nosql database where agents add a record to notify there availability. Basically a key value pair
The middleware gets config from cache and gets the next available + suitable agent from the nosql database sends message to the agent's queue (through direct exchange) and updates the nosql to set isavailable false ad gets the next message.
Issue with this solution is the db and middleware can become a bottleneck, also if I scale the middleware I will end up in database concurrency issues for example f I have two copies of middleware running and each recieves a message which can be proceesed by Agents A & B and both agents are available.
The two middleware copies would query the db and might get A as availble and end up sneding both messages to A while B is still waiting for a message to process.
I will have around 100 publishers and 180 agents to start with.
Any ideas how to improve these solutions or any other feasible solution would be highly appreciated?
Depending on this I also need to figure out how the Agent would send response back to the publisher.
Thank you
I'll answer this from the perspective the perspective of my open-source service bus: Shuttle.Esb
Typically one would ignore any content-based routing and simply have a distributor pattern. All message go to the primary endpoint and it will distribute the messages. However, if you decide to stick to these logical groupings you could have primary endpoints for each logical grouping (per agent group). You would still have the primary endpoint but instead of having worker endpoints mapped to agents you would have agent groupings map to the logical primary endpoint with workers backing that.
Then in the primary endpoint you would, based on your content (being the agent identifier), forward the message to the relevant logical primary endpoint. All the while you keep track of the original sender. In the worker you would then send a message back to the queue of the original sender.
I'm sure you could do pretty much the same using any service bus.
I see several requirements in here, that can be boiled down to a few things, I think:
publisher does not care which agent processes the image
publisher needs to know when the image processing is done
agent can only process 1 image at a time
agent can only process certain images
are these assumptions correct? did I miss anything important?
if not, then your solution is pretty much built into RabbitMQ with routing and queues. there should be no need to build custom middle-tier service to manage this.
With RabbitMQ, you can have a consumer set to only process 1 message at a time. The consumer sets it's "prefetch" limit to 1, and retrieves a message from the queue with "no ack" set to false - meaning, it must acknowledge the message when it is done processing it.
To consume only messages that a particular agent can handle, use RabbitMQ's routing capabilities with multiple queues. The queues would be created based on the type of image or some other criteria by which the consumers can select images.
For example, if there are two types of images: TypeA and TypeB, you would have 2 queues - one for TypeA and one for TypeB.
Then, if Agent1 can only handle TypeA images, it would only consume from the TypeA queue. If Agent2 can handle both types of images, it would consume from both queues.
To put the right images in the right queue, the publisher would need to use the right routing key. If you know if the image type (or whatever the selection criteria is), you would change the routing key on the publisher side to match that selection criteria. The routing in RabbitMQ would be set up to move messages for TypeA into the TypeA queue, etc.
The last part is getting a response on when the image is done processing. That can be accomplished through RabbitMQ's "reply to" field and related code. The gist of it is that the publisher has it's own exclusive queue. When it publishes a message, it includes the name of it's exclusive queue in the "reply to" header of the message. When the agent finishes processing the image, it sends a status update message back through the queue found in the "reply to" header. That status update message tells the producer the status of the request.
From a RabbitMQ perspective, these pieces can be put together using the examples and documentation found here:
http://www.rabbitmq.com/getstarted.html
Look at these specifically:
Work Queues: http://www.rabbitmq.com/tutorials/tutorial-two-python.html
Topics: http://www.rabbitmq.com/tutorials/tutorial-five-python.html
RPC (aka Request/Response): http://www.rabbitmq.com/tutorials/tutorial-six-python.html
You'll find examples in many languages, in these docs.
I also cover most of these scenarios (and others) in my RabbitMQ Patterns eBook
Since the total number of senders and receivers are only hundreds, how about to create one queue for each of your senders. Based on your sender receiver mapping, receivers subscribes to the sender queues (update the subscribing on mapping changes). You could configure your receiver to only receive the next message from all the queues it subscribes (in a random way) when it finishes processing one message.
Related
Requirement
A system undergoes some state change, and multiple other parts of the system has to know this(lets call them observers) so that they can perform some actions based on the current state, the actions of the observers are important, if some of the observers are not online(not listening currently due to some trouble, but will be back soon), the message should not be discarded till all the observers gets the message.
Trying to accomplish this with pub/sub model, here are my findings, (please correct if this understanding is wrong) -
The publisher creates an event on specific topic, and multiple subscribers can consume the same message. This model either provides no delivery guarantee(in redis), or delivery is guaranteed once(with messaging queues), ie. when one of the consumer acknowledges a message, the message is discarded(rabbitmq).
Example
A new Person Profile entity gets created in DB
Now,
A background verification service has to know this to trigger the verification process.
Subscriptions service has to know this to add default subscriptions to the user.
Now both the tasks are important, unrelated and can run in parallel.
Now In Queue model, if subscription service is down for some reason, a BG verification process acknowledges the message, the message will be removed from the queue, or if it is fire and forget like most of pub/sub, the delivery is anyhow not guaranteed for both the services.
One more point is both the tasks are unrelated and need not be triggered one after other.
In short, my need is to make sure all the consumers gets the same message and they should be able to acknowledge them individually, the message should be evicted only after all the consumers acknowledged it either of the above approaches doesn't do this.
Anything I am missing here ? How should I approach this problem ?
This scenario is explicitly supported by RabbitMQ's model, which separates "exchanges" from "queues":
A publisher always sends a message to an "exchange", which is just a stateless routing address; it doesn't need to know what queue(s) the message should end up in
A consumer always reads messages from a "queue", which contains its own copy of messages, regardless of where they originated
Multiple consumers can subscribe to the same queue, and each message will be delivered to exactly one consumer
Crucially, an exchange can route the same message to multiple queues, and each will receive a copy of the message
The key thing to understand here is that while we talk about consumers "subscribing" to a queue, the "subscription" part of a "pub-sub" setup is actually the routing from the exchange to the queue.
So a RabbitMQ pub-sub system might look like this:
A new Person Profile entity gets created in DB
This event is published as a message to an "events" topic exchange with a routing key of "entity.profile.created"
The exchange routes copies of the message to multiple queues:
A "verification_service" queue has been bound to this exchange to receive a copy of all messages matching "entity.profile.#"
A "subscription_setup_service" queue has been bound to this exchange to receive a copy of all messages matching "entity.profile.created"
The consuming scripts don't know anything about this routing, they just know that messages will appear in the queue for events that are relevant to them:
The verification service picks up the copy of the message on the "verification_service" queue, processes, and acknowledges it
The subscription setup service picks up the copy of the message on the "subscription_setup_service" queue, processes, and acknowledges it
If there are multiple consuming scripts looking at the same queue, they'll share the messages on that queue between them, but still completely independent of any other queue.
Here's a screenshot from this interactive visualisation tool that shows this scenario:
As you mentioned it is not something that you can control with Redis Pub/Sub data structure.
But you can do it easily with Redis Streams.
Streams will allow you to post messages using the XADD command and then control which consumers are dealing with the message and acknowledge that message has been processed.
You can look at these sample application that provides (in Java) example about:
posting and consuming messages
create multiple consumer groups
manage exceptions
Links:
Getting Started with Redis Streams and Java
Redis Streams in Action ( Project that shows how to use ADD/ACK/PENDING/CLAIM and build an error proof streaming application with Redis Streams and SpringData )
I have a question about multi consumer concurrency.
I want to send works to rabbitmq that comes from web request to distributed queues.
I just want to be sure about order of works in multiple queues (FIFO).
Because this request comes from different users eech user requests/works must be ordered.
I have found this feature with different names on Azure ServiceBus and ActiveMQ message grouping.
Is there any way to do this in pretty RabbitMQ ?
I want to quaranty that customer's requests must be ordered each other.
Each customer may have multiple requests but those requests for that customer must be processed in order.
I desire to process quickly incoming requests with using multiple consumer on different nodes.
For example different customers 1 to 1000 send requests over 1 millions.
If I put this huge request in only one queue it takes a lot of time to consume. So I want to share this process load between n (5) node. For customer X 's requests must be in same sequence for processing
When working with event-based systems, and especially when using multiple producers and/or consumers, it is important to come to terms with the fact that there usually is no such thing as a guaranteed order of events. And to get a robust system, it is also wise to design the system so the message handlers are idempotent; they should tolerate to get the same message twice (or more).
There are way to many things that may (and actually should be allowed to) interfere with the order;
The producers may deliver the messages in a slightly different pace
One producer might miss an ack (due to a missed package) and will resend the message
One consumer may get and process a message, but the ack is lost on the way back, so the message is delivered twice (to another consumer).
Some other service that your handlers depend on might be down, so that you have to reject the message.
That being said, there is one pattern that servicebus-systems like NServicebus use to enforce the order messages are consumed. There are some requirements:
You will need a centralized storage (like a sql-server or document store) that allows for conditional updates; for instance you want to be able to store the sequence number of the last processed message (or how far you have come in the process), but only if the already stored sequence/progress is the right/expected one. Storing the user-id and the progress even for millions of customers should be a very easy operation for most databases.
You make sure the queue is configured with a dead-letter-queue/exchange for retries, and then set your original queue as a dead-letter-queue for that one again.
You set a TTL (for instance 30 seconds) on the retry/dead-letter-queue. This way the messages that appear on the dead-letter-queue will automatically be pushed back to your original queue after some timeout.
When processing your messages you check your storage/database if you are in the right state to handle the message (i.e. the needed previous steps are already done).
If you are ok to handle it you do and update the storage (conditionally!).
If not - you nack the message, so that it is thrown on the dead-letter queue. Basically you are saying "nah - I can't handle this message, there are probably some other message in the queue that should be handled first".
This way the happy-path is to process a great number of messages in the right order.
But if something happens and a you get a message out of band, you will throw it on the retry-queue (the dead-letter-queue) and Rabbit will make sure it will get back in the queue to be retried at a later stage. But only after a delay.
The beauty of this is that you are able to handle most of the situations that may interfere with processing the message (out of order messages, dependent services being down, your handler being shut down in the middle of handling the message) in exact the same way; by rejecting the message and letting your infrastructure (Rabbit) take care of it being retried after a while.
(Assuming the OP is asking about things like ActiveMQs "message grouping:)
This isn't currently built in to RabbitMQ AFAIK (it wasn't as of 2013 as per this answer) and I'm not aware of it now (though I haven't kept up lately).
However, RabbitMQ's model of exchanges and queues is very flexible - exchanges and queues can be easily created dynamically (this can be done in other messaging systems but, for example, if you read ActiveMQ documentation or Red Hat AMQ documentation you'll find all of the examples in the user guides are using pre-declared queues in configuration files loaded at system startup - except for RPC-like request/response communication).
Also it is very easy in RabbitMQ for a consumer (i.e., message consuming thread) to consume from multiple queues.
So you could build, on top of RabbitMQ, a system where you got your desired grouping semantics.
One way would be to create dynamic queues: The first time a customer order was seen or a new group of customer orders a queue would be created with a unique name for all messages for that group - that queue name would be communicated (via another queue) to a consumer who's sole purpose was to load-balance among other consumers that were responsible for handling customer order groups. I.e., the load-balancer would pull off of its queue a message saying "new group with queue name XYZ" and it would find in a pool of order group consumer a consumer which could take this load and pass it a message saying "start listening to XYZ".
Another way to do it is with pub/sub and topic routing - each customer order group would get a unique topic - and proceed as above.
RabbitMQ Consistent Hash Exchange Type
We are using RabbitMQ and we have found a plugin. It use Consistent Hashing algorithm to distribute messages in order to consistent keys.
For more information about Consistent Hashing ;
https://en.wikipedia.org/wiki/Consistent_hashing
https://www.youtube.com/watch?v=viaNG1zyx1g
You can find this plugin from rabbitmq web page
plugin : rabbitmq_consistent_hash_exchange
https://www.rabbitmq.com/plugins.html
I'm working with RabbitMQ and I want on the server side to conduct a calculation each time an Exchange receives a message.
I have a queue for ratings and when too many bad reviews (let's say more than ten) received, then a consumer should be notified.
What options are there for serverside logic ?
I've been reading about Spring RabbitMQ, but am not sure ?
There isn't really a "server side" with a message-based system; rather, the RabbitMQ service sits somewhere and relays messages to and from any number of producers and consumers. Depending on the hardware you have available, and the amount of processing being performed, these could all be on the same server, or you could have resources dedicated to each task.
Calculations based on the content of messages is the job of consumers, which can be written in any language you feel comfortable writing them in, as long as you use a serialization of the message that all can understand (e.g. JSON, XML). For a simple counter, you may not need much framework to extract the data you need.
Any number of Queues can receive copies of messages from the same Exchange, so you can either pick up all messages from the exchange and count only the bad reviews, or you can put the rating into the "routing key" and use a "topic exchange" to pre-filter them.
After that, you could use a simple memory store like Redis to store a counter, and when it reaches the limit, either act on it within that consumer, or publish a message to a new exchange for processing by a different consumer.
The undelying use case
It is typical pubsub use case: Consider we have M news sources, and there are N subscribers who subscribe to the desired news sources, and who want to get news updates. However, we want these updates to land up in mongodb - essentially maintain most recent 'k' updates (and can be indexed and searched etc.). We want to design for M to scale upto million publishers, N to scale to few millions.
Subscribers' updates are finally received and stored in more than one hosts and their native mongodbs.
Modeling in rabbitmq
Rabbitmq will be used to persist the mappings (who subscribes to which news source).
I have setup a pubsub system in this way: We create publisher exchanges (each mapping to one news source) and of type 'fanout'.
For modelling subscribers, there are two options.
In the first option, have one queue for each subscriber bound to relevant publisher exchanges. And let the client process open connections to all these subscriber queues and receive the updates (and persist them to mongodb). Note that in this option, when the client is restarted, it has to manage list of all susbcribers, and open connections to all subscriber queues it is responsible for.
In the second option, we want to be able to remove overhead of having to explicitly open on each user queue upon startup. Instead, we want to listen to only one queue - representative of all subscribers who will send updates to this client host.
For achieving this, we first create one exchange for each subscriber and let it bind to the publisher exchange(s) that it follows. We let a single queue for each client, and let the subscriber exchange bind to this queue (type=direct) if the subscriber belongs to that client.
Once the client receives the update message, it should come to know which subscriber exchange it came from. Only then we can add it to mongodb for relevant subscriber. Presumably the subscriber exchange should add this information as a new header on the message.
As per rabbitmq docs, I believe there is no way to get achieve this. (Or more specifically, to get the 'delivery path' property from the delivered message, from which we can get this information).
My questions:
Is it possible to add a new header to message as it passes through exchange?
If this is not possible, then can we achieve it through custom exchange and relevant plugin? Any plugin that I can readily use for this purpose?
I am curious as to why rabbitmq is not providing delivery path property as an optional configuration?
Is there any other way I can achieve the same? (See pubsubhubbub note below)
PubSubHubBub
The use case is very similar to what pubsubhubbub protocol provides for. And there is rabbitmq plugin too called rabbithub. However, our system will be a closed system, and I believe that the webhook approach of the protocol is going to be too much of overhead compared to listening on single queue (and from performance perspective.)
The producer (RMQ Client) of the message should add all the required headers (including the originator's identity) before producing (publishing) it on RMQ. These headers are used for routing.
If, while in transit, the message (including headers) needs to be transformed (e.g. adding new headers), it needs to be sent to the transformer (another RMQ Client). This transformer will essentially become the new publisher.
The actual consumer should receive its intended messages (for which it has subscribed to) through single queue. The routing of all its subscribed messages should be arranged on the RMQ Exchange.
Managing the last 'K' updates should neither be the responsibility of the producer nor the consumer. So, it should be done in the transformer. Producers' messages should be routed to this transformer (for storage) before further re-routing to exchange(s) from where consumers consume.
In our scenario I'm thinking of using the pub sub technique. However I don't know which is the better option.
1 ########
A web service of ours will publish a message that something has happened when it is called externally, ExternalPersonCreatedMessage!
This message will contain a field that represents the destinations to process the message into (multiple allowed).
Various subscribers will subscribe. These subscribers will filter the message to see if any action is required by checking the destination field.
2 ########
A web service of ours will parse the incoming call and publish specific types of messages depending on the destinations supplied in the field. i.e. many Destination[n]PersonCreatedMessage messages would be created.
Subscribers will subscribe to only the specific message they care for. i.e. not having to filter any messages
QUESTIONS
Which of the above is the better option and why? And how do I stop myself from making RequestMessages. From what I've read/seen I should be trying to structure this in a way of PersonCreated, PersonDeleted i.e. SOMETHING HAS HAPPENED and NOT in the REQUEST SOMETHING TO HAPPEN form such as CreatePerson or DeletePerson
Are my thoughts correct? I've been looking for guidance on how to structure messages and making sure I don't go down a wrong path but have found no guidance out there on do's and dont's. Can any one help and guide? I want to try and get this correct from the off :)
Based on the integration scenario in the referenced article, it appears to me that you may need a Saga to complete the workflow of accept message -> operate on message -> send confirmation. In the case that the confirmation is sent immediately after the operation, you could use NSBs message handler pipeline feature which allows you to chain handlers in a specified sequence such as...
First<FilterHandler>.Then<DoWorkHandler>().AndThen<SendConfirmationHandler>();
In terms of the content filtering, you can do this although you incur some transport overhead, meaning the queue will have to accept the message and the process will always call the first handler on every message(you can short-circuit the above pipeline at any point). It may be the case that what you really want is a Distributor/Worker setup where all Workers are the same and you can handle some load.
If you truly have different endpoints with completely different logic, then I would have the Publisher process(only accepts and Publishes message) do the work of translating the inbound message to something else a Subscriber can then be interested in. If then you find that a given Published message only ever has 1 Subscriber, then you don't need to Publish at all, you need to just Bus.Send() to the correct endpoint.
The way NServiceBus handles pub-sub is more like your option two.
A publisher service has an input queue and a subscription store.
A subscriber service has an input queue
The subscriber, on start-up will send a subscription message to the input queue of the publisher
The subscription message contains the type of message subscriber is interested in and the subscribers queue address
The publisher records the subscription in the subscription store.
The publisher receives a message.
The publisher evaluates the message type against the list of subscriptions
For each match found the publisher sends the message to the queue address.
In my opinion, you should stop thinking about destinations. Messages are messages. They should not have any inherent destination information in them. The subscription mechanism defines the addressing/routing requirements for the solution.