We have rabbitmq consumer where we get image urls from where we:
download the image through this url.
resize the image to our standard aspect ratio
upload on our server at the path specified by the client.
The input to this consumer is:
{
"imageUrl": "xyz.jpg", //url of the existing image
"path": "/stock/123", //server path where we need to store this image,
"team": "teamA" //team that sends this image.
}
The consumer first downloads the images, upload this to the server at given path. Post this, we need to store this image in our database to indicate its ready to be used i.e. in this case, we will save it in mysql through "teamA-stored-procedure".
We have teams from "teamA" to "teamZ" and every team has different database table where they dump the image information i.e. its host-url, its path etc.
We have put lots of if-elseif checks in the consumer i.e. if "module" is "teamA", we will hit "teamA-StoredProcedure"...similarly for teamB-teamZ.
To remove these checks, we can:
Instead of calling respective team's stored procedure in the consumer, we can expose APIs for each team and the API URL can be passed as input to the consumer. Once, image download/resize/upload is done, it hits the respective team's API.
Create a Topic based exchange and host separate consumer for each team. Each consumer would listen to respective team's messages only. This looks like a perfect use case for "topic based exchange" but there is the overhead of hosting multiple consumers instead of one.
Is it perfect use case for using "Topic based Exchange" OR using "Direct exchange" with single consumer and delegating rest to respective API is fine?
Related
I am trying to develop a game with a pub/sub mechanism, so for each client-map I have a Notify Update channel. The client must subscribe to that to receive updates. But this subscribe must be conditioned by player's Area Of Interest. This interest is given by a rectangle which is based on avatar coordinates.
I am also pretty new to the RabbitMq but I understand that the content-based filtering can be provided by the headers exchange. How can I build a custom exchange that will redirect the message to the specific queue or what other strategy/technology can I choose?
For the Orange Live Objects, I want to "filter" the messages coming from a certain "profile" and sends them to a MQTT queue.
For messages with another profile, I would like to send them to a different MQTT queue.
It seems that I can use the Routing Key logic for this, although all examples are based on DevEUI as example (a single sensor) and not on a sensor type (which makes much more sense as you would like to decode your messages per sensor_type iso sensor.
Has anyone already tried if the Routing Key could work with selecting on "profile" level?
for now it's not possible, but with the new version of Live Objects announced for next july, it will be possible to build groups of Lora devices and route corresponding messages in different queues
I am trying to build a system where I need to select next available and suitable consumer to send a message from a queue (or may be any other solution not using the queue)
Requirements
We have multiple publishers/clients who would send objects (images) to process on one side and multiple Analysts who would process them, once processed the publisher should get the corresponding response.
The publishers do not care which Analyst is going to process the data.
Users have a web app where they can map each client/publisher to one or more or all agents, say for instance if Publisher P1 is mapped to Agents A & B, all objects coming from P1 can be processed by Agent A or Agent B. Note: an object can only be processed by one agent only.
Depending on the mapping I should have a middleware which consumes the messages from all publishers and distributes to the agents
Solution 1
My initial thoughts were to have a queue where all publishers post their messages. Another queue where Agents publish message saying they are waiting to process an object.
A middleware picks the message, gets the possible list of agents it can send the message to (from cached database) and go through the agents queue to find the next suitable and available agent and publish the message to that agent.
The issue with this solution is if I have agents queue like a,b,c,d and the message I receive can only be processed by agent b I will be rejecting agents d & c and they would end up at the tail of the queue and I have around 180 agents so they might never be picked or if the next message can only be processed by agent d (for example) we have to reject all the agents to get there
Solution 2
First bit from publishers to middleware is still the same
Have a scaled fast nosql database where agents add a record to notify there availability. Basically a key value pair
The middleware gets config from cache and gets the next available + suitable agent from the nosql database sends message to the agent's queue (through direct exchange) and updates the nosql to set isavailable false ad gets the next message.
Issue with this solution is the db and middleware can become a bottleneck, also if I scale the middleware I will end up in database concurrency issues for example f I have two copies of middleware running and each recieves a message which can be proceesed by Agents A & B and both agents are available.
The two middleware copies would query the db and might get A as availble and end up sneding both messages to A while B is still waiting for a message to process.
I will have around 100 publishers and 180 agents to start with.
Any ideas how to improve these solutions or any other feasible solution would be highly appreciated?
Depending on this I also need to figure out how the Agent would send response back to the publisher.
Thank you
I'll answer this from the perspective the perspective of my open-source service bus: Shuttle.Esb
Typically one would ignore any content-based routing and simply have a distributor pattern. All message go to the primary endpoint and it will distribute the messages. However, if you decide to stick to these logical groupings you could have primary endpoints for each logical grouping (per agent group). You would still have the primary endpoint but instead of having worker endpoints mapped to agents you would have agent groupings map to the logical primary endpoint with workers backing that.
Then in the primary endpoint you would, based on your content (being the agent identifier), forward the message to the relevant logical primary endpoint. All the while you keep track of the original sender. In the worker you would then send a message back to the queue of the original sender.
I'm sure you could do pretty much the same using any service bus.
I see several requirements in here, that can be boiled down to a few things, I think:
publisher does not care which agent processes the image
publisher needs to know when the image processing is done
agent can only process 1 image at a time
agent can only process certain images
are these assumptions correct? did I miss anything important?
if not, then your solution is pretty much built into RabbitMQ with routing and queues. there should be no need to build custom middle-tier service to manage this.
With RabbitMQ, you can have a consumer set to only process 1 message at a time. The consumer sets it's "prefetch" limit to 1, and retrieves a message from the queue with "no ack" set to false - meaning, it must acknowledge the message when it is done processing it.
To consume only messages that a particular agent can handle, use RabbitMQ's routing capabilities with multiple queues. The queues would be created based on the type of image or some other criteria by which the consumers can select images.
For example, if there are two types of images: TypeA and TypeB, you would have 2 queues - one for TypeA and one for TypeB.
Then, if Agent1 can only handle TypeA images, it would only consume from the TypeA queue. If Agent2 can handle both types of images, it would consume from both queues.
To put the right images in the right queue, the publisher would need to use the right routing key. If you know if the image type (or whatever the selection criteria is), you would change the routing key on the publisher side to match that selection criteria. The routing in RabbitMQ would be set up to move messages for TypeA into the TypeA queue, etc.
The last part is getting a response on when the image is done processing. That can be accomplished through RabbitMQ's "reply to" field and related code. The gist of it is that the publisher has it's own exclusive queue. When it publishes a message, it includes the name of it's exclusive queue in the "reply to" header of the message. When the agent finishes processing the image, it sends a status update message back through the queue found in the "reply to" header. That status update message tells the producer the status of the request.
From a RabbitMQ perspective, these pieces can be put together using the examples and documentation found here:
http://www.rabbitmq.com/getstarted.html
Look at these specifically:
Work Queues: http://www.rabbitmq.com/tutorials/tutorial-two-python.html
Topics: http://www.rabbitmq.com/tutorials/tutorial-five-python.html
RPC (aka Request/Response): http://www.rabbitmq.com/tutorials/tutorial-six-python.html
You'll find examples in many languages, in these docs.
I also cover most of these scenarios (and others) in my RabbitMQ Patterns eBook
Since the total number of senders and receivers are only hundreds, how about to create one queue for each of your senders. Based on your sender receiver mapping, receivers subscribes to the sender queues (update the subscribing on mapping changes). You could configure your receiver to only receive the next message from all the queues it subscribes (in a random way) when it finishes processing one message.
The undelying use case
It is typical pubsub use case: Consider we have M news sources, and there are N subscribers who subscribe to the desired news sources, and who want to get news updates. However, we want these updates to land up in mongodb - essentially maintain most recent 'k' updates (and can be indexed and searched etc.). We want to design for M to scale upto million publishers, N to scale to few millions.
Subscribers' updates are finally received and stored in more than one hosts and their native mongodbs.
Modeling in rabbitmq
Rabbitmq will be used to persist the mappings (who subscribes to which news source).
I have setup a pubsub system in this way: We create publisher exchanges (each mapping to one news source) and of type 'fanout'.
For modelling subscribers, there are two options.
In the first option, have one queue for each subscriber bound to relevant publisher exchanges. And let the client process open connections to all these subscriber queues and receive the updates (and persist them to mongodb). Note that in this option, when the client is restarted, it has to manage list of all susbcribers, and open connections to all subscriber queues it is responsible for.
In the second option, we want to be able to remove overhead of having to explicitly open on each user queue upon startup. Instead, we want to listen to only one queue - representative of all subscribers who will send updates to this client host.
For achieving this, we first create one exchange for each subscriber and let it bind to the publisher exchange(s) that it follows. We let a single queue for each client, and let the subscriber exchange bind to this queue (type=direct) if the subscriber belongs to that client.
Once the client receives the update message, it should come to know which subscriber exchange it came from. Only then we can add it to mongodb for relevant subscriber. Presumably the subscriber exchange should add this information as a new header on the message.
As per rabbitmq docs, I believe there is no way to get achieve this. (Or more specifically, to get the 'delivery path' property from the delivered message, from which we can get this information).
My questions:
Is it possible to add a new header to message as it passes through exchange?
If this is not possible, then can we achieve it through custom exchange and relevant plugin? Any plugin that I can readily use for this purpose?
I am curious as to why rabbitmq is not providing delivery path property as an optional configuration?
Is there any other way I can achieve the same? (See pubsubhubbub note below)
PubSubHubBub
The use case is very similar to what pubsubhubbub protocol provides for. And there is rabbitmq plugin too called rabbithub. However, our system will be a closed system, and I believe that the webhook approach of the protocol is going to be too much of overhead compared to listening on single queue (and from performance perspective.)
The producer (RMQ Client) of the message should add all the required headers (including the originator's identity) before producing (publishing) it on RMQ. These headers are used for routing.
If, while in transit, the message (including headers) needs to be transformed (e.g. adding new headers), it needs to be sent to the transformer (another RMQ Client). This transformer will essentially become the new publisher.
The actual consumer should receive its intended messages (for which it has subscribed to) through single queue. The routing of all its subscribed messages should be arranged on the RMQ Exchange.
Managing the last 'K' updates should neither be the responsibility of the producer nor the consumer. So, it should be done in the transformer. Producers' messages should be routed to this transformer (for storage) before further re-routing to exchange(s) from where consumers consume.
I have some camel routes with mina sockets and jetty websockets. I am able to broadcast a message to all the clients connected to the websocket but how do i send a message to a specific endpoint. How do i maintain a list of all connected clients with a client id as reference so i can route to a specific client. Is that possible? Will i be able to mention a dynamic client in the to URI?
Or maybe i am thinking about this wrong and i need to create topics on active mq and have the clients subscribe to it. That would mean that i create a topic for every websocket client? and route the message to the right topic.
Am i atleast on the right track here, any examples you can point out? Google was not helpful.
The approach you take depends on how sensitive the client information is. The downside of a single topic with selectors is that anyone can subscribe to the topic without a selector and see all the information for everyone - not usually something that you want to do.
A better scheme is to use a message distribution mechanism (set of Camel routes) that act as an intermediary between the websocket clients and the system producing the messages. This mechanism is responsible for distributing messages from a single destination to client-specitic destinations. I have worked on a couple of banking web front-ends that used a similar scheme.
In order for this to work you first generate for each user a distinct token/UUID; this is presented to the user when the session is established (usually through some sort of profile query/message).
It's essential that the UUID can be worked out as a hash of the clientId rather than being stored in a DB, as it will be used all the time and you want to make sure this is worked out quickly.
The user then uses that information to connect to specific topics that use that UUID as a suffix. For example two users subscribing to an orderConfirmation topic would each subscribe to their own version of that topic:
clientA -> orderConfirmation.71jqsd87162iuhw78162wd7168
clientB -> orderConfirmation.76232hdwe7r23j92irjh291e0d
To keep track of "presence", your clients would need to periodically send a heartbeat message containing their clientId to a well-known topic that your distribution mechanism listens on. Clients should not be able to subscribe to this topic for reads (see ActiveMQ Security). The message distribution mechanism needs to keep in memory a data structure that contains the clientId and the time a heartbeat was last seen.
When a message is received by the distribution mechanism, it checks whether the clientID for which it received the message has a "live/present" session, determines the UUID for the client, and broadcasts the message on the appropriate topic.
Over time this will create a large number of topics on your broker that you don't want hanging around when the user has gone away. You can configure ActiveMQ to delete these if they have been inactive for some time.
You definitely do not want to create separate endpoint for each client.
Topic and a subscription with selector is an elegant way to resolve it.
I would say the best one.
You need single topic, which every client would subscribe to with the selector looking like where clientId in ('${myClientId}', 'EVERYONE'). Now when you want to publish a message to specific client, you set a property clientId to the id of this client. If you want to broadcast, you set it to 'EVERYONE'
I hope I understand the problem right...