I have a rabbitMQ queue.
The consumer of this queue is an application that pulls messages off the queue, and inserts them into a database (after some processing).
I want to also be able to use these messages for something else (to send them to another application for storage and other, unrelated processing).
The consumer application is closed source, so I can't open it up and change its behaviour.
I think the best way of achieving my goal would be to mirror the rabbit queue, and consuming it independently (and without interfering) of the original message flow.
I've looked at RabbitMQ mirroring, but this seems to be designed to operate on two or more nodes in a master/slave configuration.
What I think I want is:
Pre-processor application > rabbit_queue_1 > Normal DB consumer
\
> rabbit_queue_2 > New independent consumer.
I need both consumers to get all the same messages, so I dont want two applications reading from the same queue, or for a new consumer to read off the queue then put back on to it again.
Mirroring is a high-availability solution and is inappropriate for what you are asking.
Instead, consider that RabbitMQ splits the publishing and consuming functions. If the existing program is publishing to RabbitMQ, simply figure out the routing key of the current application's queue, and use that in declaring your own queue.
Messages published, when matching the routing key, will flow to all queues using that key. Special cases include fanout/topic exchanges, which additionally permit wildcards in routing keys.
Using a direct exchange, your topology is actually:
Pre-processor application > Direct exchange > rabbit_queue_1 > Normal DB consumer
(via routing key) \
> rabbit_queue_2 > New independent consumer.
Related
Requirement
A system undergoes some state change, and multiple other parts of the system has to know this(lets call them observers) so that they can perform some actions based on the current state, the actions of the observers are important, if some of the observers are not online(not listening currently due to some trouble, but will be back soon), the message should not be discarded till all the observers gets the message.
Trying to accomplish this with pub/sub model, here are my findings, (please correct if this understanding is wrong) -
The publisher creates an event on specific topic, and multiple subscribers can consume the same message. This model either provides no delivery guarantee(in redis), or delivery is guaranteed once(with messaging queues), ie. when one of the consumer acknowledges a message, the message is discarded(rabbitmq).
Example
A new Person Profile entity gets created in DB
Now,
A background verification service has to know this to trigger the verification process.
Subscriptions service has to know this to add default subscriptions to the user.
Now both the tasks are important, unrelated and can run in parallel.
Now In Queue model, if subscription service is down for some reason, a BG verification process acknowledges the message, the message will be removed from the queue, or if it is fire and forget like most of pub/sub, the delivery is anyhow not guaranteed for both the services.
One more point is both the tasks are unrelated and need not be triggered one after other.
In short, my need is to make sure all the consumers gets the same message and they should be able to acknowledge them individually, the message should be evicted only after all the consumers acknowledged it either of the above approaches doesn't do this.
Anything I am missing here ? How should I approach this problem ?
This scenario is explicitly supported by RabbitMQ's model, which separates "exchanges" from "queues":
A publisher always sends a message to an "exchange", which is just a stateless routing address; it doesn't need to know what queue(s) the message should end up in
A consumer always reads messages from a "queue", which contains its own copy of messages, regardless of where they originated
Multiple consumers can subscribe to the same queue, and each message will be delivered to exactly one consumer
Crucially, an exchange can route the same message to multiple queues, and each will receive a copy of the message
The key thing to understand here is that while we talk about consumers "subscribing" to a queue, the "subscription" part of a "pub-sub" setup is actually the routing from the exchange to the queue.
So a RabbitMQ pub-sub system might look like this:
A new Person Profile entity gets created in DB
This event is published as a message to an "events" topic exchange with a routing key of "entity.profile.created"
The exchange routes copies of the message to multiple queues:
A "verification_service" queue has been bound to this exchange to receive a copy of all messages matching "entity.profile.#"
A "subscription_setup_service" queue has been bound to this exchange to receive a copy of all messages matching "entity.profile.created"
The consuming scripts don't know anything about this routing, they just know that messages will appear in the queue for events that are relevant to them:
The verification service picks up the copy of the message on the "verification_service" queue, processes, and acknowledges it
The subscription setup service picks up the copy of the message on the "subscription_setup_service" queue, processes, and acknowledges it
If there are multiple consuming scripts looking at the same queue, they'll share the messages on that queue between them, but still completely independent of any other queue.
Here's a screenshot from this interactive visualisation tool that shows this scenario:
As you mentioned it is not something that you can control with Redis Pub/Sub data structure.
But you can do it easily with Redis Streams.
Streams will allow you to post messages using the XADD command and then control which consumers are dealing with the message and acknowledge that message has been processed.
You can look at these sample application that provides (in Java) example about:
posting and consuming messages
create multiple consumer groups
manage exceptions
Links:
Getting Started with Redis Streams and Java
Redis Streams in Action ( Project that shows how to use ADD/ACK/PENDING/CLAIM and build an error proof streaming application with Redis Streams and SpringData )
Here's an example:
TYPE : TOPIC
exchange.v1 -> queue.order
exchange.v2 -> queue.log
so when the apps running it's must configure the exchange first right? and in a single service only can have 1 exchange?
I have 1 service for logging and 1 service for ordering. all proses will be sent into logging service and then forward another event. in this case to queue.order
So it's possible to publish an event from a different exchange? or I miss something? please let me know :(
Exchanges are not tied to “services”, much less in a 1:1 manner.
Exchanges in RabbitMQ are message sinks. Any existing exchanges can be published to by any number of applications (“services”) with adequate permissions.
Exhanges can either be pre-deployed or created automatically by an application. Pre-deployment is usually more common. This may or may not be outside the lifecycle of a single “service”.
Exchanges (depending on type) may also route to any number of queues on the same vhost.
Now, with all of that out of the way..
It is very possible to forward a message from a queue to another exchange: read from queues (stores), publish to exchanges (sinks). This can be done in code or even from a tool like the Shovel plugin - the “correct” approach depends significantly based on semantics, just as the choice of routing.
Personally, I recommend keeping RabbitMQ processing chains to as limited a scope as allowed by the application domain.
I have an application with RabbitMQ at the backend. So I want to develop custom 3rd party analysis code which it connects application queues on RabbitMQ and collect data. So my issue is I want to be sure both application and my code do not lose any data from rabbitmq.
If it is possible how can I configure RabbitMQ queues? I have administrative access on RabbitMQ.
I hope it's not code of producer issue because I don't have access the application code
Thanks for your help
Change the current exchange/queue mapping to allow for message replication
At the moment we can simplify that existing producer sends a message to existing exchange, that routes the message to some queue, from which the messages are now consumed:
[producer-app] ---> existing-exchange ---> existing-queue ---> [existing-consumer]
Now, what you want to have a following design, with new consumer consuming the same messages:
[producer-app] ---> existing-exchange ---> existing-queue ---> [existing-consumer]
\--> new-queue --------> [your-consumer]
You might need to change configuration of existing-exchange to allow replication of your message - for example direct and fanout will create the same message on each of the queues.
Depending on your application it might be quite easy to perform without changes in producer, but you need to be aware of possible pitfalls:
producer might re-declare exchanges/queues/bindings from time to time, and throw exceptions if the current state cannot be change to its request (this might happen if you change exchange's type)
you need to manage the new-queue on your own (preferably from your consumer artifact), as it is going to receive all the messages; in case your consumer shuts down, the queue is not going to disappear unless it is made exclusive or has TTL set
The undelying use case
It is typical pubsub use case: Consider we have M news sources, and there are N subscribers who subscribe to the desired news sources, and who want to get news updates. However, we want these updates to land up in mongodb - essentially maintain most recent 'k' updates (and can be indexed and searched etc.). We want to design for M to scale upto million publishers, N to scale to few millions.
Subscribers' updates are finally received and stored in more than one hosts and their native mongodbs.
Modeling in rabbitmq
Rabbitmq will be used to persist the mappings (who subscribes to which news source).
I have setup a pubsub system in this way: We create publisher exchanges (each mapping to one news source) and of type 'fanout'.
For modelling subscribers, there are two options.
In the first option, have one queue for each subscriber bound to relevant publisher exchanges. And let the client process open connections to all these subscriber queues and receive the updates (and persist them to mongodb). Note that in this option, when the client is restarted, it has to manage list of all susbcribers, and open connections to all subscriber queues it is responsible for.
In the second option, we want to be able to remove overhead of having to explicitly open on each user queue upon startup. Instead, we want to listen to only one queue - representative of all subscribers who will send updates to this client host.
For achieving this, we first create one exchange for each subscriber and let it bind to the publisher exchange(s) that it follows. We let a single queue for each client, and let the subscriber exchange bind to this queue (type=direct) if the subscriber belongs to that client.
Once the client receives the update message, it should come to know which subscriber exchange it came from. Only then we can add it to mongodb for relevant subscriber. Presumably the subscriber exchange should add this information as a new header on the message.
As per rabbitmq docs, I believe there is no way to get achieve this. (Or more specifically, to get the 'delivery path' property from the delivered message, from which we can get this information).
My questions:
Is it possible to add a new header to message as it passes through exchange?
If this is not possible, then can we achieve it through custom exchange and relevant plugin? Any plugin that I can readily use for this purpose?
I am curious as to why rabbitmq is not providing delivery path property as an optional configuration?
Is there any other way I can achieve the same? (See pubsubhubbub note below)
PubSubHubBub
The use case is very similar to what pubsubhubbub protocol provides for. And there is rabbitmq plugin too called rabbithub. However, our system will be a closed system, and I believe that the webhook approach of the protocol is going to be too much of overhead compared to listening on single queue (and from performance perspective.)
The producer (RMQ Client) of the message should add all the required headers (including the originator's identity) before producing (publishing) it on RMQ. These headers are used for routing.
If, while in transit, the message (including headers) needs to be transformed (e.g. adding new headers), it needs to be sent to the transformer (another RMQ Client). This transformer will essentially become the new publisher.
The actual consumer should receive its intended messages (for which it has subscribed to) through single queue. The routing of all its subscribed messages should be arranged on the RMQ Exchange.
Managing the last 'K' updates should neither be the responsibility of the producer nor the consumer. So, it should be done in the transformer. Producers' messages should be routed to this transformer (for storage) before further re-routing to exchange(s) from where consumers consume.
I'm interacting with ActiveMQ via STOMP. I have one application which publishes messages and a second application which subscribes and processes the messages.
If I am writing messages to a queue I can be certain that, if I have two consumers, each message will only be processed once (because when a message is completed it is removed from the queue) - but is this functionality available from a topic?
For example; I have a third application which is a logger. I want the logger to receive each message the publisher emits, but I also want exactly one of two (or three or four etc…) of the processors to receive the message too.
Is this possible?
EDIT
It occurs to me that a good way of doing this would be to have a topic which the publisher writes to, and a queue which the processors listen to, with something pushing every message from the topic onto the queue. Can ApacheMQ do this internally?
You can do this internally in ActiveMQ using Mirrored Queues and also use Virtual Topics for some other advanced routing semantics. If you want to have the option of other EIP type messaging patterns then I'd recommend you look into Apache Camel which provides a whole host of EIP pattern functionality.