Can AMQP messages be sent both to a topic and with a TTL/expiration? - rabbitmq

I'm using RabbitMQ and the amiquip Rust crate to build out several services that will be processing some data in multiple steps. Roughly, this might look like:
Service A ingests data from external source, publishes its results to Topic A
Service B subscribes to Topic A, does some processing, publishes results to Topic B
Service C subscribes to Topic B, does some processing, publishes results to Topic C
Each step along the way, the data are further refined. I will need to be able to shut down different services for maintenance without missing messages that they're reading (eg, Service B may be taken down briefly, but the messages published by Service A to Topic A must remain in the queue until Service B comes back online). I am okay with setting some TTL/expiration (not sure what the right terminology is for AMQP); for example, if Service B doesn't come back online after 5 minutes, it's okay if messages published to the topic are lost).
Additionally, there may be another service that should also be able to subscribe to a topic without interfering with another service reading it. For example, Service C2 gets a copy of all messages in Topic B and does something with them; every message read by Service C2 is also read by Service C (no stepping on each other's feet).
I don't know the right terminology used here, so I'm at a bit of a loss for what I should be looking for. Is this possible with AMQP & RabbitMQ?

Related

Can we define my architecture as an ESB?

I have read many different definitions of ESB (enterprise service bus) and it is not clear for me.
Here is my own definition: An ESB is an architecture and not a tool that allows heterogeneous applications to communicate with each other through a BUS. The particularity of an ESB is that it can have producers and consumers. For example, a producer can send a message to a topic/queue inside the bus and three consumers who are subscribers will receive the same message, so it avoids point-to-point flows.
The second particularity of the ESB is that it allows managing the security and logs in one place as everything goes inside the ESB.
I've also heard about "routes" that set rules in moving a message (with Talend ESB), but I don't really see the point (if you have any examples I'm interested). And of course, Web services can be created to expose data. These services must be scalable and resistant to "Single Point of Failure".
I created an architecture and would have liked to know if it's an ESB architecture.
(I made a mistake on my draw, it's not a Queue but a Topic!)
The steps of the process above:
Producer: it listens the changes (update, insert, ...) in different databases and as soon as there is a change, it retrieves the data and sends it to the queue.
Queue: The queue contains all the messages sent by the producer and will send them to the consumers.
Consumers: Consumers will make the data quality and insert the new data into a database.
For me, this architecture respects ESB because activeMQ acts like a bus. He acts here as mediator. What do you think ?
I think you are on the right track. However, I think there is an important distinction to make sure each message flow is using different queues. It is generally a best practice to have a queue per-message type.
The message flows can all co-exist on the same broker infrastructure, allowing you to have higher density, better utilization, and the ability to wiretap message flows in one place as needed.
In your case:
Database A -> queue://A -> Consumer A
Database B -> queue://B -> Consumer B
Database C -> queue://C -> Consumer C

In a publish/subscribe model in microservices, how to receive/consume a message only once per service type

We are designing for a microservices architecture model where service A publishes a message and services B, and C would like to receive/consume the message. However, for high availability multiple instances of services B and C are running at the same time. Now the question is how do we design such that only one service instance of B and one service instance of C receive the message and not all the other service instances.
As far as I know about RabbitMQ, it is not easy to achieve this behavior. I wonder if Kafka or any other messaging framework has a built-in support for this scenario, which I believe should be very common in a microservices architecture.
Kafka has a feature called Consumer Groups that does exactly what you describe.
Every identical instance of B can declare its group.id to be the same string (say "serviceB") and Kafka will ensure that each instance gets assigned a mutually exclusive set of topic partitions for all the topics it subscribes to.
Since all instances of C will have a different group.id (say "serviceC") then they will also get the same messages as the instances of B but they will be in an independent Consumer Group so messages go only to 1 of N instances of C, up to maximum number of instances which is the total number of topic partitions.
You can dynamically and independently scale up or down the number of instances of B and C. If any instance dies, the remaining instances will automatically rebalance their assigned topic partitions and take over processing of the messages for the instance that died.
Data never has to be stored more than once so there is still one single commit log or "source of truth" for all these service instances.
Kafka has built-in support for this scenario.
You can create two Consumer Groups, one for B, and the other for C. Both Consumer Groups subscribe messages from A.
Any message published by A will be sent to both groups. However, only one member of each group can receive the message.
This are the changes you need to perform to achieve the same with Rabbit MQ
Create 2 seperate queue, one for each B and C service
Change your logic to read message from queue such that only one
instance will read the message from queue ,using blocking
connection thing of rabbitmq.
this way, when multiple instance of B and C are running both will get the message and still be scalable.
You can also test this use case in kafka with the command line tools.
You create a producer with
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Then, you can create two different Consumer Groups (cgB, cgC) with
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --consumer-property group.id=cgB
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --consumer-property group.id=cgC
As soon as you send a message to the topic, both groups (B,C) will receive the message but will save what message they processed independently.
Better explained here: Kafka quickstart

Read all messages from the very begining

Consider a group chat scenario where 4 clients connect to a topic on an exchange. These clients each send an receive messages to the topic and as a result, they all send/receive messages from this topic.
Now imagine that a 5th client comes in and wants to read everything that was send from the beginning of time (as in, since the topic was first created and connected to).
Is there a built-in functionality in RabbitMQ to support this?
Many thanks,
Edit:
For clarification, what I'm really asking is whether or not RabbitMQ supports SOW since I was unable to find it on the documentations anywhere (http://devnull.crankuptheamps.com/documentation/html/develop/configuration/html/chapters/sow.html).
Specifically, the question is: is there a way for RabbitMQ to output all messages having been sent to a topic upon a new subscriber joining?
The short answer is no.
The long answer is maybe. If all potential "participants" are known up-front, the participant queues can be set up and configured in advance, subscribed to the topic, and will collect all messages published to the topic (matching the routing key) while the server is running. Additional server configurations can yield queues that persist across server reboots.
Note that the original question/feature request as-described is inconsistent with RabbitMQ's architecture. RabbitMQ is supposed to be a transient storage node, where clients connect and disconnect at random. Messages dumped into queues are intended to be processed by only one message consumer, and once processed, the message broker's job is to forget about the message.
One other way of implementing such a functionality is to have an audit queue, where all published messages are distributed to the queue, and a writer service writes them all to an audit log somewhere (usually in a persistent data store or text file). This would be something you would have to build, as there is currently no plug-in to automatically send messages out to a persistent storage (e.g. Couchbase, Elasticsearch).
Alternatively, if used as a debug tool, there is the Firehose plug-in. This is satisfactory when you are able to manually enable/disable it, but is not a good long-term solution as it will turn itself off upon any interruption of the broker.
What you would like to do is not a correct usage for RabbitMQ. Message Queues are not databases. They are not long term persistence solutions, like a RDBMS is. You can mainly use RabbitMQ as a buffer for processing incoming messages, which after the consumer handles it, get inserted into the database. When a new client connects to you service, the database will be read, not the message queue.
Relevant
Also, unless you are building a really big, highly scalable system, I doubt you actually need RabbitMQ.
Apache Kafka is the right solution for this use-case. "Log Compaction enabled topics" a.k.a. compacted topics are specifically designed for this usecase. But the catch is, obviously your messages have to be idempotent, strictly no delta-business. Because kafka will compact from time to time and may retain only the last message of a "key".

How to achieve round-robin topic exchange in RabbitMQ

I know that achieving round-robin behaviour in a topic exchange can be tricky or impossible so my question in fact is if there is anything I can make out of RabbitMQ or look away to other message queues that support that.
Here's a detailed explanation of my application requirements:
There will be one producer, let's call it P
There (potentially) will be thousands of consumers, let's call them Cn
Each consumer can "subscribe" to 1 or more topic exchange and multiple consumers can be subscribed to the same topic
Every message published into the topic should be consumed by only ONE consumer
Use case #1
Assume:
Topics
foo.bar
foo.baz
Consumers
Consumer C1 is subscribed to topic #
Consumer C2 is subscribed to topic foo.*
Consumer C3 is subscribed to topic *.bar
Producer P publishes the following messages:
publish foo.qux: C1 and C2 can potentially consume this message but only one receives it
publish foo.bar: C1, C2 and C3 can potentially consume this message but only one receives it
Note
Unfortunately I can't have a separate queue for each "topic" therefore using the Direct Exchange doesn't work since the number of topic combinations can be huge (tens of thousands)
From what I've read, there is no out-of-the box solution with RabbitMQ. Does anybody know a workaround or there's another message queue solution that would support this, ex. Kafka, Kinesis etc.
Thank you
There appears to be a conflation of the role of the exchange, which is to route messages, and the queue, which is to provide a holding place for messages waiting to be processed. Funneling messages into one or more queues is the job of the exchange, while funneling messages from the queue into multiple consumers is the job of the queue. Round robin only comes into play for the latter.
Fundamentally, a topic exchange operates by duplicating messages, one for each queue matching the topic published with the message. Therefore, any expectation of round-robin behavior would be a mistake, as it goes against the very definition of the topic exchange.
All this does is to establish that, by definition, the scenario presented in the question does not make sense. That does not mean the desired behavior is impossible, but the terms and topology may need some clarifying adjustments.
Let's take a step back and look at the described lifetime for one message: It is produced by exactly one producer and consumed by one of many consumers. Ordinarily, that is the scenario addressed by a direct exchange. The complicating factor in this is that your consumers are selective about what types of messages they will consume (or, to put it another way, your producer is not consistent about what types of messages it produces).
Ordinarily in message-oriented processing, a single message type corresponds to a single consumer type. Therefore, each different type of message would get its own corresponding queue. However, based on the description given in this question, a single message type might correspond to multiple different consumer types. One issue I have is the following statement:
Unfortunately I can't have a separate queue for each "topic"
On its face, that statement makes no sense, because what it really says is that you have arbitrarily many (in fact, an unknown number of) message types; if that were the case, then how would you be able to write code to process them?
So, ignoring that statement for a bit, we are led to two possibilities with RabbitMQ out of the box:
Use a direct exchange and publish your messages using the type of message as a routing key. Then, have your various consumers subscribe to only the message types that they can process. This is the most common message processing pattern.
Use a topic exchange, as you have, and come up with some sort of external de-duplication logic (perhaps memcached), where messages are checked against it and discarded if another consumer has started to process it.
Now, neither of these deals explicitly with the round-robin requirement. Since it was not explained why or how this was important, it is assumed that it can be ignored. If not, further definition of the problem space is required.

Select consumers before publishing a message rabbitmq

I am trying to build a system where I need to select next available and suitable consumer to send a message from a queue (or may be any other solution not using the queue)
Requirements
We have multiple publishers/clients who would send objects (images) to process on one side and multiple Analysts who would process them, once processed the publisher should get the corresponding response.
The publishers do not care which Analyst is going to process the data.
Users have a web app where they can map each client/publisher to one or more or all agents, say for instance if Publisher P1 is mapped to Agents A & B, all objects coming from P1 can be processed by Agent A or Agent B. Note: an object can only be processed by one agent only.
Depending on the mapping I should have a middleware which consumes the messages from all publishers and distributes to the agents
Solution 1
My initial thoughts were to have a queue where all publishers post their messages. Another queue where Agents publish message saying they are waiting to process an object.
A middleware picks the message, gets the possible list of agents it can send the message to (from cached database) and go through the agents queue to find the next suitable and available agent and publish the message to that agent.
The issue with this solution is if I have agents queue like a,b,c,d and the message I receive can only be processed by agent b I will be rejecting agents d & c and they would end up at the tail of the queue and I have around 180 agents so they might never be picked or if the next message can only be processed by agent d (for example) we have to reject all the agents to get there
Solution 2
First bit from publishers to middleware is still the same
Have a scaled fast nosql database where agents add a record to notify there availability. Basically a key value pair
The middleware gets config from cache and gets the next available + suitable agent from the nosql database sends message to the agent's queue (through direct exchange) and updates the nosql to set isavailable false ad gets the next message.
Issue with this solution is the db and middleware can become a bottleneck, also if I scale the middleware I will end up in database concurrency issues for example f I have two copies of middleware running and each recieves a message which can be proceesed by Agents A & B and both agents are available.
The two middleware copies would query the db and might get A as availble and end up sneding both messages to A while B is still waiting for a message to process.
I will have around 100 publishers and 180 agents to start with.
Any ideas how to improve these solutions or any other feasible solution would be highly appreciated?
Depending on this I also need to figure out how the Agent would send response back to the publisher.
Thank you
I'll answer this from the perspective the perspective of my open-source service bus: Shuttle.Esb
Typically one would ignore any content-based routing and simply have a distributor pattern. All message go to the primary endpoint and it will distribute the messages. However, if you decide to stick to these logical groupings you could have primary endpoints for each logical grouping (per agent group). You would still have the primary endpoint but instead of having worker endpoints mapped to agents you would have agent groupings map to the logical primary endpoint with workers backing that.
Then in the primary endpoint you would, based on your content (being the agent identifier), forward the message to the relevant logical primary endpoint. All the while you keep track of the original sender. In the worker you would then send a message back to the queue of the original sender.
I'm sure you could do pretty much the same using any service bus.
I see several requirements in here, that can be boiled down to a few things, I think:
publisher does not care which agent processes the image
publisher needs to know when the image processing is done
agent can only process 1 image at a time
agent can only process certain images
are these assumptions correct? did I miss anything important?
if not, then your solution is pretty much built into RabbitMQ with routing and queues. there should be no need to build custom middle-tier service to manage this.
With RabbitMQ, you can have a consumer set to only process 1 message at a time. The consumer sets it's "prefetch" limit to 1, and retrieves a message from the queue with "no ack" set to false - meaning, it must acknowledge the message when it is done processing it.
To consume only messages that a particular agent can handle, use RabbitMQ's routing capabilities with multiple queues. The queues would be created based on the type of image or some other criteria by which the consumers can select images.
For example, if there are two types of images: TypeA and TypeB, you would have 2 queues - one for TypeA and one for TypeB.
Then, if Agent1 can only handle TypeA images, it would only consume from the TypeA queue. If Agent2 can handle both types of images, it would consume from both queues.
To put the right images in the right queue, the publisher would need to use the right routing key. If you know if the image type (or whatever the selection criteria is), you would change the routing key on the publisher side to match that selection criteria. The routing in RabbitMQ would be set up to move messages for TypeA into the TypeA queue, etc.
The last part is getting a response on when the image is done processing. That can be accomplished through RabbitMQ's "reply to" field and related code. The gist of it is that the publisher has it's own exclusive queue. When it publishes a message, it includes the name of it's exclusive queue in the "reply to" header of the message. When the agent finishes processing the image, it sends a status update message back through the queue found in the "reply to" header. That status update message tells the producer the status of the request.
From a RabbitMQ perspective, these pieces can be put together using the examples and documentation found here:
http://www.rabbitmq.com/getstarted.html
Look at these specifically:
Work Queues: http://www.rabbitmq.com/tutorials/tutorial-two-python.html
Topics: http://www.rabbitmq.com/tutorials/tutorial-five-python.html
RPC (aka Request/Response): http://www.rabbitmq.com/tutorials/tutorial-six-python.html
You'll find examples in many languages, in these docs.
I also cover most of these scenarios (and others) in my RabbitMQ Patterns eBook
Since the total number of senders and receivers are only hundreds, how about to create one queue for each of your senders. Based on your sender receiver mapping, receivers subscribes to the sender queues (update the subscribing on mapping changes). You could configure your receiver to only receive the next message from all the queues it subscribes (in a random way) when it finishes processing one message.