Message bus: sender must wait for acknowledgements from multiple recipients - rabbitmq

In our application the publisher creates a message and sends it to a topic.
It then needs to wait, when all of the topic's subscribers ack the message.
It does not appear, the message bus implementations can do this automatically. So we are leaning towards making each subscriber send their own new message for the client, when they are done.
Now, the client can receive all such messages and, when it got one from each destination, do whatever clean-ups it has to do. But what if the client (sender) crashes part way through the stream of acknowledgments? To handle such a misfortune, I need to (re)implement, what the buses already implement, on the client -- save the incoming acknowledgments until I get enough of them.
I don't believe, our needs are that esoteric -- how would you handle the situation, where the sender (publisher) must wait for confirmations from multiple recipients (subscribers)? Sort of like requesting (and awaiting) Return-Receipts from each subscriber to a mailing list...
We are using RabbitMQ, if it matters. Thanks!

The functionality that you are looking for sounds like a messaging solution that can perform transactions across publishers and subscribers of a message. In The Java world, JMS specifies such transactions. One example of a JMS implementation is HornetQ.
RabbitMQ does not provide such functionality and it does for good reasons. RabbitMQ is built for being extremely robust and to perform like hell at the same time. The transactional behavior that you describe is only achievable with the cost of reasonable performance loss (especially if you want to keep outstanding robustness).
With RabbitMQ, one way to assure that a message was consumed successfully, is indeed to publish an answer message on the consumer side that is then consumed by the original publisher. This can be achieved through RabbitMQ's RPC procedure calls which might help you to get a clean solution for your problem setting.
If the (original) publisher crashes before all answers could be received, you can assume that all outstanding answers are still queued on the broker. So you would have to build your publisher in a way that it is capable to resume with processing those left messages. This might turn out to be none-trivial.
Finally, I recommend the following solution: Design your producing component in a way that you can consume the answers with one or more dedicated answer consumers that are separated from the origin publisher.
Benefits of this solution are:
the origin publisher can finish its task independent of consumer success
the origin publisher is independent of consumer availability and speed
the origin publisher implementation is far less complex
in a crash scenario, the answer consumer can resume with processing answers
Now to a more general point: One of the major benefits of messaging is the decoupling of application components by the broker. In AMQP, this is achieved with exchanges and bindings that allow you to move message distribution logic from your application to a central point of configuration.
If you add RPC-style calls to your clients, then your components are most likely closely coupled again, meaning that the publishing component fails if one of the consuming components fails / is not available / too slow. This is exactly what you will want to avoid. Otherwise, why would you have split the components then?
My recommendation is that you design your application in a way that publishers can complete their tasks independent of the success of consumers wherever possible. Back-channels should be an exceptional case and be implemented in the described not-so coupled way.

Related

RabbitMQ - Reprioritize message already in queue

We are building spark based jobs. Processing each message delivered by the queue takes time. There is a need to be able to reprioritize one already sent to the queue.
I am aware there is priority queue implementation available, but not sure how to re-prioritize the existing message in the queue?
One bad workaround is to push that message again as higher priority, so that it handled on priority. Later drop the message with same content which had low or no priority when it's turns comes next.
Is there a natural way we can handle this situation or any other queues that supports scenario better?
Unfortunately there isn't. Queues are to be considered as lists of messages in flight. It is not possible to delete/update them.
Your approach of submitting a higher priority message is the only feasible solution.
RabbitMQ is a messaging system (such as the postal one), it is not a DataBase or a storage service. The storage in form of queues is a necessary feature as much as the postal service needs storage for postcards in transit. It is optimized for the purpose and does not allow to access the messages easily.

RabbitMQ direct exchange, with routing key and no queues or subscribers, is this ok for performance?

I have an exchange that's going to receive roughly 50 messages per second. These messages have a unique identifier which relates to each unit in the field. This unique identifier will be the routing key. Every now and again we need to debug or analyse a unit. At that point in time we will spin up a queue, with the correct routing key, and bind it to the exchange. This way, that queue will start receiving the messages for that unit and any consumers monitoring that queue, will then receive the messages.
What this does mean is that 99% of the time, the exchange will have no queues and no routing key. Then, every now and again a queue and routing key will be created and subscribe.
It feels kind of wasteful to be sending 50 messages per second at an exchange, when its just going to immediately discard them. That said, it feels like this how RabbitMQ exchanges are supposed to be used. I guess from a developer perspective i feel like this is wasteful but I also think my understanding of rabbit says that this is the correct way to do.
Is there any overhead to doing this? Any performance concerns I should have? or maybe I am approaching this entirely wrong?
I did try to search before asking but nothing really describes a scenario where an exchange has no queue or routing key, but is still receiving messages.
This is basically how RabbitMQ works, as you have described. The broker is not responsible for how often and how many events you decide to publish. It will nonetheless protect from too much pressure. It has a credit based flow control mechanism. RabbitMQ flow control.
RabbitMQ has different ways in which unroutable messages can be handled.Unroutable Message Handling How to deal with unroutable messages
To sum up a bit the information you will find on those links:
If the publisher does not set the message as mandatory, it will either be discarded or republished to a different alternate exchange that you can configure. This only makes sense if you want to persist all unroutable messages regardless of the source in a single queue, that you can handle later.
If the publisher sets the message as mandatory, the message will be returned to the publisher and the publisher can have a returned message handler setup in order to handle those events.
These strategies in addition to the flow control mechanism, also assure RabbitMQ reliability and protection.
In your situation if you want to limit the messages from producer even more, you need to create a mechanism, as an example, so the producer will not start publishing only when a consumer becomes active. So basically the consumer process will communicate the producer process that it is active and it can start publishing. But from my experience I don't think it's worth the overhead, at least at first, because 50 messages per seconds isn't much. You can monitor the RabbitMQ server and check how is the resource consumption to check if you need to optimize, at first. Optimization is best done with metrics and understanding.

Read all messages from the very begining

Consider a group chat scenario where 4 clients connect to a topic on an exchange. These clients each send an receive messages to the topic and as a result, they all send/receive messages from this topic.
Now imagine that a 5th client comes in and wants to read everything that was send from the beginning of time (as in, since the topic was first created and connected to).
Is there a built-in functionality in RabbitMQ to support this?
Many thanks,
Edit:
For clarification, what I'm really asking is whether or not RabbitMQ supports SOW since I was unable to find it on the documentations anywhere (http://devnull.crankuptheamps.com/documentation/html/develop/configuration/html/chapters/sow.html).
Specifically, the question is: is there a way for RabbitMQ to output all messages having been sent to a topic upon a new subscriber joining?
The short answer is no.
The long answer is maybe. If all potential "participants" are known up-front, the participant queues can be set up and configured in advance, subscribed to the topic, and will collect all messages published to the topic (matching the routing key) while the server is running. Additional server configurations can yield queues that persist across server reboots.
Note that the original question/feature request as-described is inconsistent with RabbitMQ's architecture. RabbitMQ is supposed to be a transient storage node, where clients connect and disconnect at random. Messages dumped into queues are intended to be processed by only one message consumer, and once processed, the message broker's job is to forget about the message.
One other way of implementing such a functionality is to have an audit queue, where all published messages are distributed to the queue, and a writer service writes them all to an audit log somewhere (usually in a persistent data store or text file). This would be something you would have to build, as there is currently no plug-in to automatically send messages out to a persistent storage (e.g. Couchbase, Elasticsearch).
Alternatively, if used as a debug tool, there is the Firehose plug-in. This is satisfactory when you are able to manually enable/disable it, but is not a good long-term solution as it will turn itself off upon any interruption of the broker.
What you would like to do is not a correct usage for RabbitMQ. Message Queues are not databases. They are not long term persistence solutions, like a RDBMS is. You can mainly use RabbitMQ as a buffer for processing incoming messages, which after the consumer handles it, get inserted into the database. When a new client connects to you service, the database will be read, not the message queue.
Relevant
Also, unless you are building a really big, highly scalable system, I doubt you actually need RabbitMQ.
Apache Kafka is the right solution for this use-case. "Log Compaction enabled topics" a.k.a. compacted topics are specifically designed for this usecase. But the catch is, obviously your messages have to be idempotent, strictly no delta-business. Because kafka will compact from time to time and may retain only the last message of a "key".

Persisting Data in a Twisted App

I'm trying to understand how to persist data in a Twisted application. Let's say I've decided to write a Twisted server that:
Accepts inbound SMTP requests
Sends the message to a 3rd party system for modification
Relays the modified message to its destination
A typical Twisted tutorial would have you build this app using Deferreds and callbacks, roughly:
A Factory handles inbound requests
Each time a full email is received a call is sent to the remote message processor, returning a deferred
Add an errback that substitutes the original message if anything goes wrong in the modify call.
Add a callback to send the message on to the recipient, which again returns a deferred.
A real server would add/include additional call/errbacks to retry or notify the sender or whatnot. Again for simplicity, assume we consider this an acceptable amount of effort and just log errors.
Of course, this persists NO data in the event of a crash/restart/something else. I get that a solution involves a 3rd party persistent datastore (RabbitMQ is often mentioned) and could probably come up with a dozen random ways to achieve the outcome.
However, I imagine there are a few approaches that work best in a Twisted app. What do they look like? How do they store (and restore in the event of a crash) the in-process messages?
If you found this question, you probably already know that Twisted is event-based. It sounds simple, but the "hardest" part of the answer is to get the persistence platform generating the events we need when we need them. Naturally, you can persist the data in a DB or a message queue, but some platforms don't naturally generate events. For example:
ZeroMQ has (or at least had) no callback for new data. It's also relatively poor at persistence.
In other cases, events are easy but reliability is a problem:
pgSQL can be configured to generate events using triggers, but they're one-time things so you can't resume incomplete events
The light at the end of the tunnel seems to be something like RabbitMQ.
RabbitMQ can persist the message in a database to survive a crash
We can use acknowledgements on both legs (incoming SMTP to RabbitMQ and RabbitMQ to outgoing SMTP) to ensure the application is reliable. Importantly, RabbitMQ supports acknowledgements.
Finally, several of the RabbitMQ clients provide full asynchronous support (see for example pika, txampq, and puka)
It's enough for our purposes that the RabbitMQ client provides us an event-based interface.
At a more theoretical level, however, this need not be the case. In fact, despite the "notification" issue above, ZeroMQ has an event-based client. Even if our software is elegantly event-based, we will run into systems that aren't. In these cases, we have no choice but to fall back on polling. In principle, if not in practice, we just query the message provider for messages. When we exhaust the current queue (and immediately if there are no messages), we use a callLater command to check again in the future. It may feel anti-pattern, but it's (to the best of my knowledge anyway) the right way to handle this particular case.

How do I know when all the subscribers are complete?

We have a bunch of requests that we plan to publish to the queue.
There will be several different subscriber types, each in their own round robin pool.
For example Request1 is pushed onto the queue
LoggingSubscriber1 and LoggingSubscriber2 both subscribe with the "LoggingSubscriber" subscriptionId so that only one of them gets the request.
There will be other groups like DoProcessSubscriber1, DoProcessSubscriber2, and DoProcessSubscriber3
And another DoOtherProcessSubscriber1, DoOtherProcessSubscriber2
We need some way to know that all three subscribers (Logging, DoProcess, and DoOtherProcess) have completed, so that we can perform some action...like sending a message to the client that all the entire request has completed.
How would we aggregate responses like this? We were thinking of having each subscriber put a response object on the queue, but we still aren't sure how to know that they are all done.
Ideally you'd use the Request/Response pattern built into EasyNetQ, but that's designed for a single (potentially farmed) consumer. It doesn't allow you to bind to multiple queues. In your case you should probably have your client set up a subscription for replies and have all three services publish a message when they are complete. The client can then wait until it has a response from all three before updating.
However, I'd encourage you to possibly re-think your design. By making the client responsible for acknowledging the completion of the subscribers, you're building a very tightly coupled system. Messaging System design works far better if you adopt the notion of eventual consistency. Allow your client to fire-and-forget and have some audit process ensure that all the expected processing did eventually occur.