Redis Pub/Sub with Reliability

Redis Pub/Sub with Reliability - redis

I've been looking at using Redis Pub/Sub as a replacement to RabbitMQ.
From my understanding Redis's pub/sub holds a persistent connection to each of the subscribers, and if the connection is terminated, all future messages will be lost and dropped on the floor.
One possible solution is to use a list (and blocking wait) to store all the message and pub/sub as just a notification mechanism. I think this gets me most of the way there, but I still have some concerns about the failure cases.
what happens when a subscriber dies, and comes back online, how should it process all it's pending messages?
when a malformed message comes though the system, how do you handle those exceptions? DeadLetter Queue?
is there a standard practice to implementing a retry policy?

When a subscriber (consumer) dies, your list will continue to grow until the client returns. Your producer could trim the list (from either side) once it reaches a specific limit, but that is something you would need to handle at the application level. If you include a timestamp within each message, your consumer can then act on the age of a message, assuming you have application logic you want to enforce on message age.
I'm not sure how a malformed message would enter the system, as the connection to Redis is usually TCP with the its integrity assurances. But if this happens, perhaps due to a bug in message encoding at the producer layer, you could provide a general mechanism for handling errors by keeping a queue-per-producer that received consumer's exception messages.
Retry policies will depend greatly on your application needs. If you need 100% assurance that a message has been received and processed, then you should consider using Redis transactions (MULTI/EXEC) to wrap the work done by a consumer, so you can ensure that a client doesn't remove a message unless it has completed its work. If you need explicit acknowlegement, then you could use an explicit ACK message on a queue dedicated to the producer process(es).
Without knowing more about your application needs, it's hard to know how to choose wisely. Generally, if your messages require full ACID protection, then you probably also need to use redis transactions. If your messages are only meaningful when they are timely, then transactions may not be needed. It sounds as though you can't tolerate dropped messages, so your approach of using a list is good. If you need to implement a priority queue for your messages, you can use the sorted set (the Z-commands) to store your messages, using their priority as the score value, along with a polling consumer.

If you want a pub/sub system where subscribers won't lose messages when they die, consider using Redis Streams instead of Redis Pub/sub.
Redis Streams have their own architecture and pros/cons to Redis Pub/sub. With Redis Streams, a subscriber can issue the command:
the last message I received was X, now give me the next message;
if there is no new message, then wait for one to arrive.
Antirez's article linked above is a good intro to Redis streams with more info.

What I did is use a sorted set using the timestamp as the score and the key to the data as the member value. I use the score from the last item to retrieve the next few ones and then get the keys. Once the work is done I wrap both the zrem and the del in a MULTI/EXEC transaction.
Essentially what Edward said, but with the twist of storing the keys in the sorted set, as my messages can be pretty big.
Hope this helps!

Related

RabbitMQ: how to handle unwanted duplicate un-ack message after connection lost?

In my app(multiple instances), we occasionally see the case where connection is lost between my app and rabbitmq due to network issues(my app and rabbitmq are both alive), then after connection is recovered(re-established) we will receive messages that are unacked.
This creates an issue for us, because my app wasn't dead, and it is still processing the same message it received before, but now the message is redeivered, and it causes the app to process the message again (which can be fatal to us).
Since the app has multiple instances, it is not easy for an instance to check if another instance is processing the same message at the same time. We can't simply filter out redelivered message, because we need this feature to handle instance/app crashes/re-deployments.
It doesn't seem that there is an api to tell rabbitmq when to not redeliver unacked messages.
So what is the recommended practice to handle this situation ?
Thanks,

The general solution for such scenario is to make the consumers handle the messages in an idempotent manner . Generally what I do is from the producer side ( in case there is no unique identifier in the message body ) I add an attribute idempotencyId to the message body which is a guid and on the consumer side for each message this id is validated against the stored value in database , any duplicates are rejected.
This approach also works for messages which might be shoveled from another cluster or if in a same cluster multiple instances of consumers are listening then too this approach guarantee one time processing.
Would suggest to go over the RabbitMQ Reliability Guide here

Yeah, exactly-once delivery is not something RabbitMQ is good at. In fact, I'd say you should probably not be using it for these kinds of problems. Honestly, the only way to truly fix this is to use distributed transactions or locking.
Anyway, you could turn the problem on its head by ack'ing the message as soon as the consumer gets it, before it starts working on it. That would avoid the RabbitMQ-related duplication issue at least. This is at-most-once delivery.
Of course, it means that if the consumer crashes, the message is lost forever. So you need to persist the message right before you ack it so you can recover it later and also the consumer should remove it once it's complete.
Considering that crashes are rare, you can then have a single dedicated process that just works on those persisted messages. Or for that matter, handle them manually.
Just be aware that you are pushing the duplication problem in front of you, because the consumer might fail to remove the persisted message after it's done working with it anyway, but at least you have the option to implement it however you want.
Storage in this case could be anything from files, a RDBMS or something like ZooKeeper or Redis to lock/unlock in-flight messages.

RabbitMQ direct exchange, with routing key and no queues or subscribers, is this ok for performance?

I have an exchange that's going to receive roughly 50 messages per second. These messages have a unique identifier which relates to each unit in the field. This unique identifier will be the routing key. Every now and again we need to debug or analyse a unit. At that point in time we will spin up a queue, with the correct routing key, and bind it to the exchange. This way, that queue will start receiving the messages for that unit and any consumers monitoring that queue, will then receive the messages.
What this does mean is that 99% of the time, the exchange will have no queues and no routing key. Then, every now and again a queue and routing key will be created and subscribe.
It feels kind of wasteful to be sending 50 messages per second at an exchange, when its just going to immediately discard them. That said, it feels like this how RabbitMQ exchanges are supposed to be used. I guess from a developer perspective i feel like this is wasteful but I also think my understanding of rabbit says that this is the correct way to do.
Is there any overhead to doing this? Any performance concerns I should have? or maybe I am approaching this entirely wrong?
I did try to search before asking but nothing really describes a scenario where an exchange has no queue or routing key, but is still receiving messages.

This is basically how RabbitMQ works, as you have described. The broker is not responsible for how often and how many events you decide to publish. It will nonetheless protect from too much pressure. It has a credit based flow control mechanism. RabbitMQ flow control.
RabbitMQ has different ways in which unroutable messages can be handled.Unroutable Message Handling How to deal with unroutable messages
To sum up a bit the information you will find on those links:
If the publisher does not set the message as mandatory, it will either be discarded or republished to a different alternate exchange that you can configure. This only makes sense if you want to persist all unroutable messages regardless of the source in a single queue, that you can handle later.
If the publisher sets the message as mandatory, the message will be returned to the publisher and the publisher can have a returned message handler setup in order to handle those events.
These strategies in addition to the flow control mechanism, also assure RabbitMQ reliability and protection.
In your situation if you want to limit the messages from producer even more, you need to create a mechanism, as an example, so the producer will not start publishing only when a consumer becomes active. So basically the consumer process will communicate the producer process that it is active and it can start publishing. But from my experience I don't think it's worth the overhead, at least at first, because 50 messages per seconds isn't much. You can monitor the RabbitMQ server and check how is the resource consumption to check if you need to optimize, at first. Optimization is best done with metrics and understanding.

To be sure about concurrency, same group of works in multiple queues (FIFO)

I have a question about multi consumer concurrency.
I want to send works to rabbitmq that comes from web request to distributed queues.
I just want to be sure about order of works in multiple queues (FIFO).
Because this request comes from different users eech user requests/works must be ordered.
I have found this feature with different names on Azure ServiceBus and ActiveMQ message grouping.
Is there any way to do this in pretty RabbitMQ ?
I want to quaranty that customer's requests must be ordered each other.
Each customer may have multiple requests but those requests for that customer must be processed in order.
I desire to process quickly incoming requests with using multiple consumer on different nodes.
For example different customers 1 to 1000 send requests over 1 millions.
If I put this huge request in only one queue it takes a lot of time to consume. So I want to share this process load between n (5) node. For customer X 's requests must be in same sequence for processing

When working with event-based systems, and especially when using multiple producers and/or consumers, it is important to come to terms with the fact that there usually is no such thing as a guaranteed order of events. And to get a robust system, it is also wise to design the system so the message handlers are idempotent; they should tolerate to get the same message twice (or more).
There are way to many things that may (and actually should be allowed to) interfere with the order;
The producers may deliver the messages in a slightly different pace
One producer might miss an ack (due to a missed package) and will resend the message
One consumer may get and process a message, but the ack is lost on the way back, so the message is delivered twice (to another consumer).
Some other service that your handlers depend on might be down, so that you have to reject the message.
That being said, there is one pattern that servicebus-systems like NServicebus use to enforce the order messages are consumed. There are some requirements:
You will need a centralized storage (like a sql-server or document store) that allows for conditional updates; for instance you want to be able to store the sequence number of the last processed message (or how far you have come in the process), but only if the already stored sequence/progress is the right/expected one. Storing the user-id and the progress even for millions of customers should be a very easy operation for most databases.
You make sure the queue is configured with a dead-letter-queue/exchange for retries, and then set your original queue as a dead-letter-queue for that one again.
You set a TTL (for instance 30 seconds) on the retry/dead-letter-queue. This way the messages that appear on the dead-letter-queue will automatically be pushed back to your original queue after some timeout.
When processing your messages you check your storage/database if you are in the right state to handle the message (i.e. the needed previous steps are already done).
If you are ok to handle it you do and update the storage (conditionally!).
If not - you nack the message, so that it is thrown on the dead-letter queue. Basically you are saying "nah - I can't handle this message, there are probably some other message in the queue that should be handled first".
This way the happy-path is to process a great number of messages in the right order.
But if something happens and a you get a message out of band, you will throw it on the retry-queue (the dead-letter-queue) and Rabbit will make sure it will get back in the queue to be retried at a later stage. But only after a delay.
The beauty of this is that you are able to handle most of the situations that may interfere with processing the message (out of order messages, dependent services being down, your handler being shut down in the middle of handling the message) in exact the same way; by rejecting the message and letting your infrastructure (Rabbit) take care of it being retried after a while.

(Assuming the OP is asking about things like ActiveMQs "message grouping:)
This isn't currently built in to RabbitMQ AFAIK (it wasn't as of 2013 as per this answer) and I'm not aware of it now (though I haven't kept up lately).
However, RabbitMQ's model of exchanges and queues is very flexible - exchanges and queues can be easily created dynamically (this can be done in other messaging systems but, for example, if you read ActiveMQ documentation or Red Hat AMQ documentation you'll find all of the examples in the user guides are using pre-declared queues in configuration files loaded at system startup - except for RPC-like request/response communication).
Also it is very easy in RabbitMQ for a consumer (i.e., message consuming thread) to consume from multiple queues.
So you could build, on top of RabbitMQ, a system where you got your desired grouping semantics.
One way would be to create dynamic queues: The first time a customer order was seen or a new group of customer orders a queue would be created with a unique name for all messages for that group - that queue name would be communicated (via another queue) to a consumer who's sole purpose was to load-balance among other consumers that were responsible for handling customer order groups. I.e., the load-balancer would pull off of its queue a message saying "new group with queue name XYZ" and it would find in a pool of order group consumer a consumer which could take this load and pass it a message saying "start listening to XYZ".
Another way to do it is with pub/sub and topic routing - each customer order group would get a unique topic - and proceed as above.

RabbitMQ Consistent Hash Exchange Type
We are using RabbitMQ and we have found a plugin. It use Consistent Hashing algorithm to distribute messages in order to consistent keys.
For more information about Consistent Hashing ;
https://en.wikipedia.org/wiki/Consistent_hashing
https://www.youtube.com/watch?v=viaNG1zyx1g
You can find this plugin from rabbitmq web page
plugin : rabbitmq_consistent_hash_exchange
https://www.rabbitmq.com/plugins.html

Read all messages from the very begining

Consider a group chat scenario where 4 clients connect to a topic on an exchange. These clients each send an receive messages to the topic and as a result, they all send/receive messages from this topic.
Now imagine that a 5th client comes in and wants to read everything that was send from the beginning of time (as in, since the topic was first created and connected to).
Is there a built-in functionality in RabbitMQ to support this?
Many thanks,
Edit:
For clarification, what I'm really asking is whether or not RabbitMQ supports SOW since I was unable to find it on the documentations anywhere (http://devnull.crankuptheamps.com/documentation/html/develop/configuration/html/chapters/sow.html).

Specifically, the question is: is there a way for RabbitMQ to output all messages having been sent to a topic upon a new subscriber joining?
The short answer is no.
The long answer is maybe. If all potential "participants" are known up-front, the participant queues can be set up and configured in advance, subscribed to the topic, and will collect all messages published to the topic (matching the routing key) while the server is running. Additional server configurations can yield queues that persist across server reboots.
Note that the original question/feature request as-described is inconsistent with RabbitMQ's architecture. RabbitMQ is supposed to be a transient storage node, where clients connect and disconnect at random. Messages dumped into queues are intended to be processed by only one message consumer, and once processed, the message broker's job is to forget about the message.
One other way of implementing such a functionality is to have an audit queue, where all published messages are distributed to the queue, and a writer service writes them all to an audit log somewhere (usually in a persistent data store or text file). This would be something you would have to build, as there is currently no plug-in to automatically send messages out to a persistent storage (e.g. Couchbase, Elasticsearch).
Alternatively, if used as a debug tool, there is the Firehose plug-in. This is satisfactory when you are able to manually enable/disable it, but is not a good long-term solution as it will turn itself off upon any interruption of the broker.

What you would like to do is not a correct usage for RabbitMQ. Message Queues are not databases. They are not long term persistence solutions, like a RDBMS is. You can mainly use RabbitMQ as a buffer for processing incoming messages, which after the consumer handles it, get inserted into the database. When a new client connects to you service, the database will be read, not the message queue.
Relevant
Also, unless you are building a really big, highly scalable system, I doubt you actually need RabbitMQ.

Apache Kafka is the right solution for this use-case. "Log Compaction enabled topics" a.k.a. compacted topics are specifically designed for this usecase. But the catch is, obviously your messages have to be idempotent, strictly no delta-business. Because kafka will compact from time to time and may retain only the last message of a "key".

rabbitmq: can consumer persist message change before nack?

Before a consumer nacks a message, is there any way the consumer can modify the message's state so that when the consumer consumes it upon redelivery, it sees that changed state. I'd rather not reject + reenqueue new message, but please let me know if that's the only way to accomplish this.
My goal is to determine how many times specific messages are being redelivered. I see two ways of doing this:
(1) On the message itself as described above. The message would be a container of basic stats and the application payload message.
(2) In some external storage. We would uniquely identify the message by the message id that we set.
I know 2 is possible, but my question is if 1 is possible.

There is no way to do (1) like you want. You would need to change the message, thus the message would become another message. If you want to do something like that (and it's possible that you meant this with I'd rather not reject + reenqueue new message) - you should ACK the message, increment one field in it and publish it again (again, maybe this is what you meant when you said reenqueue it). So your message payload would have some ID, counter, and again (obviously different) payload that is the content.
Definitvly much better way is (2) for multiple reasons:
it does not interfere with business logic, that is this diagnostic part is isolated
you are leaving re-queueing to rabbitmq (as you are supposed to do), meaning that you are not worrying about losing messages and handling some message meta info which has no use for you business logic
it's actually supposed to be used - the ACKing and NACKing, that's why it's in the AMQP specification
since you do need the number of how many times specific messages have been redelivered, you have it somewhere externally, meaning that it's independent of (rabbitmq's) message persistence, lifetime, potentially queue durability mirroring etc

Even if this question was marked as solved some time ago, I want to mention that there is a way at least for the redelivery. It might be integrated after the original answer. There is a different type of queues in RabbitMQ called Quorum queues.
Quorum queues offer the option to set redelivery limit:
Quorum queues support poison message handling via a redelivery limit. This feature is currently unique to Quorum queues.
In order to archive this, RabbitMQ is counting the numbers of deliveries in the header. The header attribute is called: x-delivery-count

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas