rabbitmq: can consumer persist message change before nack? - rabbitmq

Before a consumer nacks a message, is there any way the consumer can modify the message's state so that when the consumer consumes it upon redelivery, it sees that changed state. I'd rather not reject + reenqueue new message, but please let me know if that's the only way to accomplish this.
My goal is to determine how many times specific messages are being redelivered. I see two ways of doing this:
(1) On the message itself as described above. The message would be a container of basic stats and the application payload message.
(2) In some external storage. We would uniquely identify the message by the message id that we set.
I know 2 is possible, but my question is if 1 is possible.

There is no way to do (1) like you want. You would need to change the message, thus the message would become another message. If you want to do something like that (and it's possible that you meant this with I'd rather not reject + reenqueue new message) - you should ACK the message, increment one field in it and publish it again (again, maybe this is what you meant when you said reenqueue it). So your message payload would have some ID, counter, and again (obviously different) payload that is the content.
Definitvly much better way is (2) for multiple reasons:
it does not interfere with business logic, that is this diagnostic part is isolated
you are leaving re-queueing to rabbitmq (as you are supposed to do), meaning that you are not worrying about losing messages and handling some message meta info which has no use for you business logic
it's actually supposed to be used - the ACKing and NACKing, that's why it's in the AMQP specification
since you do need the number of how many times specific messages have been redelivered, you have it somewhere externally, meaning that it's independent of (rabbitmq's) message persistence, lifetime, potentially queue durability mirroring etc

Even if this question was marked as solved some time ago, I want to mention that there is a way at least for the redelivery. It might be integrated after the original answer. There is a different type of queues in RabbitMQ called Quorum queues.
Quorum queues offer the option to set redelivery limit:
Quorum queues support poison message handling via a redelivery limit. This feature is currently unique to Quorum queues.
In order to archive this, RabbitMQ is counting the numbers of deliveries in the header. The header attribute is called: x-delivery-count

Related

RabbitMQ direct exchange, with routing key and no queues or subscribers, is this ok for performance?

I have an exchange that's going to receive roughly 50 messages per second. These messages have a unique identifier which relates to each unit in the field. This unique identifier will be the routing key. Every now and again we need to debug or analyse a unit. At that point in time we will spin up a queue, with the correct routing key, and bind it to the exchange. This way, that queue will start receiving the messages for that unit and any consumers monitoring that queue, will then receive the messages.
What this does mean is that 99% of the time, the exchange will have no queues and no routing key. Then, every now and again a queue and routing key will be created and subscribe.
It feels kind of wasteful to be sending 50 messages per second at an exchange, when its just going to immediately discard them. That said, it feels like this how RabbitMQ exchanges are supposed to be used. I guess from a developer perspective i feel like this is wasteful but I also think my understanding of rabbit says that this is the correct way to do.
Is there any overhead to doing this? Any performance concerns I should have? or maybe I am approaching this entirely wrong?
I did try to search before asking but nothing really describes a scenario where an exchange has no queue or routing key, but is still receiving messages.
This is basically how RabbitMQ works, as you have described. The broker is not responsible for how often and how many events you decide to publish. It will nonetheless protect from too much pressure. It has a credit based flow control mechanism. RabbitMQ flow control.
RabbitMQ has different ways in which unroutable messages can be handled.Unroutable Message Handling How to deal with unroutable messages
To sum up a bit the information you will find on those links:
If the publisher does not set the message as mandatory, it will either be discarded or republished to a different alternate exchange that you can configure. This only makes sense if you want to persist all unroutable messages regardless of the source in a single queue, that you can handle later.
If the publisher sets the message as mandatory, the message will be returned to the publisher and the publisher can have a returned message handler setup in order to handle those events.
These strategies in addition to the flow control mechanism, also assure RabbitMQ reliability and protection.
In your situation if you want to limit the messages from producer even more, you need to create a mechanism, as an example, so the producer will not start publishing only when a consumer becomes active. So basically the consumer process will communicate the producer process that it is active and it can start publishing. But from my experience I don't think it's worth the overhead, at least at first, because 50 messages per seconds isn't much. You can monitor the RabbitMQ server and check how is the resource consumption to check if you need to optimize, at first. Optimization is best done with metrics and understanding.

To be sure about concurrency, same group of works in multiple queues (FIFO)

I have a question about multi consumer concurrency.
I want to send works to rabbitmq that comes from web request to distributed queues.
I just want to be sure about order of works in multiple queues (FIFO).
Because this request comes from different users eech user requests/works must be ordered.
I have found this feature with different names on Azure ServiceBus and ActiveMQ message grouping.
Is there any way to do this in pretty RabbitMQ ?
I want to quaranty that customer's requests must be ordered each other.
Each customer may have multiple requests but those requests for that customer must be processed in order.
I desire to process quickly incoming requests with using multiple consumer on different nodes.
For example different customers 1 to 1000 send requests over 1 millions.
If I put this huge request in only one queue it takes a lot of time to consume. So I want to share this process load between n (5) node. For customer X 's requests must be in same sequence for processing
When working with event-based systems, and especially when using multiple producers and/or consumers, it is important to come to terms with the fact that there usually is no such thing as a guaranteed order of events. And to get a robust system, it is also wise to design the system so the message handlers are idempotent; they should tolerate to get the same message twice (or more).
There are way to many things that may (and actually should be allowed to) interfere with the order;
The producers may deliver the messages in a slightly different pace
One producer might miss an ack (due to a missed package) and will resend the message
One consumer may get and process a message, but the ack is lost on the way back, so the message is delivered twice (to another consumer).
Some other service that your handlers depend on might be down, so that you have to reject the message.
That being said, there is one pattern that servicebus-systems like NServicebus use to enforce the order messages are consumed. There are some requirements:
You will need a centralized storage (like a sql-server or document store) that allows for conditional updates; for instance you want to be able to store the sequence number of the last processed message (or how far you have come in the process), but only if the already stored sequence/progress is the right/expected one. Storing the user-id and the progress even for millions of customers should be a very easy operation for most databases.
You make sure the queue is configured with a dead-letter-queue/exchange for retries, and then set your original queue as a dead-letter-queue for that one again.
You set a TTL (for instance 30 seconds) on the retry/dead-letter-queue. This way the messages that appear on the dead-letter-queue will automatically be pushed back to your original queue after some timeout.
When processing your messages you check your storage/database if you are in the right state to handle the message (i.e. the needed previous steps are already done).
If you are ok to handle it you do and update the storage (conditionally!).
If not - you nack the message, so that it is thrown on the dead-letter queue. Basically you are saying "nah - I can't handle this message, there are probably some other message in the queue that should be handled first".
This way the happy-path is to process a great number of messages in the right order.
But if something happens and a you get a message out of band, you will throw it on the retry-queue (the dead-letter-queue) and Rabbit will make sure it will get back in the queue to be retried at a later stage. But only after a delay.
The beauty of this is that you are able to handle most of the situations that may interfere with processing the message (out of order messages, dependent services being down, your handler being shut down in the middle of handling the message) in exact the same way; by rejecting the message and letting your infrastructure (Rabbit) take care of it being retried after a while.
(Assuming the OP is asking about things like ActiveMQs "message grouping:)
This isn't currently built in to RabbitMQ AFAIK (it wasn't as of 2013 as per this answer) and I'm not aware of it now (though I haven't kept up lately).
However, RabbitMQ's model of exchanges and queues is very flexible - exchanges and queues can be easily created dynamically (this can be done in other messaging systems but, for example, if you read ActiveMQ documentation or Red Hat AMQ documentation you'll find all of the examples in the user guides are using pre-declared queues in configuration files loaded at system startup - except for RPC-like request/response communication).
Also it is very easy in RabbitMQ for a consumer (i.e., message consuming thread) to consume from multiple queues.
So you could build, on top of RabbitMQ, a system where you got your desired grouping semantics.
One way would be to create dynamic queues: The first time a customer order was seen or a new group of customer orders a queue would be created with a unique name for all messages for that group - that queue name would be communicated (via another queue) to a consumer who's sole purpose was to load-balance among other consumers that were responsible for handling customer order groups. I.e., the load-balancer would pull off of its queue a message saying "new group with queue name XYZ" and it would find in a pool of order group consumer a consumer which could take this load and pass it a message saying "start listening to XYZ".
Another way to do it is with pub/sub and topic routing - each customer order group would get a unique topic - and proceed as above.
RabbitMQ Consistent Hash Exchange Type
We are using RabbitMQ and we have found a plugin. It use Consistent Hashing algorithm to distribute messages in order to consistent keys.
For more information about Consistent Hashing ;
https://en.wikipedia.org/wiki/Consistent_hashing
https://www.youtube.com/watch?v=viaNG1zyx1g
You can find this plugin from rabbitmq web page
plugin : rabbitmq_consistent_hash_exchange
https://www.rabbitmq.com/plugins.html

Is it possible to buffer messages in exchange until at least one queue is available?

I'm looking for a way to buffer messages received by the exchange as long as there is at least one queue bind to that exchange.
Is it supported by RabbitMQ?
Maybe there are some workarounds (I didn't find any).
EDIT
My use case:
I've got one data producer (which reads real-time data from an external system)
I've got one fanout exchange which receives data from the producer
On system startup, there might be no consumer, but after a few moments, there should be at least one which creates his own queue and binds it to the exchange from 2.
The problem is this short time between step 2. and 3. where there are no queues bound to the exchange created in step 1.
Of course, it's an edge case and after system initialization queues and exchanges are bound and everything works as expected.
Why queues and bindings has to be created by consumers (not by the producer)? Because I need a flexible setup where I can add consumers without any changes in other components code (e.g. producer).
EDIT 2
I'm processing the output from another system which stores both real-time and historical data. There are the cases where I want to read historical data first (on initialization) and then continue to handle real-time data.
I may mislead you by saying that there are multiple consumers. In the case where I need a buffer on exchange there is only one consumer (which writes everything to time series DB as it appears in queue).
The RabbitMQ team monitors this mailing list and only sometimes answers questions on StackOverflow.
Why queues and bindings has to be created by consumers (not by the producer)?
Queues and bindings can be created by producers or consumers or both. The requirement is that the exact same arguments are used when creating them if a client application tries to "re-create" a queue or binding. If different arguments are used, a channel-level error will happen.
As you have found, if a producer publishes to an exchange that can't route messages, they will be lost. Olivier's suggestion to use an alternate exchange is a good one, but I recommend you have your producers create queues and bindings as well.
If you mean to avoid throwing away messages because there is no destination configured for it, yes.
You should look at alternate exchange.
This assume that before (or when) you start (or when), the alternate exchange is created (would typically go for fanout) and a queue is binded to it (let's call it notroutedq).
So the messages are not lost, they will be stored in notroutedq.
From there you can possibly setup a mechanism that would reprocess messages in that queue - reinjecting them into the main exchange most likely - once a given time has passed or when a binding has been added to your main exchange.
-- EDIT --
Thanks for the updated info.
Could you indicate how long typically you'd expect the past messages to be useful to the consumers?
In your description, you mention real-time data and possibly multiple consumers coming and going. Based on that, I'm not sure how much of the data kept in the notroutedq would be of value, and with which frequency you'd expect to resend them to the consumers.
The cases I had with alternate exchange where mostly focused on identifying missing bindings, so that one could easily correct the bindings and reprocess the messages without loss.
If the number of consumers varies through time and the data content is real-time, I'd wonder a bit about the benefit of keeping the data.

Dead lettering messages on an expired queue bound with a consistent hash exchange

I have a situation where I am processing events that are related to specific sources. Each source has a key or ID, which I can use as the hash. Events from each source have to be processed in order, but events from different sources can be parallelized, to achieve horizontal scalability. There will be hundreds of source keys.
I am planning to set the key as part of the routing key when submitting messages to RabbitMQ, and then use the consistent-hash-exchange so that events from the same source are routed to the same queue. I was then thinking of dynamically binding private queues from consumers, with a TTL (so that they are gracefully removed if a consumer is down). At the beginning I will just have 2 or 3 consumers for redundancy, but if I want to scale up due to an increased number of messages, I can just start another consumer.
My question is what happens if a consumer is down and there are messages in its queue? Ideally I would want the messages in the queue to be rerouted back to the exchange, with the consistent-hash-exchange routing them to a different queue (since the original queue would be no longer there).
The RabbitMQ documentation about dead lettering doesn't explicitly mention the scenario of TTL on consumer queues, or what happens when the queue gets deleted.
Does my approach make sense? How can I achieve the consumer fault-tolerance I am looking for while retaining the ordering by a specific routing key?
Note: I know there is even a more subtle race condition if during the process of routing dead lettered messages to the exchange new messages come that were originally routed to the expired queue, which will now be routed to a different consumer, thus ordering will be broken at that specific instance.
There are more then one questions to be answered here, I'll try to go in the same order.
My question is what happens if a consumer is down and there are messages in its queue?
Outside of the context (rest of the question) - messages stay in the queue until they are ACKed or their TTL expires.
The RabbitMQ documentation about dead lettering doesn't explicitly mention the scenario of TTL on consumer queues, or what happens when the queue gets deleted.
It does say ...The TTL for the message expires..., so basically if the message is not ACKed within given TTL, it get's to DLX. For the queue TTL, check this link - it's basically an "expiry time" for the queue. Additionally, if the queue get's deleted, the messages are gone (when not taking into account any mirroring of course).
Now for the "does it makes sense" part. For the messages from the different sources, I think it's clear - process as much as you can in parallel and that's it. There are no collisions (well usually no) there.
How can I achieve the consumer fault-tolerance I am looking for while retaining the ordering by a specific routing key?
For sequential processing, basically you need exactly one consumer that does one source. Now for monitoring this consumer maybe add a watchdog to start it again if it crashes, or restart it if hangs etc. Maybe it would also make sense to use get instead of consume (amqp) method. I can't really recommend or not recommend this approach, because (for me at least) it's quite use case specific (performance, how often is there a new message etc), but I would say that in that way it's easier to achieve a "more synchronous" behavior.
And for sure (now referring to what you wrote in the note) you should try and avoid DLX-ing messages (higher TTL etc) if you really want to keep the original order of the sequence (said it redundantly on purpose :) )

Immediate flag in RabbitMQ

I have a clients that uses API. The API sends messeges to rabbitmq. Rabbitmq to workers.
I ought to reply to clients if somethings went wrong - message wasn't routed to a certain queue and wasn't obtained for performing at this time ( full confirmation )
A task who is started after 5-10 seconds does not make sense.
Appropriately, I must use mandatory and immediate flags.
I can't increase counts of workers, I can't run workers on another servers. It's a demand.
So, as I could find the immediate flag hadn't been supporting since rabbitmq v.3.0x
The developers of rabbitmq suggests to use TTL=0 for a queue instead but then I will not be able to check status of message.
Whether any opportunity to change that behavior? Please, share your experience how you solved problems like this.
Thank you.
I'm not sure, but after reading your original question in Russian, it might be that using both publisher and consumer confirms may be what you want. See last three paragraphs in this answer.
As you want to get message result for published message from your worker, it looks like RPC pattern is what you want. See RabbitMQ RPC tuttorial. Pick a programming language section there you most comfortable with, overall concept is the same. You may also find Direct reply-to useful.
It's not the same as immediate flag functionality, but in case all your publishers operate with immediate scenario, it might be that AMQP protocol is not the best choice for such kind of task. Immediate mean "deliver this message right now or burn in hell" and it might be a situation when you publish more than you can process. In such cases RPC + response timeout may be a good choice on application side (e.g. socket timeout). But it doesn't work well for non-idempotent RPC calls while message still be processed, so you may want to use per-queue or per-message TTL (or set queue length limit). In case message will be dead-lettered, you may get it there (in case you need that for some reason).
TL;DR
As to "something" can go wrong, it can go so on different levels which we for simplicity define as:
before RabbitMQ, like sending application failure and network problems;
inside RabbitMQ, say, missed destination queue, message timeout, queue length limit, some hard and unexpected internal error;
after RabbitMQ, in most cases - messages processing application error or some third-party services like data persistence or caching layer outage.
Some errors like network outage or hardware error are a bit epic and are not a subject of this q/a.
Typical scenario for guaranteed message delivery is to use publisher confirms or transactions (which are slower). After you got a confirm it mean that RabbitMQ got your message and if it has route - placed in a queue. If not it is dropped OR if mandatory flag set returned with basic.return method.
For consumers it's similar - after basic.consumer/basic.get, client ack'ed message it considered received and removed from queue.
So when you use confirms on both ends, you are protected from message loss (we'll not run into a situation that there might be some bug in RabbitMQ itself).
Bogdan, thank you for your reply.
Seems, I expressed my thought enough clearly.
Scheme may looks like this. Each component of system must do what it must do :)
The an idea is make every component more simple.
How to task is performed.
Clients goes to HTTP-API with requests and must obtain a respones like this:
Positive - it have put to queue
Negative - response with error and a reason
When I was talking about confirmation I meant that I must to know that a message is delivered ( there are no free workers - rabbitmq can remove a message ), a client must be notified.
A sent message couldn't be delivered to certain queue, a client must be notified.
How to a message is handled.
Messages is sent for performing.
Status of perfoming is written into HeartBeat
Status.
Clients obtain status from HeartBeat by itself and then decide that
it's have to do.
I'm not sure, that RPC may be useful for us i.e. RPC means that clients must to wait response from server. Tasks may works a long time. Excess bound between clients and servers, additional logic on client-side.
Limited size of queue maybe not useful too.
Possible situation when a size of queue maybe greater than counts of workers. ( problem in configuration or defined settings ).
Then an idea with 5-10 seconds doesn't make sense.
TTL doesn't usefull because of:
Setting the TTL to 0 causes messages to be expired upon reaching a
queue unless they can be delivered to a consumer immediately. Thus
this provides an alternative to basic.publish's immediate flag, which
the RabbitMQ server does not support. Unlike that flag, no
basic.returns are issued, and if a dead letter exchange is set then
messages will be dead-lettered.
direct reply-to :
The RPC server will then see a reply-to property with a generated
name. It should publish to the default exchange ("") with the routing
key set to this value (i.e. just as if it were sending to a reply
queue as usual). The message will then be sent straight to the client
consumer.
Then I will not be able to route messages.
So, I'm sorry. I may flounder in terms i.e. I'm new in AMQP and rabbitmq.