What is the difference between correlation id and delivery tag - rabbitmq

I've searched for a good explanation for the difference between these two,
but didn't really find one.
What I know till now is that:
correlation id is a string (Guid which was converted to string), and delivery tag is an int.
correlation id is unique for each message, and delivery tag is unique only in
the channel (The channel is the scope).
That's fine....but what is the difference in the purposes? why do we need two identifiers for a message?

The two identifiers exist at two different conceptual layers of communication, and have different properties that are useful in each case. While a protocol could probably be designed that had one identifier serving both purposes, keeping them separate makes both implementations simpler.
Delivery tags
Part of the AMQP communication layer, built into RabbitMQ itself.
Example use: a consumer process can acknowledge that a message has been processed and can be permanently discarded on the broker (RabbitMQ server).
Automatically assigned within the open channel for every message delivered.
Must be unique within that channel in order for the protocol to function correctly. Does not need to be unique across different channels, so a simple incrementing integer is simple to implement.
The same message may be delivered at different times with different delivery tags, or even exist on multiple queues and be delivered simultaneously to different consumers.
Correlation IDs
Part of the logic of the application that is using RabbitMQ, not the broker itself.
Example use: using a matching correlation ID and "reply to" on two separate messages, which the application wants to treat as a request and a response in an RPC pattern.
Needs to be manually added when the message is first created, and is optional.
Not guaranteed to be unique by the protocol, which just sees it as an arbitrary string. It is up to an application to generate in a way that is sufficiently unlikely to collide for its use case, such as an appropriate form of UUID.
Will stay the same every time a message is delivered, no matter how many times it is forwarded or duplicated into multiple queues.

Correlation ID is generally used in the context of RabbitMQ when I want to see a synchronous behavior in which a message is sent and in response to it another sender will send a response but will have the correlationID in the reply-to tag . The common pattern which is replicated in RabbitMQ is the RPC call which is more like a Synchronous messaging.
Delivery Tag is however an indicator of the delivery of the message per channel and generally comes in scope when Acknowledged Delivery model is being followed.
Both have completely different purpose and are not message identifier as such.

Related

Mass Transit: ensure message processing order when there are different message types

I'm new to Mass Transit and I would like to understand if it can helps with my scenario.
I'm building a sample application implemented with a CQRS event sourcing architecture and I need a service bus in order to dispatch the events created by the command stack to the query stack denormalizers.
Let's suppose of having a single aggregate in our domain, let's call it Photo, and two different domain events: PhotoUploaded and PhotoArchived.
Given this scenario, we have two different message types and the default Mass Transit behaviour is creating two different RabbitMq exchanges: one for the PhotoUploaded message type and the other for the PhotoArchived message type.
Let's suppose of having a single denormalizer called PhotoDenormalizer: this service will be a consumer of both message types, because the photo read model must be updated whenever a photo is uploaded or archived.
Given the default Mass Transit topology, there will be two different exchanges so the message processing order cannot be guaranteed between events of different types: the only guarantee that we have is that all the events of the same type will be processed in order, but we cannot guarantee the processing order between events of different type (notice that, given the events semantic of my example, the processing order matters).
How can I handle such a scenario ? Is Mass Transit suitable with my needs ? Am I completely missing the point with domain events dispatching ?
Disclaimer: this is not an answer to your question, but rather a preventive message why you should not do what you are planning to do.
Whilst message brokers like RMQ and messaging middleware libraries like MassTransit are perfect for integration, I strongly advise against using message brokers for event-sourcing. I can refer to my old answer Event-sourcing: when (and not) should I use Message Queue? that explains the reasons behind it.
One of the reasons you have found yourself - event order will never be guaranteed.
Another obvious reason is that building read models from events that are published via a message broker effectively removes the possibility for replay and to build new read models that would need to start processing events from the beginning of time, but all they get are events that are being published now.
Aggregates form transactional boundaries, so every command needs to guarantee that it completes within one transaction. Whilst MT supports the transaction middleware, it only guarantees that you get a transaction for dependencies that support them, but not for context.Publish(#event) in the consumer body, since RMQ doesn't support transactions. You get a good chance of committing changes and not getting events on the read side. So, the rule of thumb for event stores that you should be able to subscribe to the stream of changes from the store, and not publish events from your code, unless those are integration events and not domain events.
For event-sourcing, it is crucial that each read-model keeps its own checkpoint in the stream of events it is projecting. Message brokers don't give you that kind of power since the "checkpoint" is actually your queue and as soon as the message is gone from the queue - it is gone forever, there's no coming back.
Concerning the actual question:
You can use the message topology configuration to set the same entity name for different messages and then they'll be published to the same exchange, but that falls to the "abuse" category like Chris wrote on that page. I haven't tried that but you definitely can experiment. Message CLR type is part of the metadata, so there shouldn't be deserialization issues.
But again, putting messages in the same exchange won't give you any ordering guarantees, except the fact that all messages will land in one queue for the consuming service.
You will have to at least set the partitioning filter based on your aggregate id, to prevent multiple messages for the same aggregate from being processed in parallel. That, by the way, is also useful for integration. That's how we do it:
void AddHandler<T>(Func<ConsumeContext<T>, string> partition) where T : class
=> ep.Handler<T>(
c => appService.Handle(c, aggregateStore),
hc => hc.UsePartitioner(8, partition));
AddHandler<InternalCommands.V1.Whatever>(c => c.Message.StreamGuid);

To be sure about concurrency, same group of works in multiple queues (FIFO)

I have a question about multi consumer concurrency.
I want to send works to rabbitmq that comes from web request to distributed queues.
I just want to be sure about order of works in multiple queues (FIFO).
Because this request comes from different users eech user requests/works must be ordered.
I have found this feature with different names on Azure ServiceBus and ActiveMQ message grouping.
Is there any way to do this in pretty RabbitMQ ?
I want to quaranty that customer's requests must be ordered each other.
Each customer may have multiple requests but those requests for that customer must be processed in order.
I desire to process quickly incoming requests with using multiple consumer on different nodes.
For example different customers 1 to 1000 send requests over 1 millions.
If I put this huge request in only one queue it takes a lot of time to consume. So I want to share this process load between n (5) node. For customer X 's requests must be in same sequence for processing
When working with event-based systems, and especially when using multiple producers and/or consumers, it is important to come to terms with the fact that there usually is no such thing as a guaranteed order of events. And to get a robust system, it is also wise to design the system so the message handlers are idempotent; they should tolerate to get the same message twice (or more).
There are way to many things that may (and actually should be allowed to) interfere with the order;
The producers may deliver the messages in a slightly different pace
One producer might miss an ack (due to a missed package) and will resend the message
One consumer may get and process a message, but the ack is lost on the way back, so the message is delivered twice (to another consumer).
Some other service that your handlers depend on might be down, so that you have to reject the message.
That being said, there is one pattern that servicebus-systems like NServicebus use to enforce the order messages are consumed. There are some requirements:
You will need a centralized storage (like a sql-server or document store) that allows for conditional updates; for instance you want to be able to store the sequence number of the last processed message (or how far you have come in the process), but only if the already stored sequence/progress is the right/expected one. Storing the user-id and the progress even for millions of customers should be a very easy operation for most databases.
You make sure the queue is configured with a dead-letter-queue/exchange for retries, and then set your original queue as a dead-letter-queue for that one again.
You set a TTL (for instance 30 seconds) on the retry/dead-letter-queue. This way the messages that appear on the dead-letter-queue will automatically be pushed back to your original queue after some timeout.
When processing your messages you check your storage/database if you are in the right state to handle the message (i.e. the needed previous steps are already done).
If you are ok to handle it you do and update the storage (conditionally!).
If not - you nack the message, so that it is thrown on the dead-letter queue. Basically you are saying "nah - I can't handle this message, there are probably some other message in the queue that should be handled first".
This way the happy-path is to process a great number of messages in the right order.
But if something happens and a you get a message out of band, you will throw it on the retry-queue (the dead-letter-queue) and Rabbit will make sure it will get back in the queue to be retried at a later stage. But only after a delay.
The beauty of this is that you are able to handle most of the situations that may interfere with processing the message (out of order messages, dependent services being down, your handler being shut down in the middle of handling the message) in exact the same way; by rejecting the message and letting your infrastructure (Rabbit) take care of it being retried after a while.
(Assuming the OP is asking about things like ActiveMQs "message grouping:)
This isn't currently built in to RabbitMQ AFAIK (it wasn't as of 2013 as per this answer) and I'm not aware of it now (though I haven't kept up lately).
However, RabbitMQ's model of exchanges and queues is very flexible - exchanges and queues can be easily created dynamically (this can be done in other messaging systems but, for example, if you read ActiveMQ documentation or Red Hat AMQ documentation you'll find all of the examples in the user guides are using pre-declared queues in configuration files loaded at system startup - except for RPC-like request/response communication).
Also it is very easy in RabbitMQ for a consumer (i.e., message consuming thread) to consume from multiple queues.
So you could build, on top of RabbitMQ, a system where you got your desired grouping semantics.
One way would be to create dynamic queues: The first time a customer order was seen or a new group of customer orders a queue would be created with a unique name for all messages for that group - that queue name would be communicated (via another queue) to a consumer who's sole purpose was to load-balance among other consumers that were responsible for handling customer order groups. I.e., the load-balancer would pull off of its queue a message saying "new group with queue name XYZ" and it would find in a pool of order group consumer a consumer which could take this load and pass it a message saying "start listening to XYZ".
Another way to do it is with pub/sub and topic routing - each customer order group would get a unique topic - and proceed as above.
RabbitMQ Consistent Hash Exchange Type
We are using RabbitMQ and we have found a plugin. It use Consistent Hashing algorithm to distribute messages in order to consistent keys.
For more information about Consistent Hashing ;
https://en.wikipedia.org/wiki/Consistent_hashing
https://www.youtube.com/watch?v=viaNG1zyx1g
You can find this plugin from rabbitmq web page
plugin : rabbitmq_consistent_hash_exchange
https://www.rabbitmq.com/plugins.html

Is it possible to buffer messages in exchange until at least one queue is available?

I'm looking for a way to buffer messages received by the exchange as long as there is at least one queue bind to that exchange.
Is it supported by RabbitMQ?
Maybe there are some workarounds (I didn't find any).
EDIT
My use case:
I've got one data producer (which reads real-time data from an external system)
I've got one fanout exchange which receives data from the producer
On system startup, there might be no consumer, but after a few moments, there should be at least one which creates his own queue and binds it to the exchange from 2.
The problem is this short time between step 2. and 3. where there are no queues bound to the exchange created in step 1.
Of course, it's an edge case and after system initialization queues and exchanges are bound and everything works as expected.
Why queues and bindings has to be created by consumers (not by the producer)? Because I need a flexible setup where I can add consumers without any changes in other components code (e.g. producer).
EDIT 2
I'm processing the output from another system which stores both real-time and historical data. There are the cases where I want to read historical data first (on initialization) and then continue to handle real-time data.
I may mislead you by saying that there are multiple consumers. In the case where I need a buffer on exchange there is only one consumer (which writes everything to time series DB as it appears in queue).
The RabbitMQ team monitors this mailing list and only sometimes answers questions on StackOverflow.
Why queues and bindings has to be created by consumers (not by the producer)?
Queues and bindings can be created by producers or consumers or both. The requirement is that the exact same arguments are used when creating them if a client application tries to "re-create" a queue or binding. If different arguments are used, a channel-level error will happen.
As you have found, if a producer publishes to an exchange that can't route messages, they will be lost. Olivier's suggestion to use an alternate exchange is a good one, but I recommend you have your producers create queues and bindings as well.
If you mean to avoid throwing away messages because there is no destination configured for it, yes.
You should look at alternate exchange.
This assume that before (or when) you start (or when), the alternate exchange is created (would typically go for fanout) and a queue is binded to it (let's call it notroutedq).
So the messages are not lost, they will be stored in notroutedq.
From there you can possibly setup a mechanism that would reprocess messages in that queue - reinjecting them into the main exchange most likely - once a given time has passed or when a binding has been added to your main exchange.
-- EDIT --
Thanks for the updated info.
Could you indicate how long typically you'd expect the past messages to be useful to the consumers?
In your description, you mention real-time data and possibly multiple consumers coming and going. Based on that, I'm not sure how much of the data kept in the notroutedq would be of value, and with which frequency you'd expect to resend them to the consumers.
The cases I had with alternate exchange where mostly focused on identifying missing bindings, so that one could easily correct the bindings and reprocess the messages without loss.
If the number of consumers varies through time and the data content is real-time, I'd wonder a bit about the benefit of keeping the data.

rabbitmq: can consumer persist message change before nack?

Before a consumer nacks a message, is there any way the consumer can modify the message's state so that when the consumer consumes it upon redelivery, it sees that changed state. I'd rather not reject + reenqueue new message, but please let me know if that's the only way to accomplish this.
My goal is to determine how many times specific messages are being redelivered. I see two ways of doing this:
(1) On the message itself as described above. The message would be a container of basic stats and the application payload message.
(2) In some external storage. We would uniquely identify the message by the message id that we set.
I know 2 is possible, but my question is if 1 is possible.
There is no way to do (1) like you want. You would need to change the message, thus the message would become another message. If you want to do something like that (and it's possible that you meant this with I'd rather not reject + reenqueue new message) - you should ACK the message, increment one field in it and publish it again (again, maybe this is what you meant when you said reenqueue it). So your message payload would have some ID, counter, and again (obviously different) payload that is the content.
Definitvly much better way is (2) for multiple reasons:
it does not interfere with business logic, that is this diagnostic part is isolated
you are leaving re-queueing to rabbitmq (as you are supposed to do), meaning that you are not worrying about losing messages and handling some message meta info which has no use for you business logic
it's actually supposed to be used - the ACKing and NACKing, that's why it's in the AMQP specification
since you do need the number of how many times specific messages have been redelivered, you have it somewhere externally, meaning that it's independent of (rabbitmq's) message persistence, lifetime, potentially queue durability mirroring etc
Even if this question was marked as solved some time ago, I want to mention that there is a way at least for the redelivery. It might be integrated after the original answer. There is a different type of queues in RabbitMQ called Quorum queues.
Quorum queues offer the option to set redelivery limit:
Quorum queues support poison message handling via a redelivery limit. This feature is currently unique to Quorum queues.
In order to archive this, RabbitMQ is counting the numbers of deliveries in the header. The header attribute is called: x-delivery-count

Message types : how much information should messages contain?

We are currently starting to broadcast events from one central applications to other possibly interested consumer applications, and we have different options among members of our team about how much we should put in our published messages.
The general idea/architecture is the following :
In the producer application :
the user interacts with some entities (Aggregate Roots in the DDD sense) that can be created/modified/deleted
Based on what is happening, Domain Events are raised (ex : EntityXCreated, EntityYDeleted, EntityZTransferred etc ... i.e. not only CRUD, but mostly )
Raised events are translated/converted into messages that we send to a RabbitMQ Exchange
in RabbitMQ (we are using RabbitMQ but I believe the question is actually technology-independent):
we define a queue for each consuming application
bindings connect the exchange to the consumer queues (possibly with message filtering)
In the consuming application(s)
application consumes and process messages from its queue
Based on Enterprise Integration Patterns we are trying to define the Canonical format for our published messages, and are hesitating between 2 approaches :
Minimalist messages / event-store-ish : for each event published by the Domain Model, generate a message that contains only the parts of the Aggregate Root that are relevant (for instance, when an update is done, only publish information about the updated section of the aggregate root, more or less matching the process the end-user goes through when using our application)
Pros
small message size
very specialized message types
close to the "Domain Events"
Cons
problematic if delivery order is not guaranteed (i.e. what if Update message is received before Create message ? )
consumers need to know which message types to subscribe to (possibly a big list / domain knowledge is needed)
what if consumer state and producer state get out of sync ?
how to handle new consumer that registers in the future, but does not have knowledge of all the past events
Fully-contained idempotent-ish messages : for each event published by the Domain Model, generate a message that contains a full snapshot of the Aggregate Root at that point in time, hence handling in reality only 2 kind of messages "Create or Update" and "Delete" (+metadata with more specific info if necessary)
Pros
idempotent (declarative messages stating "this is what the truth is like, synchronize yourself however you can")
lower number of message formats to maintain/handle
allow to progressively correct synchronization errors of consumers
consumer automagically handle new Domain Events as long as the resulting message follows canonical data model
Cons
bigger message payload
less pure
Would you recommend an approach over the other ?
Is there another approach we should consider ?
Is there another approach we should consider ?
You might also consider not leaking information out of the service acting as the technical authority for that part of the business
Which roughly means that your events carry identifiers, so that interested parties can know that an entity of interest has changed, and can query the authority for updates to the state.
for each event published by the Domain Model, generate a message that contains a full snapshot of the Aggregate Root at that point in time
This also has the additional Con that any change to the representation of the aggregate also implies a change to the message schema, which is part of the API. So internal changes to aggregates start rippling out across your service boundaries. If the aggregates you are implementing represent a competitive advantage to your business, you are likely to want to be able to adapt quickly; the ripples add friction that will slow your ability to change.
what if consumer state and producer state get out of sync ?
As best I can tell, this problem indicates a design error. If a consumer needs state, which is to say a view built from the history of an aggregate, then it should be fetching that view from the producer, rather than trying to assemble it from a collection of observed messages.
That is to say, if you need state, you need history (complete, ordered). All a single event really tells you is that the history has changed, and you can evict your previously cached history.
Again, responsiveness to change: if you change the implementation of the producer, and consumers are also trying to cobble together their own copy of the history, then your changes are rippling across the service boundaries.