Message types : how much information should messages contain? - rabbitmq

We are currently starting to broadcast events from one central applications to other possibly interested consumer applications, and we have different options among members of our team about how much we should put in our published messages.
The general idea/architecture is the following :
In the producer application :
the user interacts with some entities (Aggregate Roots in the DDD sense) that can be created/modified/deleted
Based on what is happening, Domain Events are raised (ex : EntityXCreated, EntityYDeleted, EntityZTransferred etc ... i.e. not only CRUD, but mostly )
Raised events are translated/converted into messages that we send to a RabbitMQ Exchange
in RabbitMQ (we are using RabbitMQ but I believe the question is actually technology-independent):
we define a queue for each consuming application
bindings connect the exchange to the consumer queues (possibly with message filtering)
In the consuming application(s)
application consumes and process messages from its queue
Based on Enterprise Integration Patterns we are trying to define the Canonical format for our published messages, and are hesitating between 2 approaches :
Minimalist messages / event-store-ish : for each event published by the Domain Model, generate a message that contains only the parts of the Aggregate Root that are relevant (for instance, when an update is done, only publish information about the updated section of the aggregate root, more or less matching the process the end-user goes through when using our application)
Pros
small message size
very specialized message types
close to the "Domain Events"
Cons
problematic if delivery order is not guaranteed (i.e. what if Update message is received before Create message ? )
consumers need to know which message types to subscribe to (possibly a big list / domain knowledge is needed)
what if consumer state and producer state get out of sync ?
how to handle new consumer that registers in the future, but does not have knowledge of all the past events
Fully-contained idempotent-ish messages : for each event published by the Domain Model, generate a message that contains a full snapshot of the Aggregate Root at that point in time, hence handling in reality only 2 kind of messages "Create or Update" and "Delete" (+metadata with more specific info if necessary)
Pros
idempotent (declarative messages stating "this is what the truth is like, synchronize yourself however you can")
lower number of message formats to maintain/handle
allow to progressively correct synchronization errors of consumers
consumer automagically handle new Domain Events as long as the resulting message follows canonical data model
Cons
bigger message payload
less pure
Would you recommend an approach over the other ?
Is there another approach we should consider ?

Is there another approach we should consider ?
You might also consider not leaking information out of the service acting as the technical authority for that part of the business
Which roughly means that your events carry identifiers, so that interested parties can know that an entity of interest has changed, and can query the authority for updates to the state.
for each event published by the Domain Model, generate a message that contains a full snapshot of the Aggregate Root at that point in time
This also has the additional Con that any change to the representation of the aggregate also implies a change to the message schema, which is part of the API. So internal changes to aggregates start rippling out across your service boundaries. If the aggregates you are implementing represent a competitive advantage to your business, you are likely to want to be able to adapt quickly; the ripples add friction that will slow your ability to change.
what if consumer state and producer state get out of sync ?
As best I can tell, this problem indicates a design error. If a consumer needs state, which is to say a view built from the history of an aggregate, then it should be fetching that view from the producer, rather than trying to assemble it from a collection of observed messages.
That is to say, if you need state, you need history (complete, ordered). All a single event really tells you is that the history has changed, and you can evict your previously cached history.
Again, responsiveness to change: if you change the implementation of the producer, and consumers are also trying to cobble together their own copy of the history, then your changes are rippling across the service boundaries.

Related

RabbitMQ Architecture Gut Check

So I'm thinking of using RabbitMQ to send messages between all the varied apps in our organization. In the attached image is essentially the picture in my mind of how things would work.
So the message goes into the exchange, and splits out into three queues.
Payloads are always JSON text.
The consumers are long-running windows services whose only job is to sit and listen for messages destined for their particular application.When a message comes in, they look at the header to determine how this payload JSON should be interpreted, and which REST endpoint it should be sent to. e.g., "When I see a 'WORK_ORDER_COMPLETE' header I am going to parse this as a WorkOrderCompleteDto and send it as a POST to the CompletedWorkOrder WebAPI method at timelabor-api.mycompany.com. If the API returns other than 200, I reject the message and let rabbit handle it. If I get a 200 back from the API, then I ack the message to rabbit."
Then end applications are simply our internal line-of-business apps that we use for inventory, billing, etc. Those applications are then responsible for performing their respective function (decrementing inventory, creating a billing record, yadda yadda.
Does this in any way make a sensible understanding of a proper way to use Rabbit?
Conceptually, I believe you may be relying on RabbitMQ to do things that your application needs to do.
The assumption of the architecture seems to be that each message is processed by each of your consuming applications totally in a vacuum. What this means is that you don't care that a message processed successfully by Billing_App ultimately failed with Inventory_App. Maybe this is true, but in my experience, it isn't.
If the end goal is to achieve some consistent state in the overall data, you're going to need a some supervisory component orchestrating and monitoring the various operations to ensure that the state is consistent. This means, in effect, that your statement about rejecting a message back to RabbitMQ means you have a bit more thought to put into what happens when something fails.
I would focus on identifying some UML activity diagrams that describe your behavior and how it achieves the end-state, and use that as a guide to determine how the orchestration of your application needs to be designed.

Mass Transit: ensure message processing order when there are different message types

I'm new to Mass Transit and I would like to understand if it can helps with my scenario.
I'm building a sample application implemented with a CQRS event sourcing architecture and I need a service bus in order to dispatch the events created by the command stack to the query stack denormalizers.
Let's suppose of having a single aggregate in our domain, let's call it Photo, and two different domain events: PhotoUploaded and PhotoArchived.
Given this scenario, we have two different message types and the default Mass Transit behaviour is creating two different RabbitMq exchanges: one for the PhotoUploaded message type and the other for the PhotoArchived message type.
Let's suppose of having a single denormalizer called PhotoDenormalizer: this service will be a consumer of both message types, because the photo read model must be updated whenever a photo is uploaded or archived.
Given the default Mass Transit topology, there will be two different exchanges so the message processing order cannot be guaranteed between events of different types: the only guarantee that we have is that all the events of the same type will be processed in order, but we cannot guarantee the processing order between events of different type (notice that, given the events semantic of my example, the processing order matters).
How can I handle such a scenario ? Is Mass Transit suitable with my needs ? Am I completely missing the point with domain events dispatching ?
Disclaimer: this is not an answer to your question, but rather a preventive message why you should not do what you are planning to do.
Whilst message brokers like RMQ and messaging middleware libraries like MassTransit are perfect for integration, I strongly advise against using message brokers for event-sourcing. I can refer to my old answer Event-sourcing: when (and not) should I use Message Queue? that explains the reasons behind it.
One of the reasons you have found yourself - event order will never be guaranteed.
Another obvious reason is that building read models from events that are published via a message broker effectively removes the possibility for replay and to build new read models that would need to start processing events from the beginning of time, but all they get are events that are being published now.
Aggregates form transactional boundaries, so every command needs to guarantee that it completes within one transaction. Whilst MT supports the transaction middleware, it only guarantees that you get a transaction for dependencies that support them, but not for context.Publish(#event) in the consumer body, since RMQ doesn't support transactions. You get a good chance of committing changes and not getting events on the read side. So, the rule of thumb for event stores that you should be able to subscribe to the stream of changes from the store, and not publish events from your code, unless those are integration events and not domain events.
For event-sourcing, it is crucial that each read-model keeps its own checkpoint in the stream of events it is projecting. Message brokers don't give you that kind of power since the "checkpoint" is actually your queue and as soon as the message is gone from the queue - it is gone forever, there's no coming back.
Concerning the actual question:
You can use the message topology configuration to set the same entity name for different messages and then they'll be published to the same exchange, but that falls to the "abuse" category like Chris wrote on that page. I haven't tried that but you definitely can experiment. Message CLR type is part of the metadata, so there shouldn't be deserialization issues.
But again, putting messages in the same exchange won't give you any ordering guarantees, except the fact that all messages will land in one queue for the consuming service.
You will have to at least set the partitioning filter based on your aggregate id, to prevent multiple messages for the same aggregate from being processed in parallel. That, by the way, is also useful for integration. That's how we do it:
void AddHandler<T>(Func<ConsumeContext<T>, string> partition) where T : class
=> ep.Handler<T>(
c => appService.Handle(c, aggregateStore),
hc => hc.UsePartitioner(8, partition));
AddHandler<InternalCommands.V1.Whatever>(c => c.Message.StreamGuid);

To be sure about concurrency, same group of works in multiple queues (FIFO)

I have a question about multi consumer concurrency.
I want to send works to rabbitmq that comes from web request to distributed queues.
I just want to be sure about order of works in multiple queues (FIFO).
Because this request comes from different users eech user requests/works must be ordered.
I have found this feature with different names on Azure ServiceBus and ActiveMQ message grouping.
Is there any way to do this in pretty RabbitMQ ?
I want to quaranty that customer's requests must be ordered each other.
Each customer may have multiple requests but those requests for that customer must be processed in order.
I desire to process quickly incoming requests with using multiple consumer on different nodes.
For example different customers 1 to 1000 send requests over 1 millions.
If I put this huge request in only one queue it takes a lot of time to consume. So I want to share this process load between n (5) node. For customer X 's requests must be in same sequence for processing
When working with event-based systems, and especially when using multiple producers and/or consumers, it is important to come to terms with the fact that there usually is no such thing as a guaranteed order of events. And to get a robust system, it is also wise to design the system so the message handlers are idempotent; they should tolerate to get the same message twice (or more).
There are way to many things that may (and actually should be allowed to) interfere with the order;
The producers may deliver the messages in a slightly different pace
One producer might miss an ack (due to a missed package) and will resend the message
One consumer may get and process a message, but the ack is lost on the way back, so the message is delivered twice (to another consumer).
Some other service that your handlers depend on might be down, so that you have to reject the message.
That being said, there is one pattern that servicebus-systems like NServicebus use to enforce the order messages are consumed. There are some requirements:
You will need a centralized storage (like a sql-server or document store) that allows for conditional updates; for instance you want to be able to store the sequence number of the last processed message (or how far you have come in the process), but only if the already stored sequence/progress is the right/expected one. Storing the user-id and the progress even for millions of customers should be a very easy operation for most databases.
You make sure the queue is configured with a dead-letter-queue/exchange for retries, and then set your original queue as a dead-letter-queue for that one again.
You set a TTL (for instance 30 seconds) on the retry/dead-letter-queue. This way the messages that appear on the dead-letter-queue will automatically be pushed back to your original queue after some timeout.
When processing your messages you check your storage/database if you are in the right state to handle the message (i.e. the needed previous steps are already done).
If you are ok to handle it you do and update the storage (conditionally!).
If not - you nack the message, so that it is thrown on the dead-letter queue. Basically you are saying "nah - I can't handle this message, there are probably some other message in the queue that should be handled first".
This way the happy-path is to process a great number of messages in the right order.
But if something happens and a you get a message out of band, you will throw it on the retry-queue (the dead-letter-queue) and Rabbit will make sure it will get back in the queue to be retried at a later stage. But only after a delay.
The beauty of this is that you are able to handle most of the situations that may interfere with processing the message (out of order messages, dependent services being down, your handler being shut down in the middle of handling the message) in exact the same way; by rejecting the message and letting your infrastructure (Rabbit) take care of it being retried after a while.
(Assuming the OP is asking about things like ActiveMQs "message grouping:)
This isn't currently built in to RabbitMQ AFAIK (it wasn't as of 2013 as per this answer) and I'm not aware of it now (though I haven't kept up lately).
However, RabbitMQ's model of exchanges and queues is very flexible - exchanges and queues can be easily created dynamically (this can be done in other messaging systems but, for example, if you read ActiveMQ documentation or Red Hat AMQ documentation you'll find all of the examples in the user guides are using pre-declared queues in configuration files loaded at system startup - except for RPC-like request/response communication).
Also it is very easy in RabbitMQ for a consumer (i.e., message consuming thread) to consume from multiple queues.
So you could build, on top of RabbitMQ, a system where you got your desired grouping semantics.
One way would be to create dynamic queues: The first time a customer order was seen or a new group of customer orders a queue would be created with a unique name for all messages for that group - that queue name would be communicated (via another queue) to a consumer who's sole purpose was to load-balance among other consumers that were responsible for handling customer order groups. I.e., the load-balancer would pull off of its queue a message saying "new group with queue name XYZ" and it would find in a pool of order group consumer a consumer which could take this load and pass it a message saying "start listening to XYZ".
Another way to do it is with pub/sub and topic routing - each customer order group would get a unique topic - and proceed as above.
RabbitMQ Consistent Hash Exchange Type
We are using RabbitMQ and we have found a plugin. It use Consistent Hashing algorithm to distribute messages in order to consistent keys.
For more information about Consistent Hashing ;
https://en.wikipedia.org/wiki/Consistent_hashing
https://www.youtube.com/watch?v=viaNG1zyx1g
You can find this plugin from rabbitmq web page
plugin : rabbitmq_consistent_hash_exchange
https://www.rabbitmq.com/plugins.html

RabbitMQ Message States

I'm working with RabbitMQ and I'd like to have multiple consumers doing different things for the same message, with this message being exactly in one queue.Each consumer would work on his own, and in the moment the consumer ends with his part, it marks the message as having completed phase "x" , when all the phases are completed for one message, then use the method a basicAck() to remove our message from the queue.
I suspect this to be impossible, if so, I would face this in other way. Having multiple queues with the same message ( using an exchange), each queue with a different consumer , which would communicate with with a Server. This server would then work with a database and checking/updating the completed phases. When all the phases are completed, log it in some way.
But this workaround seems exceedingly unefficient, I'd like to skip it if posssible.
Could it be posssible to set "states" or "phases" to a message in rabbitMQ?
So, first of all, in the context you're talking about, a "message" is an order to do some unit of work.
The first part of your question, by referring to "marking the message" treats the message as a stateful object. This is incorrect. Once a message is produced, it is immutable, meaning no changes are permitted to it. If you violate, or attempt to violate this principle, you have made an excursion beyond the realm of sound design.
So, let's reframe. In a properly-archtiected message-oriented system, a message can represent either a command ("do something") or an event ("something happened"). Note that sometimes we can call a reply message (something sent in response to a command) a third category, but it's really a sub-category of event.
Thus, we are led to the possibility of having (a) one message going to one queue, to be picked up by one consumer, or (b) one message going to many queues, to be picked up by many consumers. You take (a) and (b) to compose complex system behaviors that evolve over time with the execution of each of these small behaviors, and suddenly you have a complex system.
Messages do, in fact, have state. Their state is "processed" or "unprocessed", as appropriate. That is the limit to their statefulness.
Bottom Line
Your situation describes a series of activities (what each consumer does) being acted upon some sort of shared state among the activities. The role of messages and the message broker is to assist in the orchestration of these activities, by providing instruction on what to do (via commands) and what took place (via events). Messages themselves cannot be the shared state. So, you still need some sort of a database or other means to persist the state of your system. There is no way to avoid this.

Windows Service Bus Point-to-Point Communications to Reduce Broadcasts

I am using Windows Service Bus 1.0 to communicate between different processes, each context event stream exists on the bus as a topic.
Using the service bus to link events between bounded contexts I need a method to sync events (or in other words request a replay of past events) when a bounded context comes back online but want to limit the potential flood of messages coming back to only go to the endpoint that requested it, at least if this is something that can be easily done by using existing Service Bus features.
So given an imaginary ContextC sends a message to request all previous events from ContextA and ContextB, is there any way for these replay messages to be sent only to ContextC?
What would be the best way to map a context to be the owner of the topic (or in other words, an individual bus subscriber to a bus topic), to facilitate the unicast replaying above?
In my world, I keep this stuff loosely coupled - each context puts stuff onto a topic and anyone that needs stuff subscribes.
Each SB subscription can use the filtering facilities of Service Bus based on properties (e.g. you could tag events by adding Properties on the Messages and then have a filtering condition on the subscription meaning that only whitelisted classes of events ever apply to each consumer).
That plus the fact that you're already seggregating by topic.
The subscription and the topic then allow you to process the events without losing any or having the publisher go around worrying about or chasing subscribers.
You also mentioned you are tying this to an Event Store in other questions - in that case there is a chance your messages need to be consumed in order. If that is the case, you need to put a session id on your messages.
I could speculate as to why you want this subscriber driven redelivery but won't for now. You need to first explain / verify that concept and requirement (by asking questions which explain your higher level goals) a lot further before anyone answers how that would best be achieved using Service Bus.