We've got a CQRS project and are thinking about a way to implement a "catchup", e.g. a new event handler is started and tells the eventstore to replay all events for him.
We're not sure if we should do the replay over the NServiceBus, as there is a real 1:1 connection and no publish/subscribe situation. Also we think that our new consumer is not able to keep up with the publish-speed and its input queue would get stuck.
What's the best practice here?
I've heard of people doing the following:
Have a system of replaying/rebroadcasting the events. Event handlers that have produced projections that have already seen these events ignore the events.
Allow events to be queried directly by the Event Handler when resetting it or when starting a new projection from scratch. This can be done in some systems by reading directly from the event store and in other actor based system an actor abstraction around the source of events may be queried.
From my understanding, option 2 allows for better performance as events can be queried in batches as opposed to being replayed to all listeners individually. These are just my observations without any practical experience to draw on yet.
Related
I'm new to Mass Transit and I would like to understand if it can helps with my scenario.
I'm building a sample application implemented with a CQRS event sourcing architecture and I need a service bus in order to dispatch the events created by the command stack to the query stack denormalizers.
Let's suppose of having a single aggregate in our domain, let's call it Photo, and two different domain events: PhotoUploaded and PhotoArchived.
Given this scenario, we have two different message types and the default Mass Transit behaviour is creating two different RabbitMq exchanges: one for the PhotoUploaded message type and the other for the PhotoArchived message type.
Let's suppose of having a single denormalizer called PhotoDenormalizer: this service will be a consumer of both message types, because the photo read model must be updated whenever a photo is uploaded or archived.
Given the default Mass Transit topology, there will be two different exchanges so the message processing order cannot be guaranteed between events of different types: the only guarantee that we have is that all the events of the same type will be processed in order, but we cannot guarantee the processing order between events of different type (notice that, given the events semantic of my example, the processing order matters).
How can I handle such a scenario ? Is Mass Transit suitable with my needs ? Am I completely missing the point with domain events dispatching ?
Disclaimer: this is not an answer to your question, but rather a preventive message why you should not do what you are planning to do.
Whilst message brokers like RMQ and messaging middleware libraries like MassTransit are perfect for integration, I strongly advise against using message brokers for event-sourcing. I can refer to my old answer Event-sourcing: when (and not) should I use Message Queue? that explains the reasons behind it.
One of the reasons you have found yourself - event order will never be guaranteed.
Another obvious reason is that building read models from events that are published via a message broker effectively removes the possibility for replay and to build new read models that would need to start processing events from the beginning of time, but all they get are events that are being published now.
Aggregates form transactional boundaries, so every command needs to guarantee that it completes within one transaction. Whilst MT supports the transaction middleware, it only guarantees that you get a transaction for dependencies that support them, but not for context.Publish(#event) in the consumer body, since RMQ doesn't support transactions. You get a good chance of committing changes and not getting events on the read side. So, the rule of thumb for event stores that you should be able to subscribe to the stream of changes from the store, and not publish events from your code, unless those are integration events and not domain events.
For event-sourcing, it is crucial that each read-model keeps its own checkpoint in the stream of events it is projecting. Message brokers don't give you that kind of power since the "checkpoint" is actually your queue and as soon as the message is gone from the queue - it is gone forever, there's no coming back.
Concerning the actual question:
You can use the message topology configuration to set the same entity name for different messages and then they'll be published to the same exchange, but that falls to the "abuse" category like Chris wrote on that page. I haven't tried that but you definitely can experiment. Message CLR type is part of the metadata, so there shouldn't be deserialization issues.
But again, putting messages in the same exchange won't give you any ordering guarantees, except the fact that all messages will land in one queue for the consuming service.
You will have to at least set the partitioning filter based on your aggregate id, to prevent multiple messages for the same aggregate from being processed in parallel. That, by the way, is also useful for integration. That's how we do it:
void AddHandler<T>(Func<ConsumeContext<T>, string> partition) where T : class
=> ep.Handler<T>(
c => appService.Handle(c, aggregateStore),
hc => hc.UsePartitioner(8, partition));
AddHandler<InternalCommands.V1.Whatever>(c => c.Message.StreamGuid);
I'm struggling to understand how to implement Eventual Consistency with the exposed example of BacklogItems and Tasks from Vaughn Vernon. The statement I've understood so far is (considering the case where he splits BacklogItem and Task into separate aggregate roots):
A BacklogItem can contain one or more tasks. When all remaining hours from a the tasks of a BacklogItem are 0, the status of the BacklogItem should change to "DONE"
I'm aware about the rule that says that you should not update two aggregate roots in the same transaction, and that you should accomplish that with eventual consistency.
Once a Domain Service updates the amount of hours of a Task, a TaskRemainingHoursUpdated event should be published to a DomainEventPublisher which lives in the same thread as the executing code. And here it is where I'm at a loss with the following questions:
I suppose that there should be a subscriber (also living in the same thread I guess) that should react to TaskRemainingHoursUpdated events. At which point in your Desktop/Web application you perform this subscription to the Bus? At the very initialization of your app? In the application code? Is there any reasoning to place domain subscriptors in a specific place?
Should that subscriptor (in the same thread) call a BacklogItem repository and perform the update? (But that would be a violation of the rule of not updating two aggregates in the same transaction since this would happen synchronously, right?).
If you want to achieve eventual consistency to fulfil the previously mentioned rule, do I really need a Message Broker like RabbitMQ even though both BacklogItem and Task live inside the same Bounded Context?
If I use this message broker, should I have a background thread or something that just consumes events from a RabbitMQ queue and then dispatches the event to update the product?
I'd appreciate if someone can shed some clear light over this since it is quite complex to picture in its completeness.
So to start with, you need to recognize that, if the BacklogItem is the authority for whether or not it is "Done", then it needs to have all of the information to compute that for itself.
So somewhere within the BacklogItem is data that is tracking which Tasks it knows about, and the known state of those tasks. In other words, the BacklogItem has a stale copy of information about the task.
That's the "eventually consistent" bit; we're trying to arrange the system so that the cached copy of the data in the BacklogItem boundary includes the new changes to the task state.
That in turn means we need to send a command to the BacklogItem advising it of the changes to the task.
From the point of view of the backlog item, we don't really care where the command comes from. We could, for example, make it a manual process "After you complete the task, click this button here to inform the backlog item".
But for the sanity of our users, we're more likely to arrange an event handler to be running: when you see the output from the task, forward it to the corresponding backlog item.
At which point in your Desktop/Web application you perform this subscription to the Bus? At the very initialization of your app?
That seems pretty reasonable.
Should that subscriptor (in the same thread) call a BacklogItem repository and perform the update? (But that would be a violation of the rule of not updating two aggregates in the same transaction since this would happen synchronously, right?).
Same thread and same transaction are not necessarily coincident. It can all be coordinated in the same thread; but it probably makes more sense to let the consequences happen in the background. At their core, events and commands are just messages - write the message, put it into an inbox, and let the next thread worry about processing.
If you want to achieve eventual consistency to fulfil the previously mentioned rule, do I really need a Message Broker like RabbitMQ even though both BacklogItem and Task live inside the same Bounded Context?
No; the mechanics of the plumbing matter not at all.
I read this excellent tutorial (http://blogs.planbsoftware.co.nz/?p=247) about NserviceBus Sagas, but still I don't understand what is the advantage of this model (sagas), over using database or business layer transactions?
The main benefit of the saga model is that it allows you to take logic and data that would otherwise be spread out across a system (and various batch jobs), and pull that all into a single class, better following the single responsibility principle. Once you have that, you get all the other benefits that come from good software practices - better testability, maintainability, etc.
To show you real benefit of Saga model I'l show you two examples.
Imagine you have Services Oriented Architecture with hundreds of distributed hosts. Customer makes an Order that starts one or more sagas. Each saga have some related business logic. Handler for each given saga can be shared between different hosts and you don't need to check order state handling each message, NServiceBus implicitly checks saga state matching it by order id or other attributes and if it is still opened you'll get it in your data context.
You can also use this model as pattern without NServiceBus usage. Imagine you develop a video game and want to track some user combos. Each time player hits jump you open saga and add bonus points handling other rapid input. Once player delays for some time between inputs and saga closes itself saving total score for combo.
What are the benefits of Saga?
1) Your business logic is encapsulated in one place - saga.
2) You can extend it easily adding additional saga or removing them. You can also move them to other handlers or hosts.
3) You don't need to know what data in database are required in case of migration, you just need to migrate sagas which contain all necessary info
I am using Windows Service Bus 1.0 to communicate between different processes, each context event stream exists on the bus as a topic.
Using the service bus to link events between bounded contexts I need a method to sync events (or in other words request a replay of past events) when a bounded context comes back online but want to limit the potential flood of messages coming back to only go to the endpoint that requested it, at least if this is something that can be easily done by using existing Service Bus features.
So given an imaginary ContextC sends a message to request all previous events from ContextA and ContextB, is there any way for these replay messages to be sent only to ContextC?
What would be the best way to map a context to be the owner of the topic (or in other words, an individual bus subscriber to a bus topic), to facilitate the unicast replaying above?
In my world, I keep this stuff loosely coupled - each context puts stuff onto a topic and anyone that needs stuff subscribes.
Each SB subscription can use the filtering facilities of Service Bus based on properties (e.g. you could tag events by adding Properties on the Messages and then have a filtering condition on the subscription meaning that only whitelisted classes of events ever apply to each consumer).
That plus the fact that you're already seggregating by topic.
The subscription and the topic then allow you to process the events without losing any or having the publisher go around worrying about or chasing subscribers.
You also mentioned you are tying this to an Event Store in other questions - in that case there is a chance your messages need to be consumed in order. If that is the case, you need to put a session id on your messages.
I could speculate as to why you want this subscriber driven redelivery but won't for now. You need to first explain / verify that concept and requirement (by asking questions which explain your higher level goals) a lot further before anyone answers how that would best be achieved using Service Bus.
I have a situation where I have a service subscribing to event messages and performing some work when they arrive. There is a certain class of events which can arrive in short bursts of many events which reference the same underlying data. I would like to be able to defer processing of related events for a short period of time, so that I only do the calculation once for each batch of related events, rather than in response to each individual event. Is there some kind of pattern I can follow which will allow me to collect related events for a period of time and then process them all at once? I was thinking a saga + timeout might be able to achieve this, but not sure if this is an appropriate use for that.
Thanks!
Yes, a saga could be the way to go - however consider the performance of the saga persistence (NHibernate over a DB in the current version, RavenDB in the next version) as compared to your fault-tolerance needs (if a machine crashes, would it be acceptable to lose some messages).
No easy answers, I'm afraid.