Using NServiceBus and EventStore from High Level - nservicebus

So I've been reading about EventStore and NServiceBus and I like the idea of having a transactional log of my data that can help me build views based on that data.
What I don't understand right now is how to distinguish between an event that will write to your read storage and the same event which might trigger an email to get sent.
ex. Creating a customer
CreateUserCommand -> CreateUserCommandHandler -> CreatedUserEvent
Should I be using the CreatedUserEvent to trigger both my write to my data storage and sending an email to a user?

In the last few years, Eric Evans has recognized an update to his DDD pattern: Domain Events (aka External Events concept).
Internal events in Event Sourcing patterns is what we've been focusing on, such as UserCreatedEvent in your example. Keep these explicit with an IEvent marker interface.
While IEvents are stull published on the bus, IDomainEvents are more notability for larger external-to-the-domain notifications that don't effect a state of an aggregate per say.
So...
CreateUser (ICommand)
^- CreateUserCommandHandler
UserCreated (IEvent)
^- UserCreatedEventHandler
SendNewUserEmail (ICommand)
^- SendNewUserEmailCommandHandler
NewUserEmailSent (IDomainEvent)
^- UserRegistrationService or some other AC
I am still pretty new to event sourcing myself; but, I would guess that you can have the UserRegistrationService register on the bus to listen for the SendNewUserEmail ICommand.
Either way you go, I would concentrate on creating additional commands/events for sending an email and the email was sent. Then, later on you can view the transaction log as to when it was queued to send, how long it took to send, was there any retries in sending, how many was sent at the same time and did it effect time delays (datetime diffs) to show any bottlenecks?, install a queue for sending emails and break it out into a smaller independent service, etc etc.

Related

Mass Transit: ensure message processing order when there are different message types

I'm new to Mass Transit and I would like to understand if it can helps with my scenario.
I'm building a sample application implemented with a CQRS event sourcing architecture and I need a service bus in order to dispatch the events created by the command stack to the query stack denormalizers.
Let's suppose of having a single aggregate in our domain, let's call it Photo, and two different domain events: PhotoUploaded and PhotoArchived.
Given this scenario, we have two different message types and the default Mass Transit behaviour is creating two different RabbitMq exchanges: one for the PhotoUploaded message type and the other for the PhotoArchived message type.
Let's suppose of having a single denormalizer called PhotoDenormalizer: this service will be a consumer of both message types, because the photo read model must be updated whenever a photo is uploaded or archived.
Given the default Mass Transit topology, there will be two different exchanges so the message processing order cannot be guaranteed between events of different types: the only guarantee that we have is that all the events of the same type will be processed in order, but we cannot guarantee the processing order between events of different type (notice that, given the events semantic of my example, the processing order matters).
How can I handle such a scenario ? Is Mass Transit suitable with my needs ? Am I completely missing the point with domain events dispatching ?
Disclaimer: this is not an answer to your question, but rather a preventive message why you should not do what you are planning to do.
Whilst message brokers like RMQ and messaging middleware libraries like MassTransit are perfect for integration, I strongly advise against using message brokers for event-sourcing. I can refer to my old answer Event-sourcing: when (and not) should I use Message Queue? that explains the reasons behind it.
One of the reasons you have found yourself - event order will never be guaranteed.
Another obvious reason is that building read models from events that are published via a message broker effectively removes the possibility for replay and to build new read models that would need to start processing events from the beginning of time, but all they get are events that are being published now.
Aggregates form transactional boundaries, so every command needs to guarantee that it completes within one transaction. Whilst MT supports the transaction middleware, it only guarantees that you get a transaction for dependencies that support them, but not for context.Publish(#event) in the consumer body, since RMQ doesn't support transactions. You get a good chance of committing changes and not getting events on the read side. So, the rule of thumb for event stores that you should be able to subscribe to the stream of changes from the store, and not publish events from your code, unless those are integration events and not domain events.
For event-sourcing, it is crucial that each read-model keeps its own checkpoint in the stream of events it is projecting. Message brokers don't give you that kind of power since the "checkpoint" is actually your queue and as soon as the message is gone from the queue - it is gone forever, there's no coming back.
Concerning the actual question:
You can use the message topology configuration to set the same entity name for different messages and then they'll be published to the same exchange, but that falls to the "abuse" category like Chris wrote on that page. I haven't tried that but you definitely can experiment. Message CLR type is part of the metadata, so there shouldn't be deserialization issues.
But again, putting messages in the same exchange won't give you any ordering guarantees, except the fact that all messages will land in one queue for the consuming service.
You will have to at least set the partitioning filter based on your aggregate id, to prevent multiple messages for the same aggregate from being processed in parallel. That, by the way, is also useful for integration. That's how we do it:
void AddHandler<T>(Func<ConsumeContext<T>, string> partition) where T : class
=> ep.Handler<T>(
c => appService.Handle(c, aggregateStore),
hc => hc.UsePartitioner(8, partition));
AddHandler<InternalCommands.V1.Whatever>(c => c.Message.StreamGuid);

Event sourcing combinded with deletable data

I want to develop an Emailer microservice which provides a SendEmail command. Inside the microservice I have an aggregate in mind which represents the whole email process with the following events:
Aggregate Email:
(EmailCreated)
EmailDeliveryStarted
EmailDeliveryFailed
EmailRecipientDelivered when one of the recipients received the email
EmailRecipientDeliveryFailed when one of the recipients could not receive the email
etc.
In the background the email delivery service SendGrid is used; my microservice works like a facade for that with my own events. The incoming webhooks from SendGrid are translated to proper domain events.
The process would look like this:
Command SendEmail ==> EmailCreated
EmailCreatedHandler ==> Email.Send (to SendGrid)
Incoming webhook ==> EmailDeliveryStarted
Further webhooks ==> EmailRecipientDelivered, EmailRecipientDeliveryFailed, etc.
Of course if I'd want to replace the external webservice and it would apply other messaging strategies I would adapt to that but keep my domain model with it's events. I want to let the client not worry about the concrete email delivery strategy.
Now the crucial problem I face: I want to accept the SendEmail commands even if SendGrid is not available at that very moment, which entails storing the whole email data (with attachments) and then, with an event handler, start the sending process. On the other hand I don't want to bloat my initial EmailCreated event with this BLOB data. And I want to be able to clean up this data after SendGrid has accepted my send email request.
I could also try both sending the email to SendGrid and storing an initial EmailDeliveryStarted event in the SendEmail command. But this feels like a two-phase commit: if SendGrid accepted my call but somehow my repository was unable to store the EmailDeliveryStarted event the client would be informed that something went wrong and it tries again which would be a disaster.
So I don't know how to design my aggregate and, more important, my EmailCreated event since it should not contain the BLOB data like attachments.
I found this question interesting and it took a little bit to reflect on that.
First things first - I do not see an obligation to store the email attachments in the event. You can simply store the fully qualified name of the files attached. That would keep the event log smaller and perhaps rule out the need for "deleting" the event (and you know that, in an event source model, you should not do that).
Secondly, assuming that the project is not building an e-mail client, I don't see a need to model an e-mail as an aggregate root. I see AggregateRoots represent business-relevant domains, not for a utility task like sending an e-mail. You could model this far more easily using a database table / document that keeps track of what has been sent and what not yet. I see sending e-mails through SendGrid as a reaction to a business event, certainly to be tracked, but not an AggregateRoot in its own right.
Lastly, if you want to accept SendEmail commands also when SendGrid is offline, the aggregate emits an EmailQueued event. The EmailQueuedHandler will produce a line on the read model of the process in charge taking all the Emails in queued state and batch them for sending. If the communication with SendGrid fails, you can either:
Do nothing, the sender process will pick the email at the next attempt
Emit a EmailSendFailed, intercepted by a Handler that will increase the retry count (if you want to stop after a number of retries).
Hope that is sufficiently clear and best of luck with your project.

Message types : how much information should messages contain?

We are currently starting to broadcast events from one central applications to other possibly interested consumer applications, and we have different options among members of our team about how much we should put in our published messages.
The general idea/architecture is the following :
In the producer application :
the user interacts with some entities (Aggregate Roots in the DDD sense) that can be created/modified/deleted
Based on what is happening, Domain Events are raised (ex : EntityXCreated, EntityYDeleted, EntityZTransferred etc ... i.e. not only CRUD, but mostly )
Raised events are translated/converted into messages that we send to a RabbitMQ Exchange
in RabbitMQ (we are using RabbitMQ but I believe the question is actually technology-independent):
we define a queue for each consuming application
bindings connect the exchange to the consumer queues (possibly with message filtering)
In the consuming application(s)
application consumes and process messages from its queue
Based on Enterprise Integration Patterns we are trying to define the Canonical format for our published messages, and are hesitating between 2 approaches :
Minimalist messages / event-store-ish : for each event published by the Domain Model, generate a message that contains only the parts of the Aggregate Root that are relevant (for instance, when an update is done, only publish information about the updated section of the aggregate root, more or less matching the process the end-user goes through when using our application)
Pros
small message size
very specialized message types
close to the "Domain Events"
Cons
problematic if delivery order is not guaranteed (i.e. what if Update message is received before Create message ? )
consumers need to know which message types to subscribe to (possibly a big list / domain knowledge is needed)
what if consumer state and producer state get out of sync ?
how to handle new consumer that registers in the future, but does not have knowledge of all the past events
Fully-contained idempotent-ish messages : for each event published by the Domain Model, generate a message that contains a full snapshot of the Aggregate Root at that point in time, hence handling in reality only 2 kind of messages "Create or Update" and "Delete" (+metadata with more specific info if necessary)
Pros
idempotent (declarative messages stating "this is what the truth is like, synchronize yourself however you can")
lower number of message formats to maintain/handle
allow to progressively correct synchronization errors of consumers
consumer automagically handle new Domain Events as long as the resulting message follows canonical data model
Cons
bigger message payload
less pure
Would you recommend an approach over the other ?
Is there another approach we should consider ?
Is there another approach we should consider ?
You might also consider not leaking information out of the service acting as the technical authority for that part of the business
Which roughly means that your events carry identifiers, so that interested parties can know that an entity of interest has changed, and can query the authority for updates to the state.
for each event published by the Domain Model, generate a message that contains a full snapshot of the Aggregate Root at that point in time
This also has the additional Con that any change to the representation of the aggregate also implies a change to the message schema, which is part of the API. So internal changes to aggregates start rippling out across your service boundaries. If the aggregates you are implementing represent a competitive advantage to your business, you are likely to want to be able to adapt quickly; the ripples add friction that will slow your ability to change.
what if consumer state and producer state get out of sync ?
As best I can tell, this problem indicates a design error. If a consumer needs state, which is to say a view built from the history of an aggregate, then it should be fetching that view from the producer, rather than trying to assemble it from a collection of observed messages.
That is to say, if you need state, you need history (complete, ordered). All a single event really tells you is that the history has changed, and you can evict your previously cached history.
Again, responsiveness to change: if you change the implementation of the producer, and consumers are also trying to cobble together their own copy of the history, then your changes are rippling across the service boundaries.

How to handle a single publisher clogging up my RabbitMQ's queue

In my last project, I am using MassTransit (2.10.1) with RabbitMQ.
On some scenarios, a producer is allowed to send a bulk of messages to the queue.
For example - the user set to a bulk notification to his list of contacts - the list could be as large as 100000 contacts on some cases. This will send a message per each contact to the queue (I need to keep track of each message). Now since - as I understand - messages are being processed in the order of entrance, that user is clogging up the queue for a large amount of time while another user, which may have done a simple thing such as send a test message to himself, waits for the processing to end.
I have considered separating queues for regular VS bulk operations but this still doesn't solve the problem for small bulks (user with dozens of contacts waiting for users with hundred thousands) and causes extra maintenance.
The ideal solution for me - I think - would involve manipulating the routing in such a way that the consumer will be handling x messages from the same user, move the X messages from the next user, than again, and than moving back to the beginning of the queue, until all messages are processed.
Is that possible? Is there a better solution?
Thanks in advance.
You will to have to write code to manage this yourself. RabbitMQ doesn't really have any built-in mechanism to handle a scenario like this, without your code getting involved.
If you want to process a few at a time from bulk, then back to normal, then back to bulk, you'll need 2 queues and code to manage which one is being pulled from, when.
Just my opinion, seeing as how there is no built in way to my knowledge...Have you considered using whatever storage you are using to store the notifications, then just publish one message, with a List of Notifications, store it in you DB, and then have a retrieve notifications for user consumer. the response would be one message, it may have a massive payload, but even if that gets bogged down, add a skip and take property to the message and force it to be between 0 and 50 (or whatever). In what scenario would you want to show a user 100,000 notifications at once?

Redis as a message broker

Question
I want to pass data between applications, in a publish-subscribe manner. Data may be produced at a much higher rate than consumed and messages get lost, which is not a problem. Imagine a fast sensor and a slow sensor data processor. For that, I use redis pub/sub and wrote a class which acts as a subscriber, receives every message and puts that into a buffer. The buffer is overwritten when a new message comes in or nullified when the message is requested by the "real" function. So when I ask this class, I immediately get a response (hint that my function is slower than data comes in) or I have to wait (hint that my function is faster than the data).
This works pretty good for the case that data comes in fast. But for data which comes in relatively seldom, let's say every five seconds, this does not work: imagine my consumer gets launched slightly after the producer, the first message is lost and my consumer needs to wait nearly five seconds, until it can start working.
I think I have to solve this with Redis tools. Instead of a pub/sub, I could simply use the get/set methods, thus putting the cache functionality into Redis directly. But then, my consumer would have to poll the database instead of the event magic I have at the moment. Keys could look like "key:timestamp", and my consumer now has to get key:* and compare the timestamps permamently, which I think would cause a lot of load. There is no natural possibility to sleep, since although I don't care about dropped messages (there is nothing I can do about), I do care about delay.
Does someone use Redis for a similar thing and could give me a hint about clever use of Redis tools and data structures?
edit
Ideally, my program flow would look like this:
start the program
retrieve key from Redis
tell Redis, "hey, notify me on changes of key".
launch something asynchronously, with a callback for new messages.
By writing this, an idea came up: The publisher not only publishes message on topic key, but also set key message. This way, an application could initially get and then subscribe.
Good idea or not really?
What I did after I got the answer below (the accepted one)
Keyspace notifications are really what I need here. Redis acts as the primary source for information, my client subscribes to keyspace notifications, which notify the subscribers about events affecting specific keys. Now, in the asynchronous part of my client, I subscribe to notifications about my key of interest. Those notifications set a key_has_updates flag. When I need the value, I get it from Redis and unset the flag. With an unset flag, I know that there is no new value for that key on the server. Without keyspace notifications, this would have been the part where I needed to poll the server. The advantage is that I can use all sorts of data structures, not only the pub/sub mechanism, and a slow joiner which misses the first event is always able to get the initial value, which with pub/sib would have been lost.
When I need the value, I obtain the value from Redis and set the flag to false.
One idea is to push the data to a list (LPUSH) and trim it (LTRIM), so it doesn't grow forever if there are no consumers. On the other end, the consumer would grab items from that list and process them. You can also use keyspace notifications, and be alerted each time an item is added to that queue.
I pass data between application using two native redis command:
rpush and blpop .
"blpop blocks the connection when there are no elements to pop from any of the given lists".
Data are passed in json format, between application using list as queue.
Application that want send data (act as publisher) make a rpush on a list
Application that want receive data (act as subscriber) make a blpop on the same list
The code shuold be (in perl language)
Sender (we assume an hash pass)
#Encode hash in json format
my $json_text = encode_json \%$hash_ref;
#Connect to redis and send to list
my $r = Redis->new(server => "127.0.0.1:6379");
$r->rpush("shared_queue","$json_text");
$r->quit;
Receiver (into a infinite loop)
while (1) {
my $r = Redis->new(server => "127.0.0.1:6379");
my #elem =$r->blpop("shared_queue",0);
#Decode hash element
my $hash_ref=decode_json($elem\[1]);
#make some stuff
}
I find this way very usefull for many reasons:
The element are stored into list, so temporary disabling of receiver has no information loss. When recevier restart, can process all items into the list.
High rate of sender can be handled with multiple instance of receiver.
Multiple sender can send data on unique list. In ths case should be easily implmented a data collector
Receiver process that act as daemon can be monitored with specific tools (e.g. pm2)
From Redis 5, there is new data type called "Streams" which is append-only datastructure. The Redis streams can be used as reliable message queue with both point to point and multicast communication using consumer group concept Redis_Streams_MQ