How to make a Saga handler Reentrant - nservicebus

I have a task that can be started by the user, that could take hours to run, and where there's a reasonable chance that the user will start the task multiple times during a run.
I've broken the processing of the task up into smaller batches, but the way the data looks it's very difficult to tell what's still to be processed. I batch it using messages that each process a bite sized chunk of the data.
I have thought of using a Saga to control access to starting this process, with a Saga property called Processing that I set at the start of the handler and then unset at the end of the handler. The handler does some work and sends the messages to process the data. I check the value at the start of the handler, and if it's set, then just return.
I'm using Azure storage for Saga storage, if it makes a difference for the next bit. I'm also using NSB 6
I have a few questions though:
Is this the correct approach to re-entrancy with NSB?
When is a change to Saga data persisted? (and is it different depending on the transport?)
Following on from the above, if I set a Saga value in a handler, wait a while and then reset it to its original value will it change the persistent storage at all?

Seem to be cross posted in the Particular Software google group:
https://groups.google.com/forum/#!topic/particularsoftware/p-qD5merxZQ
Sagas are very often used for such patterns. The saga instance would track progress and guard that the (sub)tasks aren't invoked multiple times but could also take actions if the expected task(s) didn't complete or is/are over time.
The saga instance data is stored after processing the message and not when updating any of the saga data properties. The logic you described would not work.
The correct way would be having a saga that orchestrates your process and having regular handlers that do the actual work.
In the saga handle method that creates the saga check if the saga was already created or already the 'busy' status and if it does not have this status send a message to do some work. This will guard that the task is only initiated once and after that the saga is stored.
The handler can now do the actual task, when it completes it can do a 'Reply' back to the saga
When the saga receives the reply it can now start any other follow up task or raise an event and it can also 'complete'.
Optimistic concurrency control and batched sends
If two message are received that create/update the same saga instance only the first writer wins. The other will fail because of optimistic concurrency control.
However, if these messages are not processed in parallel but sequential both fail unless the saga checks if the saga instance is already initialized.
The following sample demonstrates this: https://github.com/ramonsmits/docs.particular.net/tree/azure-storage-saga-optimistic-concurrency-control/samples/azure/storage-persistence/ASP_1
The client sends two identical message bodies. The saga is launched and only 1 message succeeds due to optimistic concurrency control.
Due to retries eventually the second copy will be processed to but the saga checks the saga data for a field that it knows would normally be initialized by by a message that 'starts' the saga. If that field is already initialized it assumes the message is already processed and just returns:
It also demonstrates batches sends. Messages are not immediately send until the all handlers/sagas are completed.
Saga design
The following video might help you with designing your sagas and understand the various patterns:
Integration Patterns with NServiceBus: https://www.youtube.com/watch?v=BK8JPp8prXc
Keep in mind that Azure Storage isn't transactional and does not provide locking, it is only atomic. Any work you do within a handler or saga can potentially be invoked more than once and if you use non-transactional resources then make sure that logic is idempotent.

So after a lot of testing
I don't believe that this is the right approach.
As Archer says, you can manipulate the saga data properties as much as you like, they are only saved at the end of the handler.
So if the saga receives two simultaneous messages the check for Processing will pass both times and I'll have two processes running (and in my case processing the same data twice).
The saga within a saga faces a similar problem too.
What I believe will work (and has done during my PoC testing) is using a database unique index to help out. I'm using entity framework and azure sql, so database access is not contained within the handler's transaction (this is the important difference between the database and the saga data). The database will also operate across all instances of the endpoint and generally seems like a good solution.
The table that I'm using has each of the columns that make up the saga 'id', and there is a unique index on them.
At the beginning of the handler I retrieve a row from the database. If there is a row, the handler returns (in my case this is okay, in others you could throw an exception to get the handler to run again). The first thing that the handler does (before any work, although I'm not 100% sure that it matters) is to write a row to the table. If the write fails (probably because of the unique constraint being violated) the exception puts the message back on the queue. It doesn't really matter why the database write fails, as NSB will handle it.
Then the handler does the work.
Then remove the row.
Of course there is a chance that something happens during processing of the work, so I'm also using a timestamp and another process to reset it if it's busy for too long. (still need to define 'too long' though :) )
Maybe this can help someone with a similar problem.

Related

How to improve the performance of my NServiceBus Saga under load

I have a very simple Saga built with NSB7 using SQL Transport and NHibernate persistence.
The Saga listens on a queue and for each message received runs through 4 handlers. These are called in a sequential order, with 2 handlers run in parallel and the last handler only runs once both the parallel handlers are complete. The last handler writes a record to DB
Let's say for a single message, each handler takes 1 second. When a new message is received, which starts the Saga, the expected result is that 3-4 seconds later the record is written to the DB.
If the queue backs up with say 1000 messages, once they begin processing again, it takes almost 2000 seconds before a new record is created in the last handler. Basically, instead of running through the expected 4 second processing time for each message, they effectively bunch up in the initial handlers until the queue is emptied and then does that again for the next handler and on and on.
Any ideas on how I could improve the performance of this system when under load so that a constant stream of processed messages come out the end rather than the bunching of messages and long delay before a single new record comes out the other side?
Thanks
Will
There is documentation for saga concurrency issues: https://docs.particular.net/nservicebus/sagas/concurrency#high-load-scenarios
I still don't fully understand the issue though. Every message that instantiates a saga, should create a record in the database after the message was processed. Not after 1000 messages. How else is NServiceBus going to guarantee consistency?
Next to that, you probably should not have the single message be processed by 4 handlers. If it really needs to work like this, use publish/subscribe and create different endpoints. The saga should be done with processing as soon as possible, especially under high load scenarios.

Understanding Eventual Consistency, BacklogItem and Tasks example from Vaughn Vernon

I'm struggling to understand how to implement Eventual Consistency with the exposed example of BacklogItems and Tasks from Vaughn Vernon. The statement I've understood so far is (considering the case where he splits BacklogItem and Task into separate aggregate roots):
A BacklogItem can contain one or more tasks. When all remaining hours from a the tasks of a BacklogItem are 0, the status of the BacklogItem should change to "DONE"
I'm aware about the rule that says that you should not update two aggregate roots in the same transaction, and that you should accomplish that with eventual consistency.
Once a Domain Service updates the amount of hours of a Task, a TaskRemainingHoursUpdated event should be published to a DomainEventPublisher which lives in the same thread as the executing code. And here it is where I'm at a loss with the following questions:
I suppose that there should be a subscriber (also living in the same thread I guess) that should react to TaskRemainingHoursUpdated events. At which point in your Desktop/Web application you perform this subscription to the Bus? At the very initialization of your app? In the application code? Is there any reasoning to place domain subscriptors in a specific place?
Should that subscriptor (in the same thread) call a BacklogItem repository and perform the update? (But that would be a violation of the rule of not updating two aggregates in the same transaction since this would happen synchronously, right?).
If you want to achieve eventual consistency to fulfil the previously mentioned rule, do I really need a Message Broker like RabbitMQ even though both BacklogItem and Task live inside the same Bounded Context?
If I use this message broker, should I have a background thread or something that just consumes events from a RabbitMQ queue and then dispatches the event to update the product?
I'd appreciate if someone can shed some clear light over this since it is quite complex to picture in its completeness.
So to start with, you need to recognize that, if the BacklogItem is the authority for whether or not it is "Done", then it needs to have all of the information to compute that for itself.
So somewhere within the BacklogItem is data that is tracking which Tasks it knows about, and the known state of those tasks. In other words, the BacklogItem has a stale copy of information about the task.
That's the "eventually consistent" bit; we're trying to arrange the system so that the cached copy of the data in the BacklogItem boundary includes the new changes to the task state.
That in turn means we need to send a command to the BacklogItem advising it of the changes to the task.
From the point of view of the backlog item, we don't really care where the command comes from. We could, for example, make it a manual process "After you complete the task, click this button here to inform the backlog item".
But for the sanity of our users, we're more likely to arrange an event handler to be running: when you see the output from the task, forward it to the corresponding backlog item.
At which point in your Desktop/Web application you perform this subscription to the Bus? At the very initialization of your app?
That seems pretty reasonable.
Should that subscriptor (in the same thread) call a BacklogItem repository and perform the update? (But that would be a violation of the rule of not updating two aggregates in the same transaction since this would happen synchronously, right?).
Same thread and same transaction are not necessarily coincident. It can all be coordinated in the same thread; but it probably makes more sense to let the consequences happen in the background. At their core, events and commands are just messages - write the message, put it into an inbox, and let the next thread worry about processing.
If you want to achieve eventual consistency to fulfil the previously mentioned rule, do I really need a Message Broker like RabbitMQ even though both BacklogItem and Task live inside the same Bounded Context?
No; the mechanics of the plumbing matter not at all.

NServicebus handler with custom sqlconnection

I have an NServiceBus handler that creates a new sql connection and new sql command.
However, the command that is executed is not being committed to the database until after the whole process is finished.
It's like there is a hidden sql transaction in the handler itself.
I moved my code into a custom console application without nservicebus and the sql command executed and saved immediately. Unlike in nservicebus where it doesn't save until the end of the handler.
Indeed every handler is wrapped in a transaction, the default transaction guarantee is relying on DTC. That is intentional :)
If you disable it then you might get duplicate messages or lose some data, so that must be done carefully. You can disable transactions using endpoint configuration API instead of using options in connection string.
Here you can find more information about configuration and available guarantees http://docs.particular.net/nservicebus/transports/transactions.
Unit of work
Messages should be processed as a single unit of work. Either everything succeeds or fails.
If you want to have multiple units of work executed then
create multiple endpoints
or send multiple messages
This also has the benefit that these can potentially be processed in parallel.
Please note, that creating multiple handlers WILL NOT have this effect. All handlers on the same endpoint will be part of the same unit of work thus transaction.
Immediate dispatch
If you really want to send a specific message when the sending of the message must not be part of the unit of work then you can immediately send it like this:
using (new TransactionScope(TransactionScopeOption.Suppress))
{
var myMessage = new MyMessage();
bus.Send(myMessage);
}
This is valid for V5, for other versions its best to look at the documentation:
http://docs.particular.net/nservicebus/messaging/send-a-message#dispatching-a-message-immediately
Enlist=false
This is a workaround that MUST NOT be used to circumvent a specific transactional configuration as is explained very well by Tomasz.
This can result in data corruption because the same messsage can be processed multiple times in case of error recovery while then the same database action will be performed again.
Found the solution.
In my connection string I had to add Enlist=False
As mentioned by #wlabaj Setting Enlist=False will indeed make sure that a transaction opened in the handler will be different from transaction used by the transport to receive/send messages.
It is however important to note that it changes the message processing semantics. By default, when DTC is used, receive/send and any transactional operations inside a handler will be commited/rolled-back atomically. With Enlist=False it's not the case so it's possible that there will be more than one handler transaction being committed for the same message. Consider following scenario as a sample case when it can happen:
message is received (transport transaction gets started)
message is successfully processed inside the handler (handler transaction committed successfully)
transport transaction fails and message is moved back to the input queue
message is received second time
message is successfully processed inside the handler
...
The behavior with Enlist-False setting is something that might a be desirable behavior in your case. That being said I think it's worth clarifying what are the consequences in terms of message processing semantics.

MassTransit Saga saves late with NHibernateSagaRepository (Starbucks example)

I have converted the Starbucks example to use RabbitMQ and NHibernate.. However, there is a bug/challenge/issue with the DrinkPreparationSaga and when it actually gets saved to the database vs. when the PaymentCompleteMessage gets submitted.
How the code works (out of the box, this isn't anything I changed)... The new instance of the saga isn't saved to the database until AFTER the Initial state completes and it transitions to its next state.
The problem is that in the sample Starbucks application the DrinkPreparationSaga starts off with a very slow method that prints out coffee making sounds once every 1 a second 10 times..
So there is 10 seconds between when the Saga is actually created and when its saved to the database.. The bigger problem with that is, that any other messages that are destined to that instances of the saga (by CorrolationId) are thrown in the error queue because the Saga doesn't exist.
Shouldn't the NHibernateSagaRepository immediately save the new Saga Instance, then run the workflow, then update the saga post workflow? I can't seem to think of another way to make the example work, but that would require a decent bit of reorganizing in the NHibernateSagaRepository.
Thanks in advance.
The reason sagas are not saved before processing the message is that some members of the saga may not be nullable (or allow nulls) and they are not set before the initial saga message is processed.
And you're correct. Take a look at the Riktig sample (http://github.com/phatboyg/Riktig) to see how the Automatonymous sagas are used and how the correlation of another service (in this case, the image retrieval service) is used. Sagas should not actually perform work, but coordinate the state of a transaction. The earlier Starbucks example was a naive implementation that we built one morning in Austin. It is long overdue for an update (for more reasons, including that it still uses Magnum state machines, which are soon deprecated in favor of Automatonymous).

Are NServiceBus saga to saga communications an anti-pattern?

We have two long running sagas that are both run for an infinite amount of time and respond to a timeout. The first subscribes to a timeout every 15 minutes, and the second every 24 hours. Each saga keeps track of its own execution time and notifies the other saga when it starts running and when it has completed. The bulk data loads these sagas are responsible for cannot be run at the same time due to database contention.
When the first saga (Saga A - 15 min) kicks off, it first checks (using an internal variable) to see if the second saga (Saga B - 24 hr) is currently running. If not, it begins its processing steps (shelling off to another process, and polling it over time to see when it's completed). These two sagas communicate through sending messages to notify each other when they're starting up or completed.
For some reason this seems smelly to me on two levels:
We've essentially got a singleton saga that never completes. Is this an anti-pattern in its own right?
We're sending messages bidirectionally with the sole intention of modifying state. It seems as though there should be a better way to handle this type of scenario. With the release of NSB 4.0, we started getting errors when sending commands. The errors cleared up when we used a pub-sub approach instead.
Is this considered an NServiceBus implementation anti-pattern, and is there a better pattern for this sort of requirement?
Generally speaking, I don't think sagas communicating with each other is an antipattern. In your specific case, however, it does sound smelly.
From what you've said about the behaviors, it appears as if this could be a single saga. A saga can request multiple timeouts of different types. So you could effectively merge these sagas, but then you'd be able to get rid of all the messages that exist just for the purpose of modifying state in the sibling, because the state would be shared.
In a general sense, however, it's perfectly fine for sagas to communicate. Doing so through commands should be treated carefully, as that creates direct coupling between the two, although this is still possible. An example would be a parent and child saga pair, where the parent workflow commands a child workflow to begin, but the child workflow is independent until it replies to its parent that it's done. We just realize that these are tightly coupled processes within the same service boundary. We might do this just to keep each saga more focused, or because the parent starts multiple child sagas with different data.
An even better example is saga communication through events. One saga will publish an event, and another saga will respond with its own long-running process. This is all decoupled and good. However if the second saga publishes an event that the first one responds to, then even though you're using events you've created a loop, so it's really not that dissimilar from commands at that point although it is still decoupled from any other external subscribers.