NServicebus saga performance - nservicebus

I have a trouble with NSB saga performance. We have one single saga that orchestrate long running session. Saga sends a lot of messages to different processors and than gets its replies.
I see that sagas queue contains tons of incoming messages. Each messages processing is very fast, but there is a delay between handling next message. Here is a part of log file:
16:26:42 [14][DEBUG] Finished handling message.
16:26:46 [15][DEBUG] ChildContainerBehavior
16:26:46 [15][DEBUG] MessageHandlingLoggingBehavior
16:26:46 [15][DEBUG] Received message with ID 28b285ce-3b77-4a69-a13a-a3bf009717fd from sender xxxHost#PROCESSOR01
We see a 4 seconds delay. That is very slow. Please help, what is wrong with my saga?
Thanks!

Since you have a monolithic saga, you will have some contention on the state record that backs the saga in storage. You will want to consider breaking up your endpoint or redesigning how you gather the information. Check out this Routing Slip implementation.

Related

How to improve the performance of my NServiceBus Saga under load

I have a very simple Saga built with NSB7 using SQL Transport and NHibernate persistence.
The Saga listens on a queue and for each message received runs through 4 handlers. These are called in a sequential order, with 2 handlers run in parallel and the last handler only runs once both the parallel handlers are complete. The last handler writes a record to DB
Let's say for a single message, each handler takes 1 second. When a new message is received, which starts the Saga, the expected result is that 3-4 seconds later the record is written to the DB.
If the queue backs up with say 1000 messages, once they begin processing again, it takes almost 2000 seconds before a new record is created in the last handler. Basically, instead of running through the expected 4 second processing time for each message, they effectively bunch up in the initial handlers until the queue is emptied and then does that again for the next handler and on and on.
Any ideas on how I could improve the performance of this system when under load so that a constant stream of processed messages come out the end rather than the bunching of messages and long delay before a single new record comes out the other side?
Thanks
Will
There is documentation for saga concurrency issues: https://docs.particular.net/nservicebus/sagas/concurrency#high-load-scenarios
I still don't fully understand the issue though. Every message that instantiates a saga, should create a record in the database after the message was processed. Not after 1000 messages. How else is NServiceBus going to guarantee consistency?
Next to that, you probably should not have the single message be processed by 4 handlers. If it really needs to work like this, use publish/subscribe and create different endpoints. The saga should be done with processing as soon as possible, especially under high load scenarios.

Nservicebus Sequence

We have a requirement for all our messages to be processed in the order of arrival to MSMQ.
We will be exposing a WCF service to the clients, and this WCF service will post the messages using NServiceBus (Sendonly Bus) to MSMQ.
We are going to develop a windows service(MessageHandler), which will use Nservicebus to read the message from MSMQ and save it to the database. Our database will not be available for few hours everyday.
During the db downtime we expect that the process to retry the first message in MSMQ and halt processing other messages until the database is up. Once the database is up we want NServicebus to process in the order the message is sent.
Will setting up MaximumConcurrencyLevel="1" MaximumMessageThroughputPerSecond="1" helps in this scenario?
What is the best way using NServiceBus to handle this scenario?
We have a requirement for all our messages to be processed in the
order of arrival to MSMQ.
See the answer to this question How to handle message order in nservicebus?, and also this post here.
I am in agreement that while in-order delivery is possible, it is much better to design your system such that order does not matter. The linked article outlines the following soltuion:
Add a sequence number to all messages
in the receiver check the sequence number is the last seen number + 1 if not throw an out of sequence exception
Enable second level retries (so if they are out of order they will try again later hopefully after the correct message was received)
However, in the interest of anwering your specific question:
Will setting up MaximumConcurrencyLevel="1"
MaximumMessageThroughputPerSecond="1" helps in this scenario?
Not really.
Whenever you have a requirement for ordered delivery, the fundamental laws of logic dictate that somewhere along your message processing pipeline you must have a single-threaded process in order to guarantee in-order delivery.
Where this happens is up to you (check out the resequencer pattern), but you could certainly throttle the NserviceBus handler to a single thread (I don't think you need to set the MaximumMessageThroughputPerSecond to make it single threaded though).
However, even if you did this, and even if you used transactional queues, you could still not guarantee that each message would be dequeued and processed to the database in order, because if there are any permanent failures on any of the messages they will be removed from the queue and the next message processed.
During the db downtime we expect that the process to retry the first
message in MSMQ and halt processing other messages until the database
is up. Once the database is up we want NServicebus to process in the
order the message is sent.
This is not recommended. The second level retry functionality in NServiceBus is designed to handle unexpected and short-term outages, not planned and long-term outages.
For starters, when your NServiceBus message handler endpoint tries to process a message in it's input queue and finds the database unavailable, it will implement it's 2nd level retry policy, which by default will attempt the dequeue 5 times with increasing infrequency, and then fail permanently, sticking the failed message in it's error queue. It will then move onto the next message in the input queue.
While this doesn't violate your in-order delivery requirement on its own, it will make life very difficult for two reasons:
The permanently failed messages will need to be re-processed with priority once the database becomes available again, and
there will be a ton of unwanted failure logging, which will obfuscate any genuine handling errors.
If you have a regular planned outages which you know about in advance, then the simplest way to deal with them is to implement a service window, which another term for a schedule.
However, Windows services manager does not support the concept of service windows, so you would have to use a scheduled task to stop then start your service, or look at other options such as hangfire, quartz.net or some other cron-type library.
It kinds of depends why you need the messages to arrive in order. If it's like you first receive an Order message and then various OrderLine messages that all belong to a certain order, there are multiple possibilities.
One is to just accept that there can be OrderLine messages without an Order. The Order will come in later anyway. Eventual Consistency.
Another one is to collect messages (and possible state) in an NServiceBus Saga. When normally MessageA needs to arrive first, only to receive MessageB and MessageC later, give all three messages the ability to start the saga. All three messages need to have something that ties them together, like a unique GUID. Then the saga will make sure it collects them properly and when all messages have arrived, perhaps store its final state and mark the saga as completed.
Another option is to just persist all messages directly into the database and have something else figure out what belongs to what. This is a scenario useful for a data warehouse where the data just needs to be collected, no matter what. Some data might not be 100% accurate (or consistent) but that's okay.
Asynchronous messaging makes it hard to process them 100% in order, especially when the client calling the WCF is making mistakes and/or sending them out of order. It wouldn't be the first time I had such a requirement and out-of-order messages.

How to specify another timeout queue for NSB?

I am using NSB 4.4.2
I want to have something like heartbeats on my saga to show processing statistics.
When i request a timeout it sends to sagas input queue.
In case of many messages prior to this timeout message, IHandleTimeouts may not be fired at specific time.
Is it a bug? Or how can i use separate queue for timeout messages?
Thanks
You are correct - when a timeout is ready to be dispatched, it is sent to the incoming queue of the endpoint, and if there are already many other messages in there, it will have to wait its turn to be processed.
Another thing you might want to consider, is that the endpoint may be down at that time.
If you want to guarantee that your saga code will be invoked at (or very close to) the time of the timeout, you'll need to set up a high availability deployment first. Then, you should look at setting the SLA required of that endpoint - how quickly messages should be processed, and then monitor the time to breach SLA performance counter.
See here for more information: http://docs.particular.net/nservicebus/monitoring-nservicebus-endpoints
You should be prepared to scale out your endpoint as needed to guarantee enough processing power to keep up with the load coming in.
NOTE: The reason we use the same incoming queue for processing these timeouts is by design. A timeout message is almost always the same priority or lower than the other business messages being processed by a saga. As such, it doesn't make sense to have them cut ahead of other messages in line.
Timeouts are sent to the [endpointname].timeouts

NServiceBus Sagas - At Least Once Delivery

Using NServiceBus with the NHibernate saga persister, how can I avoid duplicate sagas when it's possible for a message to be received more than once?
Here are some solutions that I've thought of so far:
Never call MarkAsComplete() so the deduplication is handled in the usual fashion by the saga itself.
Implement my own saga persister which stores the correlation ids for completed sagas so duplicate/additional messages are ignored.
The question is what would cause the message to be received multiple times - is it due to retries of the same message (like in the case where there was a deadlock in the DB)? Those kinds of retries (causing the same message to be "processed" multiple times) are already handled by the transactional nature of NServiceBus.
If the situation is due to the message being sent by some other endpoint multiple times, the recommendation would be to see what you could do to prevent that on the sending side. If that isn't possible, then yes, a saga that doesn't ever complete could serve as your filter.

NServiceBus Retry Delay

What is the optimal way to configure/code NServiceBus to delay retrying messages?
In its default configuration retry happens almost immediately up to the number of attempts defined in the configuration file. I'd ideally like to retry again after an hour, etc.
Also, how does HandleCurrentMessageLater() work? What does the Later aspect refer to?
The NSB retries is there to remedy temporary problems like deadlocks etc. Longer retries is better handled by creating another process that monitors the error queue and puts them back into to the source queue at the interval you like. Take a look at the ReturnToSourceQueue.exe that comes with NSB for reference.
Edit: NServiceBus now supports this , we call it Second Level Retries, see http://docs.particular.net/ for more details
Here is a blog post on why NServiceBus doesn't include a retry delay that I wrote after asking Udi this very same question in his distributed systems architecture course:
NServiceBus Retries: Why no back-off delay?
And here is a discussion thread covering some of the points involved in building an error queue monitor/retry endpoint:
http://tech.groups.yahoo.com/group/nservicebus/message/10964
As far as HandleCurrentMessageLater(), all that does is puts the current message back at the end of the queue. If there are no other messages waiting, it's going to be processed again immediately.
As of NServiceBus 3.2.1, they provide an out of the box solution to handle back off delays in the event of consecutive message failures. The previously existing retry mechanism still retries failures without a delay to handle cases like Database deadlocks, quickly self healing network issues, etc.
Once a message has been retried the configured number of times, the message is moved to a "Second Level Retry" queue. This queue, as configured below, will retry after a 10, 20, and 30 second delay, then the message will be moved to the configured error queue. You're free to change these values to something that better suites your environment.
You can also check out this link:
http://docs.particular.net/nservicebus/second-level-retries