Optimizing saga lookup - nservicebus

I have a proces that downloads files from a remote location in parallel threads.
Each thread sends a message when a download is started, and a second one when the download is completed. Both messages have a download id property (guid) to correlate the two.
Next I have a saga that monitors these downloads. It is started by the DownloadStarted event, and uses a timeout to detect if the DownloadEnded event is received in time.
The problem I have is that the performance of the saga is not that great when a large amount of files are downloaded in a short time (1000 files in 1 minute). At certain times it takes more than half an hour for it to catch up.
I tried to speedup the saga lookup by providing an IFindSagas impementation. That didn't help much, as that caused RavenDB to create an auto index on the DownloadId in the saga data, but also caused the FindBy method to often return null because that index wasn't updated in time.
Is there any other way I could try to speed up the saga?
I was thinking of using the DownloadId as saga id, as that is already a unique guid. The Id property of the saga data is settable, but the documentation specifically states that you should not set the id yourself...
Transport used: MSMQ
Persistence used: RavenDB
NServiceBus version: 5

I reworked the setup so that the saga is no longer needed.
In the process that downloads the file, I now start a timer to send a notification in the background and when the download is completed I stop the timer. This works a lot faster and causes a lot less messages to be sent.
using (var timer = new Timer(1000) {AutoReset = true})
{
var start = DateTime.Now;
timer.Elapsed += (sender, e) =>
_bus.Send(new NotifyHangingDownload(correlationId,
file.Filename,
start,
TimeSpan.FromMilliseconds(1000)));
timer.Start();
client.Download(file, destination, false);
timer.Stop();
}

Related

Clear Flow's replay buffer after each new subscription

I have a process that spawns different number of events when triggered (I've assumed maximum of 50) and another process that needs to be started when the first event arrives. All events need to be passed to the second process, so I use a MutableSharedFlow with replay of size 50 (it cannot be started synchronously - Android system is responsible for creating it). As soon as the second process starts I'd like to reset the replay buffer, so that the next process that subscribes won't receive the old events as well.
I have the following piece of code:
val mutableFlow = MutableSharedFlow<Event>(replay = 50)
val sharedFlow = mutableFlow.onSubscription {
log.info { "New subscription, clearing replay cache." }
val events = mutableFlow.replayCache
mutableFlow.resetReplayCache() // clears events before they arrive to subscriber
}
emitSomeValues(mutableFlow)
So this doesn't work because onSubscription is called after subscription, but before events consumption. I struggle to find any clean solution to that, so any help would be appreciated!
Still no solution for the above :(

How to improve the performance of my NServiceBus Saga under load

I have a very simple Saga built with NSB7 using SQL Transport and NHibernate persistence.
The Saga listens on a queue and for each message received runs through 4 handlers. These are called in a sequential order, with 2 handlers run in parallel and the last handler only runs once both the parallel handlers are complete. The last handler writes a record to DB
Let's say for a single message, each handler takes 1 second. When a new message is received, which starts the Saga, the expected result is that 3-4 seconds later the record is written to the DB.
If the queue backs up with say 1000 messages, once they begin processing again, it takes almost 2000 seconds before a new record is created in the last handler. Basically, instead of running through the expected 4 second processing time for each message, they effectively bunch up in the initial handlers until the queue is emptied and then does that again for the next handler and on and on.
Any ideas on how I could improve the performance of this system when under load so that a constant stream of processed messages come out the end rather than the bunching of messages and long delay before a single new record comes out the other side?
Thanks
Will
There is documentation for saga concurrency issues: https://docs.particular.net/nservicebus/sagas/concurrency#high-load-scenarios
I still don't fully understand the issue though. Every message that instantiates a saga, should create a record in the database after the message was processed. Not after 1000 messages. How else is NServiceBus going to guarantee consistency?
Next to that, you probably should not have the single message be processed by 4 handlers. If it really needs to work like this, use publish/subscribe and create different endpoints. The saga should be done with processing as soon as possible, especially under high load scenarios.

FreeRTOS stuck in osDelay

I'm working on a project using a STM32F446 with a boilerplate created with STM32CubeMX (for peripherals initialization and middleware like the FreeRTOS with the CMSIS-V1 interface).
I have two threads which communicate using mailboxes but I encountered a problem: one of the thread body is
void StartDispatcherTask(void const * argument)
{
mailCommand *commandData = NULL;
mailCommandResponse *commandResponse = NULL;
osEvent event;
for(;;)
{
event = osMailGet(commandMailHandle, osWaitForever);
commandData = (mailCommand *)event.value.p;
// Here is the problem
osDelay(5000);
}
}
It gets to the delay but never gets out. Is there a problem with using the mailbox and the delay in the same thread? I tried also bringing the delay before the for(;;) and it works.
EDIT: I guess I can try to add more detail to the problem. The first thread send a mail of a certain type and then waits for a mail of another type; the thread in which I get the problem receive the mail go the first type and execute some code based on what it receive and then send the result as a mail of the second type; sometimes it is that it has to wait using osDelay and there it stop working but without going into any fault handler
I would rather use standard freeRTOS API. ARM CMSIS wrapper is rubbish.
BTW I rather suspect osMailGet(commandMailHandle, osWaitForever);
the delay is in this case not needed at all. If you wait for the data in the BLOCKED state the task does not consume any processing power
If another guesses are:
You are landing in the HF
You are stacked in the context switch (wrong interrupt priorities )
use your debugger and see what is going on.
osStatus osDelay (uint32_t millisec)
The millisec value specifies the number of timer ticks.
The exact time delay depends on the actual time elapsed since the last timer tick.
For a value of 1, the system waits until the next timer tick occurs.
=> You have to check whether timer tick is running or not.
check this link
As P__J__ pointed out in an earlier answer, you shouldn't use the osDelay() call in the loop1
because your task loop will wait at the osMailGet() call for the next request/mail until it arrives anyhow.
But this hint called my attention to another possible reason for your observation, so I'm opening this new answer:2
As the loop execution is interrupted by a delay of 5000 ticks - could it be that the producer of the mails is filling the mailbox faster than the task is consuming mails? Then, you should inspect if this situation is detected/handled in the producer context.
If the producer ignores "queue full" return values and discards the mails before they have been transmitted, the system will only process a few mails every 5000 ticks (or it may lose all but a few mails after the first fill of the mailbox, if the producer in your example only fills the mailbox queue once).
This could look like the consumer task being stuck, even if the main problem is about the producer context (task/ISR).
1
The osDelay() call can only help you if you want to avoid to process another mail within 5000 ticks if request mails are produced faster than the task processes them.
But then, you'd have a different problem, and you should open a different question...
2
Edit: I just noticed that Clifford already mentioned this option in one of his comments to the question. I think this option must be covered by an answer.

Nservicebus delayed publishing of events

i have a azure worker role with an nservicebus host 4.7.5 . This host sends events on azure servicebus transport and to a topic. Is there a way to either delay the sending of the event or setting some properties to make sure that the message appears after a delay on the topic subscription? The host sends out events after it notices a change in the primary database. There are several secondary databases into which the primary data write is replicated. The receivers are also azure worker roles that use nservicebus host and have subscription to the topics.
By the time the receivers receive the message, due to replication lag the secondaries may have out of sync data.
one option is to use primary database to read but that is a route which i dont want to take.
Would it be possible to fail-early in your subscription endpoints and let the retries take care of it? You can fine-tune the retry times/delays to make sure your secondary databases are updated before the message is retried.
You still need to find the best way to lookup your data from the database and a way to differentiate between the version in the event. You could use version numbers or last update dates in case of updates, or just lookup by an identifier in case of creation.
The endpoint reading data off the secondary database might have a event handler like this:
public class CustomerCreationHandler : IHandlesMessage<CustomerCreated>
{
public void Handle(CustomerCreated #event)
{
var customer = Database.Load(#event.CustomerId);
if(customer == null)
{
throw new CustomerNotFoundException("Customer was not found.");
}
//Your business logic goes here
}
}
You can control how many times the event handler will retry and how much delay there'll be between each attempt. In this case, the message will be retried by First-Level retries and then handed over to Second-Level retries which is configured below.
class ProvideConfiguration :
IProvideConfiguration<SecondLevelRetriesConfig>
{
public SecondLevelRetriesConfig GetConfiguration()
{
return new SecondLevelRetriesConfig
{
Enabled = true,
NumberOfRetries = 2,
TimeIncrease = TimeSpan.FromSeconds(10)
};
}
}
Alternatively, instead of just publishing the event, you can send a deferred message to the same endpoint to then publish the actual event after certain amount of time is passed.

Scoping transactions and sessions in NHibernate for long running tasks

When using NHibernate in web applications, I will usually let my IoC container take care of opening and closing an ISession per request and commit/rollback the transaction. The nature of HTTP makes it very easy to define a clear Unit-of-Work in such applications.
Now, I have been tasked with putting together a small program, which will be invoked regularly by a task scheduler, for sending out newsletters. The concepts of both newsletters and subscribers are already well defined entities in our domain model, and sending a newsletter to all subscribers would involve doing something similar to this:
var subscribers = _session
.QueryOver<Subscription>()
.Where(s => !s.HasReceivedNewsletter)
.List();
foreach (var subscriber in subscribers)
{
SendNewsletterTo(subscriber);
subscriber.HasReceivedNewsletter = true;
}
Notice how each Subscriber object is updated within the loop, recording that she has now received the newsletter. The idea is, that if the mail sending program should crash, it can be restarted and continue sending newsletters from where it left off.
The problem I am facing, is in defining and implementing the Unit-of-Work pattern here. I will probably need to commit changes to the database by the end of each iteration of the loop. Simply wrapping the loop body with a using (var trans = _session.BeginTransaction()) block seems to be extremely expensive in running time, and I also seem to experience locking issues between this long running process and other (web) applications using the same database.
After reading some articles and documentation on NHibernate transactions, I have come to think, that I might need to detach the list of subscribers from the session to avoid the locking issues, and reattach each to a fresh session in the loop body. I am not sure how this will work for performance, though.
So, NHibernate experts, how would you design and implement a long running job like this?
Don't you want to use asynchronous durable messaging here? Something like NServiceBus, Rhino Service Bus or MassTransit. It seems you don't have to send a lot of messages as soon as possible, so I think you should do it asynchronously with 1 durable message per user basis
Don't you think that Stateless session with no transaction will do better here?
There's no problem having multiple transactions in a session. It's appropriate here to scope the transaction to updating a single subscriber because it's an independent operation. Depending on the number of subscribers and the likelihood of failure, it might be best to grab a small number of subscribers at a time.
foreach (var subscriber in subscribers)
{
using (var txn = _session.BeginTransaction())
{
try
{
SendNewsletterTo(subscriber);
subscriber.HasReceivedNewsletter = true;
txn.Commit();
}
catch (Exception ex)
{
txn.Rollback();
// log exception, clean up any actions SendNewsletterTo has taken if needed
// Dispose of session and start over
}
}
}