NServiceBus over AzureStorage Dispatching the message took longer than a visibility timeout - nservicebus

I have NServiceBus running over the AzureStorageQueues Transport.
Sometimes, my message handler shows the following two entries in it's log:
2018-08-27 12:27:23.0329 INFO 5 Handling Message...
2018-08-27 12:27:23.7359 WARN 5 Dispatching the message took longer
than a visibility timeout. The message will reappear in the queue and
will be obtained again.
NServiceBus.AzureStorageQueues.LeaseTimeoutException: The pop receipt
of the cloud queue message '2ebd6dd4-f4a1-40c6-a52e-499e22bc9f2f' is
invalid as it exceeded the next visible time by '00:00:09.7359860'.
I understand that there is a Visiblity Timeout that can be configured, but is 30 seconds by default. And the message being handled is taking longer than this 30 seconds to process.
But what doesn't make sense is the timing of those two log entries. The handler is kicked off at 23.0329 seconds...while the warning pops up at 23.7359 seconds. That's a mere 0.7 seconds. Why is that? I would expect the warning from NServiceBus to pop up after the 30 second InvisibilityTimeout.

Assuming you're using the default settings, messages are retrieved in batches. All messages in the batch have the same visibility timeout value of 30 seconds. There's also processing concurrency limit (calculated as max(2, number of logical processors), which could have an impact, causing some messages from the batch to wait for previous messages to finish processing. Therefore it's possible that your message is retrieved as part of a batch but is not processed right away, causing visibility timeout to expire.
Adjusting configuration by tuning those to address your specific scenario should get rid of those repeated attempts to process messages.

Related

Timeouts - Request them all in the beginning or one by one?

In general, when designing a system which has multiple events happening in some well pre-defined logical order, are there any benefits to either requesting all necessary timeouts at the beginning of the process, or requesting always only the "next" timeout (or in other words, the timeout for the next event)?
To clarify, I'm talking about a scenario when you want a number of things to happen sequentially.
Event A should happen 3 hours after initialization, Event B 10 hours after initialization, and Event C 48 hours after initialization of some process.
When the process is started, should it request a timeout only for Event A (which would then in turn request a timeout for Event B, and so on), or should it immediately request a timeout for all the Events?
In our case the process might be stopped at any point in time - Thus if it's stopped 5 hours after initialization then Event A should have already happened, and Events B and C should not happen at all.
A process might also in special cases be initiated midway through (ie "Start process 5 hours in", in which case Event B should happen 5 hours later), and the timelines of individual processes might be updated manually (ie "Lets postpone Event B by 2.5 hours for this single process instance).
Any thoughts appreciated,
If I got your scenario ok, you can start this with a saga that is started by an initial message that starts the process, on handling the initial message you would request the timeouts you expect and in the timeout handlers checking whether the other events/operation where handled and acting based on the current state...
Does that make sense?

NServiceBus Timeoutsdispatcher queue is being flooded with messages during stress tests

I'm doing some stress tests on a saga that uses 2 timeouts. During the test about 21K saga's get created. So that would mean 42K timeouts, but I notice that the timeoutsdispatcher queue of the saga is getting flooded with 100's of thousands of messages until it crashes because the MSMQ storage limit is hit.
I'm seeing this behavior since I switched the persistence mechanism from RavenDB to SQL Server.
Does anyone have an idea what could be wrong?
Transport: MSMQ
Persistence: NHibernate
Packages used:
NHibernate version 4.0.4.4000
NServiceBus version 5.2.14
NServiceBus.Host version 6.0.0
NServiceBus.Log4Net version 1.0.0
NServiceBus.NHibernate version 6.2.7
Test setup:
* endpoint 1 is sending 22000 messages to endpoint 2.
* endpoint 2 hosts a saga that is started by that message.
* each saga publishes an event and then requests 2 timeouts: 1 at 4 minutes, 1 at 10 minutes.
Observed behavior:
* endpoint 1 sends the 22K messages in under a minute.
* endpoint 2 (the saga) processes 5 to 10 messages per second.
* after 4 minutes the first timeouts are fired, while endpoint 2 is still processing messages from its queue and thus is still creating new saga instances.
* from that moment on, the timeoutsdispatcher queue of the saga endpoint is getting filled with messages.
* after 10 minutes or so, the timeoutsdispatcher queue already contains over 170K messages and is still filling up.
* That continues until endpoint 2 crashes because the MSMQ storage limit is hit, or all messages from the input queue are processed. If the latter occurs first, the timeoutsdispatcher queue message count starts to decrease until it eventually reaches 0.
Did you perform the same stress test with RavenDB? And is SQL Server on a machine that's more-or-less equally powerful, with fast drives?
Update
Some checks for your saga
Is the [Unique] attribute used and is it used properly? In other words, do you use unique ids for every incoming message? So that every incoming message that is spawning 2 timeouts, will create a unique saga instance? If every incoming message is accessing the same Saga, this would be a great case for extremely limiting throughput. Imagine the Saga instance was created already once, else the explanation would become to complex. So Message1 comes in, tries to find the row in the database, finds and locks it. The second message comes in at the same time, finds the row but it's locked. It will go into retry. Message3 up until Message100 come in (if concurrency is set to 100) and all try to do the same thing, immediately failing. You can see this will limit throughput for a while :)
Are the correct indexes on your Saga table(s) and Timeout tables?
What is your maximum concurrency level set to?
Based on the number of message, you say you send 22k messages, resulting in 44k timeout messages. Image all of these timeouts are in MSMQ. Imagine messages are really, really small, like 1Kb. Header information added by NServiceBus might take up 2Kb. That's 44.000 times 3Kb is roughly 135 megabytes. So there's no way that can fill up a default MSMQ installation which has a quota of 1GB by default.
This probably means your deadletter queue is filled up completely. Find more information on MSMQ connectionstrings and set the appropriate connectionstring. For example
<connectionStrings>
<add name="NServiceBus/Transport"
connectionString="deadLetter=false;journal=false;"/>
</connectionStrings>
Messages with TimeToBeReceived attribute set (link) end up in deadletter queue. Also purging queues will make all messages go to deadletter queue. Unless you set the proper connectionstring.

How to specify another timeout queue for NSB?

I am using NSB 4.4.2
I want to have something like heartbeats on my saga to show processing statistics.
When i request a timeout it sends to sagas input queue.
In case of many messages prior to this timeout message, IHandleTimeouts may not be fired at specific time.
Is it a bug? Or how can i use separate queue for timeout messages?
Thanks
You are correct - when a timeout is ready to be dispatched, it is sent to the incoming queue of the endpoint, and if there are already many other messages in there, it will have to wait its turn to be processed.
Another thing you might want to consider, is that the endpoint may be down at that time.
If you want to guarantee that your saga code will be invoked at (or very close to) the time of the timeout, you'll need to set up a high availability deployment first. Then, you should look at setting the SLA required of that endpoint - how quickly messages should be processed, and then monitor the time to breach SLA performance counter.
See here for more information: http://docs.particular.net/nservicebus/monitoring-nservicebus-endpoints
You should be prepared to scale out your endpoint as needed to guarantee enough processing power to keep up with the load coming in.
NOTE: The reason we use the same incoming queue for processing these timeouts is by design. A timeout message is almost always the same priority or lower than the other business messages being processed by a saga. As such, it doesn't make sense to have them cut ahead of other messages in line.
Timeouts are sent to the [endpointname].timeouts

MSMQ + WCF - Retry with Growing Delay

I am using MSMQ 4 with WCF. We have a Microsoft Dynamics plugin putting a message on an queue. A service picks up the message and makes an HTTP request to another web server. The web server responds by putting another message on a different queue. A second service picks up the messages and sends the response back to Dynamics...
We have our retry queue set up to retry 3 times and then wait for 5 minutes before retrying again. The Dynamics system some times takes so long (due to other plugins) that we can round-trip before the database transaction commits. The user's aren't seeing the update come through for another 5 minutes.
I am curious if there is a way to configure the retry mechanism to retry incrementally. So, the first time it fails, it only waits a few seconds. If it fails a second time, it waits twice that. And the time between retries just keeps growing.
The problem with just reducing the time between retries is that a bad message could easily fill up a log file.
It turns out there is no built-in way of doing this. One slightly involved option is to create multiple queues, each with its own retry/poison sub-queues, each with a growing retry delay. You can reuse the same handler for each queue - the only thing that changes is the configuration. You also need a handler that can read the poison sub-queues (service) and move the message to the next queue in the chain (client).
So, you set receiveErrorHandling to Move. The maxRetryCycles and receiveRetryCount are just 1. Each queue will use a growing retryCycleDelay. Each queue you create will have a poison sub-queue created for it automatically. You simply read from each poison sub-queue and use a client to move it to the next queue.
I am sure someone could write some code that would automatically create N queues with a growing retryCycleDelay and hook it up all programmatically. Since it is the same handler/client for every queue, it wouldn't be a big deal.

NServiceBus delay retries

We need to be able to specify a delay in retrying failed messages. NServiceBus retries more or less instantly up to n times (as configured) before moving the message to error queue.
What I need to be able to do is for a given message type specify that its not to be retried for an arbitrary period of time
I've read the post here:
NServiceBus Retry Delay
but this doesn't give what I'm looking for.
Kind regards
Ben
This isn't supported as of right now. What you can do is let the messages go to the error queue and setup and endpoint to monitor that queue. Your code could then determine the rules for replaying messages. You could use a Saga to achieve this in combination with the Timeout manager.
Typically you'll have some rules around when to replay messages. In NSB 3.0 we have a better way to do this using the FaultManager. This gives you options on where to put failed messages and includes the exception. One of the options is a DB which you could then set up a job to inspect the exception and determine what to do with it.
Lastly a low tech way of getting this is to schedule a job that runs the ReturnToSourceQueue tool periodically to "clean up". We are doing this and including an alert so we don't endlessly cycle messages around.