Using NServiceBus with the NHibernate saga persister, how can I avoid duplicate sagas when it's possible for a message to be received more than once?
Here are some solutions that I've thought of so far:
Never call MarkAsComplete() so the deduplication is handled in the usual fashion by the saga itself.
Implement my own saga persister which stores the correlation ids for completed sagas so duplicate/additional messages are ignored.
The question is what would cause the message to be received multiple times - is it due to retries of the same message (like in the case where there was a deadlock in the DB)? Those kinds of retries (causing the same message to be "processed" multiple times) are already handled by the transactional nature of NServiceBus.
If the situation is due to the message being sent by some other endpoint multiple times, the recommendation would be to see what you could do to prevent that on the sending side. If that isn't possible, then yes, a saga that doesn't ever complete could serve as your filter.
Related
Before a consumer nacks a message, is there any way the consumer can modify the message's state so that when the consumer consumes it upon redelivery, it sees that changed state. I'd rather not reject + reenqueue new message, but please let me know if that's the only way to accomplish this.
My goal is to determine how many times specific messages are being redelivered. I see two ways of doing this:
(1) On the message itself as described above. The message would be a container of basic stats and the application payload message.
(2) In some external storage. We would uniquely identify the message by the message id that we set.
I know 2 is possible, but my question is if 1 is possible.
There is no way to do (1) like you want. You would need to change the message, thus the message would become another message. If you want to do something like that (and it's possible that you meant this with I'd rather not reject + reenqueue new message) - you should ACK the message, increment one field in it and publish it again (again, maybe this is what you meant when you said reenqueue it). So your message payload would have some ID, counter, and again (obviously different) payload that is the content.
Definitvly much better way is (2) for multiple reasons:
it does not interfere with business logic, that is this diagnostic part is isolated
you are leaving re-queueing to rabbitmq (as you are supposed to do), meaning that you are not worrying about losing messages and handling some message meta info which has no use for you business logic
it's actually supposed to be used - the ACKing and NACKing, that's why it's in the AMQP specification
since you do need the number of how many times specific messages have been redelivered, you have it somewhere externally, meaning that it's independent of (rabbitmq's) message persistence, lifetime, potentially queue durability mirroring etc
Even if this question was marked as solved some time ago, I want to mention that there is a way at least for the redelivery. It might be integrated after the original answer. There is a different type of queues in RabbitMQ called Quorum queues.
Quorum queues offer the option to set redelivery limit:
Quorum queues support poison message handling via a redelivery limit. This feature is currently unique to Quorum queues.
In order to archive this, RabbitMQ is counting the numbers of deliveries in the header. The header attribute is called: x-delivery-count
I have a task that can be started by the user, that could take hours to run, and where there's a reasonable chance that the user will start the task multiple times during a run.
I've broken the processing of the task up into smaller batches, but the way the data looks it's very difficult to tell what's still to be processed. I batch it using messages that each process a bite sized chunk of the data.
I have thought of using a Saga to control access to starting this process, with a Saga property called Processing that I set at the start of the handler and then unset at the end of the handler. The handler does some work and sends the messages to process the data. I check the value at the start of the handler, and if it's set, then just return.
I'm using Azure storage for Saga storage, if it makes a difference for the next bit. I'm also using NSB 6
I have a few questions though:
Is this the correct approach to re-entrancy with NSB?
When is a change to Saga data persisted? (and is it different depending on the transport?)
Following on from the above, if I set a Saga value in a handler, wait a while and then reset it to its original value will it change the persistent storage at all?
Seem to be cross posted in the Particular Software google group:
https://groups.google.com/forum/#!topic/particularsoftware/p-qD5merxZQ
Sagas are very often used for such patterns. The saga instance would track progress and guard that the (sub)tasks aren't invoked multiple times but could also take actions if the expected task(s) didn't complete or is/are over time.
The saga instance data is stored after processing the message and not when updating any of the saga data properties. The logic you described would not work.
The correct way would be having a saga that orchestrates your process and having regular handlers that do the actual work.
In the saga handle method that creates the saga check if the saga was already created or already the 'busy' status and if it does not have this status send a message to do some work. This will guard that the task is only initiated once and after that the saga is stored.
The handler can now do the actual task, when it completes it can do a 'Reply' back to the saga
When the saga receives the reply it can now start any other follow up task or raise an event and it can also 'complete'.
Optimistic concurrency control and batched sends
If two message are received that create/update the same saga instance only the first writer wins. The other will fail because of optimistic concurrency control.
However, if these messages are not processed in parallel but sequential both fail unless the saga checks if the saga instance is already initialized.
The following sample demonstrates this: https://github.com/ramonsmits/docs.particular.net/tree/azure-storage-saga-optimistic-concurrency-control/samples/azure/storage-persistence/ASP_1
The client sends two identical message bodies. The saga is launched and only 1 message succeeds due to optimistic concurrency control.
Due to retries eventually the second copy will be processed to but the saga checks the saga data for a field that it knows would normally be initialized by by a message that 'starts' the saga. If that field is already initialized it assumes the message is already processed and just returns:
It also demonstrates batches sends. Messages are not immediately send until the all handlers/sagas are completed.
Saga design
The following video might help you with designing your sagas and understand the various patterns:
Integration Patterns with NServiceBus: https://www.youtube.com/watch?v=BK8JPp8prXc
Keep in mind that Azure Storage isn't transactional and does not provide locking, it is only atomic. Any work you do within a handler or saga can potentially be invoked more than once and if you use non-transactional resources then make sure that logic is idempotent.
So after a lot of testing
I don't believe that this is the right approach.
As Archer says, you can manipulate the saga data properties as much as you like, they are only saved at the end of the handler.
So if the saga receives two simultaneous messages the check for Processing will pass both times and I'll have two processes running (and in my case processing the same data twice).
The saga within a saga faces a similar problem too.
What I believe will work (and has done during my PoC testing) is using a database unique index to help out. I'm using entity framework and azure sql, so database access is not contained within the handler's transaction (this is the important difference between the database and the saga data). The database will also operate across all instances of the endpoint and generally seems like a good solution.
The table that I'm using has each of the columns that make up the saga 'id', and there is a unique index on them.
At the beginning of the handler I retrieve a row from the database. If there is a row, the handler returns (in my case this is okay, in others you could throw an exception to get the handler to run again). The first thing that the handler does (before any work, although I'm not 100% sure that it matters) is to write a row to the table. If the write fails (probably because of the unique constraint being violated) the exception puts the message back on the queue. It doesn't really matter why the database write fails, as NSB will handle it.
Then the handler does the work.
Then remove the row.
Of course there is a chance that something happens during processing of the work, so I'm also using a timestamp and another process to reset it if it's busy for too long. (still need to define 'too long' though :) )
Maybe this can help someone with a similar problem.
We have a requirement for all our messages to be processed in the order of arrival to MSMQ.
We will be exposing a WCF service to the clients, and this WCF service will post the messages using NServiceBus (Sendonly Bus) to MSMQ.
We are going to develop a windows service(MessageHandler), which will use Nservicebus to read the message from MSMQ and save it to the database. Our database will not be available for few hours everyday.
During the db downtime we expect that the process to retry the first message in MSMQ and halt processing other messages until the database is up. Once the database is up we want NServicebus to process in the order the message is sent.
Will setting up MaximumConcurrencyLevel="1" MaximumMessageThroughputPerSecond="1" helps in this scenario?
What is the best way using NServiceBus to handle this scenario?
We have a requirement for all our messages to be processed in the
order of arrival to MSMQ.
See the answer to this question How to handle message order in nservicebus?, and also this post here.
I am in agreement that while in-order delivery is possible, it is much better to design your system such that order does not matter. The linked article outlines the following soltuion:
Add a sequence number to all messages
in the receiver check the sequence number is the last seen number + 1 if not throw an out of sequence exception
Enable second level retries (so if they are out of order they will try again later hopefully after the correct message was received)
However, in the interest of anwering your specific question:
Will setting up MaximumConcurrencyLevel="1"
MaximumMessageThroughputPerSecond="1" helps in this scenario?
Not really.
Whenever you have a requirement for ordered delivery, the fundamental laws of logic dictate that somewhere along your message processing pipeline you must have a single-threaded process in order to guarantee in-order delivery.
Where this happens is up to you (check out the resequencer pattern), but you could certainly throttle the NserviceBus handler to a single thread (I don't think you need to set the MaximumMessageThroughputPerSecond to make it single threaded though).
However, even if you did this, and even if you used transactional queues, you could still not guarantee that each message would be dequeued and processed to the database in order, because if there are any permanent failures on any of the messages they will be removed from the queue and the next message processed.
During the db downtime we expect that the process to retry the first
message in MSMQ and halt processing other messages until the database
is up. Once the database is up we want NServicebus to process in the
order the message is sent.
This is not recommended. The second level retry functionality in NServiceBus is designed to handle unexpected and short-term outages, not planned and long-term outages.
For starters, when your NServiceBus message handler endpoint tries to process a message in it's input queue and finds the database unavailable, it will implement it's 2nd level retry policy, which by default will attempt the dequeue 5 times with increasing infrequency, and then fail permanently, sticking the failed message in it's error queue. It will then move onto the next message in the input queue.
While this doesn't violate your in-order delivery requirement on its own, it will make life very difficult for two reasons:
The permanently failed messages will need to be re-processed with priority once the database becomes available again, and
there will be a ton of unwanted failure logging, which will obfuscate any genuine handling errors.
If you have a regular planned outages which you know about in advance, then the simplest way to deal with them is to implement a service window, which another term for a schedule.
However, Windows services manager does not support the concept of service windows, so you would have to use a scheduled task to stop then start your service, or look at other options such as hangfire, quartz.net or some other cron-type library.
It kinds of depends why you need the messages to arrive in order. If it's like you first receive an Order message and then various OrderLine messages that all belong to a certain order, there are multiple possibilities.
One is to just accept that there can be OrderLine messages without an Order. The Order will come in later anyway. Eventual Consistency.
Another one is to collect messages (and possible state) in an NServiceBus Saga. When normally MessageA needs to arrive first, only to receive MessageB and MessageC later, give all three messages the ability to start the saga. All three messages need to have something that ties them together, like a unique GUID. Then the saga will make sure it collects them properly and when all messages have arrived, perhaps store its final state and mark the saga as completed.
Another option is to just persist all messages directly into the database and have something else figure out what belongs to what. This is a scenario useful for a data warehouse where the data just needs to be collected, no matter what. Some data might not be 100% accurate (or consistent) but that's okay.
Asynchronous messaging makes it hard to process them 100% in order, especially when the client calling the WCF is making mistakes and/or sending them out of order. It wouldn't be the first time I had such a requirement and out-of-order messages.
I am reading through the documentation and the following confuses me because it states at the top of the document with version 5 we get reliability without using the DTC.
These feature has been implemented using both the Outbox pattern and the Deduplication pattern. As a message is dequeued we check to see if we have previously processed it. If so, we then deliver any messages in the outbox for that message but do not invoke message-processing logic again. If the message wasn't previously processed, then we invoke the regular handler logic, storing all outgoing message in a durable storage in the same transaction as the users own database changes. Finally we send out all outgoing messages and update the deduplication storage.
I'm sure it's probably due to my lack of understanding, but wouldn't the fact that NServiceBus is opening it's own connection and transaction separate from the message handler (ex; calling repository for saving) database connection the transaction would be escalated to a full 2PC using the DTC?
Here is the documentation:
http://docs.particular.net/nservicebus/outbox/
Thanks!
Yes, it would. Which is why it shares them with you instead.
NServiceBus expose these to you in the message handlers so you can reuse them and avoid the escalation.
Simply take a dependency on NHibernateStorageContext
in your message handler constructor and it gives you access to the correct NHibernate.ISession and NHibernate.ITransaction.
I have a trouble with NSB saga performance. We have one single saga that orchestrate long running session. Saga sends a lot of messages to different processors and than gets its replies.
I see that sagas queue contains tons of incoming messages. Each messages processing is very fast, but there is a delay between handling next message. Here is a part of log file:
16:26:42 [14][DEBUG] Finished handling message.
16:26:46 [15][DEBUG] ChildContainerBehavior
16:26:46 [15][DEBUG] MessageHandlingLoggingBehavior
16:26:46 [15][DEBUG] Received message with ID 28b285ce-3b77-4a69-a13a-a3bf009717fd from sender xxxHost#PROCESSOR01
We see a 4 seconds delay. That is very slow. Please help, what is wrong with my saga?
Thanks!
Since you have a monolithic saga, you will have some contention on the state record that backs the saga in storage. You will want to consider breaking up your endpoint or redesigning how you gather the information. Check out this Routing Slip implementation.