How to prevent NServiceBus from not sending messages on errors - nservicebus

I'm new to NServiceBus, so maybe I'm asking something pretty silly here, but is there a way to make NServiceBus not stop sending any messages that are sent in response to a message whose handler fails?
Let me explain with a simple example.
Suppose I have an OrderPaidEvent that has a handler that does the following:
Look for the customer
Start a DB transaction
Update the customer to a good customer
Send an CustomerUpgradedToGoodCustomerEvent message
Commit the DB transaction
Fairly straightforward, all is well in the world. Now a few months later someone else figures that an email would be nice when an order is paid and thus adds another handler to the OrderPaidEvent to send an email.
Unfortunately, now whenever the mailserver has an issue, this second handler will fail with an error which will however prevent the original CustomerUpgradedToGoodCustomerEvent message from being sent (step 4). But because the DB transaction was already committed (step 5) the customer has already been upgraded to a good customer in the database.
This means that even if the OrderPaidEvent handler is retried the customer no longer changes and thus the CustomerUpgradedToGoodCustomerEvent message is never sent. Worse yet, this is all because of a change to the code that has nothing to do with the original message handler and will thus be difficult to detect.
This seems like a massive flaw and since I'm new to this I'm certain there's something I'm doing wrong, but I can't seem to figure out what it is.
Any help from you fine people would be great.
Thanks in advance.

How about breaking down your procedural code into separate handlers?
Thereafter each logical operation will either be done or will not be done based on successful completion of each granular task.
If you add a Saga to the mix then you can make business decisions based on the completed steps in your Saga.
Also maybe read more about transactions and NServiceBus here

First of all I would send out the CustomerUpgradedToGoodCustomerEvent after the commit. At that point you are sure that the event actually took place.
And in response to your question: You could handle the email in some 'SendEmail' command that is raised after the db commit and before the event is published. If that command fails it will not hurt the handling of the OrderPaid event. When mail is up again, the command can be retried and handled normally.

Related

RabbitMQ+MassTransit: how to cancel queued message from processing?

In some exceptional situations I need somehow to tell consumer on receiving point that some messages shouldn’t be processed. Otherwise two systems will become out-of-sync (we deal with some outdates external systems, and if, for example, connection is dropped we have to discard all queued operations in scope of that connection).
Take a risk and resolve problem messages manually? Compensation actions (that could be tough to support in my case)? Anything else?
There are a few ways:
You can set a time-to-live when sending a message: await endpoint.Send(myMessage, c => c.TimeToLive = TimeSpan.FromHours(1));, but this will apply to all messages that are sent (or published) like this. I would consider this, after looking at your requirements. This is technical, but it is a proper messaging pattern.
Make TTL and generation timestamp properties of your message itself and let the consumer decide if the message is still worth processing. This is more business and, probably, the most correct way.
Combine tech and business - keep the timestamp and TTL in message headers so they don't pollute your message contracts, and filter them out using a custom middleware. In this case, you need to be careful to log such drops so you won't be left wonder why messages disappear now and then.
Almost any unreliable integration can be monitored using sagas, with timeouts. For example, we use a saga to integrate with Twilio. Since we have no ability to open a webhook for them, we poll after some interval to check the message status. You can start a saga when you get a message and schedule a message to check if the processing is still waiting. As discussed in comments, you can either use the "human intervention required" way to fix the issue or let the saga decide to drop the message.
A similar way could be to use a lookup table, where you put the list of messages that aren't relevant for processing. Such a table would be similar to the list of sagas. It seems that this way would also require scheduling. Both here, and for the saga, I'd recommend using a separate receive endpoint (a queue) for the DropIt message, with only one consumer. It would prevent DropIt messages from getting stuck behind the integration messages that are waiting to be processed (and some should be already dropped)
Use RMQ management API to remove messages from the queue. This is the worst method, I won't recommend it.
From what I understand, you're building a system that sends messages to 3rd party systems. In other words, systems you don't control. It has an API but compensating actions aren't always possible, because the API doesn't provide it or because actions are performed inside the 3rd party system that can't be compensated or rolled back?
If possible try to solve this via sagas. Make sure the saga executes the different steps (the sending of messages) in the right order. So that messages that cannot be compensated are sent last. This way message that can be compensated if they fail, will be compensated by the saga. The ones that cannot be compensated should be sent last, when you're as sure as possible that they don't have to be compensated. Because that last message is the last step in synchronizing all systems.
All in all this is one of the problems with distributed systems, keeping everything in sync. Compensating actions is the way to deal with this. If compensating actions aren't possible, you're in a very difficult situation. Try to see if the business can help by becoming more flexible and accepting that you need to compensate things, where they'll tell you it's not possible.
In some exceptional situations I need somehow to tell consumer on receiving point that some messages shouldn’t be processed.
Can't you revert this into:
Tell the consumer that an earlier message can be processed.
This way you can easily turn this in a state machine (like a saga) that acts on two messages. If the 2nd message never arrives then you can discard the 1st after a while or do something else.
The strategy here is to halt/wait until certain that no actions need to be reverted.

How to make a Saga handler Reentrant

I have a task that can be started by the user, that could take hours to run, and where there's a reasonable chance that the user will start the task multiple times during a run.
I've broken the processing of the task up into smaller batches, but the way the data looks it's very difficult to tell what's still to be processed. I batch it using messages that each process a bite sized chunk of the data.
I have thought of using a Saga to control access to starting this process, with a Saga property called Processing that I set at the start of the handler and then unset at the end of the handler. The handler does some work and sends the messages to process the data. I check the value at the start of the handler, and if it's set, then just return.
I'm using Azure storage for Saga storage, if it makes a difference for the next bit. I'm also using NSB 6
I have a few questions though:
Is this the correct approach to re-entrancy with NSB?
When is a change to Saga data persisted? (and is it different depending on the transport?)
Following on from the above, if I set a Saga value in a handler, wait a while and then reset it to its original value will it change the persistent storage at all?
Seem to be cross posted in the Particular Software google group:
https://groups.google.com/forum/#!topic/particularsoftware/p-qD5merxZQ
Sagas are very often used for such patterns. The saga instance would track progress and guard that the (sub)tasks aren't invoked multiple times but could also take actions if the expected task(s) didn't complete or is/are over time.
The saga instance data is stored after processing the message and not when updating any of the saga data properties. The logic you described would not work.
The correct way would be having a saga that orchestrates your process and having regular handlers that do the actual work.
In the saga handle method that creates the saga check if the saga was already created or already the 'busy' status and if it does not have this status send a message to do some work. This will guard that the task is only initiated once and after that the saga is stored.
The handler can now do the actual task, when it completes it can do a 'Reply' back to the saga
When the saga receives the reply it can now start any other follow up task or raise an event and it can also 'complete'.
Optimistic concurrency control and batched sends
If two message are received that create/update the same saga instance only the first writer wins. The other will fail because of optimistic concurrency control.
However, if these messages are not processed in parallel but sequential both fail unless the saga checks if the saga instance is already initialized.
The following sample demonstrates this: https://github.com/ramonsmits/docs.particular.net/tree/azure-storage-saga-optimistic-concurrency-control/samples/azure/storage-persistence/ASP_1
The client sends two identical message bodies. The saga is launched and only 1 message succeeds due to optimistic concurrency control.
Due to retries eventually the second copy will be processed to but the saga checks the saga data for a field that it knows would normally be initialized by by a message that 'starts' the saga. If that field is already initialized it assumes the message is already processed and just returns:
It also demonstrates batches sends. Messages are not immediately send until the all handlers/sagas are completed.
Saga design
The following video might help you with designing your sagas and understand the various patterns:
Integration Patterns with NServiceBus: https://www.youtube.com/watch?v=BK8JPp8prXc
Keep in mind that Azure Storage isn't transactional and does not provide locking, it is only atomic. Any work you do within a handler or saga can potentially be invoked more than once and if you use non-transactional resources then make sure that logic is idempotent.
So after a lot of testing
I don't believe that this is the right approach.
As Archer says, you can manipulate the saga data properties as much as you like, they are only saved at the end of the handler.
So if the saga receives two simultaneous messages the check for Processing will pass both times and I'll have two processes running (and in my case processing the same data twice).
The saga within a saga faces a similar problem too.
What I believe will work (and has done during my PoC testing) is using a database unique index to help out. I'm using entity framework and azure sql, so database access is not contained within the handler's transaction (this is the important difference between the database and the saga data). The database will also operate across all instances of the endpoint and generally seems like a good solution.
The table that I'm using has each of the columns that make up the saga 'id', and there is a unique index on them.
At the beginning of the handler I retrieve a row from the database. If there is a row, the handler returns (in my case this is okay, in others you could throw an exception to get the handler to run again). The first thing that the handler does (before any work, although I'm not 100% sure that it matters) is to write a row to the table. If the write fails (probably because of the unique constraint being violated) the exception puts the message back on the queue. It doesn't really matter why the database write fails, as NSB will handle it.
Then the handler does the work.
Then remove the row.
Of course there is a chance that something happens during processing of the work, so I'm also using a timestamp and another process to reset it if it's busy for too long. (still need to define 'too long' though :) )
Maybe this can help someone with a similar problem.

Grails test JMS messaging

I've got a JMS messaging system implemented with two queues. One is used as a standard queue second is an error queue.
This system was implemented to handle database concurrency in my application. Basically, there are users and users have assets. One user can interact with another user and as a result of this interaction their assets can change. One user can interact with single user at once, so they cannot start another interaction before the first one finishes. However, one user can be in interaction with other users multiple times [as long as they started the interaction].
What I did was: crated an "interaction registry" in redis, where I store the ID of users who begin an interaction. During interaction I gather all changes that should be made to the second user's assets, and after interaction is finished I send those changes to the queue [user who has started the interaction is saved within the original transaction]. After the interaction is finished I clear the ID from registry in redis.
Listener of my queue will receive a message with information about changes to the user that need to be done. Listener will get all objects which require a change from the database and update it. Listener will check before each update if there is an interaction started by the user being updated. If there is - listener will rollback the transaction and put the message back on the queue. However, if there's something else wrong, message will be put on to the error queue and will be retried several times before it is logged and marked as failed. Phew.
Now I'm at the point where I need to create a proper integration test, so that I make sure no future changes will screw this up.
Positive testing is easy, unfortunately I have to test scenarios, where during updates there's an OptimisticLockFailureException, my own UserInteractingException & some other exceptions [catch (Exception e) that is].
I can simulate my UserInteractingException by creating a payload with hundreds of objects to be updated by the listener and changing one of it in the test. Same thing with OptimisticLockFailureException. But I have no idea how to simulate something else [I can't even think of what could it be].
Also, this testing scenario based on a fluke [well, chance that presented scenario will not trigger an error is very low] is not something I like. I would like to have something more concrete.
Is there any other, good, way to test this scenarios?
Thanks
I did as I described in the original question and it seems to work fine.
Any delays I can test with camel.

NServiceBus pub/sub - where have my messages gone?

Well I've been doing this NServiceBus project for a while and once I got it working for PubSub I then spent the rest of the time on the actual workflow logic. However, I can see a serious issue which I want to get around (or rather learn how to handle correctly).
A publisher publishes a message to the storage queues of any subscribers as far as I understand. Great. But what happens when the subscriber isn't running (I've read other posts about this and they don't seem to be asking the same question).
Scenario - I get the publisher to Publish a message when no subscribers are running (attached/requested messages to be relayed to them).. I then find that.. the message is "gone" just simply isn't there! where did it go? Did the publisher say "hey, no one's subscribing to this, so I wont bother publishing it?", shouldn't it NOT do that and require at least one subscriber?
Can anyone shed any light on this? (nservicenewbie)
You should publish an event that has happened - a statement of fact, that other handler may or may not be interested in. It's perfectly valid to have zero subscribers! If this is not the case then maybe you should be Send()ing a command instead of Publish()ing an event.
If you are using a persistent subscription storage, start the subscriber up once and it will always be subscribed. If the subscriber is offline, messages for it will pile up in its Input Queue, ready to be processed when the subscriber comes back online.
If you're just testing with NServiceBus, the NServiceBus.Host.exe is running in the Lite profile, which uses in-memory (non-persistant) subscription storage, which would result in what you are seeing.
Ah ha! Well though it's not always an error to have no subscriber for a message type, there is a way to handle it.
In your publisher simply modify the:
IBus Bus
To use (you will need NServiceBus.Core.dll and the NS NServiceBus.Unicast):
IUnicastBus Bus
Then you can attach an handler to:
Bus.NoSubscribersForMessage += .......
This can then put the message in an error queue.. or perhaps retry forever.. or publish something else etc.. etc.. what ever you want. Thus ensuring there is nothing lost where your particular system (from a business perspective) requires an outcome

receive as pick branch trigger does not fire

I have a WF4 Service with a flowchart as the root activity. It contains multiple correlated receive activites and decision branching to step through an approval process. The receive activities work perfectly until I try and use one as the trigger for a pick branch.
I am running tracking so can see that the receive is opened and in the persistance I can see the associated bookmark. When I send a client message with the receive type it does not trigger. I have a delay pick branch that fires OK but then the subsequent receive also does not work.
I have checked these receive activities individually and they work OK when not used as the pick trigger. I have tried the pick within a Sequence and a While but no difference.
I cannot see any difference between my implementation and may examples on the web. Am I missing something extra required when the receive is encapsulated by a pick branch?
There is nothing special about a PickBranch trigger that would cause a receive to behave differently so I suspect it is something with the Receive itself. What kind of errors are you seeing at the client application?