NServiceBus Distributor: Preventing extra entries in StorageQueue after Client restart - nservicebus

For simplicity, I'll refer to both the distributor's ControlInputQueue and it's StorageQueue as the same. I understand how the distributor's client notifies of it's availabilty by writing an entry to the ControlInputQueue and how the distributor moves the entry to it's StorageQueue to track which clients are available to do work. It's just easier to explain if I treat them as the same. So...
I've created a proof of concept to demonstrate the behavior of the NServiceBus distributor. As expected, when a client starts up, it adds an entry to the distributor's StorageQueue. When a message comes in to the distributor (via it's InputQueue), the distributor removes an entry from it's StorageQueue, and forwards the message to the indicated client. The client performs it's work, and then adds an entry back to the distributor's StorageQueue. Thus there is at most one entry (per client) in the distributor's StorageQueue.
My problem occurs when a client is shut down, either manually or unexpectedly (like the server explodes). The client's entry still exists in the Distributor's StorageQueue, so as far as the distributor knows, that client is still available. This is fine, except that when the client starts up again, it adds another entry to the StorageQueue. So now there are two entries in the StorageQueue for a single client.
Is there any way to ensure that the distributor only ever has one StorageQueue entry for any given client?

In the interests of providing an "official" answer to this question... Per Andreas' comment above, it seems that there isn't a way to prevent these duplicate entries in NServiceBus v2.6, but there is in v3.0. So the solution is to upgrade. ;-)

Related

Redirect NServiceBus message based on Endpoint availability

I'm new to NServiceBus, but currently using it with SQL Server Transport to send messages between three machines: one belongs to an endpoint called Server, and two belong to an endpoint called Agent. This is working as expected, with messages sent to the Agent endpoint distributed to one of the two machines via the default round-robin.
I now want to add a new endpoint called PriorityAgent with a different queue and two additional machines. While all endpoints use the same message type, I know where each message should be handled prior to sending it, so normally I can just choose the correct destination endpoint and the message will be processed accordingly.
However, I need to build in a special case: if all machines on the PriorityAgent endpoint are currently down, messages that ordinarily should be sent there should be sent to the Agent endpoint instead, so they can be processed without delay. On the other hand, if all machines on the Agent endpoint are currently down, any Agent messages should not be sent to PriorityAgent, they can simply wait for an Agent machine to return.
I've been researching the proper way to implement this, and haven't seen many results. I imagine this isn't an unheard-of scenario, so my assumption is that I'm searching for the wrong things or thinking about this problem in the wrong way. Still, I came up with a couple potential solutions:
Separately track heartbeats of PriorityAgent machines, and add a mutator or behavior to change the destination of outgoing PriorityAgent messages to the Agent endpoint if those heartbeats stop.
Give PriorityAgent messages a short expiration, and somehow handle the expiration to redirect messages to the Agent endpoint. I'm not sure if this is actually possible.
Is one of these solutions on the right track, or am I off-base entirely?
You have not seen many do this because it's considered an antipattern. Or rather one of two antipatterns.
1) Either you are sending a command, in which case the RECEIVER of the command defines the contract. Why are you sending a command defined by PriorityAgent to Agent? There should be no coupling there. A command belongs to ONE logical endpoint/queue.
2) Or you are publishing an event defined by whoever publishes, with both PriorityAgent and Agent as subscribers. The two subscribers should be 100% autonomous and share nothing. Checking heartbeats/sharing info between these two logical separate entities is a bad thing. Why have them separately in the first place then? If they know about each other "dirty secrets," they should be the same thing.
If your primary concern is that the PriorityAgent messages will not be handled if the machines hosting it are down, and want to use the machines hosting Agent as a backup, simply deploy PriorityAgent there as well. One machine can run more than one endpoint just fine.
That way you can leverage the additional machines, but don't have to get dirty with sending the same command to a different logical endpoint or coupling two different logical endpoints together through some back channel.
I'm Dennis van der Stelt and I work for Particular Software, makers of NServiceBus.
From what I understand, both PriorityAgent and Agent are already scaled out over multiple machines? Then they both work according to competing consumers pattern. In other words, both machines try to pick up messages from the same queue, where only one will win and starts processing the message.
You're also talking about high availability. So when PriorityAgent goes down, another machine will pick it up. That's what I don't understand. Why fail over to Agent, which seems to me to be a logically different endpoint? If it is logically different, how can it handle PriorityAgent messages? If it can handle the same message, it seems logically the same endpoint. Then why make the difference between PriorityAgent and Agent?
Besides that, SQL Server has all kinds of features (like Always-On) to make sure it does not (completely) go down. Why try to solve difficult scenarios with custom build solutions, when SQL Server can already solve this for you?
Another scenario could be that PriorityAgent should handle priority cases. Something like preferred customers, or high-value customers. That is sometimes used when (for example) a lot of orders (read: messages) come in, but we want to deal with high-value customers sooner than regular customers. But due to the amount of messages coming in, high-value customers would also end up in the back of the queue, together with regular customers. A solution could be to publish these messages and have two different endpoints (with different queues) subscribed both to this message. Both receive each unique message, but check whether it's a message they should handle. The Agent will ignore high-value customers, the PriorityAgent will ignore regular customer.
These are some of the solutions available as standard messaging patterns, or infrastructural solutions to solving your issue. Again, it's not completely clear to me what it is you're looking for. If you'd like to continue the discussion; perhaps you want to email support#particular.net and we can continue the discussion there.

Why is NServiceBus Bus.Publish() not transactional?

Setup:
I have a couple of subscribers subscribing to an event produced by a publisher on the same machine via MSMQ. The subscribers use two different endpoint names, and are run in its respective process. (This is NSB 4.6.3)
Scenario:
Now, if I do something "bad" to one of the subscribers (say remove proper permission in MSMQ to receive messages, or delete the queue in MSMQ outright...), and call Bus.Publish(), I will still have one event successfully published to the "good" subscriber (if the good one precedes the bad one on the subscriber list in subscription storage), or none successful (if the bad one precedes the good one).
Conclusion:
The upshot here is that Bus.Publish() does not seem to be transactional, as to making publishing to subscribers all succeed or all fail. Depending on the order of the subscribers on the list, the end result might be different.
Questions:
Is this behavior by design?
What is the thought behind this?
If I want to make this call transactional, what is the recommended way? (One option seems to enclose Bus.Publish() in a TransactionScope in my code...)
Publish is transactional, or at least, it is if there is an ambient transaction. Assuming you have not taken steps to disable transactions, all message handlers have an ambient transaction running when you enter the Handle method. (Inspect Transaction.Current.TransactionInformation to see first-hand.) If you are operating out of an IWantToRunWhenBusStartsAndStops, however, there will be no ambient transaction, so then yes you would need to wrap with your own TransactionScope.
How delivery is handled (specific for the MSMQ transport) is different depending upon whether the destination is a local or remote queue.
Remote Queues
For a remote queue, delivery is not directly handled by the publisher at all. It simply drops the two messages in the "Outbox", so to speak. MSMQ uses store-and-forward to ensure that these messages are eventually delivered to their intended destinations, whether that be on the same machine or a remote machine. In these cases, you may look at your outgoing queues and see that there are messages stuck there that are unable to be delivered because of whatever you have done to their destinations.
The safety afforded by store-and-forward mean that one errant subscriber cannot take down a publisher, and so overall coupling is reduced. This is a good thing! But it also means that monitoring outgoing queues is a very important part of your DevOps story when deploying an NServiceBus system.
Local Queues
For local queues, MSMQ may still technically use a concept of an outoging queue in its own plumbing - I'm not sure and it doesn't really matter. But an additional step that MSMQ is capable of doing (and does) is to check the existence of a local queue before you try to send to it, and will throw an exception if it doesn't exist or something is wrong with it. This would indeed affect the publisher.
So yes, if you publish a message from a non-transactional state like the inside of an IWantToRunWhenBusStartsAndStops, and the downed queue happens to be #2 on the list in subscription storage, you could observe a message arriving at SubscriberA but not at Subscriber B. If it were within a message handler with transactions disabled, you could see the multiple copies arriving at SubscriberA because of the message retry logic!
Upshot
IWantToRunWhenBusStartsAndStops is great for quick demos and proving things out, but try to put as little real logic in them as possible, opting instead for the safety of message handlers where the ambient transaction applies. Also remember than an exception inside there could potentially take down your host process. Certainly don't publish inside of one without wrapping it with your own transaction.

Implementing a "Snapshot and Subscribe" in Redis

I wish to use Redis to create a system which publishes stock quote data to subscribers in an internal network. The problem is that publishing is not enough, as I need to find a way to implement an atomic "get snapshot and then subscribe" mechanism. I'm pretty new to Redis so I'm not sure my solution is the "proper way".
In a given moment each stock has a book of orders which contains at most 10 bids and 10 asks. The publisher receives data for the exchange and should publish them to subscribers.
While the publishing of changes in the order book can be easily done using publish and subscribe, each subscriber that connects also needs to get the snapshot of the current order book of the stock and only then subscribe to changes in the order book.
As I understand, Redis channel never saves information, so the publisher also needs to maintain the complete order book in a hash key (Or a sorted set. I'm not sure which is more appropriate) in addition to publishing changes.
I also understand that a Redis client cannot issue any commands except subscribing and unsubscribing once it subscribes to the first channel.
So, once the subscriber application is up, it needs first to get the key which contains the complete order book and then subscribe to changes in that book. However, this may result in a race condition. A change in the book order can be made after the client got the key containing the current snapshot but before it actually subscribed to changes, resulting a change which it will never see.
As it is not possible to use subscribe and then use get in a single connection, the client application needs two connections to the Redis server. At this point I started thinking that I'm probably not doing things in the proper way if I need more than one connection in the same application. Anyway, my idea is that the client will have a subscribing connection and a query connection. First, it will use the subscribing connection to subscribe to changes in order book, but still won't not enter the loop which process events. Afterwards, it will use the query connection to get the complete snapshot of the book. Finally, it will enter the loop which process events, but as he actually subscribed before taking the snapshot, it is guaranteed that it will not miss any changed that occurred after the snapshot was taken.
Is there any better way to accomplish my goal?
I hope you found your way already, if not here we goes a personal suggestion:
If you are in javascript land i would recommend having a look on Meteor.js they do somehow achieve the goal you want to achieve, with the default setup you will end up writing to mongodb in order to "update" the GUI for the "end user".
In any case, you might be interested in reading about how meteor's ddp protocol works: https://meteorhacks.com/introduction-to-ddp/ and https://www.meteor.com/ddp

Is there a way to control how the NServiceBus Distributor distributes messages?

I am working with a medical system that (among other things) will handle updates to patient data.
We are going to try to scale our system out to more than one machine to improve performance.
However, I have a worry that distributing messages could cause updates in the wrong order.
Here is an Example:
The top two messages on my MSMQ are update requests for a patient. The first one changes the middle name to "Bob", the second one changes the middle name to "Bill". (So in the end the middle name should be "Bill".)
I have a distrubtor setup to send messages to MachineA and MachineB. Both are able to handle messages, but MachineA is a much slower machine (or has other processes that load it down or is slower for some other reason).
If the distributor sends the first message to MachineA and the second to MachineB then the MachineB message will finish first and update the name to "Bill". Then MachineA finishes and the middle name is updated to "Bob".
So I end up with "Bob" as the middle name instead of "Bill".
I need a way to tell the distributor that these two messages should go to the same machine and same thread. Logic wise it is not hard to do, but I need a way to plug into the distributor to do it.
The Logic I would want to do would be like this: The Distributor would keep a list of Patient Ids that the queues are working on and then check new messages for that ID as well. If any of the queues are working on that Id then it either waits until it is done or just make sure the same queue gets the message.)
Or, does NServiceBus some how make sure that these transactions will only commit in order? (I am using Distributed Transactions.)
When you .Send() messages with NSB you can send and array of IMessage. This will send over a batch of messages that will be handled in the context of a single transaction. The messages should be handled in order.

Building a reliable service in WCF

I am currently designing a service (wsHttp) which should be used to return sensitive data. As soon as a client asks for this data, I get it from the database, compile a list, then delete the data from the database and return the list.
My concern is that something happens on the way back to the client (network issues, ...) I have already deleted the data from the database, but the client will never get it.
Which out of the box solution do I have here?
This is an inherent problem in the distributed computing. There is no easy solution. The question is how important it is to recover from such errors.
For example, if one deletes some records but the client gets disconnected, next time he connects he will see those records as deleted. Even if he tries to delete them again (data stayed in the UI), this will do no harm.
For banks transferring money, they have an error resolution mechanism where they match the transactions that happened between them in a second process. Conflicts will be dealt manually.
Some systems such as NServiceBus rely on MSMQ for storing messages and eventual consistency where a message destined to a client will eventually arrive whenever he is connected again.
There is no out of the box solution for this. You would need to implement some form of user/automated confirmation that the data had been recieved and only delete once this was returned.
Ed
There is an easy solution. But it doesn't come in a box.
Protocols like WS-ReliableMessaging (or equally TCP/IP) give you a layer of reliability under your messaging, but all bets are off once that layer offloads the message to the layer above.
So reliability can only be fully addressed at the absolute highest layer - the application layer, not by any lower layer down the communication stack. This makes it a first class business concern, not a purely technical concern.
The problem can be solved with a slight change to the process of deleting your sensitive data.
Instead of deleting it immediately, flag it for deletion. Then, build into the business processes that drive your service the assertion that the client must acknowledge receipt of the sensitive data. Then, when you get the acknowledgement back you can safely delete the data flagged for deletion, knowing that it has been received.
I recently wrote a blog post reasoning that reliability is a first class business concern that cannot be offloaded to a lower layer.