NServiceBus and DTC's - nservicebus

Im using the Distributor in NServiceBus and i've hit a wall of ignorence regarding DTC's.
Ive only used DTC once maybe twice before when doing stuff across processes and not a lot, so im very newbie with the whole DTC concept.
Question:
To ensure durable messaging with NSB, is it absolutely necessary to use DTC's?
The reason I ask is that i would expect NSB be able to detect any exception in, say a handler, and therefore react to the error by not removing the message from the queue. Hence no need for DTC. That would of course mean that any database or external service access in the handler would require the programmer to perform hers/his own rollbacks etc. And for that reason DTC's does seem like the best way to go. So Im all for DTC's (if I understand them right) as they from my perspective ensures that messages never are lost from the queues and message handling will never be corrupt as long a the handlers are implemented correctly and have other external services participate in the DTC.
But im not sure, especially since a well respected guy in the server maintenance team used the sentence "DTC's will cause you a world of pain!" when I ran the idea of activating DTC's on the database server by him... But he have yet to come with argument as to why im in for so much pain with DTC's... :/.
Could someone with a good understanding of DTC's and NSB please help me clearify whether im completely off in my understanding of DTC's and whether there's some big pitfall I have completely missed with DTC's?
Kind regards

NServiceBus distributor and the use of DTC in NServiceBus don't have anything to do with one another. DTC will be used by NServiceBus whether you're using the distributor or not.
NSB distributor workers (and even the individual worker threads on a single box when the NSB distributor isn't used) don't enlist one another in distributed transactions. Let me reiterate, you will never see two NSB worker threads in a single DTC transaction. Each worker thread starts a transaction against a local queue and then adds a (likely remote) database to the transaction (which makes it distributed)
There's a nice illustration of the concept here
I don't think you're missing any big pitfalls. I'd just decouple the two concepts, NSB distributor and how distributed transactions are used by NSB

Related

Gathering distributed data into central database

I was assigned to update existing system of gathering data coming from points of sale and inserting it into central database. The one that is working now is based on FTP/SFTP transmission, where the information is sent once a day, usually at night. Unfortunately, because of unstable connection links (low quality 2G/3G modems), some of the files appear to be broken. With just a few shops connected that way everything was working smooth, but along with increasing number of shops, errors became more often. What is worse, the time needed to insert data into central database is taking up to 12 - 14h (including waiting for the data to be downloaded from all of the shops) and that cannot happen during the working day as it would block the process of creating sale reports and other activities with the database - so we are really tight with processing time here.
The idea my manager suggested is to send the data continuously, during the day. Data packages would be significantly smaller, so their transmission and insertion would be much faster, central server would contain actual (almost real time) data and night could be used for long running database activities like creating backups, rebuilding indexes etc.
After going through many websites, I found that:
using ASMX web service is now obsolete and WCF should be used instead
WCF with MSMQ or System Messaging could be used to safely transmit data, where I don't have to care that much about acknowledging delivery of data, consistency, nodes going offline etc.
according to http://blogs.msdn.com/b/motleyqueue/archive/2007/09/22/system-messaging-versus-wcf-queuing.aspx WCF queuing is better
there are also other technologies for implementing message queue, like RabbitMQ, ZeroMQ etc.
And that is where I become confused. With so many options, do you have any pros and cons of these technologies?
We were using .NET with Windows Forms and SQL Server, but if it would be necessary, we could change to something more suitable. I am also a bit afraid of server efficiency. After some calculations, server would be receiving about 15 packages of data per second (peak). Is it much? I know there are many websites without serious server infrastructure, that handle hundreds of visitors online and still run smooth, but the website mainly uploads data to the client, and here we would download it from the client.
I also found somewhat similar SO question: Middleware to build data-gathering and monitoring for a distributed system
where DDS was mentioned. What do you think about introducing some middleware servers that would cope with low quality links to points of sale, so the main server would not be clogged with 1KB/s transmission?
I'd be grateful with all your help. Thank you in advance!
Rabbitmq can easily cope with thousands of 1kb messages per second.
As your use case is not about processing real time data, I'd say you should combine few messages and send them as a batch. That would be good enough in order to spread load over the day.
As the motivation here is not to process the data in real time, then any transport layer would do the job. Even ftp/sftp. As rabbitmq will work fine here, it's not the typical use case for it.
As you mentioned that one of your concerns is slow/unreliable network, I'd suggest to compress the files before sending them, and on the receiving end, immediately verify their integrity. Rsync or similar will probably do great job in doing that.
From what I understand, you have basically two problems:
Potential for loss/corruption of call data
Database write performance
The potential for loss/corruption of call data is being caused by a lack of reliability in the transmission of data from client to service.
And it's not clear what is causing the database contention/performance issues, beyond a vague reference to high volumes, so this answer will be more geared towards solving the first problem.
You have correctly identified the need for reliable asynchronous communication transport as a way to address the reliability issues in your current setup.
Looking at MSMQ to deliver this is a valid first step. MSMQ provides reliable communication via a store and forward messaging semantic which comes out of the box and requires very little in the way of configuration.
Unfortunately, while suitable for your needs, MSMQ relies on 2 things:
A reliable network protocol, and
A client service running on both sending and receiving machine.
From your description above, I don't believe 1 exists (the internet is not a reliable network), and you might well struggle with 2 - MSMQ only ships with Windows Server or business/enterprise versions of Windows on the desktop.(*see below...)
As a possible solution to the network reliability problem, you could use a WCF or a RESTful endpoint (using Nancy or WebApi) to expose a service operation(s) exposed over HTTP, which would accept the incoming calls from the client machines. These technologies are quite different, so you'll need to make sure you're making the correct choice early on.
WCF supports WS-ReliableMessaging from the SOAP 1.2 specification out of the box, which allows for reliable web service calls over http, however it's very config-heavy and not generally a nice framework to work with.
REST much simpler than WCF in .Net, is very lightweight and easy to use. However, for reliable delivery you would have to expose some kind of GET operation (in addition to a POST to allow the client to send data) to be called (within a reasonable time-frame) to verify the data was committed. The client would have to implement some kind of retry semantic if the result of the GET "acknowledgement" was negative.
Despite requiring two operations rather than one for the WCF route, I would favour the REST approach. I've done plenty of both and find REST services way nicer to work with.
(*) That's not to say that MSMQ wouldn't work in your ultimate solution, just that it would not be used to address the transmission reliability issue. However it could still be used to address another of your problems, that of database write contention. If you were to queue incoming requests once they came into the server, then these could be processed by an "offline" process, which could then perform the required database operations in a reliable manner. This could be done by using MSMQ transactional queues.
In response to comments:
99% messages are passed from shop to main server, but if some change
is needed (price correction, discounts etc.), that data has to be sent
to shop.
This kind of changes things. Had I understood from the beginning that you had a bidirectional requirement, and seeing as how you have managed to establish msmq communication, I would have nudged you towards NServiceBus, which is a really, really cool wrapper around MSMQ. The reason I would have done this is that you appear to have both a one way, and a publish-subscribe requirement, which is supported really nicely by NServiceBus.

Is there a way to see what subscriptions exist currently for NServiceBus

I am concerned with my NServiceBus solution.
I have a "MessageHub" that publishes some very important messages. But sometimes it loses track of its subscriptions and just discards the message because it thinks no one is listening.
I have tried turning on "NServiceBus.Integration" to store the subscriptions. But despite that, I still have issues with bad start up order where it thinks nothing is listening.
Is there a way to debug this process? Try to figure out why it is getting confused?
I don't even know a way to look at what subscriptions it "thinks" it has...
I went with NServiceBus because it is not supposed to lose data ever. Now I am losing large chucks. I know it is a config issue, but it is causing much grief.
What is probably happening in your case is that you are using MSMQ for subscription storage. Even though it's possible for subscriptions to endure for a while, using MSMQ to store things long term is always going to be volatile.
For durable subscriptions storage (which survive "forever") you should be using SQL server as your subscription storage.
Note: You can always view your current subscriptions whether you are using sql or msmq to store them. In SQL just look in the subscriptions table and for msmq look in the publisher's subscription queue.
UPDATE
Since version 3 I have been using RavenDb which is the default.
In my experiance, to get the subscriptions assigned correctly, one should first start the EventHandler projects and then when they are all idle, start the CommandHandlers (Publishers).
You can see what messages are being Subscribed to using Service Bus MQ Manager, it has a dialog listing all "messages" and their subscribers/publishers. A side project of mine, its free and open sourced.
http://blog.halan.se/page/Service-Bus-MQ-Manager.aspx

SQL Server, using a table as a queue

I'm using an SQL Server 2008 R2 as a queuing mechanism. I add items to the table, and an external service reads and processes these items. This works great, but is missing one thing - I need mechanism whereby I can attempt to select a single row from the table and, if there isn't one, block until there is (preferably for a specific period of time).
Can anyone advise on how I might achieve this?
The only way to achieve a non-pooling blocking dequeue is WAITFOR (RECEIVE). Which implies Service Broker queues, with all the added overhead.
If you're using ordinary tables as queues you will not be able to achieve non-polling blocking. You must poll the queue by asking for a dequeue operation, and if it returns nothing, sleep and try again later.
I'm afraid I'm going to disagree with Andomar here: while his answer works as a generic question 'are there any rows in the table?' when it comes to queueing, due to the busy nature of overlapping enqueue/dequeue, checking for rows like this is a (almost) guaranteed deadlock under load. When it comes to using tables as queue, one must always stick to the basic enqueue/dequeue operations and don't try fancy stuff.
"since SQL Server 2005 introduced the OUTPUT clause, using tables as queues is no longer a hard problem". A great post on how to do this.
http://rusanu.com/2010/03/26/using-tables-as-queues/
I need mechanism whereby I can attempt
to select a single row from the table
and, if there isn't one, block until
there is (preferably for a specific
period of time).
You can loop and check for new rows every second:
while not exists (select * from QueueTable)
begin
wait for delay '00:01'
end
Disclaimer: this is not code I would use for a production system, but it does what you ask.
The previous commenter that suggested using Service Broker likely had the best answer. Service Broker allows you to essentially block while waiting for more input.
If Service Broker is overkill, you should consider a different approach to your problem. Can you provide more details of what you're trying to do?
Let me share my experiences with you in this area, you may find it helpful.
My team first used MSMQ transactional queues that would feed our asynchronous services (be they IIS hosted or WAS). The biggest problem we encountered was MS DTC issues under heavy load, like 100+ messages/second load; all it took was one slow database operation somewhere to start causing timeout exceptions and MS DTC would bring the house down so to speak (transactions would actually become lost if things got bad enough), and although we're not 100% certain of the root cause to this day, we do suspect MS DTC in a clustered environment has some serious issues.
Because of this, we started looking into different solutions. Service Bus for Windows Server (the on-premise version of Azure Service Bus) looked promising, but it was non-transactional so didn't suit our requirements.
We finally decided on the roll-your-own approach, an approach suggested to us by the guys who built the Azure Service Bus, because of our transactional requirements. Essentially, we followed the Azure Worker Role model for a worker role that would be fed via some queue; a polling-blocking model.
Honestly, this has been far better for us than anything else we've used. The pseudocode for such a service is:
hasMsg = true
while(true)
if(!hasMsg)
sleep
msg = GetNextMessage
if(msg == null)
hasMsg = false
else
hasMsg = true
Process(msg);
We've found that CPU usage is significantly lower this way (lower than traditional WCF services).
The tricky part of course is handling transactions. If you'd like to have multiple instances of your service read from the queue, you'll need to employ read-past/updlock in your sql, and also have your .net service enlist in the transactions in a way that will roll-back should the service fail. in this case, you'll want to go with retry/poison queues as tables in addition to your regular queues.

nServiceBus deployment approach on a single machine

I'm looking to put nServiceBus onto a single machine and am wondering if my understanding of a simple deployment is correct.
I intend to deploy each logical publisher and subscriber in their own service (as per advice here), and for each to have their own message queue (I will be using MSMQ). To deploy another subscriber is then as simple as adding the service and the queue. To remove it you just remove the service and the queue.
Is it really as simple as that for a low message volume single machine deployment?
Are there any serious gotchas I need be aware of with this approach?
That really should be it. The other thing you may want to consider is at least putting your error queue(s) on another machine in case that single machine crashes. This way you can still get an idea as to what the errors where. I think in a production environment you may want to consider a cluster to make it a little bit more reliable.

Should I use MSMQ or SQL Service Broker for transactions?

I've been asked by my team leader to investigate MSMQ as an option for the new version of our product. We use SQL Service Broker in our current version. I've done my fair share of experimentation and Googling to find which product is better for my needs, but I thought I'd ask the best site I know for programming answers.
Some details:
Our client is .NET 1.1 and 2.0 code; this is where the message will be sent from.
The target in a SQL Server 2005 instance. All messages end up being database updates or inserts.
We will send several updates that must be treated as a transaction.
We have to have perfect message recoverability; no messages can be lost.
We have to be asynchronous and able to accept messages even when the target SQL server is down.
Developing our own queuing solution isn't an option; we're a small team.
Things I've discovered so far:
Both MSMQ and SQL Service Broker can do the job.
It appears that service broker is faster for transactional messages.
Service Broker requires a SQL server running somewhere, whereas MSMQ needs any configured Windows machine running somewhere.
MSMQ appears to be better/faster/easier to set up/run in clusters.
Am I missing something? Is there a clear winner here? Any thoughts, experiences, or links would be valued. Thank you!
EDIT: We ended up sticking with service broker because we have a custom DB framework used in some of our client code (we handle transactions better). That code captured SQL for transactions, but not . The client code was also all version 1.1 of .NET, so we'd have to upgrade all the client code. Thanks for your help!
Having just migrated my application from Service Broker to MSMQ, I would have to vote for using MSMQ. There are several factors to take into account, but most of which have to do with how you are using your data and where the processing lives.
If processing is done in the database? Service Broker
If it is just data move? Service Broker
Is processing done in .NET/COM code? MSMQ
Do you need remote distributed transactions (for example, processing on a box different than SQL)? MSMQ
Do you need to be able to send messages while the destination is down? MSMQ
Do you want to use nServiceBus, MassTransit, Rhino-ESB, etc.? MSMQ
Things to consider no matter what you choose
How do you know the health of your queue? Both options handle failover differently. For example Service Broker will disable your queue in certain scenarios which can take down your application.
How will you perform reporting? If you already use SQL Tables in your reports, Service Broker can easily fit in as it's just another dynamic table. If you are already using Performance Monitor MSMQ may fit in nicer. Service Broker does have a lot of performance counters, so don't let this be your only factor.
How do you measure uptime? Is it merely making sure you don't lose transactions, or do you need to respond synchronously? I find that the distributed nature of MSMQ allows for higher uptime because the main queue can go offline and not lose anything. Whereas with Service Broker your database must be online or else you lose out.
Do you already have experience with one of these technologies? Both have a lot of implementation details that can come back and bite you.
No mater what choice you make, how easy is it to switch out the underlying Queueing technology? I recommend having a generic IQueue interface that you write a concrete implementation against. This way the choice you make can easily be changed later on if you find that you made the wrong one. After all, a queue is just a queue and should not lock you into a specific implementation.
I've used MSMQ before and the only item I'd add to your list is a prerequisite check for versioning. I ran into an issue where one site had Win 2000 Server and therefore MSMQ v.2, versus Win 2003 Server and MSMQ v3. All my .NET code targeted v.3 and they aren't compatible... or at least not easily so.
Just a consideration if you go the MSMQ route.
The message size limitation in MSMQ has halted my digging in that direction. I am learning Service Broker for the project.
Do you need to be able to send messages while the destination is down? MSMQ
I don't understand why? SSB can send messages to disconnected destination without any problem. All this messages going to transmission queue and would be delivered when destination stay reachable.