many Distributors to many workers - nservicebus

We're a licensed product using NServiceBus as the messaging framework in our federated system.
Looking for opportunities for using it in a new feature-
Is there a way to build a multi-site system (scaled-out), at which each site/node produces and distributes messages to workers located on several nodes/sites?
Each distributor and worker can be hosted at it's own site (same LAN), and each site can go down at any point. All distributors and workers should be symmetric.
At first it looked like a classical "many producers to many competing consumers" problem. but I can't find a built-in way to achieve it with NServiceBus as, from what I saw, each worker can send health sings to only one distributor (I might be wrong about that).
Another issue I came against is with having a centralized RavenDB instance holding the distributor subscriptions. Having the RavenDB in it's own "availability group" requires additional resources. Is there a way to host the RavenDB instance under the same sites, having their data replicated while each site is using it's local DB instance? This will also bind the HA of the distributors to their subscription DB's.
Reading this discussion- http://tech.groups.yahoo.com/group/nservicebus/message/18412
It seems that NServiceBus requires a cluster to keep HA of the published data. But why can’t the distributor wait for an acknowledge that the msg was successfully consumed & processed by the worker, and keep retrying to publish it to the same or a different worker? This way, even if the VM went down with data in the queues, the data will be sent to another node which is available at that time.
Edit: Tried to ask the same at NServiceBus official Yahoo group but keep getting pythonError within the yahoo group.
Thanks in advance,
Rami Prilutsky, dbMotion

Related

How to get user specific data in a queue from ActiveMQ

If as admin I wanted to know from a particular queue A, how many calls initiated by which person and how many get dequeued, and how many are still in queue # any time.
I just want to develop one UI in my application to show those user-specific records from ActiveMQ.
There is no built in functionality in the broker that does this sort of thing. You could develop your own broker plugin that tracks these things but you'd need to build some sort of DB or other storage as you would lose any in-memory stats when a broker is restarted. You should use caution when trying to push all requirements into the message broker for system level management as that is not its purpose and will likely result in other issues when you do.

In RabbitMQ, How to make Queues in different clusters to Be Highly Available without Clustering?

In RabbitMQ, if two clusters are hosted on geographical different locations, then we can’t use Clustering. Then how to make them highly available I.e. if one site’s whole cluster goes down then the messages should be mirrored to other site and other site should be able to cater those messages. Note : sites are connected by WAN
See I can’t lose any message on the both sites. Publishing message to the right site can be taken care of, but if the messages are in queue(work queue) or messages are being processed by consumer and suddenly if the site goes down which includes the broker and consumer, how can those messages be catered by the other site. Like in a cluster if one node dies, the other one has all the messages mirrored and knows which were acknowledged, but how to achieve this on WAN, cause clustering cross WAN is not practical.
I think the question illustrates a conceptual problem with the design. To summarize,
There are two sites, connected via WAN
One site is the primary, while one is the active standby
There is a desire for complete replication of system state (total consistency) between site A and B, to include the status of messages in the queue and messages being processed.
Essentially, you want 100% consistency, availability, and partition tolerance. Such a design is not possible according to CAP Theorem. What RabbitMQ provides is either consistency and availability, with low partition tolerance via clustering, or availability and partition tolerance via federation or shovel. RabbitMQ does not deal very well with the case of needing consistency and partition tolerance, since message brokers really handle highly transient traffic.
Instead, what is needed is to fully scope the problem to something that can be solved using the available technologies. It sounds to me like the correct approach (since it's over a WAN) is to sacrifice availability for consistency and partition tolerance, and have your application handle the failover case. You may be able to configure RabbitMQ sufficiently in this regard - see https://www.rabbitmq.com/partitions.html.

NService Bus: Nitty-Gritty Deployment Issues

Please consider the following questions in the context of multiple publications from a scaled out publisher (using DB subscription storage) and multiple subscriptions with scaled out subscribers (using distributors) where installs and uninstalls happen regularly for initial deployments, upgrades, etc. using automated MSI's.
Using DB subscription storage, what happens if the DB goes down? If access to the Subscription DB is required in order to Publish a message, how will it be delivered? Will it get lost? Will the call to Bus.Publish throw an exception?
Assuming you need to have no down-time deployments: What if you want to move your subscription DB for a particular publication to a different server? How do you manage a transition like this?
Same question goes for a distributor on the subscriber side: What if you want to move your distributor endpoint? One scenario I can think of is if you have multiple subscriptions utilizing a single distributor machine, it might be hard if you want to move some of them to another distributor server to reduce load.
What would the install/uninstall scenarios look like for a setup like this (both initially, and for continuous upgrades)? It seems like you would want to have some special install/uninstall scripts for deployment of the "logical publication" and subscription DB, as well as for the "logical subscriptions" and the distributors. The publisher instances wouldn't need any special install/uninstall logic (since they just start publishing messages using the configured subscription DB, and then stop when they are uninstalled). The subscriber worker nodes wouldn't need anything special on install other than the correct configuration of the distributor endpoint, but would need uninstall logic to make sure they are removed from the distributors list of worker nodes.
Eventually the publisher will fail and the messages will build up in the internal queue. You will have to plan the size of disk you need to handle this based on the message size and how long you want to wait for a DB to come up. From there it is based how much downtime you can handle. You can use DB mirroring or clustering to make the DB have less downtime.
Mirroring and clustering technologies can also help with this. Depends on if you want to do manual or automatic failover and where your doing it(remote sites?).
Clustering MSMQ could help you here. If you want to drop a distributor and move it within a cluster you'd be ok. Another possibility is to expose your distributors via HTTP and load balance them behind either a software or hardware load balancing solution. Behind the load balancer you'd be more free to move things around.
Sounds like you have a good grasp on this one already :)
To your first question, about the high availability of the subscription DB, you can use a cluster for failover. If the DB is down, then the Bus.Publish will throw an exception, yes. It is recommended to keep the subscription DB separate from your applicative DB to avoid having to bring it down when upgrading your app. This doesn't have to be a separate DB server, a separate DB on the same DB server will be fine.
About moving servers, this is usually managed at a DNS level where for a certain period of time you'll have both running, until communication moves over.
On your third question about distributors - don't share a distributor between different publishers or subscribers.
As a rule of thumb, it is recommended to not add/remove subscribers when doing these kinds of maintainenance activities. This usually simplifies things quite a bit.

NServiceBus: Pros and Cons of using NServiceBus Distributor

I am considering using a Network Load Balancer to load balance messages between my subscriber instances, instead of using the NServiceBus distributor (which is basically just a software load-balancer from what I can tell). Each subscriber instance will have a queue of the same name for messages to be delivered to, and there will be a virtual IP that round-robins between the subscribers. The publisher will only know about the virtual IP and queue name.
Here is what I understand as the pros and cons of doing this:
PROS
No need to install NServiceBus Distributor
One less thing that would need to be managed/updated when we are scaling-out (we already use an F5 to load balance these machines, and our data center buys know it like the back of their hand)
One less point of failure (yes, the NLB could fail, but let's face it, an F5 is going to be a lot more stable than NServiceBus Distributor running on Windows)
No need to have a clustered server to have our clustered MSMQ. 2 servers is a lot more expensive than just adding another VIP to an F5.
CONS
The NServiceBus Distributor allows you to see the backlog of messages more easily since there is a single queue on the Distributor you can monitor. This makes it easy to know when you should add more worker nodes.
The NServiceBus Distributor is smarter about controlling of number of worker threads, etc. Gives you more control than an NLB? (not sure about this one)
Have I captured this accurately? I know it is recommended to use the NServiceBus Distributor, and I would like to know more of why before I go against that recommendation.
Youve' got some of the main points down, but one of the main differences is that since the distributor holds on to load itself, if a machine were to go down, the rest of the load would be distributed between the remaining machines with a much lower SLA impact on the messages.

Failover scenarious for the Service Bus with NServiceBus or MassTransit

I need to build Identity server like Microsoft's http://login.live.com.
To handle failover I will have multiple web servers nodes. The plan is that all database write operations are done by sending messages to the database server. Database will be mirrored or replicated. The idea is that database subscribes to the write operations but that other nodes subscribe also. That way other nodes do not need to read from database and can update their caches.
I am just starting to learn the service bus architecture and what is not clear to me is how to handle failover scenario for the service bus.
Question:
If database server is not available, what will happen with the published messages ?
Will they be stored somewhere and where ?
Do I need additional machine or a cluster to handle failover of the service bus?
I read that SQL Server can be used as a message store but can I use durable MSMQ? I am queuing messages to be able to write them to the database so why would I store them to the DB first just to take them and write them again? OR, I am getting this wrong and DB is only used for the list of subscriptions and not for the Messages?
Whe implementing this kind of architecture, you should look at applying the principles of CQRS - queries (is this user/pwd combo valid) should not be done via the bus; commands (change pwd, forgot pwd) are sent via the bus, not published as events. While internally you will likely use events to keep the command and query sides in sync, this doesn't involve the client.
Queries can be done using simple ado.net against the replicated-read-slaves of your DB - what's known as the persistent view model in CQRS. If you like, you can put some simple WCF in front of that too.
When using MSMQ, all messages are delivered via store-and-forward. That means that they're first stored on the client before being delivered to the server, so if the server is down, the messages sit on the client waiting. For fault-tolerance, you will want your messages to be recoverable (written to disk) - this is the default in NServiceBus but not the default of standard MSMQ (don't know about MassTransit). You don't need the database for this.
In NServiceBus, the bus is not installed on a separate machine so you don't need to deal with its availability independently of the rest of the system. It's only when you look at scaling our your command processing to more nodes that you might consider using the message-based load balancer in NServiceBus (called the distributor) which, for high availability, should be installed on a cluster or fault-tolerant hardware.
This will depend on how it is setup, but in MassTransit you can leave the subscription active so the message will still be delivered to the queue for the DB. When the DB is active again, you can read the messages in the queue.
Each service connected to a service bus, in MassTransit, has an active queue for itself. The messages will be stored there.
I think this is a "it depends"... MassTransit has support for other MQs than MSMQ but is really built around MSMQ. We have no experienced great support for things such as failover from MSMQ. However, everything will continue to run without fault if the subscription service (i.e. the bus) fails - the services already know who to talk to. It's only when a change in a consumer (subscribe or unsubscribe) where this becomes a problem. For me, that's an event that happens almost never.
With MassTransit, we use the DB to store the subscription states but all the messages are stored in MSMQ.
If you'd like more details in one of these responses or have additional questions about MT, you can join us on the mailing list: http://groups.google.com/group/masstransit-discuss.