Please consider the following questions in the context of multiple publications from a scaled out publisher (using DB subscription storage) and multiple subscriptions with scaled out subscribers (using distributors) where installs and uninstalls happen regularly for initial deployments, upgrades, etc. using automated MSI's.
Using DB subscription storage, what happens if the DB goes down? If access to the Subscription DB is required in order to Publish a message, how will it be delivered? Will it get lost? Will the call to Bus.Publish throw an exception?
Assuming you need to have no down-time deployments: What if you want to move your subscription DB for a particular publication to a different server? How do you manage a transition like this?
Same question goes for a distributor on the subscriber side: What if you want to move your distributor endpoint? One scenario I can think of is if you have multiple subscriptions utilizing a single distributor machine, it might be hard if you want to move some of them to another distributor server to reduce load.
What would the install/uninstall scenarios look like for a setup like this (both initially, and for continuous upgrades)? It seems like you would want to have some special install/uninstall scripts for deployment of the "logical publication" and subscription DB, as well as for the "logical subscriptions" and the distributors. The publisher instances wouldn't need any special install/uninstall logic (since they just start publishing messages using the configured subscription DB, and then stop when they are uninstalled). The subscriber worker nodes wouldn't need anything special on install other than the correct configuration of the distributor endpoint, but would need uninstall logic to make sure they are removed from the distributors list of worker nodes.
Eventually the publisher will fail and the messages will build up in the internal queue. You will have to plan the size of disk you need to handle this based on the message size and how long you want to wait for a DB to come up. From there it is based how much downtime you can handle. You can use DB mirroring or clustering to make the DB have less downtime.
Mirroring and clustering technologies can also help with this. Depends on if you want to do manual or automatic failover and where your doing it(remote sites?).
Clustering MSMQ could help you here. If you want to drop a distributor and move it within a cluster you'd be ok. Another possibility is to expose your distributors via HTTP and load balance them behind either a software or hardware load balancing solution. Behind the load balancer you'd be more free to move things around.
Sounds like you have a good grasp on this one already :)
To your first question, about the high availability of the subscription DB, you can use a cluster for failover. If the DB is down, then the Bus.Publish will throw an exception, yes. It is recommended to keep the subscription DB separate from your applicative DB to avoid having to bring it down when upgrading your app. This doesn't have to be a separate DB server, a separate DB on the same DB server will be fine.
About moving servers, this is usually managed at a DNS level where for a certain period of time you'll have both running, until communication moves over.
On your third question about distributors - don't share a distributor between different publishers or subscribers.
As a rule of thumb, it is recommended to not add/remove subscribers when doing these kinds of maintainenance activities. This usually simplifies things quite a bit.
Related
If as admin I wanted to know from a particular queue A, how many calls initiated by which person and how many get dequeued, and how many are still in queue # any time.
I just want to develop one UI in my application to show those user-specific records from ActiveMQ.
There is no built in functionality in the broker that does this sort of thing. You could develop your own broker plugin that tracks these things but you'd need to build some sort of DB or other storage as you would lose any in-memory stats when a broker is restarted. You should use caution when trying to push all requirements into the message broker for system level management as that is not its purpose and will likely result in other issues when you do.
I have a cluster of backend servers on GCP, and they need to send messages to each other. All the servers need to receive every message, but I can tolerate a low error rate. I can deal with receiving the message more than once on a given server. Packet ordering doesn't matter.
I don't need much of a persistence layer. A message becomes stale within a couple of seconds after sending it.
I wired up Google Cloud PubSub and pretty quickly realized that for a given subscription, you can have any number of subscribers but only one of them is guaranteed to get the message. I considered making the subscribers all fail to ack it, but that seems like a gross hack that probably won't work well.
My server cluster is sized dynamically by an autoscaler. It spins up VM instances as needed, with dynamic hostnames and IP addresses. There is no convenient way to map the dynamic hosts to static subscriptions, but it feels like that's my only real option: Create more subscriptions than my max server pool size, and then use some sort of paxos system (runtime config, zookeeper, whatever) to allocate servers to subscriptions.
I'm starting to feel that even though my use case feels really simple ("Every server can multicast a message to every other server in my group"), it may not be a good fit for Cloud PubSub.
Should I be using GCM/FCM? Or some other technology?
Cloud Pub/Sub may or may not be a fit for you, depending on the size of your server cluster. Failing to ack the messages certainly won't work because you can't be sure each instance will get the message; it could just be redelivered to the same instance over and over again.
You could use multiple subscriptions and have each instance create a new subscription when it starts up. This only works if you don't plan to scale beyond 10,000 instances in your cluster, as that is the maximum number of subscriptions per topic allowed. The difficulty here is in cleaning up subscriptions for instances that go down. Ones that cleanly shut down could probably delete their own subscriptions, but there will always be some that don't get cleaned up. You'd need some kind of external process that can determine if the instance for each subscription is still up and running and if not, delete the subscription. You could use GCE shutdown scripts to catch this most of the time, though there will still be edge cases where deletes would have to be done manually.
I've been trying to find out ways to improve our nservicebus code performance. I searched and stumbled on these profiles that you can set upon running/installing the nservicebus host.
Currently we're running the nservicebus host as-is, and I read that by default we are using the "Lite" version of the available profiles. I've also learnt from this link:
http://docs.particular.net/nservicebus/hosting/nservicebus-host/profiles
that there are Integrated and Production profiles. The documentation does not say much - has anyone tried the Production profiles and noticed an improvement in nservicebus performance? Specifically affecting the speed in consuming messages from the queues?
One major difference between the NSB profiles is how they handle storage of subscriptions.
The lite, integration and production profiles allow NSB to configure how reliable it is. For example, the lite profile uses in-memory subscription storage for all pub/sub registrations. This is a concern because in order to register a subscriber in the lite profile, the publisher has to already be running (so the publisher can store the subscriber list in memory). What this means is that if the publisher crashes for any reason (or is taken offline), all the subscription information is lost (until each subscriber is restarted).
So, the lite profile is good if you are running on a developer machine and want to quickly test how your services interact. However, it is just not suitable to other environments.
The integration profile stores subscription information on a local queue. This can be good for simple environments (like QA etc.). However, in a highly distributed environment holding the subscription information in a database is best, hence the production profile.
So, to answer your question, I don't think that by changing profiles you will see a performance gain. If anything, changing from the lite profile to one of the other profiles is likely to decrease performance (because you incur the cost of accessing queue or database storage).
Unless you tuned the logging yourself, we've seen large improvements based on reduced logging. The performance from reading off the queues is same all around. Since the queues are local, you won't gain much from the transport. I would take a look at tuning your handlers and the underlying infrastructure. You may want to check out tuning MSMQ and look at the disk you are using etc. Another spot would be to look at how distributed transactions are working assuming you are using a remote database that requires them.
Another option to increase processing time is to increase the number of threads consuming the queue. This will require a license. If a license is not an option you can have multiple instances of a single threaded endpoint running. This requires you shard your work based on message type or something else.
Continuing up the scale you can then get into using the Distributor to load balance work. Again this will require a license, but you'll be able to add more nodes as necessary. All of the opportunities above also apply to this topology.
I am considering using a Network Load Balancer to load balance messages between my subscriber instances, instead of using the NServiceBus distributor (which is basically just a software load-balancer from what I can tell). Each subscriber instance will have a queue of the same name for messages to be delivered to, and there will be a virtual IP that round-robins between the subscribers. The publisher will only know about the virtual IP and queue name.
Here is what I understand as the pros and cons of doing this:
PROS
No need to install NServiceBus Distributor
One less thing that would need to be managed/updated when we are scaling-out (we already use an F5 to load balance these machines, and our data center buys know it like the back of their hand)
One less point of failure (yes, the NLB could fail, but let's face it, an F5 is going to be a lot more stable than NServiceBus Distributor running on Windows)
No need to have a clustered server to have our clustered MSMQ. 2 servers is a lot more expensive than just adding another VIP to an F5.
CONS
The NServiceBus Distributor allows you to see the backlog of messages more easily since there is a single queue on the Distributor you can monitor. This makes it easy to know when you should add more worker nodes.
The NServiceBus Distributor is smarter about controlling of number of worker threads, etc. Gives you more control than an NLB? (not sure about this one)
Have I captured this accurately? I know it is recommended to use the NServiceBus Distributor, and I would like to know more of why before I go against that recommendation.
Youve' got some of the main points down, but one of the main differences is that since the distributor holds on to load itself, if a machine were to go down, the rest of the load would be distributed between the remaining machines with a much lower SLA impact on the messages.
I need to build Identity server like Microsoft's http://login.live.com.
To handle failover I will have multiple web servers nodes. The plan is that all database write operations are done by sending messages to the database server. Database will be mirrored or replicated. The idea is that database subscribes to the write operations but that other nodes subscribe also. That way other nodes do not need to read from database and can update their caches.
I am just starting to learn the service bus architecture and what is not clear to me is how to handle failover scenario for the service bus.
Question:
If database server is not available, what will happen with the published messages ?
Will they be stored somewhere and where ?
Do I need additional machine or a cluster to handle failover of the service bus?
I read that SQL Server can be used as a message store but can I use durable MSMQ? I am queuing messages to be able to write them to the database so why would I store them to the DB first just to take them and write them again? OR, I am getting this wrong and DB is only used for the list of subscriptions and not for the Messages?
Whe implementing this kind of architecture, you should look at applying the principles of CQRS - queries (is this user/pwd combo valid) should not be done via the bus; commands (change pwd, forgot pwd) are sent via the bus, not published as events. While internally you will likely use events to keep the command and query sides in sync, this doesn't involve the client.
Queries can be done using simple ado.net against the replicated-read-slaves of your DB - what's known as the persistent view model in CQRS. If you like, you can put some simple WCF in front of that too.
When using MSMQ, all messages are delivered via store-and-forward. That means that they're first stored on the client before being delivered to the server, so if the server is down, the messages sit on the client waiting. For fault-tolerance, you will want your messages to be recoverable (written to disk) - this is the default in NServiceBus but not the default of standard MSMQ (don't know about MassTransit). You don't need the database for this.
In NServiceBus, the bus is not installed on a separate machine so you don't need to deal with its availability independently of the rest of the system. It's only when you look at scaling our your command processing to more nodes that you might consider using the message-based load balancer in NServiceBus (called the distributor) which, for high availability, should be installed on a cluster or fault-tolerant hardware.
This will depend on how it is setup, but in MassTransit you can leave the subscription active so the message will still be delivered to the queue for the DB. When the DB is active again, you can read the messages in the queue.
Each service connected to a service bus, in MassTransit, has an active queue for itself. The messages will be stored there.
I think this is a "it depends"... MassTransit has support for other MQs than MSMQ but is really built around MSMQ. We have no experienced great support for things such as failover from MSMQ. However, everything will continue to run without fault if the subscription service (i.e. the bus) fails - the services already know who to talk to. It's only when a change in a consumer (subscribe or unsubscribe) where this becomes a problem. For me, that's an event that happens almost never.
With MassTransit, we use the DB to store the subscription states but all the messages are stored in MSMQ.
If you'd like more details in one of these responses or have additional questions about MT, you can join us on the mailing list: http://groups.google.com/group/masstransit-discuss.