I have a logical publication which is basically a bunch of MT servers, who all access a DB subscription storage. These MTs are typically upgraded by taking 1/2 out of rotation, installing the new MT version, bringing them back online, and then repeating for the other half.
I am confused how a subscriber would subscribes to such a publication. In all of the examples I have seen, a subscriber needs to have a publisher's InputQueue specified in configuration in order for the subscription request to be received. But what InputQueue would I specify in this situation? I don't want subscription to fail if some of my publisher MT's happen to be down. Would I just subscribe manually by adding a record to the DB subscription storage?
Publishers usually publish as a result of processing some command from a client, and as such, you usually use a distributor to scale them out, as well as using the DB subscription storage. Subscribers are another kind of client so you would configure them to point to the distributor as well.
Related
I have a cluster of backend servers on GCP, and they need to send messages to each other. All the servers need to receive every message, but I can tolerate a low error rate. I can deal with receiving the message more than once on a given server. Packet ordering doesn't matter.
I don't need much of a persistence layer. A message becomes stale within a couple of seconds after sending it.
I wired up Google Cloud PubSub and pretty quickly realized that for a given subscription, you can have any number of subscribers but only one of them is guaranteed to get the message. I considered making the subscribers all fail to ack it, but that seems like a gross hack that probably won't work well.
My server cluster is sized dynamically by an autoscaler. It spins up VM instances as needed, with dynamic hostnames and IP addresses. There is no convenient way to map the dynamic hosts to static subscriptions, but it feels like that's my only real option: Create more subscriptions than my max server pool size, and then use some sort of paxos system (runtime config, zookeeper, whatever) to allocate servers to subscriptions.
I'm starting to feel that even though my use case feels really simple ("Every server can multicast a message to every other server in my group"), it may not be a good fit for Cloud PubSub.
Should I be using GCM/FCM? Or some other technology?
Cloud Pub/Sub may or may not be a fit for you, depending on the size of your server cluster. Failing to ack the messages certainly won't work because you can't be sure each instance will get the message; it could just be redelivered to the same instance over and over again.
You could use multiple subscriptions and have each instance create a new subscription when it starts up. This only works if you don't plan to scale beyond 10,000 instances in your cluster, as that is the maximum number of subscriptions per topic allowed. The difficulty here is in cleaning up subscriptions for instances that go down. Ones that cleanly shut down could probably delete their own subscriptions, but there will always be some that don't get cleaned up. You'd need some kind of external process that can determine if the instance for each subscription is still up and running and if not, delete the subscription. You could use GCE shutdown scripts to catch this most of the time, though there will still be edge cases where deletes would have to be done manually.
I am concerned with my NServiceBus solution.
I have a "MessageHub" that publishes some very important messages. But sometimes it loses track of its subscriptions and just discards the message because it thinks no one is listening.
I have tried turning on "NServiceBus.Integration" to store the subscriptions. But despite that, I still have issues with bad start up order where it thinks nothing is listening.
Is there a way to debug this process? Try to figure out why it is getting confused?
I don't even know a way to look at what subscriptions it "thinks" it has...
I went with NServiceBus because it is not supposed to lose data ever. Now I am losing large chucks. I know it is a config issue, but it is causing much grief.
What is probably happening in your case is that you are using MSMQ for subscription storage. Even though it's possible for subscriptions to endure for a while, using MSMQ to store things long term is always going to be volatile.
For durable subscriptions storage (which survive "forever") you should be using SQL server as your subscription storage.
Note: You can always view your current subscriptions whether you are using sql or msmq to store them. In SQL just look in the subscriptions table and for msmq look in the publisher's subscription queue.
UPDATE
Since version 3 I have been using RavenDb which is the default.
In my experiance, to get the subscriptions assigned correctly, one should first start the EventHandler projects and then when they are all idle, start the CommandHandlers (Publishers).
You can see what messages are being Subscribed to using Service Bus MQ Manager, it has a dialog listing all "messages" and their subscribers/publishers. A side project of mine, its free and open sourced.
http://blog.halan.se/page/Service-Bus-MQ-Manager.aspx
Related to this question:
As I understand it, two physical publishers represent a single logical publisher must each have their own subscription queue. Using DBSubscriptionStorage allows them to have a common list of subscribers, but what happens when a new subscriber pops up and subscribes? The subscribe message will go onto one of the subscription queues and then into the database. Is there any way, short of restarting the other publishers, that they can be made aware of the new subscriber?
2 Physical Publishers with Many Subscribers
Configuration
Every endpoint(publisher or subscriber) has its own input queue.
Each Publisher will be configured to point to the same shared subscription database
Each Subscriber will be configured to point to one Publisher input queue(this is where they will drop their subscription messages)
Processing
Subscriber1 places a subscription message(M1) into Publisher1's input queue
Publisher1 saves that subscription into the database
Subscriber2 places a subscription message(M2) into Publisher2's input queue
Publisher2 saves that subscription to the database
Publisher1 publishers M1 which is placed into Subscriber1's input queue
Same occurs for Publisher2 and M1
You will have to decide if all Subscribers are interested in the same messages. It is ok to have both Publishers publish the same messages as each Subscriber will only subscribed once(either at P1 or P2). You have full control on how to "load balance" the work. More information can be found here if you haven't looked at it already
You have a couple of options. One of the easiest is to have each of the separate physical publishers pointing to a single physical subscriber/subscription database.
Another really good way to handle it is with database replication. The only problem with replication is that it's inherently "one way". Even so there's a really interesting project for MySQL called "MySQL MMM" that seems to be a perfect fit for this scenario.
Finally, you could potentially have your own subscription storage using something like Membase which is a persistent, replicated key/value store.
Bottom line: you can have a single subscription database which is the easiest, but you have a failure point. Or you can have a replicated subscription storage. The replicated storage will ensure that all nodes have a list of all subscribers.
Please consider the following questions in the context of multiple publications from a scaled out publisher (using DB subscription storage) and multiple subscriptions with scaled out subscribers (using distributors) where installs and uninstalls happen regularly for initial deployments, upgrades, etc. using automated MSI's.
Using DB subscription storage, what happens if the DB goes down? If access to the Subscription DB is required in order to Publish a message, how will it be delivered? Will it get lost? Will the call to Bus.Publish throw an exception?
Assuming you need to have no down-time deployments: What if you want to move your subscription DB for a particular publication to a different server? How do you manage a transition like this?
Same question goes for a distributor on the subscriber side: What if you want to move your distributor endpoint? One scenario I can think of is if you have multiple subscriptions utilizing a single distributor machine, it might be hard if you want to move some of them to another distributor server to reduce load.
What would the install/uninstall scenarios look like for a setup like this (both initially, and for continuous upgrades)? It seems like you would want to have some special install/uninstall scripts for deployment of the "logical publication" and subscription DB, as well as for the "logical subscriptions" and the distributors. The publisher instances wouldn't need any special install/uninstall logic (since they just start publishing messages using the configured subscription DB, and then stop when they are uninstalled). The subscriber worker nodes wouldn't need anything special on install other than the correct configuration of the distributor endpoint, but would need uninstall logic to make sure they are removed from the distributors list of worker nodes.
Eventually the publisher will fail and the messages will build up in the internal queue. You will have to plan the size of disk you need to handle this based on the message size and how long you want to wait for a DB to come up. From there it is based how much downtime you can handle. You can use DB mirroring or clustering to make the DB have less downtime.
Mirroring and clustering technologies can also help with this. Depends on if you want to do manual or automatic failover and where your doing it(remote sites?).
Clustering MSMQ could help you here. If you want to drop a distributor and move it within a cluster you'd be ok. Another possibility is to expose your distributors via HTTP and load balance them behind either a software or hardware load balancing solution. Behind the load balancer you'd be more free to move things around.
Sounds like you have a good grasp on this one already :)
To your first question, about the high availability of the subscription DB, you can use a cluster for failover. If the DB is down, then the Bus.Publish will throw an exception, yes. It is recommended to keep the subscription DB separate from your applicative DB to avoid having to bring it down when upgrading your app. This doesn't have to be a separate DB server, a separate DB on the same DB server will be fine.
About moving servers, this is usually managed at a DNS level where for a certain period of time you'll have both running, until communication moves over.
On your third question about distributors - don't share a distributor between different publishers or subscribers.
As a rule of thumb, it is recommended to not add/remove subscribers when doing these kinds of maintainenance activities. This usually simplifies things quite a bit.
I need to build Identity server like Microsoft's http://login.live.com.
To handle failover I will have multiple web servers nodes. The plan is that all database write operations are done by sending messages to the database server. Database will be mirrored or replicated. The idea is that database subscribes to the write operations but that other nodes subscribe also. That way other nodes do not need to read from database and can update their caches.
I am just starting to learn the service bus architecture and what is not clear to me is how to handle failover scenario for the service bus.
Question:
If database server is not available, what will happen with the published messages ?
Will they be stored somewhere and where ?
Do I need additional machine or a cluster to handle failover of the service bus?
I read that SQL Server can be used as a message store but can I use durable MSMQ? I am queuing messages to be able to write them to the database so why would I store them to the DB first just to take them and write them again? OR, I am getting this wrong and DB is only used for the list of subscriptions and not for the Messages?
Whe implementing this kind of architecture, you should look at applying the principles of CQRS - queries (is this user/pwd combo valid) should not be done via the bus; commands (change pwd, forgot pwd) are sent via the bus, not published as events. While internally you will likely use events to keep the command and query sides in sync, this doesn't involve the client.
Queries can be done using simple ado.net against the replicated-read-slaves of your DB - what's known as the persistent view model in CQRS. If you like, you can put some simple WCF in front of that too.
When using MSMQ, all messages are delivered via store-and-forward. That means that they're first stored on the client before being delivered to the server, so if the server is down, the messages sit on the client waiting. For fault-tolerance, you will want your messages to be recoverable (written to disk) - this is the default in NServiceBus but not the default of standard MSMQ (don't know about MassTransit). You don't need the database for this.
In NServiceBus, the bus is not installed on a separate machine so you don't need to deal with its availability independently of the rest of the system. It's only when you look at scaling our your command processing to more nodes that you might consider using the message-based load balancer in NServiceBus (called the distributor) which, for high availability, should be installed on a cluster or fault-tolerant hardware.
This will depend on how it is setup, but in MassTransit you can leave the subscription active so the message will still be delivered to the queue for the DB. When the DB is active again, you can read the messages in the queue.
Each service connected to a service bus, in MassTransit, has an active queue for itself. The messages will be stored there.
I think this is a "it depends"... MassTransit has support for other MQs than MSMQ but is really built around MSMQ. We have no experienced great support for things such as failover from MSMQ. However, everything will continue to run without fault if the subscription service (i.e. the bus) fails - the services already know who to talk to. It's only when a change in a consumer (subscribe or unsubscribe) where this becomes a problem. For me, that's an event that happens almost never.
With MassTransit, we use the DB to store the subscription states but all the messages are stored in MSMQ.
If you'd like more details in one of these responses or have additional questions about MT, you can join us on the mailing list: http://groups.google.com/group/masstransit-discuss.