Service Bus for Windows Server High-Availability and Disaster Recovery - servicebus

We are currently implementing Service Bus for Windows Server in our solution, in order to replace MSMQ which offers limited disaster recovery options.
According to this, you can create a new farm using the existing certificate or with a new certificate, or recover the storage layer by restoring SQL from backups and putting new SWBS nodes in a farm and connecting to this storage layer.
I'm not keen on (in the event of a disaster) building a new farm.
We have two datacentres (DC1 and DC2). Could we:
Put two compute nodes in DC1 (Node1 and Node2), and one node in DC2 (Node3)? This will satisfy the SWBS requirement of at least 3 nodes for HA.
Create an A record for our Farm DNS, and setup a load balancer in DC1 for this DNS name, that only points to Node1 and Node2 (so clients always connect to only those)?
In the event that DC2 is lost (and as a result Node3), we'd be all good, since Node1 and Node2 still provides HA, and the load balancer in DC1 still balances between these two
What would happen though, if we lost DC1? My initial idea is to change the A record for our Farm DNS to point to Node3. However, there will only be one node in the Farm (since Node1 and Node2 was lost with DC1). Will the service bus continue to operate, but just not have HA? Could I create multiple A records (one for the load balancer between Node1 and Node2, and one for Node3), and clients will alternate between the two?
PS: Our storage layer is handled by a SQL Availability Group, which has two (primary and secondary) nodes in DC1, and a third node (secondary as well) in DC2, so storage is covered

Related

Regarding cluster configuration in Ignite

Let us say I've two server nodes in one data center DC1 and two more server
nodes in another data center DC2. Two data centers have some network delay.
Now I'm using SQL select statements on caches which are replicated. Now
those caches' write synchronization mode is FULL_SYNC.
Now at a time we have working clients nodes only in one DC but not both.
Let's say we have two clients in DC1.
So total nodes is 6 (2 client nodes and 2 server nodes in DC1 and 2
server nodes in DC2).
Our use case is such a way that..
2 clients should query only 2 server nodes in DC1 and not the other 2
servers in DC2.
All the cache queries should be in FULL_SYNC with 2 server nodes in DC1
and DC1-DC2 should be done in ASYNC mode.
A doubt I got which is, if in client's node discoveryspi, if I (X,Y) ip
list as server nodes ips, would the queries always reach X,Y even though the
entire topology contains X,Y,Z as server nodes?
Please someone provide us the solution for this.
Note: I saw one GridGain's capability for cluster-cluster replication but that comes under paid version. I am looking for a solution in the community edition.
A doubt I got which is, if in client's node discoveryspi, if I (X,Y) ip list as server nodes ips, would the queries always reach X,Y
even though the entire topology contains X,Y,Z as server nodes?
No, DiscoverySPI is used only for the connecting to the cluster, after that, client node will be working with all nodes in the cluster.
All the cache queries should be in FULL_SYNC with 2 server nodes in
DC1 and DC1-DC2 should be done in ASYNC mode.
It's not possible to do this, only one synchronization mode can be used for one cache in the cluster.
2 clients should query only 2 server nodes in DC1 and not the other 2 servers in DC2.
It's not possible to do this for cache operations, but you can do this for computing operations - you can send a job to a certain node with a primary or backup copy in DC1 and it will take the local partition. But compute creates some overhead compared to the plain cache operations if it used only for getting the entries.
So, as you mentioned, the best way here is the DataCenter Replication, which is available as a part of GridGain, because, based on your requirements, you need 2 separate clusters here.

Redis sentinel vs clustering

I understand redis sentinel is a way of configuring HA (high availability) among multiple redis instances. As I see, there is one redis instance actively serving the client requests at any given time. There are two additional servers are on standby (waiting for a failure to happen, so one of them can be in action again).
Is it waste of resources?
Is there a better way of using full use of the resources available?
Is Redis clustering an alternative to Redis sentinel?
I already looked up redis documentation for sentinel and clustering, can somebody having experience explain please.
UPDATE
OK. In my real deployment scenario I have two servers dedicated for redis. I have another server my Jboss server is running. The application running in Jboss is configured to connect to redis master server(M).
Failover scenario
Ideally, I think when Master cache server fails (either Redis process goes down or machine failure) the application in Jboss needs to connect to Slave cache server. How would I configure the redis servers to achieve this?
+--------+ +--------+
| Master |---------| Slave |
| | | |
+--------+ +--------+
Configuration: quorum = 1
First, lets talk sentinel.
Sentinel manages the failover, it doesn't configure Redis for HA. It is an important distinction. Second, the diagram you posted is actually a bad setup - you don't want to run Sentinel on the same node as the Redis nodes it is managing. When you lose that host you lose both.
As to "Is it waste of resources?" it depends on your use case. You don't need three Redis nodes in that setup, you only need two. Three increases your redundancy, but is not required. If you need the added redundancy then it isn't a waste of resources. If you don't need redundancy then you just run a single Redis instance and call it good - as running more would be "wasted".
Another reason for running two slaves would be to split reads. Again, if you need it then it wouldn't be a waste.
As to "Is there a better way of using full use of the resources available?" we can't answer that as it is far too dependent on your specific scenario and code. That said if the amount of data to store is "small" and the command rate is not exceedingly high, then remember you don't need to dedicate a host to Redis.
Now for "Is Redis clustering an alternative to Redis sentinel?".
It really depends entirely on your use case. Redis Cluster is not an HA solution - it is a multiple writer/larger-than-ram solution. If your goal is just HA then it likely won't be suitable for you. Redis Cluster comes with limitations, particularly around multi-key operations, so it isn't necessarily a straightforward "just use cluster" operation.
If you think having three hosts running Redis (and three running sentinel) is wasteful, you'll likely hold Cluster to be even more so as it does require more resources.
The questions you've asked are probably too broad and opinion-based to survive as written. If you have a specific case/problem you are working out please update with that so we can provide specific assistance and information.
Update for specifics:
For proper failover management in your scenario I would go with 3 sentinels, one running on your JBoss server. If you have 3 JBoss nodes then go with one on each. I'd have a Redis pod (master+slave) on separate nodes, and let sentinel manage the failover.
From there it is a matter of wiring up JBoss/Jedis to use Sentinel for it's information and connection management. As I don't use those a quick search turns up that Jedis has the support for it, you just need to configure it correctly. Some examples I found are at Looking for an example of Jedis with Sentinel and https://github.com/xetorthio/jedis/issues/725 which talk about JedisSentinelPool being the route for using a pool.
When Sentinel executes a failover the clients will be disconnected and Jedis will (should?) handle the reconnection by asking the Sentinels who the current master is.
This is not direct answer to your question, but think, it's helpful information for Redis newbies, like me. Also this question appears as the first link in google when searching the "Redis cluster vs sentinel".
Redis Sentinel is the name of the Redis high availability solution...
It has nothing to do with Redis Cluster and is intended to be used by
people that don't need Redis Cluster, but simply a way to perform
automatic fail over when a master instance is not functioning
correctly.
Taken from the Redis Sentinel design draft 1.3
It's not obviuos when you are new to Redis and implementing failover solution. Official documentations about sentinel and clustering doens't compare to each other, so it's hard to choose the right way without reading tons of documentations.
The recommendation, everywhere, is to start with an odd number of instances, not using two or a multiple of two. That was corrected, but lets correct some other points.
First, to say that Sentinel provides failover without HA is false. When you have failover, you have HA with the additional benefit of application state being replicated. The distinction is that you can have HA in a system without replication (it's HA but it's not fault tolerant).
Second, running a sentinel on the same machine as its target redis instance is not a "bad setup": if you lose your sentinel, or your redis instance, or the whole machine, the results are the same. That's probably why every example of such configurations shows both running on the same machine.
Additional info to above answers
Redis Cluster
One main purpose of the Redis cluster is to equally/uniformly distribute
your data load by sharding
Redis Cluster does not use consistent hashing, but a different form of sharding where every key is conceptually part of what is called as hash slot
There are 16384 hash slots in Redis Cluster, Every node in a Redis Cluster is responsible for a subset of the hash slots, so, for example, you may have a cluster with 3 nodes,
where:
Node A contains hash slots from 0 to 5500,
Node B contains hash slots from 5501 to 11000,
Node C contains hash slots from 11001 to 16383
This allows us to add and remove nodes in the cluster easily. For example, if we want to add a new node D, we need to move some hash slot from nodes A, B, C to D
Redis cluster supports the master-slave structure, you can create slaves A1,B1, C2 along with master A, B, C when creating a cluster, so when master B goes down slave B1 gets promoted as master
You don't need additional failover handling when using Redis Cluster and you should definitely not point Sentinel instances at any of the Cluster nodes.
So in practical terms, what do you get with Redis Cluster?
1.The ability to automatically split your dataset among multiple nodes.
2.The ability to continue operations when a subset of the nodes are experiencing failures or are unable to communicate with the rest of the cluster.
Redis Sentinel
Redis supports multiple slaves replicating data from a master node.
This provides a backup for data in master node.
Redis Sentinel is a system designed to manage master and slave. It runs as separate program. The minimum number of sentinels required in an ideal system is 3. They communicate among themselves and make sure that the Master is alive, if not alive they will promote one of the slaves as master, so later when the dead node spins up it will be acting as a slave for the new master
Quorum is configurable. Basically it is the number of sentinels that need to agree as the master is down. N/2 +1 should agree. N is the number of nodes in the Pod (note this setup is called a pod and is not a cluster)
So in practical terms, what do you get with Redis Sentinel?
It will make sure that Master is always available (if master goes down, the slave will be promoted as master)
Reference :
https://fnordig.de/2015/06/01/redis-sentinel-and-redis-cluster/
https://redis.io/topics/cluster-tutorial
This is my understanding after banging my head throughout the documentation.
Sentinel is a kind of hot standby solution where the slaves are kept replicated and ready to be promoted at any time. However, it won't support any multi-node writes. Slaves can be configured for read operations. It's NOT true that Sentinel won't provide HA, it has all the features of a typical active-passive cluster ( though that's not the right term to use here ).
Redis cluster is more or less a distributed solution, working on top of shards. Each chunk of data is being distributed among masters and slaves nodes. A minimum replication factor of 2 ensures that you have two active shards available across master and slaves.
If you know the sharding in Mongo or Elasticsearch, it will be easy to catch up.
Redis can operate in partitioned cluster (with many masters and slaves of those masters) or a single instance mode (single master with replica slaves).
The link here says:
When using Redis in single instance mode, in which a single Redis server manages the entire unpartitioned database, Redis Sentinel is used to manage its availability
It also says:
A Redis cluster, in which data is partitioned among multiple primary instances, manages availability by itself and requires no extra components.
So HA can be ensured in the 2 mentioned scenarios. Hope this clears the doubts. Redis cluster and sentinels are not alternative to each other. They are just used to ensure HA in different cases of partitioned or non-partitioned master.
Redis Sentinel performs the failover promoting replicas when they see a master is down. You typically want an odd number of sentinel nodes. For the example of one master and one replica, 3 sentinels should be used so there can be a consensus on the decision. Ideally the 3rd sentinel is on a 3rd server so the decision is not skewed (depending on failure). Sentinel takes care of changing the master/replica config settings on your nodes so that promotion and syncing occurs in the correct order and you don’t overwrite data by bringing on an old failed master that now contains older data.
Once you have your sentinel nodes set up to perform failovers, you need to ensure you are pointing to the correct instance. See an example of HAProxy configuration for this. HAProxy performs health checks and will point to the new master if a failure occurs.
Clustering will allow you to scale horizontally and can help handle high loads. It does take a bit of work to set up and configure up front.
There is an open source fork of Redis, “KeyDB” that has eliminated the need for sentinel nodes with an active-replica option. This allows the replica node to accept reads and writes. When a failover occurs HAProxy stops reads/writes with the failed node and just uses the remaining active node which is already sync’d. Timestamping enables the failed nodes to rejoin automatically and resync without losing data when they come back online. Setup is simple and for higher traffic you don’t need special upfront setup to direct reads to the replica node and read/writes to the master. See example of active replication here. KeyDB is also multi-threaded which for some applications might be an alternative to clustering, but really depends on what your needs are.
There is also an example of setting up clustering manually and with the create-cluster tool. These are the same steps if you are using Redis (replace 'keydb' with 'redis' in instruction)

Couchbase node failure

My understanding could be amiss here. As I understand it, Couchbase uses a smart client to automatically select which node to write to or read from in a cluster. What I DON'T understand is, when this data is written/read, is it also immediately written to all other nodes? If so, in the event of a node failure, how does Couchbase know to use a different node from the one that was 'marked as the master' for the current operation/key? Do you lose data in the event that one of your nodes fails?
This sentence from the Couchbase Server Manual gives me the impression that you do lose data (which would make Couchbase unsuitable for high availability requirements):
With fewer larger nodes, in case of a node failure the impact to the
application will be greater
Thank you in advance for your time :)
By default when data is written into couchbase client returns success just after that data is written to one node's memory. After that couchbase save it to disk and does replication.
If you want to ensure that data is persisted to disk in most client libs there is functions that allow you to do that. With help of those functions you can also enshure that data is replicated to another node. This function is called observe.
When one node goes down, it should be failovered. Couchbase server could do that automatically when Auto failover timeout is set in server settings. I.e. if you have 3 nodes cluster and stored data has 2 replicas and one node goes down, you'll not lose data. If the second node fails you'll also not lose all data - it will be available on last node.
If one node that was Master goes down and failover - other alive node becames Master. In your client you point to all servers in cluster, so if it unable to retreive data from one node, it tries to get it from another.
Also if you have 2 nodes in your disposal you can install 2 separate couchbase servers and configure XDCR (cross datacenter replication) and manually check servers availability with HA proxies or something else. In that way you'll get only one ip to connect (proxy's ip) which will automatically get data from alive server.
Hopefully Couchbase is a good system for HA systems.
Let me explain in few sentence how it works, suppose you have a 5 nodes cluster. The applications, using the Client API/SDK, is always aware of the topology of the cluster (and any change in the topology).
When you set/get a document in the cluster the Client API uses the same algorithm than the server, to chose on which node it should be written. So the client select using a CRC32 hash the node, write on this node. Then asynchronously the cluster will copy 1 or more replicas to the other nodes (depending of your configuration).
Couchbase has only 1 active copy of a document at the time. So it is easy to be consistent. So the applications get and set from this active document.
In case of failure, the server has some work to do, once the failure is discovered (automatically or by a monitoring system), a "fail over" occurs. This means that the replicas are promoted as active and it is know possible to work like before. Usually you do a rebalance of the node to balance the cluster properly.
The sentence you are commenting is simply to say that the less number of node you have, the bigger will be the impact in case of failure/rebalance, since you will have to route the same number of request to a smaller number of nodes. Hopefully you do not lose data ;)
You can find some very detailed information about this way of working on Couchbase CTO blog:
http://damienkatz.net/2013/05/dynamo_sure_works_hard.html
Note: I am working as developer evangelist at Couchbase

Rabbit cluster loadbalancing, and HA what the difference?

I have a three node cluster but did not to the reliable queue. I am using puka for python as the client.
For load balancing on ec2 I am using route53 and assign an equal weight to a private ip address. So..if I have three ec2 instances I have 3 route53 entries.
So...my question is this why the cluster? What is the difference with three nodes not clustered on route53 versus three nodes clustered on route53? Are all rabbits writable and readable?
My understaing is that if I want HA and reliable queues then rabbit becomes a master slave and a working cluster is required first before turing the custer into reliable queues.
I am rather confused about how to best cluster and the differences between a cluster vs HA.
Thanks
Clustered nodes will have equally weighted nodes, that no master and no slave, the only advantage is that when a publisher pushes a message to some queue located on other node, the message will traverse from node to node (through Erlang's clustered VM layer) to reach its consumer/worker.
On the other hand, in the HA mode, All queues and exchanges (as per some policy you specify) will be replicated across all the nodes, more over, there is only one master and one or more slaves, where the master is the oldest existing node, and when it dies the second oldest node will take over and be the master.
Let me know if that was the answer you expected.
Here is an article outlining both HA and load-balancing techniques, and how to combine the two efficiently, across a RabbitMQ cluster.

ZooKeeper - adding peers dynamically?

I'm new to ZooKeeper. This is what I need.
I've a network of peers.
At t=t_1 -> [peer-1 (Leader), peer-2]
peer-1 is the master and all clients connect to this node.
At t=t_2 -> [peer-1 (Leader), peer-2, peer-3]
At some later time peer-3 joins the group. Is it possible to add peer-3 to the list of zookeeper servers "dynamically" ( i.e., without restarting ZooKeeper on peer-1 ) ?
At t=t_3 -> [peer-3 (Leader), peer-4]
After a while both peer-1 and peer-2 leave the group (e.g., die or are switched off.) Assuming that there is a way to dynamically add peer-3 and peer-4 to the group peer-3 becomes the leader and all client requests are send to peer-3.
Are there any other options that I can use apart from using ZooKeeper to do something like this.
thanks.
At the moment, you can't dynamically change the configuration of a zookeeper cluster without restarting. There is an open issue to fix this, ZOOKEEPER-107. The paper describing the cluster membership algorithm is quite interesting, and can be found here.
You can change the configuration of the cluster by restarting server nodes 1 at a time. For example, if you cluster has servers A,B,C, and you want to replace server C with D, then you can do something like,
Bring down C
Bring up D, it's peer list is A,B,D
Take down B
Change B's peer list to A,B,D
Bring up B
Take down A Change A's peer list to A,B,D
Bring up A
Change the client configuration of all clients to point to A,B,D
At t=t_1, you have a cluster with 2 zookeeper nodes. This is quite brittle, as if either node goes down, you will not be able to establish quorum (floor(N / 2) + 1), and the cluster will be unavailable. Generally zookeeper clusters are odd numbers.
I'm not sure what you are trying to do when you say,
peer-3 becomes the leader and all client requests are send to peer-3.
You can't specify which node in a zookeeper cluster is the leader, the nodes themselves will elect their leader, and leadership will change as nodes go up and down. As well, clients typically don't always connect to the leader, but clients are given list of machines in the cluster, and connect randomly to one, reconnecting if the server they are connected to goes down. You can set the leaderServes option to specify that the leader does NOT server client connections.
I would not suggest using the above for any production situation.
The above solution only works if you are ok with losing ZK quorum for while until all changes are complete.
here's why:
Bring down C Bring up D, it's peer list is A,B,D"
-> at this point A and B dont know about D
-> D knows about A B
so at this point you have only A and B functioning in quorum
next you take down B and you lose quorum.
you will lose access to zk data, until migration is complete and quorum is restored again. Most well designed apps using zk in this case failover to a readonly mode and will gracefully recover.
Until Zookeeper-107 is released under Zookeeper 3.5, you will need to choose you poison wisely.
Its better to :
just setup a new zk ensemble (zk cluster)
restore from snapshot
mirate apps from old zk ensemble to new zk ensemble
After migration is complete shutdown old zk ensemble