CouchBase 2.5 2 nodes in replica: 1 node fail: the service is no more available - replication

We are testing Couchbase with a two node cluster with one replica.
When we stop the service on one node, the other one does not respond until we restart the service or manually failover the stopped node.
Is there a way to maintain the service from the good node when one node is temporary unavailable?

If a node goes down then in order to activate the replicas on the other node you will need to manually fail it over. If you want this to happen automatically then you can enable auto-failover, but in order to use that feature I'm pretty sure you must have at least a three node cluster. When you want to add the failed node back then you can just re-add it to the cluster and rebalance.

Related

Redis Cluster Single Node issue

Can Redis run in cluster mode enabled but with single instance. It seems we get cluster status as fail when we try to deploy with single node because no slots are added to it.
https://serverfault.com/a/815813/510599
I understand we can manually add SLOTS after deployment. But I wonder if we can modify Redis source code to make this change.

Setup docker-swarm to use specific nodes as backup?

Is there a way to setup docker-swarm to only use specific nodes (workers or managers) as fail-over nodes? For instance if one specific worker dies (or if a service on it dies), only then it will use another node, before that happens it's as if the node wasn't in the swarm.
No, that is not possible. However, docker-swarm does have the features to build that up. Let's say that you have 3 worker nodes in which you want to run service A. 2/3 nodes will always be available and node 3 will be the backup.
Add a label to the 3 nodes. E.g: runs=serviceA . This will make sure that your service only runs in those 3 nodes.
Make the 3rd node unable to schedule tasks by running docker node update --availability drain <NODE-ID>
Whenever you need your node back, run docker node update --availability active <NODE-ID>

Re-join cluster node after seed node got restarted

Let's imagine such scenario. I have a three nodes inside my akka cluster (node A,B,C). Each node is deployed to a different physical device inside a network.
All of those nodes are wrapped inside Topshelf windows services.
Node A is my seed node, the other ones are just simply 'worker' nodes with port specified.
When I run cluster and stop node (service) B or C and then restart them. Nodes are rejoining with no issues.
I'd like to ask whether it's possible to handle other scenario which will be. When I stop seed node (node A), the other nodes - services still running and then I restart node-service A - I'd like to make nodes B,C rejoin the cluster and make the whole eco system working again.
Is such scenario possible to implement? If yes then how should I do that?
In Akka.NET cluster any node can serve as a seed node for others as long as it's a part of the cluster. "Seeds" are just a configuration thing, so you can define a list of well-known node addresses you know, that are a part of the cluster.
Regarding your case, there are several solutions I can think of:
Quite common approach is to define more than one seed node in the configuration, so that your node doesn't serve as a single point of failure. As long as at least one of the configured seed nodes is alive, everything should work fine. Keep in mind, that the seed nodes should be defined in each configuration in the exactly same order.
If your "worker" nodes have statically assigned endpoints, they can be used as seed nodes as well.
Since you can initialize the cluster programmaticaly from code, you can also use 3rd party service as a node discovery service. You can use i.e. consul for that - I've started a project, which gives such functionality. While it's not yet published, feel free to fork it or contribute, if it will help you.

RabbitMQ cluster - best practice for updating a node in a load balanced cluster?

Summary: What's the best practice for updating a node in a load balanced cluster?
We use RabbitMQ Cluster behind a ha proxy load balancer to support easy clustering for our clients, as suggested in the RabbitMQ docs
Though the docs suggest this, they don't describe the best way to remove a node from the cluster for upgrades, and put it back in.
Here's the process I think we should use:
remove node from cluster by running rabbitmqctl stop_app on the node itself, and wait for it to shutdown
put node in maint mode in haproxy
perform maint work
join node back to cluster, confirm it rejoins and sync.
remove node from maint mode in haproxy
but I've had it suggested that we should remove it from ha proxy first, basically swapping steps 1 and 2 above
Here's the process suggested by another team member:
put node in maint mode in haproxy
remove node from cluster by running rabbitmqctl stop_app on the node itself, and wait for it to shutdown
perform maint work
join node back to cluster, confirm it rejoins and sync.
remove node from maint mode in haproxy
Which is the best way to do this?
For me, the obvious way would be to tell you haproxy that you want to stop sending requests to a server, and then stop the server itself, instead of the other way around.
I'm curious as to why you would want to stop the server first, and then put it into maint ? If you do it like this, some requests will go to your node before it is known that it is gone. I believe you can have haproxy set up to re-send those missed calls; so best case you have some requests that are a bit slower, worst case you have some missed requests.
There is no specific downside I can see with setting it in maint mode first so I would not consider the other option personally.

Apache ignite node won't join cluster if server nodes start simultaneously

We have a problem with ignite when we start two ignite server nodes at the exact same time. We are currently implementing our own discovery mechanism by extending TcpDiscoveryIpFinderAdapter. The first time the TcpDiscoveryIpFinderAdapter is called neither ignite servers will be able to find the other node (due to the nature of our discovery mechanism). Subsequent invocations does report the other node with a correct IP, yet the ignite nodes will not start to talk to each other.
If we start the servers with some delay, the second server will (on the first attempt) find the other node and join the cluster successfully.
Is there a way to get the two nodes to talk to each other even after both of them initially think they are a cluster of one node?