I have a cluster of 3 rabbitmq nodes spread out on 3 different servers. The second and third node joins the first node and forms the cluster. In the process of testing for failover I am finding that once the primary node is killed, I am not able to make it rejoin the cluster. The documentation does not state that I have to use join_cluster or any other command, after startup. I tried join_cluster but it is rejected since the cluster with name is the same as the node host. Is there a way to make this work?
cluster_status displays the following (not from the primary node):
Cluster status of node 'rabbit#<secondary>' ...
[{nodes,[{disc,['rabbit#<primary>','rabbit#<secondary>',
'rabbit#<tertiary>']}]},
{running_nodes,['rabbit#<secondary>','rabbit#<tertiary>']},
{cluster_name,<<"rabbit#<primary>">>},
{partitions,[]}]
On one of the nodes which are in the cluster, use the command
rabbitmqctl forget_cluster_node rabbit#rabbitmq1
To make the current cluster forget the old primary.
Now you should be able to rejoin the cluster on the old primary (rabbitmq1)
rabbitmqctl stop_app
rabbitmqctl join_cluster rabbit#rabbitmq2
rabbitmqctl start_app
See the reference cluster guide
A quote from here
Nodes that have been joined to a cluster can be stopped at any time.
It is also ok for them to crash. In both cases the rest of the cluster
continues operating unaffected, and the nodes automatically "catch up"
with the other cluster nodes when they start up again.
So you just need to start the node that you killed/stopped. Doesn't make a difference if it's "primary" or not - if it was primary and then killed, some other node becomes the primary one.
I've just tested this (with docker of course) and works as expected.
Related
I setup my Redis cluster (version 6.2) with 3 master nodes and 3 slave nodes. It works well for normal scenario.
However, if I kill one of the master nodes, even if I wait a very long time, the auto-failover does not happen. I use the "cluster nodes" command, the output tells me that the killed node is marked as "master, failed", and all 3 slave nodes are still as "slave". From the log, I also can not see any useful information.
My cluster config, except below 2, all are used default:
cluster-node-timeout 5000
cluster-require-full-covearage no
So may I know who has an idea how to check what is wrong, that is very appreciated!
The environment I have consists of two separate servers, one each with RabbitMQ service application running. They are correctly clustered and the queues are using mirroring correctly.
Node A is master
Node B is slave
My question is more specifically when Node A goes down but Service A is still up. Node B and Service B are still up. At this point, Node B is now promoted to master. When an application connects to Node B it connects okay, of course.
rabbitmqctl cluster_status on Node B shows cluster is up with two nodes and Node B is running. rabbitmqctl cluster_status on Node A shows node is down. This is expected behavior.
It is possible for an application to connect to Node A and be able to publish/pop queue items as normal?
Suppose I have [Slave IP Address] which is the slave of [Master IP Address].
Now my master server has been shut down, and I need to set this slave to be master MANUALLY (WITHOUT using sentinel automatic failover, WITH redis command).
Is it possible doing this without restarting the redis service ? (and losing all the cached data)
use SLAVEOF NO ONE to promote a slave to master
http://redis.io/commands/slaveof
it depends, if you are in a cluster you will be better using the fail over. You will need to use the force option in the command
http://redis.io/commands/cluster-failover
Is it possible doing this without restarting the redis service? (and
losing all the cached data)
yes that's possible, you can use
SLAVEOF NO ONE (without sentinel)
But it is recommended to use sentinel to avoid data loss.
sentinel failover master-name(with sentinel)
This will force the sentinel to switch master.
The new master will have all the data that was synchronized before the old-master shutdown.
Redis will automatically choose the best slave with max. data, that will reduce the amount of data we lose when switching master.
Below 2 options in step 3 have helped me to recover the cluster once a master node is down, compute was replaced or other not recoverable state.
1 .- First you need to connect to the slave node, use redis-cli, here a link how to do that: How to connect to remote Redis server?
2 .- Once connected to the slave node run the command cluster nodes to validate master node is in fail state, also run cluster info to see the overall state of your cluster(this is always a good idea)
3 .- Inside the slave node to be promoted run command: cluster failover,
in rare cases when there is some serious issues with redis this
command could fail, and you will need to use cluster failover force
or cluster failover takeover, here more info abut the implications
of those options: https://redis.io/commands/cluster-failover
4 .- Run cluster forged $old_master_id in all your cluster nodes
5 .- Add a new node with cluster meet $new_node_IP $new_node_PORT
6 .- Subscribe your new node to your brand new master, login in to the new bode and run cluster replicate $master_node_id
Steps 1-3 are required for the slave-master promotion and 4-5 are required to left all cluster in a healthy master-slave equilibrium.
As of Redis version 5.0.0 the SLAVEOF command is regarded as deprecated.
If a Redis server is already acting as replica, the command REPLICAOF NO ONE will turn off the replication, turning the Redis server into a MASTER.
I have a clustered HA rabbitmq setup. I am using the "exactly" policy similar to:
rabbitmqctl set_policy ha-two "^two\." \'{"ha-mode":"exactly","ha-params":10,"ha-sync-mode":"automatic"}'
I have 30 machines running, of which 10 are HA nodes with queues replicated. When my broker goes down (randomly assigned to be the first HA node), I need my celery workers to point to a new HA node (one of the 9 left). I have a script that automates this. The problem is: I do not know how to distinguish between a regular cluster node and a HA node. When I issue the command:
rabbitmqctl cluster_status
The categories I get are "running nodes", "disc", and "ram". but there is no way here to tell if a node is HA.
Any ideas?
In cluster every node share everything with another, so you don't have do distinguish nodes in your application in order to access all entities.
In your case when one of HA node goes down (their number get to 9), HA queues will be replicated to first available node (doesn't matter disc or ram).
We are testing Couchbase with a two node cluster with one replica.
When we stop the service on one node, the other one does not respond until we restart the service or manually failover the stopped node.
Is there a way to maintain the service from the good node when one node is temporary unavailable?
If a node goes down then in order to activate the replicas on the other node you will need to manually fail it over. If you want this to happen automatically then you can enable auto-failover, but in order to use that feature I'm pretty sure you must have at least a three node cluster. When you want to add the failed node back then you can just re-add it to the cluster and rebalance.