I understand that a fail-over in the non-cluster replication setup works only by manual intervention.
I've set up the following non-cluster replication (that is, with total 3 nodes):
node1 (master)
/ \
node2 node3
(slave1) (slave2)
Let's say that node1 is broken.
So, I issue "replicaof no one" on node2, trying to make it as the new master. (For this, Redis restart is NOT required, right?)
In this example, is it assumed that I need to make the following changes, all manually?
1. fix and reconfigure node1 to make it as a slave of node2 (new master)? - Redis restart is indeed required.
2. reconfigure node3 to make it as a slave of node2 (new master) instead of node1 (original master)? - Redis restart is also required.
I just want to verify to see if I understand it correctly.
Thank you in advance!
Related
I have a pair of Redis on my machine. They are set as master/slave. Redis1 master. Redis2 slave.
Whenever I stop the Redis1 (master) the Redis2 assumes as master.
Then I start the Redis1 again. It start as slave (as it was supposed).
If I type in the Redis1:
slaveof no one
It assumes as master. But 5 seconds after Redis2 assumes as master again.
Any hint on this behavior?
Redis slaveof doc.
---SOLUTION--
there was a sentinel active changing the setting. Thank you #Not_a_Golfer
Summing up the investigation as an answer for future generations:
The set-up on docker also included a sentinel, that performed a fail over and made redis1 a slave of redis2 when it returned to the game.
Question Background:
I deploy a redis cluster in k8s cluster and use Redis-Sentinel to implement ha for redis cluster. My redis cluster structure likes below:
One master
One slave
three sentinel (serve a specific redis cluster)
When i login the container of the one of sentinels, i execute a command:
sentinel sentinels mymaster
Luckly, i get a desirable output. These are two sentinel's infos. After a period of time, i execute "sentinels mymaster" command again, i found that there is a additional sentinel and don't find this instance through IP address or runId。
I know that sentinel discover other sentinels and master and slave through sub the channel of sentinel:hello in redis master.
Question:
how to check the message published from redis sentinel to redis master? I have opened log for master and set the log level to debug.
You can see the Sentinel's activity (when it discovers a sentinel, a replica, failsover to a new master, etc.) in the sentinel log file, not the master. If a sentinel is running on a host, it will be in the same directory the master or replica log file is. For me on CentOS it's /var/log/redis/sentinel.log.
I am trying to implement a Redis cluster with 6 machine.
I have a vagrant cluster of six machines:
192.168.56.101
192.168.56.102
192.168.56.103
192.168.56.104
192.168.56.105
192.168.56.106
all running redis-server
I edited /etc/redis/redis.conf file of all the above servers adding this
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
cluster-slave-validity-factor 0
appendonly yes
I then ran this on one of the six machines;
./redis-trib.rb create --replicas 1 192.168.56.101:6379 192.168.56.102:6379 192.168.56.103:6379 192.168.56.104:6379 192.168.56.105:6379 192.168.56.106:6379
A Redis cluster is up and running. I checked manually by setting value in one machine it shows up on other machine.
$ redis-cli -p 6379 cluster nodes
3c6ffdddfec4e726f29d06a6da550f94d976f859 192.168.56.105:6379 master - 0 1450088598212 5 connected
47d04bc98ab42fc793f9f382855e5c54ab8f2e20 192.168.56.102:6379 slave caf2cec45114dc8f4cbc6d96c6dbb20b62a39f90 0 1450088598716 7 connected
040d4bb6a00569fc44eec05440a5fe0796952ccf 192.168.56.101:6379 myself,slave 5318e48e9ef0fc68d2dc723a336b791fc43e23c8 0 0 4 connected
caf2cec45114dc8f4cbc6d96c6dbb20b62a39f90 192.168.56.104:6379 master - 0 1450088599720 7 connected 0-10922
d78293d0821de3ab3d2bca82b24525e976e7ab63 192.168.56.106:6379 slave 5318e48e9ef0fc68d2dc723a336b791fc43e23c8 0 1450088599316 8 connected
5318e48e9ef0fc68d2dc723a336b791fc43e23c8 192.168.56.103:6379 master - 0 1450088599218 8 connected 10923-16383
My problem is that when I shutdown or stop redis-server on any one machine which is master the whole cluster goes down, but if all the three slaves die the cluster still works properly.
What should I do so that a slave turns a master if a master fails(Fault tolerance)?
I am under the assumption that redis handles all those things and I need not worry about it after deploying the cluster. Am I right or would I have to do thing myself?
Another question is lets say I have six machine of 16GB RAM. How much total data I would be able to handle on this Redis cluster with three masters and three slaves?
Thank you.
the setting cluster-slave-validity-factor 0 may be the culprit here.
from redis.conf
# A slave of a failing master will avoid to start a failover if its data
# looks too old.
In your setup the slave of the terminated master considers itself unfit to be elected master since the time it last contacted master is greater than the computed value of:
(node-timeout * slave-validity-factor) + repl-ping-slave-period
Therefore, even with a redundant slave, the cluster state is changed to DOWN and becomes unavailable.
You can try with a different value, example, the suggested default
cluster-slave-validity-factor 10
This will ensure that the cluster is able to tolerate one random redis instance failure. (it can be slave or a master instance)
For your second question: Six machines of 16GB RAM each will be able to function as a Redis Cluster of 3 Master instances and 3 Slave instances. So theoretical maximum is 16GB x 3 data. Such a cluster can tolerate a maximum of ONE node failure if cluster-require-full-coverage is turned on. else it may be able to still serve data in the shards that are still available in the functioning instances.
Suppose I have a redis cluster with nodes 10.0.0.1, 10.0.0.2, 10.0.0.3 and 10.0.0.4, which I'm using as a cache.
Then, for whatever reason, node 10.0.0.4 fails and goes down. This brings down the entire cluster:
2713:M 13 Apr 21:07:52.415 * FAIL message received from [id1] about [id2]
2713:M 13 Apr 21:07:52.415 # Cluster state changed: fail
Which causes any query to be shut down with "CLUSTERDOWN The cluster is down".
However, since I'm using the cluster as a cache, I don't really care if a node goes down. A key can get resharded to a different node and lose its contents without affecting my application.
Is there a way to set up such an automated resharding?
I found something close enough to what I need.
By setting cluster-require-full-coverage to "no", the rest of the cluster will continue to respond to queries, although the client needs to handle the possibility of being redirected to a failing node.
Then I can replace the broken node by running:
redis-trib.rb call 10.0.0.1:6379 cluster forget [broken_node_id]
redis-trib.rb add-node 10.0.0.5:6379 10.0.0.1:6379
redis-trib.rb fix 10.0.0.1:6379
Where 10.0.0.5:6379 is the node that will replace the broken one.
By assuming you have only master nodes in your current cluster, you will definitely get cluster down error because there is no replica of down master and Redis thinks cluster is not in safe and triggers an error.
Solution
Create a new node (Create redis.conf with desired parameters.)
Join that node to cluster
redis-trib.rb add-node 127.0.0.1:6379 EXISTING_MASTER_IP:EXISTING_MASTER_PORT
Make node slave of 10.0.0.4
redis-cli -p 6379 cluster replicate NODE_ID_OF_TARGET_MASTER
To Test
First be sure, cluster is in good shape.(All slots are covered and nodes are agreed about configurations.)
redis-trib.rb check 127.0.0.1:6379 (On any master)
Kill process of 10.0.0.4
Wait Slave to be new master.(It happens quickly. Slots assigned to 10.0.0.4 will be resharded automatically to Slave.)
Check cluster and be sure all slots are moved new master
redis-trib.rb check 127.0.0.1:6379 (On any master)
No manual actions needed. Additionally, if you have more slaves in cluster they may be promoted as new masters of other masters as well. (e.g. You have a setup of 3 master, 3 slaves. Master1 goes down, Slave1 becomes new master. Slave1 goes down, Slave1 can be new master as Master1.)
I am setting up rabbitMQ cluster on 2 nodes -- node1, node2. and try to make node2 to join node1's cluster
what I did is following :
1, install rabbitMQ (and Erlang) seperately on node1 and node2,
2, "rabbitmqctl stop_app" on node2, delete the .erlang.cookie and then copied .erlang.cookie from node1 to node2
3, "rabbitmqctl join_cluster --ram rabbit#node1", now I have connection error, "unable to connect to node rabbit#node2", cookie issue.
if I copy back the old .erlang.cookie generated by node2, I will have connection error to rabbit#node1, (which make sense, since I am supposed to copy node1's cookie to node2).
anything I am doing wrong here...
Thanks
In case you install the nodes on different machines you must make sure the machines are reachable.
in Linux:
1) update /etc/hosts
on slave:
on master:
2) open tcp ports 1-20000 (or 1-65535) between the nodes
This is usually the problem!
good luck!