Redis - Make failover master return to slave state and master take up it's old master role - redis

I have a Redis v4.0.7 cluster consisting of 4 servers. These 4 servers are all running Ubuntu v17.10 64 bit Virtual Machines (in VirtualBox) that I have on my Windows PC. I have shifted all the slaves 1 server and will be using M1 for master 1 as well as S1 for slave 1 in the following explanation of my "issue".
192.168.56.101 (with a master on port 7000 (M1) and slave on port 7001 (S4))
192.168.56.102 (with a master on port 7000 (M2) and slave on port 7001 (S1))
192.168.56.103 (with a master on port 7000 (M3) and slave on port 7001 (S2))
192.168.56.104 (with a master on port 7000 (M4) and slave on port 7001 (S3))
I am fiddling a little bit with the setup to check if the failover "works".
Therefore I have tried shutting down M2, which means that S2 takes over and becomes the master. This works as intended. However if I start up the (old) M2 again it is now a slave and remains as such until I shut S2 down at which point it will take over the master role again.
I was wondering if there is a certain command that I can issue to the slave that has taken over the master role which makes it take over it's (old) slave role and hand over the master role to the (old) master, in this case M2.
I have tried googling the "issue", but to no avail.

You can do this by running:
redis-cli -h M2_IP_ADDRESS M2_PORT CLUSTER FAILOVER
Above command will make manual failover. M2 will became master and S2 slave.

Related

Redis cluster failover: slave won't become master

I am trying to test my software behavior during cluster failover, and for that reason I want to configure a simplest cluster: one master and two slaves. I have tree files 7000.conf - 7002.conf of the following content:
port 7000
cluster-config-file nodes.7000.conf
appendfilename appendonly.7000.aof
dbfilename dump.7000.rdb
pidfile /var/run/redis_7000.pid
include cluster.conf
The content of cluster.conf:
cluster-enabled yes
appendonly yes
maxclients 100
daemonize yes
cluster-node-timeout 2000
cluster-slave-validity-factor 0
I've configured then that 7000 runs all slots from 0 to 16383, and 7001 and 7002 are replicas of 7000:
XXX 127.0.0.1:7002 slave YYY 0 1511389011347 4 connected
YYY 127.0.0.1:7000 myself,master - 0 0 4 connected 0-16383
ZZZ 127.0.0.1:7001 slave YYY 0 1511389011246 4 connected
Then I try to get rid of 7000 - via shutdown command, or via killing a process. One of the slaves should promote itself to master, but none does:
ZZZ 127.0.0.1:7001 slave YYY 0 0 3 connected
YYY 127.0.0.1:7000 master,fail? - 1511389104442 1511389103933 4 disconnected 0-16383
XXX 127.0.0.1:7002 myself,slave YYY 0 1511389116543 4 connected
I've waited for like minutes, and my slaves not want to become master. If I force a slave to become master via cluster failover takeover, it's more than happy to do so (and if I restart master, it becomes slave), but not automatically.
I've tried to play with cluster-node-timeout - does not help.
Am I doing something wrong? Redis version is 3.2.11.
The issue is that a redis-cluster has a minimum size of 3 masters to get automatic failover working. It's the master nodes that watch each other, and detect the failover, so with a single master in the cluster there is no processes running are able to detect that your one master is down. The minimum of three, is to make sure that in the case of any downed node, the majority of the entire cluster needs to agree, so at the minimum you need 3 nodes, to still have more than half of them around to reach a majority view in case of failure.
The Redis-cluster tutorial mentions this in the following section: https://redis.io/topics/cluster-tutorial#creating-and-using-a-redis-cluster
"Note that the minimal cluster that works as expected requires to contain at least three master nodes."
Please note that even with 3 masters the automatic failover is not guaranteed if the failure happens like below in the cluster: (M-Master / S-Slave)
Node-1: M1 S3
Node-2: M2 S1
Node-3: M3 S2
Now if node 3 fails, then its slave S3 in Node-1 is promoted as Master automatically.All is well with following status after the Node-3 recovers:
Node-1: M1 M3 <----- Please note 2 Masters in Node-1 now with S3 become M3 in prev step.
Node-2: M2 S1
Node-3: S3 S2 <----- Please note that the redis-server came up as Slave(was M3 before)
Now you might think the cluster will continue to handle failures easily since 3 masters are there in this setup. However, if Node-1 fails the Cluster is DOWN due to quorum not satisfied and never gets up unless we do some manual adjustments.
Hope this helps.

Reconnect Shutdown Redis Instance back to Cluster

Given a redis cluster with six nodes (3M/3S) on ports 7000-7005 with master nodes on ports 7000-7002 and slave nodes on the rest, master node 7000 is shut down, so node 7003 becomes the new master:
$ redis-cli -p 7003 cluster nodes
2a23385e94f8a27e54ac3b89ed3cabe394826111 127.0.0.1:7004 slave 1108ef4cf01ace085b6d0f8fd5ce5021db86bdc7 0 1452648964358 5 connected
5799de96ff71e9e49fd58691ce4b42c07d2a0ede 127.0.0.1:7000 master,fail - 1452648178668 1452648177319 1 disconnected
dad18a1628ded44369c924786f3c920fc83b59c6 127.0.0.1:7002 master - 0 1452648964881 3 connected 10923-16383
dfcb7b6cd920c074cafee643d2c631b3c81402a5 127.0.0.1:7003 myself,master - 0 0 7 connected 0-5460
1108ef4cf01ace085b6d0f8fd5ce5021db86bdc7 127.0.0.1:7001 master - 0 1452648965403 2 connected 5461-10922
bf60041a282929cf94a4c9eaa203a381ff6ffc33 127.0.0.1:7005 slave dad18a1628ded44369c924786f3c920fc83b59c6 0 1452648965926 6 connected
How does one go about [automatically] reconnecting/restarting node 7000 as a slave instance of 7003?
Redis Cluster: Re-adding a failed over node has detail explanation about what happens.
Basically, the node will become a slave of the slave (which is now a master) that replaced it during the failover.
Have you seen the Redis Sentinel Documentation?
Redis Sentinel provides high availability for Redis. In practical
terms this means that using Sentinel you can create a Redis deployment
that resists without human intervention to certain kind of failures.

Redis Cluster: No automatic failover for master failure

I am trying to implement a Redis cluster with 6 machine.
I have a vagrant cluster of six machines:
192.168.56.101
192.168.56.102
192.168.56.103
192.168.56.104
192.168.56.105
192.168.56.106
all running redis-server
I edited /etc/redis/redis.conf file of all the above servers adding this
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
cluster-slave-validity-factor 0
appendonly yes
I then ran this on one of the six machines;
./redis-trib.rb create --replicas 1 192.168.56.101:6379 192.168.56.102:6379 192.168.56.103:6379 192.168.56.104:6379 192.168.56.105:6379 192.168.56.106:6379
A Redis cluster is up and running. I checked manually by setting value in one machine it shows up on other machine.
$ redis-cli -p 6379 cluster nodes
3c6ffdddfec4e726f29d06a6da550f94d976f859 192.168.56.105:6379 master - 0 1450088598212 5 connected
47d04bc98ab42fc793f9f382855e5c54ab8f2e20 192.168.56.102:6379 slave caf2cec45114dc8f4cbc6d96c6dbb20b62a39f90 0 1450088598716 7 connected
040d4bb6a00569fc44eec05440a5fe0796952ccf 192.168.56.101:6379 myself,slave 5318e48e9ef0fc68d2dc723a336b791fc43e23c8 0 0 4 connected
caf2cec45114dc8f4cbc6d96c6dbb20b62a39f90 192.168.56.104:6379 master - 0 1450088599720 7 connected 0-10922
d78293d0821de3ab3d2bca82b24525e976e7ab63 192.168.56.106:6379 slave 5318e48e9ef0fc68d2dc723a336b791fc43e23c8 0 1450088599316 8 connected
5318e48e9ef0fc68d2dc723a336b791fc43e23c8 192.168.56.103:6379 master - 0 1450088599218 8 connected 10923-16383
My problem is that when I shutdown or stop redis-server on any one machine which is master the whole cluster goes down, but if all the three slaves die the cluster still works properly.
What should I do so that a slave turns a master if a master fails(Fault tolerance)?
I am under the assumption that redis handles all those things and I need not worry about it after deploying the cluster. Am I right or would I have to do thing myself?
Another question is lets say I have six machine of 16GB RAM. How much total data I would be able to handle on this Redis cluster with three masters and three slaves?
Thank you.
the setting cluster-slave-validity-factor 0 may be the culprit here.
from redis.conf
# A slave of a failing master will avoid to start a failover if its data
# looks too old.
In your setup the slave of the terminated master considers itself unfit to be elected master since the time it last contacted master is greater than the computed value of:
(node-timeout * slave-validity-factor) + repl-ping-slave-period
Therefore, even with a redundant slave, the cluster state is changed to DOWN and becomes unavailable.
You can try with a different value, example, the suggested default
cluster-slave-validity-factor 10
This will ensure that the cluster is able to tolerate one random redis instance failure. (it can be slave or a master instance)
For your second question: Six machines of 16GB RAM each will be able to function as a Redis Cluster of 3 Master instances and 3 Slave instances. So theoretical maximum is 16GB x 3 data. Such a cluster can tolerate a maximum of ONE node failure if cluster-require-full-coverage is turned on. else it may be able to still serve data in the shards that are still available in the functioning instances.

Redis - configure sentinel to elect slave if master shutdown

Hi i have create a cluster Redis with sentinel composed by 3 aws instances, i have configured sentinel to have an HA redis cluster and work, but if i simulate a crash of master (shutdown of master instance), sentinel installed on slaves, not locate sentinel of master and the election fail.
My sentinel configuration is:
sentinel monitor master ip-master 6379 2
sentinel down-after-milliseconds master 5000
sentinel failover-timeout master 10000
sentinel parallel-syncs master 1
Same file to all instaces
There are issues when running sentinel on the same node as the master and attempting to trigger a failover. Try it w/o running Sentinel on the master. Ultimately this means not running Sentinel on the same nodes as the Redis instances.
In your case your dead-node simulation is showing why you should not run Sentinel on the same node as Redis: If the node dies you lose one of your sentinels. In theory it should still work but as you and others have seen it isn't certain to work. I have some theories why but I've not yet confirmed them.
In a sense Sentinel is partly a monitoring system. Running a monitoring solution on the same nodes as are being monitored is generally unadvisable anyway, so you should be using off-node sentinels anyway. As Sentinel is resource efficient you don't necessarily need dedicated machines or large VMs. Indeed if you have a static set of application servers (where your client code runs), you should run Sentinel there, keeping in mind you want 3 minimum and a quorum of 50%+1.
recent redis version introduced the "protected-mode" option, which defaults to yes.
with protected-mode set to yes, redis instances, without a password set will not allow remote clients to execute commands.
this also affects sentinels master election.
try it with setting "protected-mode no" in the sentinels. this will allow them to talk to each other.
If you don't want to set protected-mode as no. you'd better set masterauth myredis in redis.conf and use sentinel auth-pass mymaster myredis in sentinel.conf

Redis Sentinel for Windows

I'm successfully using Redis for Windows (2.6.8-pre2) in a master slave setup. However, I need to provide some automated failover capability, and it appears the sentinel is the most popular choice. When I run redis in sentinel mode the sentinel connects, but it always thinks the master is down. Also, when I run the sentinel master command it reports that there are 0 slaves (not true) and that there are no other sentinels (again, not true). So it's like it connects to the master, but not correctly.
Has anyone else seen this issue on Windows and, more importantly, is anyone successfully using sentinel in a windows environment? Any help or direction at all is appreciated!
I recommend use this:
1 master node redis server 1 slave node redis server
List item 3 redis sentinels with a quorum of 2
It's so important have more than have 3 sentinels to get a odd quorum.
I made this configuration in Windows 7 and it's working well.
Example of sentinel conf:
port 20001
logfile "sentinel1.log"
sentinel monitor shard1 127.0.0.1 16379 2
sentinel down-after-milliseconds shard1 5000
sentinel failover-timeout shard1 30000