Redis failover fails after hosts reboot

Redis failover fails after hosts reboot - redis

I have a small cluster of redis machines, 3 hosts with sentinels on them, one master and 2 slaves. Each time when we are doing any patching and rebooting machines, some of instances are not getting up after the reboot and sentinel shows their status as sdown
In Sentinel log this msg keeps popping up:
9932:X 18 Jul 13:46:47.357 # Next failover delay: I will not start a failover before Wed Jul 18 13:52:47 2018
9932:X 18 Jul 13:46:47.485 # +new-epoch 9602
9932:X 18 Jul 13:46:47.485 # +try-failover master Redis 10.208.202.112 6204
9932:X 18 Jul 13:46:47.487 # +vote-for-leader eb84f5a615bc02d9de4674dd33136fcde3f318dd 9602
9932:X 18 Jul 13:46:47.491 # daf5bf045ca5733de9fdd7f0206aa225d993c100 voted for eb84f5a615bc02d9de4674dd33136fcde3f318dd 9602
9932:X 18 Jul 13:46:47.492 # 4009be0fbc29c385eb68a6492c8c94ce1b61e31b voted for eb84f5a615bc02d9de4674dd33136fcde3f318dd 9602
9932:X 18 Jul 13:46:47.559 # +elected-leader master Redis 10.208.202.102 6204
9932:X 18 Jul 13:46:47.559 # +failover-state-select-slave master Redis 10.208.202.112 6204
9932:X 18 Jul 13:46:47.630 # -failover-abort-no-good-slave master Redis 10.208.202.112 6204
And Sentinel is unable to select any master after reboot, and this msg keeps coming up again and again, but all the instances are up and running.
Sentinels are on the same machines as redis slaves and master.
I'm rebooting redis isntances and sentinels on all machines after issue appears and then everything gets back to normal.
Does anyone know what can I do to make it work properly after reboot? When I'm doing normal failover, eg. shutting down master master it works fine, new master get's ellected and everything runs smooth. I don't even exactly care on which host the master gonna be, it can do a failover anytime it wants, I just need it to work after the machine reboot.

Looks like the reason was that I did not put masterauth parameter in master config, just the requirepass, seems like master config needs both.

Related

Redis Clustering Waiting for the cluster to join Private VPC

I have 3 EC2 Instances with Redis Running like this:
Server 001: 10.0.1.203, Port: 6379
Server 002: 10.0.1.202, Port: 6380
Server 003: 10.0.1.190, Port: 6381
Config file for each one:
# bind 127.0.0.1
protected-mode no
port PORT
pidfile /var/run/redis_PORT.pid
cluster-enabled yes
cluster-config-file nodes-PORT.conf
cluster-node-timeout 15000
I can connect via redis to each one on each server.
But when I run the cluster creation the script never ends on Server 001.
root#ip-10-0-1-203:~/redis-stable# redis-cli --cluster create 10.0.1.203:6379 10.0.1.202:6380 10.0.1.190:6381
>>> Performing hash slots allocation on 3 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
M: 4c0b7609e5d906ff58d67ab446bbd9e20833e0db 10.0.1.203:6379
slots:[0-5460] (5461 slots) master
M: a5dbd72815a1875b58a0cc0fd6a52dc0b76735b7 10.0.1.202:6380
slots:[5461-10922] (5462 slots) master
M: 14d39c0876a982cadd50f301a3d35715171279c0 10.0.1.190:6381
slots:[10923-16383] (5461 slots) master
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
....................................................................................................................................................................................................................................................................................................................................
Server 002 logs:
44119:M 02 Nov 2020 13:30:03.477 * Ready to accept connections
44119:M 02 Nov 2020 13:30:45.362 # configEpoch set to 0 via CLUSTER RESET HARD
44119:M 02 Nov 2020 13:30:45.362 * Node hard reset, now I'm a5dbd72815a1875b58a0cc0fd6a52dc0b76735b7
44119:M 02 Nov 2020 13:30:59.352 # configEpoch set to 2 via CLUSTER SET-CONFIG-EPOCH
Server 003 logs:
44033:M 02 Nov 2020 13:30:50.695 # configEpoch set to 0 via CLUSTER RESET HARD
44033:M 02 Nov 2020 13:30:50.695 * Node hard reset, now I'm 14d39c0876a982cadd50f301a3d35715171279c0
44033:M 02 Nov 2020 13:30:59.346 # configEpoch set to 3 via CLUSTER SET-CONFIG-EPOCH
Am missing something on the configuration?

Probably the Redis Cluster port is not accessible on the EC2 instances.
From the Redis Cluster Specification:
Every Redis Cluster node has an additional TCP port for receiving incoming connections from other Redis Cluster nodes. This port is at a fixed offset from the normal TCP port used to receive incoming connections from clients. To obtain the Redis Cluster port, 10000 should be added to the normal commands port. For example, if a Redis node is listening for client connections on port 6379, the Cluster bus port 16379 will also be opened.

redis-ha in kubernetes cannot failover back to master

I am trying to create a simple redis high availability setup with 1 master, 1 slave and 2 sentinels.
The setup works perfectly when failing over from redis-master to redis-slave.
When redis-master recovers, it correctly register itself as slave to the new redis-slave master.
However, when redis-slave as a master goes down, redis-master cannot return as master. The log of redis-master go into the loop showing:
1:S 12 Dec 11:12:35.073 * MASTER <-> SLAVE sync started
1:S 12 Dec 11:12:35.073 * Non blocking connect for SYNC fired the event.
1:S 12 Dec 11:12:35.074 * Master replied to PING, replication can continue...
1:S 12 Dec 11:12:35.075 * Trying a partial resynchronization (request 684581a36d134a6d50f1cea32820004a5ccf3b2d:285273).
1:S 12 Dec 11:12:35.076 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 12 Dec 11:12:36.081 * Connecting to MASTER 10.102.1.92:6379
1:S 12 Dec 11:12:36.081 * MASTER <-> SLAVE sync started
1:S 12 Dec 11:12:36.082 * Non blocking connect for SYNC fired the event.
1:S 12 Dec 11:12:36.082 * Master replied to PING, replication can continue...
1:S 12 Dec 11:12:36.083 * Trying a partial resynchronization (request 684581a36d134a6d50f1cea32820004a5ccf3b2d:285273).
1:S 12 Dec 11:12:36.084 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 12 Dec 11:12:37.087 * Connecting to MASTER 10.102.1.92:6379
1:S 12 Dec 11:12:37.088 * MASTER <-> SLAVE sync started
...
Per Replication doc, it states that:
Since Redis 4.0, when an instance is promoted to master after a
failover, it will be still able to perform a partial resynchronization
with the slaves of the old master.
But the log seems to show otherwise. More detail version of log showing both the first redis-master to redis-slave failover and subsequent redis-slave to redis-master log is available here.
Any idea what's going on? What do I have to do to allow the redis-master to return to master role? Configuration detail is provided below:
SERVICES
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
redis-master ClusterIP 10.102.1.92 <none> 6379/TCP 11m
redis-slave ClusterIP 10.107.0.73 <none> 6379/TCP 11m
redis-sentinel ClusterIP 10.110.128.95 <none> 26379/TCP 11m
redis-master config
requirepass test1234
masterauth test1234
dir /data
tcp-keepalive 60
maxmemory-policy noeviction
appendonly no
bind 0.0.0.0
save 900 1
save 300 10
save 60 10000
slave-announce-ip redis-master.fp8-cache
slave-announce-port 6379
redis-slave config
requirepass test1234
slaveof redis-master.fp8-cache 6379
masterauth test1234
dir /data
tcp-keepalive 60
maxmemory-policy noeviction
appendonly no
bind 0.0.0.0
save 900 1
save 300 10
save 60 10000
slave-announce-ip redis-slave.fp8-cache
slave-announce-port 6379

It turn out that the problem is related to the used of host name instead of IP:
slaveof redis-master.fp8-cache 6379
...
slave-announce-ip redis-slave.fp8-cache
So, when the master came back as slave, sentinel shows that there are now 2 slaves: one with ip address and another with host name. Not sure exactly how does these 2 slave entries (that points to the same Redis server) cause the problem above. Now that I changed the config to use IP address instead of host name the Redis HA is working flawlessly.

Redis sentinel marks slaves as down

I'm trying to setup a typical redis sentinel configuration, with three machines that will run three redis servers and three redis sentinels. The Master/Slave part of the redis servers are working OK, but the sentinels are not working. When I start two sentinels, the sentinel with the master detects the slaves, but mark them as down after the specified amount of time. I'm running Redis 3.0.5 64-bit in debian jessie machines.
8319:X 22 Dec 14:06:17.855 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
8319:X 22 Dec 14:06:17.855 # Sentinel runid is cdd5bbd5b84c876982dbca9d45ecc4bf8500e7a2
8319:X 22 Dec 14:06:17.855 # +monitor master mymaster xxxxxxxx0 6379 quorum 2
8319:X 22 Dec 14:06:18.857 * +slave slave xxxxxxxx2:6379 xxxxxxx2 6379 # mymaster xxxxxxx0 6379
8319:X 22 Dec 14:06:18.858 * +slave slave xxxxxx1:6380 xxxxxxx1 6380 # mymaster xxxxxxx0 6379
8319:X 22 Dec 14:07:18.862 # +sdown slave xxxxxxxx1:6380 xxxxxxx1 6380 # mymaster xxxxxx0 6379
8319:X 22 Dec 14:07:18.862 # +sdown slave xxxxxx2:6379 xxxxxxx2 6379 # mymaster xxxxxx0 6379
Sentinel config file:
daemonize yes
pidfile "/var/run/redis/redis-sentinel.pid"
logfile "/var/log/redis/redis-sentinel.log"
bind 127.0.0.1 xxxxxxx0
port 26379
sentinel monitor mymaster xxxxxxx0 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
dir "/var/lib/redis"
Of course, there is connectivity between these machines, as the slaves are working OK:
7553:S 22 Dec 13:46:33.285 * Connecting to MASTER xxxxxxxx0:6379 <br/>
7553:S 22 Dec 13:46:33.286 * MASTER <-> SLAVE sync started
7553:S 22 Dec 13:46:33.286 * Non blocking connect for SYNC fired the event.
7553:S 22 Dec 13:46:33.287 * Master replied to PING, replication can continue...
7553:S 22 Dec 13:46:33.288 * Partial resynchronization not possible (no cached master)
7553:S 22 Dec 13:46:33.291 * Full resync from master: f637ca8fe003acd09c6d021aed3f89a0d9994c9b:98290
7553:S 22 Dec 13:46:33.350 * MASTER <-> SLAVE sync: receiving 18 bytes from master
7553:S 22 Dec 13:46:33.350 * MASTER <-> SLAVE sync: Flushing old data
7553:S 22 Dec 13:46:33.350 * MASTER <-> SLAVE sync: Loading DB in memory
7553:S 22 Dec 13:46:33.350 * MASTER <-> SLAVE sync: Finished with success
7553:S 22 Dec 14:01:33.072 * 1 changes in 900 seconds. Saving...

I can answer myself. The problem was that the first IP that appeared in the sentinel conf was the localhost ip. It needs to be the binding IP. Just in case it serves anyone.

In Redis-sentinel master-slave cofiguration new master is shown as down for the initial master

Have installed a redis (v. 3.0.4) master-slave model using 3 nodes (1 master and 2 slaves) with requirepass for each node as described in https://www.digitalocean.com/community/tutorials/how-to-configure-a-redis-cluster-on-ubuntu-14-04 then started 3 sentinel on each node as described in the article http://blog.commando.io/redis-is-easy-trivial-hard/
After have tried to take down the master, sentinel has promoted one of the slaves to a master as expected. Then when the old master was up
again it became a slave and recognize a new master, this could be seen in the
/etc/redis/sentinel.conf which was updated with the new master IP
in 'sentinel monitor redis-cluster' attribute.
But have noticed that the old master despite knowing the new master IP, it considers the new master as down, unlike the other slave which see it up. This could checked by running this command against the old master:
$redis-cli -a altoros info replication
#
Replication
role:slave
master_host: new master ip
master_port:6379
master_link_status:down
This also seems to be causing the following error "MASTERDOWN Link with MASTER is down and slave-serve-stale-data is set to 'no'", when trying to use a synchronous client for testing data replication over nodes.
The logs of the old masters (/var/log/redis/redis-server.log) are showing:
20731:S 09 Nov 10:16:31.117 * Connecting to MASTER <new master="" ip="">: 6379
20731:S 09 Nov 10:16:31.117 * MASTER <-> SLAVE sync started
20731:S 09 Nov 10:16:31.118 * Non blocking connect for SYNC fired the event.
20731:S 09 Nov 10:16:31.118 * Master replied to PING, replication can continue...
20731:S 09 Nov 10:16:31.119 * (Non critical) Master does not under stand REPLCONF listening-port: -NOAUTH Authentication required.
20731:S 09 Nov 10:16:31.119 * (Non critical) Master does not under stand REPLCONF capa:
-NOAUTH Authentication required.
This looks like the old master cannot authenticate to the new master, because it
doesn't have his password, but how to set that properly?
Because have noticed that /etc/redis/redis.conf did not changed after a new master was
promoted, unlike /etc/redis/sentinel.conf, and this could cause that as redis.conf of the master doesn't the password of the new master.
Would appreciate any hint to resolve the issue, thanks in advance.

The master needs to be configured just like a slave because it might become one, someday. As such you need to set it's masterauth to the password for the pod.
You can do this without restarting y doing the following against the "old master":
redis-cli -h oldmasterip -a thepassword config set masterauth thepassword
redis-cli -h oldmasterip -a thepassword config rewrite
And it should be fine from that point, and the config file will be updated.

redis 2.8.7 sentinel environment configuration questions for linux, how to make it autostart, what they should subscribe to?

for now we're trying to play with redis 2.8.7 as cache storage (from the .NET web application using booksleeve client).
It seems to be very interesting and exciting task at the moment, redis documentation is very nice, but due to lack of real practical experience I do have couple of questions about how expected configuration should be done properly.
I took next articles as main configuration source:
Installing redis with autostart capability (using an init script, so that after a restart everything will start again properly) : http://redis.io/topics/quickstart
Deployment of the redis into azure: http://haishibai.blogspot.com/2014/01/walkthrough-setting-up-redis-cluster-on.html
Initial idea/assumption - is to have 1 redis master and 2 slave instances running with linux Ubuntu. In order to provide high availability of the instances - I've decided to use sentinel. So my expected configuration looks like this at the moment:
MasterInstance: VM1 (linux, Ubuntu), port : 6379 (autostarted when linux is restarted)
Slave1: VM2 (linux, ubuntu), port : 6380 (autostarted when linux is restarted) : slaveOf MasterID 6379
Slave2: VM3 (linux, ubuntu), port : 6379 (autostarted when linux is restarted) : slaveOf MasterIP 6379
After VMs started, I can see that I've got 2 slaves successfully connected and syncing with Master:
Trace sample from the master:
[1120] 25 Mar 14:11:18.629 - 1 clients connected (0 slaves), 793352 bytes in use
[1120] 25 Mar 14:11:18.634 * Slave asks for synchronization
[1120] 25 Mar 14:11:18.634 * Full resync requested by slave.
[1120] 25 Mar 14:11:18.634 * Starting BGSAVE for SYNC
[1120] 25 Mar 14:11:18.634 * Background saving started by pid 1227
[1227] 25 Mar 14:11:18.810 * DB saved on disk
[1227] 25 Mar 14:11:18.810 * RDB: 0 MB of memory used by copy-on-write
[1120] 25 Mar 14:11:18.836 * Background saving terminated with success
[1120] 25 Mar 14:11:18.837 * Synchronization with slave succeeded
[1120] 25 Mar 14:11:23.829 - DB 0: 2 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:23.829 - DB 2: 4 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:23.829 - 0 clients connected (1 slaves), 1841992 bytes in use
[1120] 25 Mar 14:11:29.011 - DB 0: 2 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:29.011 - DB 2: 4 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:29.011 - 0 clients connected (1 slaves), 1841992 bytes in use
[1120] 25 Mar 14:11:29.826 - Accepted 168.62.36.189:1024
[1120] 25 Mar 14:11:29.828 * Slave asks for synchronization
[1120] 25 Mar 14:11:29.828 * Full resync requested by slave.
[1120] 25 Mar 14:11:29.828 * Starting BGSAVE for SYNC
[1120] 25 Mar 14:11:29.828 * Background saving started by pid 1321
[1321] 25 Mar 14:11:29.871 * DB saved on disk
[1321] 25 Mar 14:11:29.871 * RDB: 0 MB of memory used by copy-on-write
[1120] 25 Mar 14:11:29.943 * Background saving terminated with success
[1120] 25 Mar 14:11:29.946 * Synchronization with slave succeeded
[1120] 25 Mar 14:11:34.195 - DB 0: 2 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:34.195 - DB 2: 4 keys (0 volatile) in 4 slots HT.
[1120] 25 Mar 14:11:34.195 - 0 clients connected (2 slaves), 1862920 bytes in use
now I need to setup sentinel instances ...
I copied sentinel.conf from the initial redis-stable package into 3 VM runnung redis (1 master and both slaves)
Inside each config I've done next modifications:
sentinel monitor mymaster MasterPublicIP 6379 2
on each VM started sentinel using next command line:
redis-server /etc/redis/sentinel.conf -- sentinel
After that I've got the response that sentinel successfully started ... on all VMs...
After I started all 3 sentinel instances I've got next trace sample (sentinel.conf files were updated with information about slaves and other sentinel instances):
[1743] 25 Mar 16:35:46.450 # Sentinel runid is 05380d689af9cca1e826ce9c85c2d68c65780878
[1743] 25 Mar 16:35:46.450 # +monitor master mymaster MasterIP 6379 quorum 2
[1743] 25 Mar 16:36:11.578 * -dup-sentinel master mymaster MasterIP 6379 #duplicate of 10.119.112.41:26379 or 83666bdd03fd064bcf2ec41ec2134d4e1e239842
[1743] 25 Mar 16:36:11.578 * +sentinel sentinel 10.119.112.41:26379 10.119.112.41 26379 # mymaster 168.62.41.1 6379
[1743] 25 Mar 16:36:16.468 # +sdown sentinel 10.175.220.134:26379 10.175.220.134 26379 # mymaster 168.62.41.1 6379
[1743] 25 Mar 16:36:40.876 * -dup-sentinel master mymaster MasterIP 6379 #duplicate of 10.175.220.134:26379 or fe9edeb321e04070c6ac6e28f52c05317a593ffd
[1743] 25 Mar 16:36:40.876 * +sentinel sentinel 10.175.220.134:26379 10.175.220.134 26379 # mymaster 168.62.41.1 6379
[1743] 25 Mar 16:37:10.962 # +sdown sentinel 10.175.220.134:26379 10.175.220.134 26379 # mymaster 168.62.41.1 6379
based on the trace sample, I have next questions. It will be great, if someone can clarify them:
Why do I see -dup-sentinel master mymaster configuration here ... Is it because I added 3 sentinels for the same master instance (maybe I need to register 1 sentinel per instance of redis - so 1 sentinel is going to be mapped to the master and 2 other sentinels - to the 2 slaves)?
how to start sentinels in the way redis servers is started (automatically even then VM is restarted)? - do I need to perform same actions and register them as ordinary redis-server instances?
Is it ok to have sentinel instance to be hosted in the same VM as redis-server?
After that I started new putty connection and started redis-cli to work with sentinel APIs, but received next response on my command below:
127.0.0.1:6379> SENTINEL masters
(error) ERR unknown command 'SENTINEL'
I guess I've done something stupid here... :(
What I've done wrong and how to test sentinel APIs from the terminal connection?
Thank you in advance for any help.

I guess "SENTINEL masters" should be run on the Redis sentinel
redis-cli -p 26379 (which the default sentinel port)
then issue
127.0.0.1:26379> SENTINEL masters
and you will get something
1) "name"
2) "mymaster"
3) "ip"
4) "127.0.0.1"
5) "port"
6) "6379"
.
.
.
To start sentinels automatically even then VM is restarted
first set daemonize yes into sentinel.conf
and modify the init script here (https://github.com/antirez/redis/blob/unstable/utils/redis_init_script) to reflect the sentinel port and .conf location.
$EXEC $CONF --sentinel # starting in Sentinel mode
and the rest is like you did for redis server.

First, you don't run Sentinel on the master. Sentinel is designed to detect when the master fails. If you run Sentinel on the same system as the master, you will lose a Sentinel when you lose the system. For the same reasons you shouldn't use the slaves as your additional test points.
You want to run Sentinel from where the clients run - to ensure you are testing for network outages.
Next, you mention you added slave information to your sentinel configs. You don't configure slaves in sentinel - it discovers them through the master. I suspect you've added additional sentinel monitor commands for each slave - this would indeed cause duplicate monitoring attempts.
Third, as #yofpro mentioned, to run sentinel commands you need to connect to sentinel -not Redis master or slaves.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Redis failover fails after hosts reboot - redis

Looks like the reason was that I did not put masterauth parameter in master config, just the requirepass, seems like master config needs both.

Related

Redis Clustering Waiting for the cluster to join Private VPC

redis-ha in kubernetes cannot failover back to master

Redis sentinel marks slaves as down

In Redis-sentinel master-slave cofiguration new master is shown as down for the initial master

redis 2.8.7 sentinel environment configuration questions for linux, how to make it autostart, what they should subscribe to?

Categories

Resources