redis master slave replication stopped working - redis

I'm trying to figure out how to troubleshoot my redis master / slave replication. It has "just stopped" working.
Setup Information
Let's say my master's IP address is 10.1.2.3
Here's what I've checked so far:
I've restarted redis on both the master and slave... but anytime I run INFO REPLICATION on the slave it shows the link as "down"
Ran netstat -lnp on both the master and slave. Here's the output from the master:
masterdb:~# netstat -lnp | grep 6379
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN 21611/redis-server
tcp 0 0 10.1.2.3:6379 0.0.0.0:* LISTEN 21611/redis-server
And from the slave machine:
slavedb:~# netstat -lnp | grep 6379
tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN 5577/redis-server
tcp 0 0 :::6379 :::* LISTEN 5577/redis-server
slavedb:~#
I've checked the logs on both the master and the slave and I don't see any error messages. But I see timeout messages on the slave... which I think I've seen before, even when replication was working. The log looks like this on the slave:
5577:S 26 Oct 13:17:19.510 * MASTER <-> SLAVE sync started
5577:S 26 Oct 13:18:20.597 # Timeout connecting to the MASTER...
5577:S 26 Oct 13:18:20.597 * Connecting to MASTER 10.1.2.3:6379
5577:S 26 Oct 13:18:20.597 * MASTER <-> SLAVE sync started
5577:S 26 Oct 13:19:21.685 # Timeout connecting to the MASTER...
When i start the redis-cli on the slave and re-issue the slaveof command, i get this message:
127.0.0.1:6379> slaveof 10.1.2.3 6379
OK Already connected to specified master
127.0.0.1:6379>
I also tried the following commands on the master :
127.0.0.1:6379> save
OK
127.0.0.1:6379> bgsave
Background saving started
127.0.0.1:6379>
But that didn't resolve anything on the slave. It still says the link is down when I check the INFO on REPLCIATION:
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.1.2.3
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:1477488462
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
127.0.0.1:6379>
I'm not sure what else to check.

sounds like a networking issue. Try to
telnet masterip 6379 from the slave next time it happens to ensure it is a network issue.

Related

KeyDb replica master is connected, but not working

I want to create an active-active replication for keydb, I used the official docs https://docs.keydb.dev/docs/active-rep/ , however, I'm not getting the expected results, even though I'm not getting errors.
Config A:
port 6379
requirepass mypassword123
masterauth mypassword123
active-replica yes
replicaof 10.0.11.205 6379
Config B:
port 6379
requirepass mypassword123
masterauth mypassword123
active-replica yes
replicaof 10.0.11.208 6379

redis sentinel not promoting +sdown to +odown

I setup a cluster of 3 redis-sentinel (3.2.6-1) on three instance of redis-server (3.2.6-1).
I checked the firewall for the 6379 and 26379 TCP port and it's all good.
The configuration for my redis-sentinel is something like that:
port 26379
dir "/tmp"
sentinel myid 0559ec26112bebce70bbfa5849f77338453315b
sentinel monitor rback 10.3.0.43 6379 2
sentinel down-after-milliseconds rback 5000
sentinel failover-timeout rback 10000
daemonize yes
pidfile "/var/run/redis/redis-sentinel.pid"
loglevel notice
logfile "/var/log/redis/redis-sentinel.log"
When I start the redis-server and redis-sentinel instances, I can query on the port 26379 port sentinel master rback and see the options:
9) "flags"
10) "master"
...
31) "num-slaves"
32) "2"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
In the logs of the redis-sentinel, I see this:
26851:X 12 Jun 15:22:35.092 * +sentinel sentinel 4b22b6ff1b983432028f8cdb0db75cd553bec4b3 XXXXX 26379 # redis-back XXXXX 6379
26851:X 12 Jun 15:22:40.105 * +sentinel sentinel 8fc263bf82226364917478541c13f2c7f5b746e6 XXXXX 26379 # redis-back XXXXX 6379
26851:X 12 Jun 15:22:40.168 # +sdown sentinel 4b22b6ff1b983432028f8cdb0db75cd553bec4b3 XXXXX 26379 # redis-back XXXXX 6379
26851:X 12 Jun 15:22:45.120 # +sdown sentinel 8fc263bf82226364917478541c13f2c7f5b746e6 XXXXX 26379 # redis-back XXXXX 6379
And if I run the sleep command or crash the master redis, I see each sentinel logging a +sdown command, but never promote it to +odown and promoting a new master.
How can I debug this?
Thanks
Add Information:
I run a tcpdump and analyse the traffic with wireshark, and found out that the sentinel is connecting to the other sentinel and try to communicate with it, but receive a "DENIED Redis is running in protected mode...". Even though the redis-servers are not running in protected mode.
The problem is the communication between the sentinel.
Redis adds with 3.2 version a "protected-mode" configuration flag on the sentinel.conf too.
The sentinel will receive an error message "Denied Redis is running in protected mode..." if the sentinel doesn't have the flag.
I found this information here:
https://newbiedba.wordpress.com/2016/07/01/redis-3-2-sentinel-with-protected-mode/

Redis sentinel failover configuration receive always +sdown

I'm testing redis failover with this simple setup:
3 Ubuntu server 16.04
redis and redis-sentinel are configured on each box.
Master ip : 192.168.0.18
Resque ip : 192.168.0.16
Resque2 ip : 192.168.0.13
Data replication works well but I can't get failover to work.
When I start redis-sentinel I always get a +sdown message after 60 seconds:
14913:X 17 Jul 10:40:03.505 # +monitor master mymaster 192.168.0.18 6379 quorum 2
14913:X 17 Jul 10:41:03.525 # +sdown master mymaster 192.168.0.18 6379
this is the configuration file for redis-sentinel:
bind 192.168.0.18
port 16379
sentinel monitor mymaster 192.168.0.18 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 6000
loglevel verbose
logfile "/var/log/redis/sentinel.log"
repl-ping-slave-period 5
slave-serve-stale-data no
repl-backlog-size 8mb
min-slaves-to-write 1
min-slaves-max-lag 10
the bind directive uses the proper IP for each box.
I followed the redis tutorial here: https://redis.io/topics/sentinel but I can't get the failover to work.
Redis server version : 3.2.9
The issue is all about how redis-sentinel works because sentinel can not handle password protected redis-server.
In your redis-server configuration file (/etc/redis/redis.conf) do not use "requirepass" directive if you want to use redis-sentinel.

redis-cli redirected to 127.0.0.1

I started Redis cluster on PC1, then connected it on PC2. When needed to redirect to another cluster node, it shows Redirected to slot [7785] located at 127.0.0.1, but should show Redirected to slot [7785] located at [IP of PC1, like 192.168.1.20], then it shows an error. What is happening? What can I do?
The output:
[admin#localhost ~]$ redis-cli -c -h 192.168.1.20 -p 30001
192.168.1.20:30001> get foo
-> Redirected to slot [12182] located at 127.0.0.1:30003
Could not connect to Redis at 127.0.0.1:30003: Connection refused
Could not connect to Redis at 127.0.0.1:30003: Connection refused
not connected>
Output of redis-cli -h 192.168.1.20 -p 30001 cluster nodes:
5f6d6f1319318233917aba92b6ab0e244b3260d7 127.0.0.1:30004 slave 4c7b046ecaeb2dc689cbad21ee3466fb43b48fb9 0 14639
84410573 4 connected
e04d5b461cb6a2b48cb2a607e2140b7c1d32af25 127.0.0.1:30006 slave 3fc25c3851f7a9afd09b60739434118c25cd9243 0 14639
84410473 6 connected
3fc25c3851f7a9afd09b60739434118c25cd9243 127.0.0.1:30003 master - 0 1463984410573 3 connected 10923-16383
4c7b046ecaeb2dc689cbad21ee3466fb43b48fb9 127.0.0.1:30001 myself,master - 0 0 1 connected 0-5460
7383830ac84f199db346da3112b5aaf9e124d3cf 127.0.0.1:30005 slave 1eeeb51522aed364fcf9623d6045fa3df2748579 0 14639
84410573 5 connected
1eeeb51522aed364fcf9623d6045fa3df2748579 127.0.0.1:30002 master - 0 1463984410473 2 connected 5461-10922
Hey could you try binding your redis cluster instance to server's IP
Update your redis.conf to add
bind 172.31.28.76
PS- Update IP as required
That is because all your Redis IP addresses have updated to 127.0.0.1, and they believe other Redis are located in 127.0.0.1 too. That's not wrong if nodes in a cluster just communicate with each other, but definitely improper when a connection from other host want to know about the cluster.
In that situation, your client asked a Redis for a key it's not in charge and the Redis told the client to redirect to 127.0.0.1:30003. The client misunderstood it and tried to connect the port 30003 in its localhost, and certainly found nothing.
To fix it, try to send cluster meet with the right IP to each Redis in the cluster. I've made an experiment like this
# initial, Redis doesn't know its IP before a meet
127.0.0.1:7000> cluster nodes
8af9e47cb96f3bd8fff3800c38da11601157605d :7000 myself,master - 0 0 0 connected
# meet from 127.0.0.1, and their IP addresses updated to 127.0.0.1
127.0.0.1:7000> cluster meet 127.0.0.1 7001
OK
127.0.0.1:7000> cluster nodes
8af9e47cb96f3bd8fff3800c38da11601157605d 127.0.0.1:7000 myself,master - 0 0 0 connected
2c3d9b6c29f21ecd846f42bcfb238099d88b57df 127.0.0.1:7001 master - 0 1463987186714 1 connected
# send another meet, use the eth0 IP other than lo
127.0.0.1:7000> cluster meet 172.31.28.76 7001
OK
127.0.0.1:7000> cluster nodes
8af9e47cb96f3bd8fff3800c38da11601157605d 127.0.0.1:7000 myself,master - 0 0 0 connected
2c3d9b6c29f21ecd846f42bcfb238099d88b57df 172.31.28.76:7001 master - 0 1463987192672 1 connected
# connect to :7001, its cluster nodes are what we expect
127.0.0.1:7001> cluster nodes
2c3d9b6c29f21ecd846f42bcfb238099d88b57df 172.31.28.76:7001 myself,master - 0 0 1 connected
8af9e47cb96f3bd8fff3800c38da11601157605d 172.31.28.76:7000 master - 0 1463987203631 0 connected
# send another meet to fix
127.0.0.1:7001> cluster meet 172.31.28.76 7000
OK
# back to :7000, its address updated
127.0.0.1:7000> cluster nodes
8af9e47cb96f3bd8fff3800c38da11601157605d 172.31.28.76:7000 myself,master - 0 0 0 connected
2c3d9b6c29f21ecd846f42bcfb238099d88b57df 172.31.28.76:7001 master - 0 1463987210539 1 connected
In your case you may send multiple cluster meet commands to each Redis to ensure its IP updated at all its peers.
You said, you are running redis server on PC1.
Then mention PC1's IP address (in your case it's 192.168.1.20) while mentioning bind option in redis node config files.
Example of node config file for a cluster -
bind 192.168.1.20
port 6000
cluster-enabled yes
cluster-config-file "nodes.conf"
cluster-node-timeout 5000
appendonly yes
you have to use -c option
for example you want to use client on port 6379
$ service redis-server start
$ redis-cli -c -p 6379

Redis failover cluster convert-to-slave

After sentinel failover. My slave prometed as a new master. But my old master couldn't convert as a slave. Sentinel's log is below getting forever :)
26378:X 23 May 17:55:00.429 * +convert-to-slave slave 10.0.22.43:6379 10.0.22.43 6379 # master01 10.0.22.44 6379
I tested this redis v3.2.0 and v3.0.7
same error. I am missing something.