Redis Sentinel Slave SDOWN is set to True - redis

I wrote a monitoring program to monitor the health of my Redis Sentinel HA cluster and it flagged that one slave is missing, node 10.10.10.30. After some debugging it turns out that slaves who are in sdown state true are filtered out.
My system consists of three nodes, 1 master, two slaves. Each node has sentinel deployed on it.
On the master, if I log on to redis-cli, the following is reported:
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.10.10.8,port=6379,state=online,offset=1409435252945,lag=1
slave1:ip=10.10.10.30,port=6379,state=online,offset=1409436519147,lag=1
master_repl_offset:1409439031250
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1409437982675
repl_backlog_histlen:1048576
All my redis servers as well as sentinels on each of my machines are up and running.
If I execute redis-cli -p 26379 on any of my machines and run sentinel slaves mymaster. I get a report of the same number of slaves as I have configured and running. However, node 10.10.10.30 is reporting this:
2) 1) "name"
2) "10.10.10.30:6379"
3) "ip"
4) "10.10.10.30"
5) "port"
6) "6379"
7) "runid"
8) ""
9) "flags"
10) "s_down,slave,disconnected"
11) "pending-commands"
12) "0"
13) "last-ping-sent"
14) "936737"
15) "last-ok-ping-reply"
16) "936737"
17) "last-ping-reply"
18) "936737"
19) "s-down-time"
20) "931725"
21) "down-after-milliseconds"
22) "5000"
23) "info-refresh"
24) "1589412820130"
25) "role-reported"
26) "slave"
27) "role-reported-time"
28) "936737"
29) "master-link-down-time"
30) "0"
31) "master-link-status"
32) "err"
33) "master-host"
34) "?"
35) "master-port"
36) "0"
37) "slave-priority"
38) "100"
39) "slave-repl-offset"
40) "0"
I don't understand how to get that node out of sdown state. All redis machines and sentinel deployments are using ports 6379 and 26379 respectively and the port are reachable.

I compared redis.conf and sentinel.conf with slaves that have no issue. The difference was the bind address. I changed it from 127.0.0.1 to bind 0.0.0.0 and restarted redis. The sdown state went away.

Related

redis cluster TPS toooooo low, only 8

this is bench result
C:\Users\LG520\Desktop> redisbench -cluster=true -a 192.168.1.61:6380,192.168.1.61:6381,192.168.1.61:6382 -c 10 -n 100 -d 1000
2020/12/22 14:43:50 Go...
2020/12/22 14:43:50 # BENCHMARK CLUSTER (192.168.1.61:6380,192.168.1.61:6381,192.168.1.61:6382, db:0)
2020/12/22 14:43:50 * Clients Number: 10, Testing Times: 100, Data Size(B): 1000
2020/12/22 14:43:50 * Total Times: 1000, Total Size(B): 1000000
2020/12/22 14:46:13 # BENCHMARK DONE
2020/12/22 14:46:13 * TIMES: 1000, DUR(s): 143.547, TPS(Hz): 6
i build a redis cluster, but redisbench result is too low;
this is cluster info
[root#SZFT-LINUX chen]# ./redis-6.0.6/src/redis-cli -c -p 6380
127.0.0.1:6380> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:2616
cluster_stats_messages_pong_sent:3260
cluster_stats_messages_sent:5876
cluster_stats_messages_ping_received:3255
cluster_stats_messages_pong_received:2616
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:5876
127.0.0.1:6380>
127.0.0.1:6380> cluster nodes
c12b3dbe5dbfe23a8bf0c180cbcdd6aaec98c4aa 192.168.1.61:6382#16382 master - 0 1608621050071 3 connected 10923-16383
3adf356189ddc44547b662b4f5f05f85f2cf016b 192.168.1.61:6385#16385 slave 8af6ca7a04368dd2cd7f40b76f3ac43fc0741812 0 1608621048057 2 connected
4a92459e43eff69aa6a0f603e13310b1a679b98d 192.168.1.61:6380#16380 myself,master - 0 1608621049000 1 connected 0-5460
72c20f23d93d87f75d78df4fa19e7cfa7a6f392e 192.168.1.61:6383#16383 slave c12b3dbe5dbfe23a8bf0c180cbcdd6aaec98c4aa 0 1608621048000 3 connected
fd16d8cd8226d3e6ee8854f642f82159c97eaa48 192.168.1.61:6384#16384 slave 4a92459e43eff69aa6a0f603e13310b1a679b98d 0 1608621047049 1 connected
8af6ca7a04368dd2cd7f40b76f3ac43fc0741812 192.168.1.61:6381#16381 master - 0 1608621049060 2 connected 5461-10922
127.0.0.1:6380>
redis version: 6.0.6
i build in docker for the first time(i thought the low TPS was due to docker ), now i build in centos 7, got the same result ;
this is one of the redis.conf, 6 in total
port 6383
#dbfilename dump.rdb
#save 300 10
save ""
appendonly yes
appendfilename appendonly.aof
# appendfsync always
appendfsync everysec
# appendfsync no
dir /home/chen/redis-hd/node6383/data
maxmemory 2G
logfile /home/chen/redis-hd/node6383/data/redis.log
protected-mode no
maxmemory-policy allkeys-lru
# bind 127.0.0.1
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 15000
cluster-slave-validity-factor 10
cluster-migration-barrier 1
cluster-require-full-coverage yes
cluster-announce-ip 192.168.1.61
no-appendfsync-on-rewrite yes
i test one node redis, the tps is 2000,
why redis cluster'TPS is lower than singele node?
anybody can help me, i will be very appreciated!

Redis Slave Sync's but does not continue replication

I have a Redis v3.0.2 Master and Slave
on the slave I issue slaveof (masterIP) 6379
Sync happens, logs looking fine on master and slave. Key counts look sane.
After the sync completes and the slave loads the database, No more operations happen.
running monitor on the master give me hundreds of sets / sec.
the slave only sees a few deletes and an occasional PING
Slave Log:
2734:S 16 Aug 07:23:29.460 * MASTER <-> SLAVE sync: Loading DB in memory
2734:S 16 Aug 07:25:16.531 * MASTER <-> SLAVE sync: Finished with success
Slave Monitor:
~
[119](root#[slave])[0]:\> redis-cli
127.0.0.1:6379> monitor
OK
1534405063.907020 [0 [master]:6379] "PING"
1534405065.409863 [0 [master]:6379] "DEL" "pmlock12"
1534405065.709784 [0 [master]:6379] "DEL" "pmlock22"
1534405065.909400 [0 [master]:6379] "DEL" "pmlock27"
Master Log
2951:C 16 Aug 07:20:57.908 * RDB: 279 MB of memory used by copy-on-write
2745:M 16 Aug 07:20:58.297 * Background saving terminated with success
2745:M 16 Aug 07:22:59.369 * Synchronization with slave 10.168.230.15:6379 succeeded
Master Monitor:
1534405287.136316 [0 [src]:54660] "SET" "CMP36" "{\"m_cur...
1534405252.002731 [0 [src]:45742] "SET" "PM14" "H4sIAAAAAAAAAO1cW4...
Master Info
[209](root#master)[0]:\> redis-cli info replication
# Replication
role:master
connected_slaves:1
slave0:ip=[slave],port=6379,state=online,offset=1747897005,lag=0
master_repl_offset:1748304094
repl_backlog_active:1
repl_backlog_size:104857600
repl_backlog_first_byte_offset:1643446495
repl_backlog_histlen:104857600
I've rebooted the master and slave, i just can't get the master to send through anything but ping and delete. I'm not well versed on redis so i'm sure i'm just missing something.

Redis on shared hosting connection timed out error

I have a local Laravel project that uses Redis and it is running fine on local environment. But then i moved the project to a shared hosting. I installed redis there, started the redis server. I verified that redis server is up and running. But whenever i am trying to access a page i get the following error:
(1/1) ConnectionException
Connection timed out [tcp://127.0.0.1:6379]
in AbstractConnection.php (line 155)
at AbstractConnection->onConnectionError('Connection timed out', 110)
in StreamConnection.php (line 128)
at StreamConnection->createStreamSocket(object(Parameters), 'tcp://127.0.0.1:6379', 4)
in StreamConnection.php (line 178)
at StreamConnection->tcpStreamInitializer(object(Parameters))
in StreamConnection.php (line 100)
at StreamConnection->createResource()
in AbstractConnection.php (line 81)
at AbstractConnection->connect()
in StreamConnection.php (line 258)
at StreamConnection->connect()
in AbstractConnection.php (line 180)
at AbstractConnection->getResource()
in StreamConnection.php (line 288)
at StreamConnection->write('*4 $9 ZREVRANGE $15 popular_threads $1 0 $1 5 ')
in StreamConnection.php (line 394)
at StreamConnection->writeRequest(object(ZSetReverseRange))
in AbstractConnection.php (line 110)
at AbstractConnection->executeCommand(object(ZSetReverseRange))
in Client.php (line 331)
at Client->executeCommand(object(ZSetReverseRange))
in Client.php (line 314)
at Client->__call('zrevrange', array('popular_threads', 0, 5))
in Connection.php (line 72)
It is working fine in my local pc. Does it have anything to do with shared hosting? How do i solve or troubleshoot this?
TIA, Yeasir

ERR Slot xxx is already busy (Redis::CommandError)

I want setup redis cluster with 6 nodes (node1, node2, node3, node4, node5, node6), which has 3 masters and 3 slaves. Each node has this configuration file
redis.conf
port 6379
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 10000
appendonly yes
I get error when create the cluster. Create command:
redis-trib.rb create --replicas 1 node1:6379 node2:6379 node3:6379 node4:6379 node5:6379 node6:6379
Error:
>>> Creating cluster
Connecting to node node1:6379: OK
Connecting to node node2:6379: OK
Connecting to node node3:6379: OK
Connecting to node node4:6379: OK
Connecting to node node5:6379: OK
Connecting to node node6:6379: OK
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
node6:6379
node5:6379
node4:6379
Adding replica node3:6379 to node6:6379
Adding replica node2:6379 to node5:6379
Adding replica node1:6379 to node4:6379
S: 1f13819038ba983bb8355f54cb8cec19d2b29e01 node1:6379
replicates 534745088c8b403b81d7e48a22d2e317fb420a38
S: 711461862393664b46d73db6561631f40de29561 node2:6379
replicates f503fe6fd52c73e446267795111ae6ea95495829
S: 204fa4e23b08e2c6ad80b0aca271fc380bc6885d node3:6379
replicates fe6a8e88afdb2796c09fcc873b37ba90c2ba6d79
M: 534745088c8b403b81d7e48a22d2e317fb420a38 node4:6379
slots:10923-16383 (5461 slots) master
M: f503fe6fd52c73e446267795111ae6ea95495829 node5:6379
slots:5461-10922 (5462 slots) master
M: fe6a8e88afdb2796c09fcc873b37ba90c2ba6d79 node6:6379
slots:0-5460,6918 (5462 slots) master
Can I set the above configuration? (type 'yes' to accept): yes
/var/lib/gems/1.8/gems/redis-3.2.2/lib/redis/client.rb:114:in `call': ERR Slot 16011 is already busy (Redis::CommandError)
from /var/lib/gems/1.8/gems/redis-3.2.2/lib/redis.rb:2646:in `method_missing'
from /var/lib/gems/1.8/gems/redis-3.2.2/lib/redis.rb:57:in `synchronize'
from /usr/lib/ruby/1.8/monitor.rb:242:in `mon_synchronize'
from /var/lib/gems/1.8/gems/redis-3.2.2/lib/redis.rb:57:in `synchronize'
from /var/lib/gems/1.8/gems/redis-3.2.2/lib/redis.rb:2645:in `method_missing'
from /home/hadoop/projects/ramin/redis-3.0.5/src/redis-trib.rb:205:in `flush_node_config'
from /home/hadoop/projects/ramin/redis-3.0.5/src/redis-trib.rb:667:in `flush_nodes_config'
from /home/hadoop/projects/ramin/redis-3.0.5/src/redis-trib.rb:666:in `each'
from /home/hadoop/projects/ramin/redis-3.0.5/src/redis-trib.rb:666:in `flush_nodes_config'
from /home/hadoop/projects/ramin/redis-3.0.5/src/redis-trib.rb:1007:in `create_cluster_cmd'
from /home/hadoop/projects/ramin/redis-3.0.5/src/redis-trib.rb:1388:in `send'
from /home/hadoop/projects/ramin/redis-3.0.5/src/redis-trib.rb:1388
I also did these, but also got same error message
use ip address instead of hostname
remove nodes.conf in each nodes
How said #thepirat000 (in all nodes did FLUSHALL and then CLUSTER RESET SOFT) i also changed hostname to ip address

Redis3 cluster infinite waiting for the cluster to join

I have 2 servers and 3 instances of redis3 in each of them. I have a cluster-nodes directory, where I have all the data of each instance. Here it is.
cluster-nodes/
|-- 7777
| |-- db01
| | -- nodes-7777.conf
| -- redis.conf
|-- 7778
| |-- db02
| | -- nodes-7778.conf
| -- redis.conf
-- 7779
|-- db03
| -- nodes-7779.conf
-- redis.conf
Here is my config file redis.conf under the 7777 directory
pidfile /var/run/redis/redis-7777.pid
port 7777
dir /opt/redis/cluster-nodes/7777/db01/
cluster-enabled yes
cluster-config-file nodes-7777.conf
cluster-node-timeout 15000
When I try to start redis I get
./redis-trib.rb create --replicas 1 127.0.0.1:7777 127.0.0.1:7778 127.0.0.1:7779 192.168.56.41:7777 192.168.56.41:7778 192.168.56.41:7779
>>> Creating cluster
Connecting to node 127.0.0.1:7777: OK
Connecting to node 127.0.0.1:7778: OK
Connecting to node 127.0.0.1:7779: OK
Connecting to node 192.168.56.41:7777: OK
Connecting to node 192.168.56.41:7778: OK
Connecting to node 192.168.56.41:7779: OK
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
127.0.0.1:7777
192.168.56.41:7777
127.0.0.1:7778
Adding replica 192.168.56.41:7778 to 127.0.0.1:7777
Adding replica 127.0.0.1:7779 to 192.168.56.41:7777
Adding replica 192.168.56.41:7779 to 127.0.0.1:7778
M: 209d68fae9c64855d34972f660232eb96370a669 127.0.0.1:7777
slots:0-5460 (5461 slots) master
M: 62e2b167a287b94b5154f7b9b0f226345baa81b7 127.0.0.1:7778
slots:10923-16383 (5461 slots) master
S: 36ed59deceb01788db76abc0c2f22925a27295fc 127.0.0.1:7779
replicates 2760b5fcc99c6563a7cf8deea159efb012309238
M: 2760b5fcc99c6563a7cf8deea159efb012309238 192.168.56.41:7777
slots:5461-10922 (5462 slots) master
S: 16bf95ba9cb743c2a3caecaab5c2fd5121d80557 192.168.56.41:7778
replicates 209d68fae9c64855d34972f660232eb96370a669
S: 30e7a5b4a94b5ff3a09f4809d6fd62edb2279b0e 192.168.56.41:7779
replicates 62e2b167a287b94b5154f7b9b0f226345baa81b7
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join.....................................................................................................................................................................................................................................................................^C./redis-trib.rb:534:in `sleep': Interrupt
from ./redis-trib.rb:534:in `wait_cluster_join'
from ./redis-trib.rb:1007:in `create_cluster_cmd'
from ./redis-trib.rb:1373:in `<main>'
Here is the output from cluster nodes on the first server
62e2b167a287b94b5154f7b9b0f226345baa81b7 127.0.0.1:7778 master - 0 1435144555558 2 connected 10923-16383
36ed59deceb01788db76abc0c2f22925a27295fc 127.0.0.1:7779 master - 0 1435144554554 3 connected
209d68fae9c64855d34972f660232eb96370a669 127.0.0.1:7777 myself,master - 0 0 1 connected 0-5460
And this is from the second
16bf95ba9cb743c2a3caecaab5c2fd5121d80557 127.0.0.1:7778 master - 0 1435144648065 5 connected
30e7a5b4a94b5ff3a09f4809d6fd62edb2279b0e 127.0.0.1:7779 master - 0 1435144647057 6 connected
2760b5fcc99c6563a7cf8deea159efb012309238 127.0.0.1:7777 myself,master - 0 0 4 connected 5461-10922
It seems that all of them are started as masters? Is there something wrong in my configs?
Thank you.
p.s. when I try the same configs and start all instances in one server, everything works fine.
The problem in my case was that I was starting the service with localhost address,
./redis-trib.rb create --replicas 1 127.0.0.1:7777 127.0.0.1:7778 127.0.0.1:7779 192.168.56.41:7777 192.168.56.41:7778 192.168.56.41:7779
In order to fix that 127.0.0.1 should be substituted with ip address of the local node, i.e.
./redis-trib.rb create --replicas 1 192.168.56.40:7777 192.168.56.40:7778 192.168.56.40:7779 192.168.56.41:7777 192.168.56.41:7778 192.168.56.41:7779
Please check the 17777 17778...port,cluster need those port to communication。