Redis sentinel node can not sync after failover - redis

We have setup Redis with sentinel high availability using 3 nodes. Suppose fist node is master, when we reboot first node, failover happens and second node becomes master, until this point every thing is OK. But when fist node comes back it cannot sync with master and we saw that in its config no "masterauth" is set.
Here is the error log and Generated by CONFIG REWRITE config:
1182:S 29 May 2021 13:49:42.713 * Reconnecting to MASTER 192.168.1.2:6379 after failure
1182:S 29 May 2021 13:49:42.716 * MASTER <-> REPLICA sync started
1182:S 29 May 2021 13:49:42.716 * Non blocking connect for SYNC fired the event.
1182:S 29 May 2021 13:49:42.717 * Master replied to PING, replication can continue...
1182:S 29 May 2021 13:49:42.717 * (Non critical) Master does not understand REPLCONF listening-port: -NOAUTH Authentication required.
1182:S 29 May 2021 13:49:42.717 * (Non critical) Master does not understand REPLCONF capa: -NOAUTH Authentication required.
1182:S 29 May 2021 13:49:42.717 * Partial resynchronization not possible (no cached master)
1182:S 29 May 2021 13:49:42.718 # Unexpected reply to PSYNC from master: -NOAUTH Authentication required.
1182:S 29 May 2021 13:49:42.718 * Retrying with SYNC...
# Generated by CONFIG REWRITE
save 3600 1
save 300 100
save 60 10000
user default on #eb5fbb922a75775721db681c49600c069cf686765eeebaa6e18fad195812140d ~* &* +#all
replicaof 192.168.1.2 6379
What is the problem?
Config Sample:
bind 127.0.0.1 -::1 192.168.1.3
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
supervised systemd
pidfile "/var/run/redis_6379.pid"
loglevel notice
logfile ""
databases 16
always-show-logo no
set-proc-title yes
proc-title-template "{title} {listen-addr} {server-mode}"
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump.rdb"
rdb-del-sync-files no
dir "/"
replicaof 192.168.1.2 6379
masterauth "redis"
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-diskless-load disabled
repl-disable-tcp-nodelay no
replica-priority 100
acllog-max-len 128
requirepass "redis"
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
lazyfree-lazy-user-del no
lazyfree-lazy-user-flush no
oom-score-adj no
oom-score-adj-values 0 200 800
disable-thp yes
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4kb
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
jemalloc-bg-thread yes

For those who may run into same problem, problem was REDIS misconfiguration, after third deployment we carefully set parameters and no problem was found.

Related

FATAL: Data file was created with a Redis server configured to handle more than 1 databases. Exiting

I have one redis instance and my redis.conf file is:
# masterauth [password]
# requirepass [password]
bind 0.0.0.0
protected-mode no
# port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
pidfile /var/run/redis_6380.pid
loglevel verbose
#databases 1
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /data
appendonly no
appendfilename "appendonly.aof"
# appendfsync always
appendfsync everysec
# appendfsync no
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble no
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
maxmemory 1000mb
maxmemory-policy volatile-ttl
port 6379
databases 1
requirepass [password]
masterauth [password]
but my redis can not start!!!
logs:
22:C 19 Aug 2021 07:39:51.385 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
22:C 19 Aug 2021 07:39:51.385 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=22, just started
22:C 19 Aug 2021 07:39:51.385 # Configuration loaded
22:M 19 Aug 2021 07:39:51.387 * Running mode=standalone, port=6379.
22:M 19 Aug 2021 07:39:51.387 # Server initialized
22:M 19 Aug 2021 07:39:51.468 # FATAL: Data file was created with a Redis server configured to handle more than 1 databases. Exiting
ERROR is:
22:M 19 Aug 2021 07:39:51.468 # FATAL: Data file was created with a Redis server configured to handle more than 1 databases. Exiting
what am i doing??
I solved the problem!!!
I was deployed in k8s and using PVC. I don't know why, but after remove PVC and bound new one, this error was fixed

redis cluster TPS toooooo low, only 8

this is bench result
C:\Users\LG520\Desktop> redisbench -cluster=true -a 192.168.1.61:6380,192.168.1.61:6381,192.168.1.61:6382 -c 10 -n 100 -d 1000
2020/12/22 14:43:50 Go...
2020/12/22 14:43:50 # BENCHMARK CLUSTER (192.168.1.61:6380,192.168.1.61:6381,192.168.1.61:6382, db:0)
2020/12/22 14:43:50 * Clients Number: 10, Testing Times: 100, Data Size(B): 1000
2020/12/22 14:43:50 * Total Times: 1000, Total Size(B): 1000000
2020/12/22 14:46:13 # BENCHMARK DONE
2020/12/22 14:46:13 * TIMES: 1000, DUR(s): 143.547, TPS(Hz): 6
i build a redis cluster, but redisbench result is too low;
this is cluster info
[root#SZFT-LINUX chen]# ./redis-6.0.6/src/redis-cli -c -p 6380
127.0.0.1:6380> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:2616
cluster_stats_messages_pong_sent:3260
cluster_stats_messages_sent:5876
cluster_stats_messages_ping_received:3255
cluster_stats_messages_pong_received:2616
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:5876
127.0.0.1:6380>
127.0.0.1:6380> cluster nodes
c12b3dbe5dbfe23a8bf0c180cbcdd6aaec98c4aa 192.168.1.61:6382#16382 master - 0 1608621050071 3 connected 10923-16383
3adf356189ddc44547b662b4f5f05f85f2cf016b 192.168.1.61:6385#16385 slave 8af6ca7a04368dd2cd7f40b76f3ac43fc0741812 0 1608621048057 2 connected
4a92459e43eff69aa6a0f603e13310b1a679b98d 192.168.1.61:6380#16380 myself,master - 0 1608621049000 1 connected 0-5460
72c20f23d93d87f75d78df4fa19e7cfa7a6f392e 192.168.1.61:6383#16383 slave c12b3dbe5dbfe23a8bf0c180cbcdd6aaec98c4aa 0 1608621048000 3 connected
fd16d8cd8226d3e6ee8854f642f82159c97eaa48 192.168.1.61:6384#16384 slave 4a92459e43eff69aa6a0f603e13310b1a679b98d 0 1608621047049 1 connected
8af6ca7a04368dd2cd7f40b76f3ac43fc0741812 192.168.1.61:6381#16381 master - 0 1608621049060 2 connected 5461-10922
127.0.0.1:6380>
redis version: 6.0.6
i build in docker for the first time(i thought the low TPS was due to docker ), now i build in centos 7, got the same result ;
this is one of the redis.conf, 6 in total
port 6383
#dbfilename dump.rdb
#save 300 10
save ""
appendonly yes
appendfilename appendonly.aof
# appendfsync always
appendfsync everysec
# appendfsync no
dir /home/chen/redis-hd/node6383/data
maxmemory 2G
logfile /home/chen/redis-hd/node6383/data/redis.log
protected-mode no
maxmemory-policy allkeys-lru
# bind 127.0.0.1
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 15000
cluster-slave-validity-factor 10
cluster-migration-barrier 1
cluster-require-full-coverage yes
cluster-announce-ip 192.168.1.61
no-appendfsync-on-rewrite yes
i test one node redis, the tps is 2000,
why redis cluster'TPS is lower than singele node?
anybody can help me, i will be very appreciated!

Redis Master Slave Switch after Aof rewrite

This Redis Cluster have 240 nodes (120 masters and 120 slaves), and works well for a long time. But now it get a Master Slave switch almost several hours.
I get some log from Redis Server.
5c541d3a765e087af7775ba308f51ffb2aa54151
10.12.28.165:6502
13306:M 08 Mar 18:55:02.597 * Background append only file rewriting started by pid 15396
13306:M 08 Mar 18:55:41.636 # Cluster state changed: fail
13306:M 08 Mar 18:55:45.321 # Connection with slave client id #112948 lost.
13306:M 08 Mar 18:55:46.243 # Configuration change detected. Reconfiguring myself as a replica of afb6e012db58bd26a7c96182b04f0a2ba6a45768
13306:S 08 Mar 18:55:47.134 * AOF rewrite child asks to stop sending diffs.
15396:C 08 Mar 18:55:47.134 * Parent agreed to stop sending diffs. Finalizing AOF...
15396:C 08 Mar 18:55:47.134 * Concatenating 0.02 MB of AOF diff received from parent.
15396:C 08 Mar 18:55:47.135 * SYNC append only file rewrite performed
15396:C 08 Mar 18:55:47.186 * AOF rewrite: 4067 MB of memory used by copy-on-write
13306:S 08 Mar 18:55:47.209 # Cluster state changed: ok
5ac747878f881349aa6a62b179176ddf603e034c
10.12.30.107:6500
22825:M 08 Mar 18:55:30.534 * FAIL message received from da493af5bb3d15fc563961de09567a47787881be about 5c541d3a765e087af7775ba308f51ffb2aa54151
22825:M 08 Mar 18:55:31.440 # Failover auth granted to afb6e012db58bd26a7c96182b04f0a2ba6a45768 for epoch 323
22825:M 08 Mar 18:55:41.587 * Background append only file rewriting started by pid 23628
22825:M 08 Mar 18:56:24.200 # Cluster state changed: fail
22825:M 08 Mar 18:56:30.002 # Connection with slave client id #382416 lost.
22825:M 08 Mar 18:56:30.830 * FAIL message received from 0decbe940c6f4d4330fae5a9c129f1ad4932405d about 5ac747878f881349aa6a62b179176ddf603e034c
22825:M 08 Mar 18:56:30.840 # Failover auth denied to d46f95da06cfcd8ea5eaa15efabff5bd5e99df55: its master is up
22825:M 08 Mar 18:56:30.843 # Configuration change detected. Reconfiguring myself as a replica of d46f95da06cfcd8ea5eaa15efabff5bd5e99df55
22825:S 08 Mar 18:56:31.030 * Clear FAIL state for node 5ac747878f881349aa6a62b179176ddf603e034c: slave is reachable again.
22825:S 08 Mar 18:56:31.030 * Clear FAIL state for node 5c541d3a765e087af7775ba308f51ffb2aa54151: slave is reachable again.
22825:S 08 Mar 18:56:31.294 # Cluster state changed: ok
22825:S 08 Mar 18:56:31.595 * Connecting to MASTER 10.12.30.104:6404
22825:S 08 Mar 18:56:31.671 * MASTER SLAVE sync started
22825:S 08 Mar 18:56:31.671 * Non blocking connect for SYNC fired the event.
22825:S 08 Mar 18:56:31.672 * Master replied to PING, replication can continue...
22825:S 08 Mar 18:56:31.673 * Partial resynchronization not possible (no cached master)
22825:S 08 Mar 18:56:31.691 * AOF rewrite child asks to stop sending diffs.
It appends that Redis Master Slave Swtich happend after Aof rewtiting.
Here is the config of this cluster.
daemonize no
tcp-backlog 511
timeout 0
tcp-keepalive 60
loglevel notice
databases 16
dir "/var/cachecloud/data"
stop-writes-on-bgsave-error no
repl-timeout 60
repl-ping-slave-period 10
repl-disable-tcp-nodelay no
repl-backlog-size 10000000
repl-backlog-ttl 7200
slave-serve-stale-data yes
slave-read-only yes
slave-priority 100
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 512mb 128mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
port 6401
maxmemory 13000mb
maxmemory-policy volatile-lru
appendonly yes
appendfsync no
appendfilename "appendonly-6401.aof"
dbfilename "dump-6401.rdb"
aof-rewrite-incremental-fsync yes
no-appendfsync-on-rewrite yes
auto-aof-rewrite-min-size 62500kb
auto-aof-rewrite-percentage 86
rdbcompression yes
rdbchecksum yes
repl-diskless-sync no
repl-diskless-sync-delay 5
maxclients 10000
hll-sparse-max-bytes 3000
min-slaves-to-write 0
min-slaves-max-lag 10
aof-load-truncated yes
notify-keyspace-events ""
bind 10.12.26.226
protected-mode no
cluster-enabled yes
cluster-node-timeout 15000
cluster-slave-validity-factor 10
cluster-migration-barrier 1
cluster-config-file "nodes-6401.conf"
cluster-require-full-coverage no
rename-command FLUSHDB ""
rename-command FLUSHALL ""
rename-command KEYS ""
In my option, aof rewrite will not effect the Redis Main Thread. BUT this seems make this node not response other nodes' Ping.
Check THP(Transparent Huge pages) on Linux kernel parameter.
because AOF diff size 0.02MB, copy-on-write size 2067MB.

Apache 2.4.10 hangs AH00485: scoreboard is full, not at MaxRequestWorkers

Apache server will stay up for random amount of time, usually days, but eventually enters a hung state. When hung the CPU load gradually spikes on the machine and new web server requests are unresponsive.
Error logs typically contain lots of these:
Wed Jan 28 16:06:58.667188 2015] [mpm_event:error] [pid 25336:tid 1] AH00485: scoreboard is full, not at MaxRequestWorkers
Environment:
LDOM (VM) SunOS myhostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T200
http Conf:
StartServers 8
MinSpareServers Not set
MaxSpareServers Not set
ServerLimit 256
MaxRequestWorkers 100
MaxConnectionsPerChild 1000
KeepAlive On
TimeOut 3000
MaxKeepAliveRequests 50
KeepAliveTimeout 2
Current non-hung Score Board:
Server Version: Apache/2.4.10 (Unix)
Server MPM: event
Server Built: Oct 30 2014 16:29:03
Current Time: Wednesday, 28-Jan-2015 10:59:39 PST
Restart Time: Wednesday, 28-Jan-2015 09:49:21 PST
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 1 hour 10 minutes 17 seconds
Server load: 0.60 0.46 0.41
Total accesses: 1134 - Total Traffic: 2.2 GB
CPU Usage: u9.07 s16.94 cu609.51 cs69.31 - 16.7% CPU load
.269 requests/sec - 0.5 MB/second - 2.0 MB/request
1 requests currently being processed, 99 idle workers
PID Connections Threads Async connections
total accepting busy idle writing keep-alive closing
25337 0 yes 1 24 0 0 0
25338 1 yes 0 25 1 0 0
25339 1 yes 0 25 0 0 1
25340 1 yes 0 25 0 0 1
Sum 3 1 99 1 0 2
Any thoughts on http conf tuning, OS patches, apache bug fixes appreciated.
Yes I have seen the open ASF bugzilla for the same error message.
This is a production server, so you can imagine, having it go down at random times (usually when I am asleep) is not fun!

Redis server started, but killed in a second

I created 500 MB rdb file on OS X machine and Redis server works fine there. But on Ubuntu Server it is killed in several seconds after having started:
$ src/redis-server configFile_6381.conf
[1004] 30 Jan 15:50:27.591 * Max number of open files set to 10032
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 2.6.17 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in stand alone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6381
| `-._ `._ / _.-' | PID: 1004
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
[1004] 30 Jan 15:50:27.593 # Server started, Redis version 2.6.17
Killed
Config file (configFile_6381.conf):
daemonize yes
pidfile /var/run/redisVgo.pid
port 6381
timeout 0
tcp-keepalive 0
loglevel verbose
logfile /root/Dropbox/redis/_projects/vgo/vgo.log
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename vgo6381.rdb
dir ./
slave-serve-stale-data yes
slave-read-only yes
repl-disable-tcp-nodelay no
slave-priority 100
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
Sometimes before killing I see message in client output:
(error) LOADING Redis is loading the dataset in memory
Check if you run the same Redis version on both systems.
I used different Redis versions, which caused the trouble. On OS X it was 2.8, on Ubuntu Server it was 2.6. After I set up Redis 2.8 on Ubuntu Server, my .rdb file started there well.
I was running in a similar issue and it seems that my dump.rdb was corrupted. If you do not care about the data currently loaded in your Redis, you can simply remove /var/lib/redis/dump.rdb and restart redis-server.