redis-master slave setup failing

redis-master slave setup failing - redis

I have started server with port 6001 as master with persistence aof turned off,slave with port 6002 as master of 6001.However on startup of slave i am getting below error in infinite loop also note able to find any error logs of the same..
Slave infinite loop logs :
[5556] 20 Aug 21:34:28.499 # Server started, Redis version 3.2.100
[5556] 20 Aug 21:34:28.500 * DB loaded from disk: 0.001 seconds
[5556] 20 Aug 21:34:28.500 * The server is now ready to accept connections on port 6002
[5556] 20 Aug 21:34:28.501 * Connecting to MASTER localhost:6001
[5556] 20 Aug 21:34:28.513 * MASTER <-> SLAVE sync started
[5556] 20 Aug 21:34:29.513 * Non blocking connect for SYNC fired the event.
[5556] 20 Aug 21:34:29.513 # Sending command to master in replication handshake: -Writing to master: Unknown error
[5556] 20 Aug 21:34:29.516 * Connecting to MASTER localhost:6001
[5556] 20 Aug 21:34:29.517 * MASTER <-> SLAVE sync started

Issue resolved,redis.conf contained 127.0.0.1 as bind value,and from slave redis.conf file ,I had SLAVE OF localhost .Replacing localhost with 127.0.0.1 resolved the issue

Related

Redis service automatically stops after few minutes of running

On my Ubuntu machine, redis server was running fine and suddenly it stops. After I started it, again it automatically stops after few minutes. So I start again, and so on. Why is this happening?
Here are the logs when I start redis:
21479:C 29 Apr 21:59:10.986 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
21479:C 29 Apr 21:59:10.987 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=21479, just started
21479:C 29 Apr 21:59:10.987 # Configuration loaded
21480:M 29 Apr 21:59:10.990 * Increased maximum number of open files to 10032 (it was originally set to 1024).
21480:M 29 Apr 21:59:10.991 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
21480:M 29 Apr 21:59:10.992 # Server initialized
21480:M 29 Apr 21:59:14.588 * DB loaded from disk: 3.596 seconds
21480:M 29 Apr 21:59:14.591 * Ready to accept connections

Kubernetes Redis Cluster PubSub Channels not getting synched on replica

I have set up a Redis cluster on Kubernetes, the cluster state is OK and the replica is connected to the master. Also as per the logs, the full synchronization is also completed. The logs are as follows:-
9:M 22 Oct 12:24:18.209 * Slave 192.168.1.41:6379 asks for synchronization
9:M 22 Oct 12:24:18.209 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '794b9c74abe40ac90c752f32a102078e063ff636', my replication IDs are '0f499740a46665d12fab921838297273279ad136' and '0000000000000000000000000000000000000000')
9:M 22 Oct 12:24:18.209 * Starting BGSAVE for SYNC with target: disk
9:M 22 Oct 12:24:18.211 * Background saving started by pid 231
231:C 22 Oct 12:24:18.215 * DB saved on disk
231:C 22 Oct 12:24:18.216 * RDB: 4 MB of memory used by copy-on-write
9:M 22 Oct 12:24:18.224 * Background saving terminated with success
9:M 22 Oct 12:24:18.224 * Synchronization with slave 192.168.1.41:6379 succeeded
Still, when I check the List of the PubSub Channels on the replica, it does not show the channels and thus it breaks the PubSub flow.
Any help/advise is appreciated.

Redis Master Slave Switch after Aof rewrite

This Redis Cluster have 240 nodes (120 masters and 120 slaves), and works well for a long time. But now it get a Master Slave switch almost several hours.
I get some log from Redis Server.
5c541d3a765e087af7775ba308f51ffb2aa54151
10.12.28.165:6502
13306:M 08 Mar 18:55:02.597 * Background append only file rewriting started by pid 15396
13306:M 08 Mar 18:55:41.636 # Cluster state changed: fail
13306:M 08 Mar 18:55:45.321 # Connection with slave client id #112948 lost.
13306:M 08 Mar 18:55:46.243 # Configuration change detected. Reconfiguring myself as a replica of afb6e012db58bd26a7c96182b04f0a2ba6a45768
13306:S 08 Mar 18:55:47.134 * AOF rewrite child asks to stop sending diffs.
15396:C 08 Mar 18:55:47.134 * Parent agreed to stop sending diffs. Finalizing AOF...
15396:C 08 Mar 18:55:47.134 * Concatenating 0.02 MB of AOF diff received from parent.
15396:C 08 Mar 18:55:47.135 * SYNC append only file rewrite performed
15396:C 08 Mar 18:55:47.186 * AOF rewrite: 4067 MB of memory used by copy-on-write
13306:S 08 Mar 18:55:47.209 # Cluster state changed: ok
5ac747878f881349aa6a62b179176ddf603e034c
10.12.30.107:6500
22825:M 08 Mar 18:55:30.534 * FAIL message received from da493af5bb3d15fc563961de09567a47787881be about 5c541d3a765e087af7775ba308f51ffb2aa54151
22825:M 08 Mar 18:55:31.440 # Failover auth granted to afb6e012db58bd26a7c96182b04f0a2ba6a45768 for epoch 323
22825:M 08 Mar 18:55:41.587 * Background append only file rewriting started by pid 23628
22825:M 08 Mar 18:56:24.200 # Cluster state changed: fail
22825:M 08 Mar 18:56:30.002 # Connection with slave client id #382416 lost.
22825:M 08 Mar 18:56:30.830 * FAIL message received from 0decbe940c6f4d4330fae5a9c129f1ad4932405d about 5ac747878f881349aa6a62b179176ddf603e034c
22825:M 08 Mar 18:56:30.840 # Failover auth denied to d46f95da06cfcd8ea5eaa15efabff5bd5e99df55: its master is up
22825:M 08 Mar 18:56:30.843 # Configuration change detected. Reconfiguring myself as a replica of d46f95da06cfcd8ea5eaa15efabff5bd5e99df55
22825:S 08 Mar 18:56:31.030 * Clear FAIL state for node 5ac747878f881349aa6a62b179176ddf603e034c: slave is reachable again.
22825:S 08 Mar 18:56:31.030 * Clear FAIL state for node 5c541d3a765e087af7775ba308f51ffb2aa54151: slave is reachable again.
22825:S 08 Mar 18:56:31.294 # Cluster state changed: ok
22825:S 08 Mar 18:56:31.595 * Connecting to MASTER 10.12.30.104:6404
22825:S 08 Mar 18:56:31.671 * MASTER SLAVE sync started
22825:S 08 Mar 18:56:31.671 * Non blocking connect for SYNC fired the event.
22825:S 08 Mar 18:56:31.672 * Master replied to PING, replication can continue...
22825:S 08 Mar 18:56:31.673 * Partial resynchronization not possible (no cached master)
22825:S 08 Mar 18:56:31.691 * AOF rewrite child asks to stop sending diffs.
It appends that Redis Master Slave Swtich happend after Aof rewtiting.
Here is the config of this cluster.
daemonize no
tcp-backlog 511
timeout 0
tcp-keepalive 60
loglevel notice
databases 16
dir "/var/cachecloud/data"
stop-writes-on-bgsave-error no
repl-timeout 60
repl-ping-slave-period 10
repl-disable-tcp-nodelay no
repl-backlog-size 10000000
repl-backlog-ttl 7200
slave-serve-stale-data yes
slave-read-only yes
slave-priority 100
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 512mb 128mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
port 6401
maxmemory 13000mb
maxmemory-policy volatile-lru
appendonly yes
appendfsync no
appendfilename "appendonly-6401.aof"
dbfilename "dump-6401.rdb"
aof-rewrite-incremental-fsync yes
no-appendfsync-on-rewrite yes
auto-aof-rewrite-min-size 62500kb
auto-aof-rewrite-percentage 86
rdbcompression yes
rdbchecksum yes
repl-diskless-sync no
repl-diskless-sync-delay 5
maxclients 10000
hll-sparse-max-bytes 3000
min-slaves-to-write 0
min-slaves-max-lag 10
aof-load-truncated yes
notify-keyspace-events ""
bind 10.12.26.226
protected-mode no
cluster-enabled yes
cluster-node-timeout 15000
cluster-slave-validity-factor 10
cluster-migration-barrier 1
cluster-config-file "nodes-6401.conf"
cluster-require-full-coverage no
rename-command FLUSHDB ""
rename-command FLUSHALL ""
rename-command KEYS ""
In my option, aof rewrite will not effect the Redis Main Thread. BUT this seems make this node not response other nodes' Ping.

Check THP(Transparent Huge pages) on Linux kernel parameter.
because AOF diff size 0.02MB, copy-on-write size 2067MB.

Unable to diagnose MISCONF redis issue while launching celery worker server

I use a celery worker server with redis as the broker url (for receiving tasks) as well as the result backend.
BROKER_URL = 'redis://localhost:6379/2'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/2'
app = Celery('myceleryapp', broker=BROKER_URL,backend=CELERY_RESULT_BACKEND)
I launch the celery worker server using celery -A myceleryapp worker -l info -c 8
The worker processes start processing my tasks from the redis queue until at some point, I receive the infamous MISCONF redis error and the celery worker process terminates.
Unrecoverable error: ResponseError('MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.',)
I checked the redis log files in /var/log/redis and the tail end of the file has the following
24745:C 19 Aug 09:20:26.169 * RDB: 0 MB of memory used by copy-on-write
1590:M 19 Aug 09:20:26.247 * Background saving terminated with success
1590:M 19 Aug 09:25:27.080 * 10 changes in 300 seconds. Saving...
1590:M 19 Aug 09:25:27.081 * Background saving started by pid 25397
25397:C 19 Aug 09:25:27.082 # Write error saving DB on disk: No space left on device
1590:M 19 Aug 09:25:27.181 # Backgroun1590:M 19 Aug 09:51:03.042 * 1 changes in 900 seconds. Saving...
1590:M 19 Aug 09:51:03.042 * Background saving started by pid 26341
26341:C 19 Aug 09:51:03.405 * DB saved on disk
26341:C 19 Aug 09:51:03.405 * RDB: 22 MB of memory used by copy-on-write
1590:M 19 Aug 09:51:03.487 * Background saving terminated with success
The dump.rdb file is being written to /var/lib/redis/dump.rdb.
Since the logs reported a No space left on device, I checked the disk space where /var is mounted and there seems to be sufficient space left (1.2GB).
How do I get to the root cause of this error if there is enough disk space? Of course, to prevent this error from happening, I could set config set stop-writes-on-bgsave-error no in redis-cli. But I want to get to the root cause of this error. Any help or pointers?

Maybe this is caused by the swap file. Because the swap file took the 1.2Gb space of your disk. So redis complains No space to write.
Try this "swapon -s" command to check this.
I think 1.2Gb is not enough if this disk accept the RAM page swap. you should change the dir of RDB in a more big dir.

Failed opening .rdb for saving: Permission denied - started after a while of running successfully

I have had a node web service running successfully on an aws ubuntu server for over a month, with the requests cached using redis.
Yesterday I started getting the following error from some of my routes:
MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
I was able to stop the error occurring by using:
config set stop-writes-on-bgsave-error no
as suggested in the answers to this question, but it doesn't actually solve the underlying problem.
To find the underlying problem I checked the logs and found the following had started happening:
[1105] 09 Aug 13:17:14.800 - 0 clients connected (0 slaves), 797680 bytes in use
[1105] 09 Aug 13:17:15.101 * 1 changes in 900 seconds. Saving...
[1105] 09 Aug 13:17:15.101 * Background saving started by pid 28090
[28090] 09 Aug 13:17:15.101 # Failed opening .rdb for saving: Permission denied
[1105] 09 Aug 13:17:15.201 # Background saving error
Over the weekend no one had been using the server, but before the weekend the logs were fine, and we were getting no errors:
[12521] 06 Aug 04:49:27.308 - 0 clients connected (0 slaves), 803352 bytes in use
[12521] 06 Aug 04:49:29.012 * 1 changes in 900 seconds. Saving...
[12521] 06 Aug 04:49:29.012 * Background saving started by pid 26663
[26663] 06 Aug 04:49:29.014 * DB saved on disk
[26663] 06 Aug 04:49:29.014 * RDB: 2 MB of memory used by copy-on-write
[12521] 06 Aug 04:49:29.112 * Background saving terminated with success
As I said, no one has touched this server in the intervening time.
Looking around for people having the same problem I found this question. I checked the ownership and permissions on the directory and db file as suggested in the answers there:
drwxr-xr-x 2 redis redis 26 Aug 6 06:55 redis
-rw-r--r-- 1 redis redis 18 Aug 6 06:55 dump-6379.rdb
The permissions and ownership both look ok to me, but I have noticed that the date on the file and folder is between the last time I saw the service working and the first time it failed. Unfortunately that hasn't really helped me with what to do next and I am at a bit of a loss.
I am looking for suggestions for next steps to find the cause of the problem, or at least a way of making redis able to write again.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas