Kubernetes Redis Cluster PubSub Channels not getting synched on replica - redis

I have set up a Redis cluster on Kubernetes, the cluster state is OK and the replica is connected to the master. Also as per the logs, the full synchronization is also completed. The logs are as follows:-
9:M 22 Oct 12:24:18.209 * Slave 192.168.1.41:6379 asks for synchronization
9:M 22 Oct 12:24:18.209 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '794b9c74abe40ac90c752f32a102078e063ff636', my replication IDs are '0f499740a46665d12fab921838297273279ad136' and '0000000000000000000000000000000000000000')
9:M 22 Oct 12:24:18.209 * Starting BGSAVE for SYNC with target: disk
9:M 22 Oct 12:24:18.211 * Background saving started by pid 231
231:C 22 Oct 12:24:18.215 * DB saved on disk
231:C 22 Oct 12:24:18.216 * RDB: 4 MB of memory used by copy-on-write
9:M 22 Oct 12:24:18.224 * Background saving terminated with success
9:M 22 Oct 12:24:18.224 * Synchronization with slave 192.168.1.41:6379 succeeded
Still, when I check the List of the PubSub Channels on the replica, it does not show the channels and thus it breaks the PubSub flow.
Any help/advise is appreciated.

Related

Redis timeout with almost no data in the database, using the .NET client

I received this error:
StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms),
next: GET RetryCount, inst: 3, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 7, in-pipe: 0, out-pipe: 0,
serverEndpoint: redis:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: 18745af38fec,
IOCP: (Busy=0,Free=1000,Min=1,Max=1000),
WORKER: (Busy=6,Free=32761,Min=1,Max=32767), v: 2.1.58.34321
(Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
We can see that there is only a single message in the queue (qs=1) and that there are only 7 bytes waiting to be read (in=7). Redis is used by 2 processes and holds settings for the system and store logs.
It was a re-install so no logs were written and the database has probably 2-3kb of data :)
This is the only output from Redis:
1:C 12 Sep 2020 15:20:49.293 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 12 Sep 2020 15:20:49.293 # Redis version=6.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 12 Sep 2020 15:20:49.293 # Configuration loaded
1:M 12 Sep 2020 15:20:49.296 * Running mode=standalone, port=6379.
1:M 12 Sep 2020 15:20:49.296 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 12 Sep 2020 15:20:49.296 # Server initialized
1:M 12 Sep 2020 15:20:49.296 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memor
y=1' for this to take effect.
1:M 12 Sep 2020 15:20:49.296 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepag
e/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
1:M 12 Sep 2020 15:20:49.305 * DB loaded from append only file: 0.000 seconds
1:M 12 Sep 2020 15:20:49.305 * Ready to accept connections
so it looks like nothing went wrong on that side.
The 2 processes accessing it are in docker containers, so does Redis. All on a single AWS instance with a lot of ram and disk available.
this is also a one time event, it has never happened before with the same config.
I'm not very experienced with Redis; is there anything in the error message that would look suspicious?

Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis

I have run below Test-1 and Test-2 for longer run for performance test with redis configuration values specified, still we see the highlighted error-1 & 2 message and cluster is failing for some time, few of our processing failing. How to solve this problem.
please anyone have suggestion to avoid cluster fail which is goes longer than 10seconds, cluster is not coming up within 3 retry attempts (spring retry template we are using for retry mechanism try count is set to 3, and retry after 5sec, its exponential way next attempts) using Jedis client.
Error-1: Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
Error-2: Marking node a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba as failing (quorum reached).
Test-1:
Run the test with Redis Setting:
"appendfsync"="yes"
"appendonly"="no"
[root#rdcapdev1-redis-cache3 redis-3.2.5]# src/redis-cli -p 6379
127.0.0.1:6379> CONFIG GET **aof***
1) "auto-aof-rewrite-percentage"
2) "30"
3) "auto-aof-rewrite-min-size"
4) "67108864"
5) "aof-rewrite-incremental-fsync"
6) "yes"
7) "aof-load-truncated"
8) "yes"
127.0.0.1:6379> exit
[root#rdcapdev1-redis-cache3 redis-3.2.5]# src/redis-cli -p 6380
127.0.0.1:6380> CONFIG GET **aof***
1) "auto-aof-rewrite-percentage"
2) "30"
3) "auto-aof-rewrite-min-size"
4) "67108864"
5) "aof-rewrite-incremental-fsync"
6) "yes"
7) "aof-load-truncated"
8) "yes"
127.0.0.1:6380> clear
Observation:
1. The redis failover occurred for ~40 sec.
2. There are around 20 documents failed on FX and OCR level. Due to inability to write/read the files to Redis.
3. This has been happened when ~50% of RAM got utilized.
4. The Master-slave configuration has be reshuffled as below after this failover.
5. Below are the few highlights of the redis log, plese refer that attached log for more detail.
6. I have logs for this, for more details: 30Per_AofRW_2.zip
Redis1 Master log:
2515:C 05 May 11:06:30.343 * DB saved on disk
2515:C 05 May 11:06:30.379 * RDB: 15 MB of memory used by copy-on-write
837:S 05 May 11:06:30.429 * Background saving terminated with success
837:S 05 May 11:11:31.024 * 10 changes in 300 seconds. Saving...
837:S 05 May 11:11:31.067 * Background saving started by pid 2534
837:S 05 May 11:12:24.802 * FAIL message received from 6b8d49e9db288b13071559c667e95e3691ce8bd0 about ce62a26102ef54f43fa7cca64d24eab45cf42a61
837:S 05 May 11:12:27.049 * Clear FAIL state for node ce62a26102ef54f43fa7cca64d24eab45cf42a61: slave is reachable again.
2534:C 05 May 11:12:31.110 * DB saved on disk
Redis2 Master log:
837:M 05 May 10:30:22.216 * Marking node a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba as failing (quorum reached).
837:M 05 May 10:30:22.216 # Cluster state changed: fail
837:M 05 May 10:30:23.148 # Failover auth granted to 6b8d49e9db288b13071559c667e95e3691ce8bd0 for epoch 12
837:M 05 May 10:30:23.188 # Cluster state changed: ok
837:M 05 May 10:30:27.227 * Clear FAIL state for node a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba: slave is reachable again.
837:M 05 May 10:35:22.017 * 10 changes in 300 seconds. Saving...
.
.
.
837:M 05 May 11:12:23.592 * FAIL message received from 6b8d49e9db288b13071559c667e95e3691ce8bd0 about ce62a26102ef54f43fa7cca64d24eab45cf42a61
837:M 05 May 11:12:27.045 * Clear FAIL state for node ce62a26102ef54f43fa7cca64d24eab45cf42a61: slave is reachable again.
Redis3 Master Log:
833:M 05 May 10:30:22.217 * FAIL message received from 83f6a9589aa1bce8932a367894fa391edd0ce269 about a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba
833:M 05 May 10:30:22.217 # Cluster state changed: fail
833:M 05 May 10:30:23.149 # Failover auth granted to 6b8d49e9db288b13071559c667e95e3691ce8bd0 for epoch 12
833:M 05 May 10:30:23.189 # Cluster state changed: ok
1822:C 05 May 10:30:27.397 * DB saved on disk
1822:C 05 May 10:30:27.428 * RDB: 8 MB of memory used by copy-on-write
833:M 05 May 10:30:27.528 * Background saving terminated with success
833:M 05 May 10:30:27.828 * Clear FAIL state for node a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba: slave is reachable again.
HOST: localhost PORT: 6379
machine master slave
10.2.1.233 0.00 2.00
10.2.1.46 2.00 0.00
10.2.1.202 1.00 1.00
MASTER SLAVE INFO
hashCode master slave hashSlot
81ae2d757f57f36fa1df6e930af3b072084ba3e8 10.2.1.202:6379 10.2.1.233:6380, 10923-16383
6b8d49e9db288b13071559c667e95e3691ce8bd0 10.2.1.46:6380 10.2.1.233:6379, 0-5460
83f6a9589aa1bce8932a367894fa391edd0ce269 10.2.1.46:6379 10.2.1.202:6380, 5461-10922
6b8d49e9db288b13071559c667e95e3691ce8bd0 10.2.1.46:6380 master - 0 1493981044497 12 connected 0-5460
81ae2d757f57f36fa1df6e930af3b072084ba3e8 10.2.1.202:6379 master - 0 1493981045500 3 connected 10923-16383
ce62a26102ef54f43fa7cca64d24eab45cf42a61 10.2.1.202:6380 slave 83f6a9589aa1bce8932a367894fa391edd0ce269 0 1493981043495 10 connected
ac630108d1556786a4df74945cfe35db981d15fa 10.2.1.233:6380 slave 81ae2d757f57f36fa1df6e930af3b072084ba3e8 0 1493981042492 11 connected
83f6a9589aa1bce8932a367894fa391edd0ce269 10.2.1.46:6379 master - 0 1493981044497 2 connected 5461-10922
a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba 10.2.1.233:6379 myself,slave 6b8d49e9db288b13071559c667e95e3691ce8bd0 0 0 1 connected
Test-2:
Run the test with Redis Setting:
"appendfsync"="no"
"appendonly"="yes"
Observation:
1. The redis failover occurred for ~40 sec.
2. There are around 20 documents failed on FX and OCR level. Due to inability to write/read the files to Redis.
3. This has been happened when ~50% of RAM got utilized.
4. The Master-slave configuration has be reshuffled as below after this failover.
5. Below are the few highlights of the redis log, plese refer that attached log for more detail.
30Per_AofRW_2.zip
Redis1 Master log:
2515:C 05 May 11:06:30.343 * DB saved on disk
2515:C 05 May 11:06:30.379 * RDB: 15 MB of memory used by copy-on-write
837:S 05 May 11:06:30.429 * Background saving terminated with success
837:S 05 May 11:11:31.024 * 10 changes in 300 seconds. Saving...
837:S 05 May 11:11:31.067 * Background saving started by pid 2534
837:S 05 May 11:12:24.802 * FAIL message received from 6b8d49e9db288b13071559c667e95e3691ce8bd0 about ce62a26102ef54f43fa7cca64d24eab45cf42a61
837:S 05 May 11:12:27.049 * Clear FAIL state for node ce62a26102ef54f43fa7cca64d24eab45cf42a61: slave is reachable again.
2534:C 05 May 11:12:31.110 * DB saved on disk
Redis2 Master log:
5306:M 03 Apr 09:02:36.947 * Background saving terminated with success
5306:M 03 Apr 09:02:49.574 * Starting automatic rewriting of AOF on 3% growth
5306:M 03 Apr 09:02:49.583 * Background append only file rewriting started by pid 12864
5306:M 03 Apr 09:02:54.050 * AOF rewrite child asks to stop sending diffs.
12864:C 03 Apr 09:02:54.051 * Parent agreed to stop sending diffs. Finalizing AOF...
12864:C 03 Apr 09:02:54.051 * Concatenating 0.00 MB of AOF diff received from parent.
12864:C 03 Apr 09:02:54.051 * SYNC append only file rewrite performed
12864:C 03 Apr 09:02:54.058 * AOF rewrite: 2 MB of memory used by copy-on-write
5306:M 03 Apr 09:02:54.098 * Background AOF rewrite terminated with success
5306:M 03 Apr 09:02:54.098 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
5306:M 03 Apr 09:02:54.098 * Background AOF rewrite finished successfully
5306:M 03 Apr 09:04:01.843 * Starting automatic rewriting of AOF on 3% growth
5306:M 03 Apr 09:04:01.853 * Background append only file rewriting started by pid 12867
5306:M 03 Apr 09:04:11.657 * AOF rewrite child asks to stop sending diffs.
12867:C 03 Apr 09:04:11.657 * Parent agreed to stop sending diffs. Finalizing AOF...
12867:C 03 Apr 09:04:11.657 * Concatenating 0.00 MB of AOF diff received from parent.
12867:C 03 Apr 09:04:11.657 * SYNC append only file rewrite performed
12867:C 03 Apr 09:04:11.664 * AOF rewrite: 2 MB of memory used by copy-on-write
5306:M 03 Apr 09:04:11.675 * Background AOF rewrite terminated with success
5306:M 03 Apr 09:04:11.675 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
5306:M 03 Apr 09:04:11.675 * Background AOF rewrite finished successfully
5306:M 03 Apr 09:04:48.054 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5306:M 03 Apr 09:05:28.571 * Starting automatic rewriting of AOF on 3% growth
5306:M 03 Apr 09:05:28.581 * Background append only file rewriting started by pid 12873
5306:M 03 Apr 09:05:33.300 * AOF rewrite child asks to stop sending diffs.
12873:C 03 Apr 09:05:33.300 * Parent agreed to stop sending diffs. Finalizing AOF...
12873:C 03 Apr 09:05:33.300 * Concatenating 2.09 MB of AOF diff received from parent.
12873:C 03 Apr 09:05:33.329 * SYNC append only file rewrite performed
12873:C 03 Apr 09:05:33.336 * AOF rewrite: 11 MB of memory used by copy-on-write
5306:M 03 Apr 09:05:33.395 * Background AOF rewrite terminated with success
5306:M 03 Apr 09:05:33.395 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
5306:M 03 Apr 09:05:33.395 * Background AOF rewrite finished successfully
5306:M 03 Apr 09:07:37.082 * 10 changes in 300 seconds. Saving...
5306:M 03 Apr 09:07:37.092 * Background saving started by pid 12875
12875:C 03 Apr 09:07:47.016 * DB saved on disk
12875:C 03 Apr 09:07:47.024 * RDB: 5 MB of memory used by copy-on-write
5306:M 03 Apr 09:07:47.113 * Background saving terminated with success
5306:M 03 Apr 09:07:51.622 * Starting automatic rewriting of AOF on 3% growth
5306:M 03 Apr 09:07:51.632 * Background append only file rewriting started by pid 12876
5306:M 03 Apr 09:07:56.559 * AOF rewrite child asks to stop sending diffs.
12876:C 03 Apr 09:07:56.559 * Parent agreed to stop sending diffs. Finalizing AOF...
12876:C 03 Apr 09:07:56.559 * Concatenating 0.00 MB of AOF diff received from parent.
12876:C 03 Apr 09:07:56.559 * SYNC append only file rewrite performed
12876:C 03 Apr 09:07:56.567 * AOF rewrite: 2 MB of memory used by copy-on-write
5306:M 03 Apr 09:07:56.645 * Background AOF rewrite terminated with success
5306:M 03 Apr 09:07:56.645 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
5306:M 03 Apr 09:07:56.645 * Background AOF rewrite finished successfully
5306:M 03 Apr 09:12:48.071 * 10 changes in 300 seconds. Saving...
5306:M 03 Apr 09:12:48.081 * Background saving started by pid 12882
12882:C 03 Apr 09:12:58.381 * DB saved on disk
12882:C 03 Apr 09:12:58.389 * RDB: 5 MB of memory used by copy-on-write
5306:M 03 Apr 09:12:58.403 * Background saving terminated with success
5306:M 03 Apr 10:17:33.005 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5306:M 03 Apr 10:22:42.042 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5306:M 03 Apr 10:27:51.039 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5306:M 03 Apr 11:10:10.606 * Marking node a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba as failing (quorum reached).
5306:M 03 Apr 11:10:10.607 # Cluster state changed: fail
5306:M 03 Apr 11:10:10.608 * FAIL message received from 83f6a9589aa1bce8932a367894fa391edd0ce269 about ac630108d1556786a4df74945cfe35db981d15fa
5306:M 03 Apr 11:10:11.594 # Failover auth granted to 6b8d49e9db288b13071559c667e95e3691ce8bd0 for epoch 7
HOST: localhost PORT: 6379
machine master slave
10.2.1.233 0.00 2.00
10.2.1.46 2.00 0.00
10.2.1.202 1.00 1.00
MASTER SLAVE INFO
hashCode master slave hashSlot
81ae2d757f57f36fa1df6e930af3b072084ba3e8 10.2.1.202:6379 10.2.1.233:6380, 10923-16383
6b8d49e9db288b13071559c667e95e3691ce8bd0 10.2.1.46:6380 10.2.1.233:6379, 0-5460
83f6a9589aa1bce8932a367894fa391edd0ce269 10.2.1.46:6379 10.2.1.202:6380, 5461-10922
6b8d49e9db288b13071559c667e95e3691ce8bd0 10.2.1.46:6380 master - 0 1493981044497 12 connected 0-5460
81ae2d757f57f36fa1df6e930af3b072084ba3e8 10.2.1.202:6379 master - 0 1493981045500 3 connected 10923-16383
ce62a26102ef54f43fa7cca64d24eab45cf42a61 10.2.1.202:6380 slave 83f6a9589aa1bce8932a367894fa391edd0ce269 0 1493981043495 10 connected
ac630108d1556786a4df74945cfe35db981d15fa 10.2.1.233:6380 slave 81ae2d757f57f36fa1df6e930af3b072084ba3e8 0 1493981042492 11 connected
83f6a9589aa1bce8932a367894fa391edd0ce269 10.2.1.46:6379 master - 0 1493981044497 2 connected 5461-10922
a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba 10.2.1.233:6379 myself,slave 6b8d49e9db288b13071559c667e95e3691ce8bd0 0 0 1 connected

redis-master slave setup failing

I have started server with port 6001 as master with persistence aof turned off,slave with port 6002 as master of 6001.However on startup of slave i am getting below error in infinite loop also note able to find any error logs of the same..
Slave infinite loop logs :
[5556] 20 Aug 21:34:28.499 # Server started, Redis version 3.2.100
[5556] 20 Aug 21:34:28.500 * DB loaded from disk: 0.001 seconds
[5556] 20 Aug 21:34:28.500 * The server is now ready to accept connections on port 6002
[5556] 20 Aug 21:34:28.501 * Connecting to MASTER localhost:6001
[5556] 20 Aug 21:34:28.513 * MASTER <-> SLAVE sync started
[5556] 20 Aug 21:34:29.513 * Non blocking connect for SYNC fired the event.
[5556] 20 Aug 21:34:29.513 # Sending command to master in replication handshake: -Writing to master: Unknown error
[5556] 20 Aug 21:34:29.516 * Connecting to MASTER localhost:6001
[5556] 20 Aug 21:34:29.517 * MASTER <-> SLAVE sync started
Issue resolved,redis.conf contained 127.0.0.1 as bind value,and from slave redis.conf file ,I had SLAVE OF localhost .Replacing localhost with 127.0.0.1 resolved the issue

Unable to diagnose MISCONF redis issue while launching celery worker server

I use a celery worker server with redis as the broker url (for receiving tasks) as well as the result backend.
BROKER_URL = 'redis://localhost:6379/2'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/2'
app = Celery('myceleryapp', broker=BROKER_URL,backend=CELERY_RESULT_BACKEND)
I launch the celery worker server using celery -A myceleryapp worker -l info -c 8
The worker processes start processing my tasks from the redis queue until at some point, I receive the infamous MISCONF redis error and the celery worker process terminates.
Unrecoverable error: ResponseError('MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.',)
I checked the redis log files in /var/log/redis and the tail end of the file has the following
24745:C 19 Aug 09:20:26.169 * RDB: 0 MB of memory used by copy-on-write
1590:M 19 Aug 09:20:26.247 * Background saving terminated with success
1590:M 19 Aug 09:25:27.080 * 10 changes in 300 seconds. Saving...
1590:M 19 Aug 09:25:27.081 * Background saving started by pid 25397
25397:C 19 Aug 09:25:27.082 # Write error saving DB on disk: No space left on device
1590:M 19 Aug 09:25:27.181 # Backgroun1590:M 19 Aug 09:51:03.042 * 1 changes in 900 seconds. Saving...
1590:M 19 Aug 09:51:03.042 * Background saving started by pid 26341
26341:C 19 Aug 09:51:03.405 * DB saved on disk
26341:C 19 Aug 09:51:03.405 * RDB: 22 MB of memory used by copy-on-write
1590:M 19 Aug 09:51:03.487 * Background saving terminated with success
The dump.rdb file is being written to /var/lib/redis/dump.rdb.
Since the logs reported a No space left on device, I checked the disk space where /var is mounted and there seems to be sufficient space left (1.2GB).
How do I get to the root cause of this error if there is enough disk space? Of course, to prevent this error from happening, I could set config set stop-writes-on-bgsave-error no in redis-cli. But I want to get to the root cause of this error. Any help or pointers?
Maybe this is caused by the swap file. Because the swap file took the 1.2Gb space of your disk. So redis complains No space to write.
Try this "swapon -s" command to check this.
I think 1.2Gb is not enough if this disk accept the RAM page swap. you should change the dir of RDB in a more big dir.

Failed opening .rdb for saving: Permission denied - started after a while of running successfully

I have had a node web service running successfully on an aws ubuntu server for over a month, with the requests cached using redis.
Yesterday I started getting the following error from some of my routes:
MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
I was able to stop the error occurring by using:
config set stop-writes-on-bgsave-error no
as suggested in the answers to this question, but it doesn't actually solve the underlying problem.
To find the underlying problem I checked the logs and found the following had started happening:
[1105] 09 Aug 13:17:14.800 - 0 clients connected (0 slaves), 797680 bytes in use
[1105] 09 Aug 13:17:15.101 * 1 changes in 900 seconds. Saving...
[1105] 09 Aug 13:17:15.101 * Background saving started by pid 28090
[28090] 09 Aug 13:17:15.101 # Failed opening .rdb for saving: Permission denied
[1105] 09 Aug 13:17:15.201 # Background saving error
Over the weekend no one had been using the server, but before the weekend the logs were fine, and we were getting no errors:
[12521] 06 Aug 04:49:27.308 - 0 clients connected (0 slaves), 803352 bytes in use
[12521] 06 Aug 04:49:29.012 * 1 changes in 900 seconds. Saving...
[12521] 06 Aug 04:49:29.012 * Background saving started by pid 26663
[26663] 06 Aug 04:49:29.014 * DB saved on disk
[26663] 06 Aug 04:49:29.014 * RDB: 2 MB of memory used by copy-on-write
[12521] 06 Aug 04:49:29.112 * Background saving terminated with success
As I said, no one has touched this server in the intervening time.
Looking around for people having the same problem I found this question. I checked the ownership and permissions on the directory and db file as suggested in the answers there:
drwxr-xr-x 2 redis redis 26 Aug 6 06:55 redis
-rw-r--r-- 1 redis redis 18 Aug 6 06:55 dump-6379.rdb
The permissions and ownership both look ok to me, but I have noticed that the date on the file and folder is between the last time I saw the service working and the first time it failed. Unfortunately that hasn't really helped me with what to do next and I am at a bit of a loss.
I am looking for suggestions for next steps to find the cause of the problem, or at least a way of making redis able to write again.