Unable to diagnose MISCONF redis issue while launching celery worker server

Unable to diagnose MISCONF redis issue while launching celery worker server - redis

I use a celery worker server with redis as the broker url (for receiving tasks) as well as the result backend.
BROKER_URL = 'redis://localhost:6379/2'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/2'
app = Celery('myceleryapp', broker=BROKER_URL,backend=CELERY_RESULT_BACKEND)
I launch the celery worker server using celery -A myceleryapp worker -l info -c 8
The worker processes start processing my tasks from the redis queue until at some point, I receive the infamous MISCONF redis error and the celery worker process terminates.
Unrecoverable error: ResponseError('MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.',)
I checked the redis log files in /var/log/redis and the tail end of the file has the following
24745:C 19 Aug 09:20:26.169 * RDB: 0 MB of memory used by copy-on-write
1590:M 19 Aug 09:20:26.247 * Background saving terminated with success
1590:M 19 Aug 09:25:27.080 * 10 changes in 300 seconds. Saving...
1590:M 19 Aug 09:25:27.081 * Background saving started by pid 25397
25397:C 19 Aug 09:25:27.082 # Write error saving DB on disk: No space left on device
1590:M 19 Aug 09:25:27.181 # Backgroun1590:M 19 Aug 09:51:03.042 * 1 changes in 900 seconds. Saving...
1590:M 19 Aug 09:51:03.042 * Background saving started by pid 26341
26341:C 19 Aug 09:51:03.405 * DB saved on disk
26341:C 19 Aug 09:51:03.405 * RDB: 22 MB of memory used by copy-on-write
1590:M 19 Aug 09:51:03.487 * Background saving terminated with success
The dump.rdb file is being written to /var/lib/redis/dump.rdb.
Since the logs reported a No space left on device, I checked the disk space where /var is mounted and there seems to be sufficient space left (1.2GB).
How do I get to the root cause of this error if there is enough disk space? Of course, to prevent this error from happening, I could set config set stop-writes-on-bgsave-error no in redis-cli. But I want to get to the root cause of this error. Any help or pointers?

Maybe this is caused by the swap file. Because the swap file took the 1.2Gb space of your disk. So redis complains No space to write.
Try this "swapon -s" command to check this.
I think 1.2Gb is not enough if this disk accept the RAM page swap. you should change the dir of RDB in a more big dir.

Related

MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk

i use reddison client but when the client has error “MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk”
Unable to send PING command over channel: [id: 0x04130153, L:/171.20.0.8:38080 - R:10.3.236.102/10.3.236.102:6379]
org.redisson.client.RedisException: MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.. channel: [id: 0x04130153, L:/171.20.0.8:38080 - R:10.3.236.102/10.3.236.102:6379] command: (PING), params: []
the redis server has no error
{"log":"3443340:C 09 Apr 00:12:41.648 * DB saved on disk\n","stream":"stdout","time":"2022-04-09T00:12:41.649083457Z"}
{"log":"3443340:C 09 Apr 00:12:41.772 * RDB: 38 MB of memory used by copy-on-write\n","stream":"stdout","time":"2022-04-09T00:12:41.77335587Z"}
{"log":"7:M 09 Apr 00:12:42.024 * Background saving terminated with success\n","stream":"stdout","time":"2022-04-09T00:12:42.025019006Z"}
{"log":"7:M 09 Apr 00:12:45.027 *

beacuse the server time is not correct

Redis timeout with almost no data in the database, using the .NET client

I received this error:
StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms),
next: GET RetryCount, inst: 3, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 7, in-pipe: 0, out-pipe: 0,
serverEndpoint: redis:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: 18745af38fec,
IOCP: (Busy=0,Free=1000,Min=1,Max=1000),
WORKER: (Busy=6,Free=32761,Min=1,Max=32767), v: 2.1.58.34321
(Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
We can see that there is only a single message in the queue (qs=1) and that there are only 7 bytes waiting to be read (in=7). Redis is used by 2 processes and holds settings for the system and store logs.
It was a re-install so no logs were written and the database has probably 2-3kb of data :)
This is the only output from Redis:
1:C 12 Sep 2020 15:20:49.293 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 12 Sep 2020 15:20:49.293 # Redis version=6.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 12 Sep 2020 15:20:49.293 # Configuration loaded
1:M 12 Sep 2020 15:20:49.296 * Running mode=standalone, port=6379.
1:M 12 Sep 2020 15:20:49.296 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 12 Sep 2020 15:20:49.296 # Server initialized
1:M 12 Sep 2020 15:20:49.296 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memor
y=1' for this to take effect.
1:M 12 Sep 2020 15:20:49.296 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepag
e/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
1:M 12 Sep 2020 15:20:49.305 * DB loaded from append only file: 0.000 seconds
1:M 12 Sep 2020 15:20:49.305 * Ready to accept connections
so it looks like nothing went wrong on that side.
The 2 processes accessing it are in docker containers, so does Redis. All on a single AWS instance with a lot of ram and disk available.
this is also a one time event, it has never happened before with the same config.
I'm not very experienced with Redis; is there anything in the error message that would look suspicious?

What is Redis change its own configurations

Redis change its own config dir to /etc/cron.d and dbfile to ntp instead of default configuration. Once we restart the redis it will reset to /var/lib/redis and dump.rdb but after awhile, it gives "Failed opening the RDB file" error
Default dire and rdb file has correct permission and redis only allow for internal IPs.
cli output
127.0.0.1:6381> CONFIG GET dir
1) "dir"
2) "/etc/cron.d"
127.0.0.1:6381> CONFIG GET "dbfilename"
1) "dbfilename"
2) "ntp"
/var/log/redis/redis-server.log
3204:M 21 May 16:07:19.124 * Background saving terminated with success
3204:M 21 May 16:12:18.962 * 10000 changes in 60 seconds. Saving...
3204:M 21 May 16:12:18.967 * Background saving started by pid 25469
25469:C 21 May 16:12:20.931 * DB saved on disk
25469:C 21 May 16:12:20.934 * RDB: 3 MB of memory used by copy-on-write
3204:M 21 May 16:12:20.968 * Background saving terminated with success
3204:M 21 May 16:17:21.082 * 10 changes in 300 seconds. Saving...
3204:M 21 May 16:17:21.088 * Background saving started by pid 25865
25865:C 21 May 16:17:22.800 * DB saved on disk
25865:C 21 May 16:17:22.803 * RDB: 3 MB of memory used by copy-on-write
3204:M 21 May 16:17:22.891 * Background saving terminated with success
3204:M 21 May 16:17:43.669 # Failed opening the RDB file root (in server root dir /var/spool/cron) for saving: Read-only file system
3204:M 21 May 16:17:45.320 # Failed opening the RDB file ntp (in server root dir /etc/cron.d) for saving: Read-only file system
3204:M 21 May 16:22:23.086 * 10 changes in 300 seconds. Saving...
3204:M 21 May 16:22:23.092 * Background saving started by pid 26264
26264:C 21 May 16:22:23.093 # Failed opening the RDB file ntp (in server root dir /etc/cron.d) for saving: Read-only file system
3204:M 21 May 16:22:23.194 # Background saving error
3204:M 21 May 16:22:29.104 * 10 changes in 300 seconds. Saving...
3204:M 21 May 16:22:29.109 * Background saving started by pid 26265
26265:C 21 May 16:22:29.109 # Failed opening the RDB file ntp (in server root dir /etc/cron.d) for saving: Read-only file system
3204:M 21 May 16:22:29.209 # Background saving error
3204:M 21 May 16:22:35.016 * 10 changes in 300 seconds. Saving...

Is your server publicly accessibly over the internet?
The most likely explanation is that somebody is connecting to the redis and sending commands to reconfigure it remotely, trying to take control over the server.
There are bots scanning the internet 24/7 looking for exposed software and known vulnerabilities. Quick rule of thumb is that a new service coming up online will be discovered and attacked in less than 5 minutes. (Try running an unpatched Windows XP server and be amazed how short it lasts).
Consider that redis and potentially the whole server was compromised. I hope for you that there was no sensitive information in this redis or that's a data breach.
Block public access, decommission the virtual machine, setup a new one from scratch.
Related redis ticket: https://github.com/antirez/redis/issues/3594

Kubernetes Redis Cluster PubSub Channels not getting synched on replica

I have set up a Redis cluster on Kubernetes, the cluster state is OK and the replica is connected to the master. Also as per the logs, the full synchronization is also completed. The logs are as follows:-
9:M 22 Oct 12:24:18.209 * Slave 192.168.1.41:6379 asks for synchronization
9:M 22 Oct 12:24:18.209 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '794b9c74abe40ac90c752f32a102078e063ff636', my replication IDs are '0f499740a46665d12fab921838297273279ad136' and '0000000000000000000000000000000000000000')
9:M 22 Oct 12:24:18.209 * Starting BGSAVE for SYNC with target: disk
9:M 22 Oct 12:24:18.211 * Background saving started by pid 231
231:C 22 Oct 12:24:18.215 * DB saved on disk
231:C 22 Oct 12:24:18.216 * RDB: 4 MB of memory used by copy-on-write
9:M 22 Oct 12:24:18.224 * Background saving terminated with success
9:M 22 Oct 12:24:18.224 * Synchronization with slave 192.168.1.41:6379 succeeded
Still, when I check the List of the PubSub Channels on the replica, it does not show the channels and thus it breaks the PubSub flow.
Any help/advise is appreciated.

Failed opening .rdb for saving: Permission denied - started after a while of running successfully

I have had a node web service running successfully on an aws ubuntu server for over a month, with the requests cached using redis.
Yesterday I started getting the following error from some of my routes:
MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
I was able to stop the error occurring by using:
config set stop-writes-on-bgsave-error no
as suggested in the answers to this question, but it doesn't actually solve the underlying problem.
To find the underlying problem I checked the logs and found the following had started happening:
[1105] 09 Aug 13:17:14.800 - 0 clients connected (0 slaves), 797680 bytes in use
[1105] 09 Aug 13:17:15.101 * 1 changes in 900 seconds. Saving...
[1105] 09 Aug 13:17:15.101 * Background saving started by pid 28090
[28090] 09 Aug 13:17:15.101 # Failed opening .rdb for saving: Permission denied
[1105] 09 Aug 13:17:15.201 # Background saving error
Over the weekend no one had been using the server, but before the weekend the logs were fine, and we were getting no errors:
[12521] 06 Aug 04:49:27.308 - 0 clients connected (0 slaves), 803352 bytes in use
[12521] 06 Aug 04:49:29.012 * 1 changes in 900 seconds. Saving...
[12521] 06 Aug 04:49:29.012 * Background saving started by pid 26663
[26663] 06 Aug 04:49:29.014 * DB saved on disk
[26663] 06 Aug 04:49:29.014 * RDB: 2 MB of memory used by copy-on-write
[12521] 06 Aug 04:49:29.112 * Background saving terminated with success
As I said, no one has touched this server in the intervening time.
Looking around for people having the same problem I found this question. I checked the ownership and permissions on the directory and db file as suggested in the answers there:
drwxr-xr-x 2 redis redis 26 Aug 6 06:55 redis
-rw-r--r-- 1 redis redis 18 Aug 6 06:55 dump-6379.rdb
The permissions and ownership both look ok to me, but I have noticed that the date on the file and folder is between the last time I saw the service working and the first time it failed. Unfortunately that hasn't really helped me with what to do next and I am at a bit of a loss.
I am looking for suggestions for next steps to find the cause of the problem, or at least a way of making redis able to write again.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas