Redis service crashes with "Failed opening the RDB file systemdd (in server root dir /etc/cron.d) for saving: Permission denied" - redis

I am running Redis server version 6.0.6 on Ubuntu 20.04. The process is run by "redis" user.
Sometimes, the Redis process crashes and gets restarted on its own and when this happens, lot of data cached in Redis becomes unavailable. This happens every few days/weeks. I can see the following messages in the logs - saving was working fine till 2:32:43 and suddenly failed at 2:34:15:
133121:C 23 Jun 2021 02:27:54.383 * RDB: 22 MB of memory used by copy-on-write
105798:M 23 Jun 2021 02:27:54.511 * Background saving terminated with success
105798:M 23 Jun 2021 02:29:46.279 * 10000 changes in 60 seconds. Saving...
105798:M 23 Jun 2021 02:29:46.354 * Background saving started by pid 133125
133125:C 23 Jun 2021 02:30:16.363 * DB saved on disk
133125:C 23 Jun 2021 02:30:16.464 * RDB: 18 MB of memory used by copy-on-write
105798:M 23 Jun 2021 02:30:16.583 * Background saving terminated with success
105798:M 23 Jun 2021 02:32:14.138 * 10000 changes in 60 seconds. Saving...
105798:M 23 Jun 2021 02:32:14.222 * Background saving started by pid 133131
133131:C 23 Jun 2021 02:32:42.924 * DB saved on disk
133131:C 23 Jun 2021 02:32:42.988 * RDB: 22 MB of memory used by copy-on-write
105798:M 23 Jun 2021 02:32:43.123 * Background saving terminated with success
105798:M 23 Jun 2021 02:34:14.958 * DB saved on disk
105798:M 23 Jun 2021 02:34:15.705 # Failed opening the RDB file systemdd (in server root dir /etc/cron.d) for saving: Permission denied
=== REDIS BUG REPORT START: Cut & paste starting from here ===
105798:M 23 Jun 2021 02:34:15.705 # Redis 6.0.6 crashed by signal: 11
105798:M 23 Jun 2021 02:34:15.705 # Crashed running the instruction at: 0x55f2e7e35099
105798:M 23 Jun 2021 02:34:15.705 # Accessing address: 0x149968
105798:M 23 Jun 2021 02:34:15.705 # Failed assertion: <no assertion failed> (<no file>:0)
------ STACK TRACE ------
EIP:
/usr/bin/redis-server 172.16.106.88:6379(je_malloc_usable_size+0x89)[0x55f2e7e35099]
Backtrace:
/usr/bin/redis-server 172.16.106.88:6379(logStackTrace+0x4f)[0x55f2e7db2bcf]
/usr/bin/redis-server 172.16.106.88:6379(sigsegvHandler+0xb5)[0x55f2e7db33d5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fb934c173c0]
/usr/bin/redis-server 172.16.106.88:6379(je_malloc_usable_size+0x89)[0x55f2e7e35099]
/usr/bin/redis-server 172.16.106.88:6379(+0x50b79)[0x55f2e7d72b79]
/usr/bin/redis-server 172.16.106.88:6379(rdbSave+0x2ba)[0x55f2e7d9345a]
/usr/bin/redis-server 172.16.106.88:6379(saveCommand+0x67)[0x55f2e7d94ab7]
/usr/bin/redis-server 172.16.106.88:6379(call+0xb1)[0x55f2e7d6a8b1]
/usr/bin/redis-server 172.16.106.88:6379(processCommand+0x4a6)[0x55f2e7d6b446]
/usr/bin/redis-server 172.16.106.88:6379(processCommandAndResetClient+0x14)[0x55f2e7d799e4]
/usr/bin/redis-server 172.16.106.88:6379(processInputBuffer+0x18f)[0x55f2e7d7e39f]
/usr/bin/redis-server 172.16.106.88:6379(+0xe10ac)[0x55f2e7e030ac]
/usr/bin/redis-server 172.16.106.88:6379(aeProcessEvents+0x303)[0x55f2e7d63b83]
/usr/bin/redis-server 172.16.106.88:6379(aeMain+0x1d)[0x55f2e7d63ebd]
/usr/bin/redis-server 172.16.106.88:6379(main+0x4e5)[0x55f2e7d603d5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fb934a370b3]
/usr/bin/redis-server 172.16.106.88:6379(_start+0x2e)[0x55f2e7d606ae]
The service restarts on its own and the Redis server starts working fine for a few days/weeks and crashes again with the same error!
I have checked several posts in SO, but none of them resolve my issue since:
a) The instance where the Redis server is running is in a private network (public access is disabled).
b) The DB file name and dir have not been corrupted as observed from "config get dbfilename" and "config get dir" commands. They show the default values.
c) The permissions of the directories are correct (/var/lib/redis is owned by redis with 755 permissions and /var/lib/redis/dump.rdb is owned by redis with 660 permissions)
Can anyone help me identify the root cause of this issue please?

Related

Redis crashing without any log errors

I'm debugging some weird behavior in my redis, where it's crashing each 2 days more or less, but not showing any errors whatsoever, only this on the logs:
1:C 10 Sep 2020 15:44:14.517 # Configuration loaded
1:M 10 Sep 2020 15:44:14.522 * Running mode=standalone, port=6379.
1:M 10 Sep 2020 15:44:14.522 # Server initialized
1:M 10 Sep 2020 15:44:14.524 * Ready to accept connections
1:C 12 Sep 2020 13:20:23.751 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 12 Sep 2020 13:20:23.751 # Redis version=6.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 12 Sep 2020 13:20:23.751 # Configuration loaded
1:M 12 Sep 2020 13:20:23.757 * Running mode=standalone, port=6379.
1:M 12 Sep 2020 13:20:23.757 # Server initialized
1:M 12 Sep 2020 13:20:23.758 * Ready to accept connections
That's all redis says to me.
I have lots of RAM available, but I have redis running as a single instance on a docker container, could the lack of processing power cause this? Should I use multiple nodes? I don't want to setup a cluster just to find out the problem was another, how can I trace down the actually cause of the problem?
So, in the end, it was exactly what I thought it was not: a memory leak!
I had 16GB that was slowly being consumed until redis crashed with no warnings, nor the operating system/docker. I fixed the app that caused the leak and the problem was gone.

Redis timeout with almost no data in the database, using the .NET client

I received this error:
StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms),
next: GET RetryCount, inst: 3, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 7, in-pipe: 0, out-pipe: 0,
serverEndpoint: redis:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: 18745af38fec,
IOCP: (Busy=0,Free=1000,Min=1,Max=1000),
WORKER: (Busy=6,Free=32761,Min=1,Max=32767), v: 2.1.58.34321
(Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
We can see that there is only a single message in the queue (qs=1) and that there are only 7 bytes waiting to be read (in=7). Redis is used by 2 processes and holds settings for the system and store logs.
It was a re-install so no logs were written and the database has probably 2-3kb of data :)
This is the only output from Redis:
1:C 12 Sep 2020 15:20:49.293 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 12 Sep 2020 15:20:49.293 # Redis version=6.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 12 Sep 2020 15:20:49.293 # Configuration loaded
1:M 12 Sep 2020 15:20:49.296 * Running mode=standalone, port=6379.
1:M 12 Sep 2020 15:20:49.296 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 12 Sep 2020 15:20:49.296 # Server initialized
1:M 12 Sep 2020 15:20:49.296 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memor
y=1' for this to take effect.
1:M 12 Sep 2020 15:20:49.296 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepag
e/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
1:M 12 Sep 2020 15:20:49.305 * DB loaded from append only file: 0.000 seconds
1:M 12 Sep 2020 15:20:49.305 * Ready to accept connections
so it looks like nothing went wrong on that side.
The 2 processes accessing it are in docker containers, so does Redis. All on a single AWS instance with a lot of ram and disk available.
this is also a one time event, it has never happened before with the same config.
I'm not very experienced with Redis; is there anything in the error message that would look suspicious?

Redis service automatically stops after few minutes of running

On my Ubuntu machine, redis server was running fine and suddenly it stops. After I started it, again it automatically stops after few minutes. So I start again, and so on. Why is this happening?
Here are the logs when I start redis:
21479:C 29 Apr 21:59:10.986 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
21479:C 29 Apr 21:59:10.987 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=21479, just started
21479:C 29 Apr 21:59:10.987 # Configuration loaded
21480:M 29 Apr 21:59:10.990 * Increased maximum number of open files to 10032 (it was originally set to 1024).
21480:M 29 Apr 21:59:10.991 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
21480:M 29 Apr 21:59:10.992 # Server initialized
21480:M 29 Apr 21:59:14.588 * DB loaded from disk: 3.596 seconds
21480:M 29 Apr 21:59:14.591 * Ready to accept connections

Kubernetes Redis Cluster PubSub Channels not getting synched on replica

I have set up a Redis cluster on Kubernetes, the cluster state is OK and the replica is connected to the master. Also as per the logs, the full synchronization is also completed. The logs are as follows:-
9:M 22 Oct 12:24:18.209 * Slave 192.168.1.41:6379 asks for synchronization
9:M 22 Oct 12:24:18.209 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '794b9c74abe40ac90c752f32a102078e063ff636', my replication IDs are '0f499740a46665d12fab921838297273279ad136' and '0000000000000000000000000000000000000000')
9:M 22 Oct 12:24:18.209 * Starting BGSAVE for SYNC with target: disk
9:M 22 Oct 12:24:18.211 * Background saving started by pid 231
231:C 22 Oct 12:24:18.215 * DB saved on disk
231:C 22 Oct 12:24:18.216 * RDB: 4 MB of memory used by copy-on-write
9:M 22 Oct 12:24:18.224 * Background saving terminated with success
9:M 22 Oct 12:24:18.224 * Synchronization with slave 192.168.1.41:6379 succeeded
Still, when I check the List of the PubSub Channels on the replica, it does not show the channels and thus it breaks the PubSub flow.
Any help/advise is appreciated.

Failed opening .rdb for saving: Permission denied - started after a while of running successfully

I have had a node web service running successfully on an aws ubuntu server for over a month, with the requests cached using redis.
Yesterday I started getting the following error from some of my routes:
MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
I was able to stop the error occurring by using:
config set stop-writes-on-bgsave-error no
as suggested in the answers to this question, but it doesn't actually solve the underlying problem.
To find the underlying problem I checked the logs and found the following had started happening:
[1105] 09 Aug 13:17:14.800 - 0 clients connected (0 slaves), 797680 bytes in use
[1105] 09 Aug 13:17:15.101 * 1 changes in 900 seconds. Saving...
[1105] 09 Aug 13:17:15.101 * Background saving started by pid 28090
[28090] 09 Aug 13:17:15.101 # Failed opening .rdb for saving: Permission denied
[1105] 09 Aug 13:17:15.201 # Background saving error
Over the weekend no one had been using the server, but before the weekend the logs were fine, and we were getting no errors:
[12521] 06 Aug 04:49:27.308 - 0 clients connected (0 slaves), 803352 bytes in use
[12521] 06 Aug 04:49:29.012 * 1 changes in 900 seconds. Saving...
[12521] 06 Aug 04:49:29.012 * Background saving started by pid 26663
[26663] 06 Aug 04:49:29.014 * DB saved on disk
[26663] 06 Aug 04:49:29.014 * RDB: 2 MB of memory used by copy-on-write
[12521] 06 Aug 04:49:29.112 * Background saving terminated with success
As I said, no one has touched this server in the intervening time.
Looking around for people having the same problem I found this question. I checked the ownership and permissions on the directory and db file as suggested in the answers there:
drwxr-xr-x 2 redis redis 26 Aug 6 06:55 redis
-rw-r--r-- 1 redis redis 18 Aug 6 06:55 dump-6379.rdb
The permissions and ownership both look ok to me, but I have noticed that the date on the file and folder is between the last time I saw the service working and the first time it failed. Unfortunately that hasn't really helped me with what to do next and I am at a bit of a loss.
I am looking for suggestions for next steps to find the cause of the problem, or at least a way of making redis able to write again.