Redis:Error writing to the AOF file: Quota exceeded - redis

When we run the performance test, we get error from Reids log:
11:M 06 Jun 03:26:02.640 # Bad message length or signature received from Cluster bus.
11:M 06 Jun 03:26:25.429 # Bad message length or signature received from Cluster bus.
11:M 06 Jun 03:26:25.434 # Bad message length or signature received from Cluster bus.
11:M 06 Jun 03:26:27.031 # Error writing to the AOF file: Quota exceeded
could someone help me?

Related

Redis OOM issue

We are using Redis 6.0.0. After OOM, I see this log. What is RDB memory usage?
3489 MB is very close to max memory that we have. Does it indicate that we are storing a lot of data in Redis ? Or its just being caused by RDB overhead.
1666:M 01 Jun 2022 19:23:32.268 # Server initialized
1666:M 01 Jun 2022 19:23:32.270 * Loading RDB produced by version 6.0.6
1666:M 01 Jun 2022 19:23:32.270 * RDB age 339 seconds
1666:M 01 Jun 2022 19:23:32.270 * RDB memory usage when created **3489.20 Mb**
Can we rule out fragmentation? Given that RDB memory usage itself indicated 3489 MB.

Redis timeout with almost no data in the database, using the .NET client

I received this error:
StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms),
next: GET RetryCount, inst: 3, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 7, in-pipe: 0, out-pipe: 0,
serverEndpoint: redis:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: 18745af38fec,
IOCP: (Busy=0,Free=1000,Min=1,Max=1000),
WORKER: (Busy=6,Free=32761,Min=1,Max=32767), v: 2.1.58.34321
(Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
We can see that there is only a single message in the queue (qs=1) and that there are only 7 bytes waiting to be read (in=7). Redis is used by 2 processes and holds settings for the system and store logs.
It was a re-install so no logs were written and the database has probably 2-3kb of data :)
This is the only output from Redis:
1:C 12 Sep 2020 15:20:49.293 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 12 Sep 2020 15:20:49.293 # Redis version=6.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 12 Sep 2020 15:20:49.293 # Configuration loaded
1:M 12 Sep 2020 15:20:49.296 * Running mode=standalone, port=6379.
1:M 12 Sep 2020 15:20:49.296 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 12 Sep 2020 15:20:49.296 # Server initialized
1:M 12 Sep 2020 15:20:49.296 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memor
y=1' for this to take effect.
1:M 12 Sep 2020 15:20:49.296 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepag
e/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
1:M 12 Sep 2020 15:20:49.305 * DB loaded from append only file: 0.000 seconds
1:M 12 Sep 2020 15:20:49.305 * Ready to accept connections
so it looks like nothing went wrong on that side.
The 2 processes accessing it are in docker containers, so does Redis. All on a single AWS instance with a lot of ram and disk available.
this is also a one time event, it has never happened before with the same config.
I'm not very experienced with Redis; is there anything in the error message that would look suspicious?

Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis

I have run below Test-1 and Test-2 for longer run for performance test with redis configuration values specified, still we see the highlighted error-1 & 2 message and cluster is failing for some time, few of our processing failing. How to solve this problem.
please anyone have suggestion to avoid cluster fail which is goes longer than 10seconds, cluster is not coming up within 3 retry attempts (spring retry template we are using for retry mechanism try count is set to 3, and retry after 5sec, its exponential way next attempts) using Jedis client.
Error-1: Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
Error-2: Marking node a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba as failing (quorum reached).
Test-1:
Run the test with Redis Setting:
"appendfsync"="yes"
"appendonly"="no"
[root#rdcapdev1-redis-cache3 redis-3.2.5]# src/redis-cli -p 6379
127.0.0.1:6379> CONFIG GET **aof***
1) "auto-aof-rewrite-percentage"
2) "30"
3) "auto-aof-rewrite-min-size"
4) "67108864"
5) "aof-rewrite-incremental-fsync"
6) "yes"
7) "aof-load-truncated"
8) "yes"
127.0.0.1:6379> exit
[root#rdcapdev1-redis-cache3 redis-3.2.5]# src/redis-cli -p 6380
127.0.0.1:6380> CONFIG GET **aof***
1) "auto-aof-rewrite-percentage"
2) "30"
3) "auto-aof-rewrite-min-size"
4) "67108864"
5) "aof-rewrite-incremental-fsync"
6) "yes"
7) "aof-load-truncated"
8) "yes"
127.0.0.1:6380> clear
Observation:
1. The redis failover occurred for ~40 sec.
2. There are around 20 documents failed on FX and OCR level. Due to inability to write/read the files to Redis.
3. This has been happened when ~50% of RAM got utilized.
4. The Master-slave configuration has be reshuffled as below after this failover.
5. Below are the few highlights of the redis log, plese refer that attached log for more detail.
6. I have logs for this, for more details: 30Per_AofRW_2.zip
Redis1 Master log:
2515:C 05 May 11:06:30.343 * DB saved on disk
2515:C 05 May 11:06:30.379 * RDB: 15 MB of memory used by copy-on-write
837:S 05 May 11:06:30.429 * Background saving terminated with success
837:S 05 May 11:11:31.024 * 10 changes in 300 seconds. Saving...
837:S 05 May 11:11:31.067 * Background saving started by pid 2534
837:S 05 May 11:12:24.802 * FAIL message received from 6b8d49e9db288b13071559c667e95e3691ce8bd0 about ce62a26102ef54f43fa7cca64d24eab45cf42a61
837:S 05 May 11:12:27.049 * Clear FAIL state for node ce62a26102ef54f43fa7cca64d24eab45cf42a61: slave is reachable again.
2534:C 05 May 11:12:31.110 * DB saved on disk
Redis2 Master log:
837:M 05 May 10:30:22.216 * Marking node a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba as failing (quorum reached).
837:M 05 May 10:30:22.216 # Cluster state changed: fail
837:M 05 May 10:30:23.148 # Failover auth granted to 6b8d49e9db288b13071559c667e95e3691ce8bd0 for epoch 12
837:M 05 May 10:30:23.188 # Cluster state changed: ok
837:M 05 May 10:30:27.227 * Clear FAIL state for node a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba: slave is reachable again.
837:M 05 May 10:35:22.017 * 10 changes in 300 seconds. Saving...
.
.
.
837:M 05 May 11:12:23.592 * FAIL message received from 6b8d49e9db288b13071559c667e95e3691ce8bd0 about ce62a26102ef54f43fa7cca64d24eab45cf42a61
837:M 05 May 11:12:27.045 * Clear FAIL state for node ce62a26102ef54f43fa7cca64d24eab45cf42a61: slave is reachable again.
Redis3 Master Log:
833:M 05 May 10:30:22.217 * FAIL message received from 83f6a9589aa1bce8932a367894fa391edd0ce269 about a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba
833:M 05 May 10:30:22.217 # Cluster state changed: fail
833:M 05 May 10:30:23.149 # Failover auth granted to 6b8d49e9db288b13071559c667e95e3691ce8bd0 for epoch 12
833:M 05 May 10:30:23.189 # Cluster state changed: ok
1822:C 05 May 10:30:27.397 * DB saved on disk
1822:C 05 May 10:30:27.428 * RDB: 8 MB of memory used by copy-on-write
833:M 05 May 10:30:27.528 * Background saving terminated with success
833:M 05 May 10:30:27.828 * Clear FAIL state for node a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba: slave is reachable again.
HOST: localhost PORT: 6379
machine master slave
10.2.1.233 0.00 2.00
10.2.1.46 2.00 0.00
10.2.1.202 1.00 1.00
MASTER SLAVE INFO
hashCode master slave hashSlot
81ae2d757f57f36fa1df6e930af3b072084ba3e8 10.2.1.202:6379 10.2.1.233:6380, 10923-16383
6b8d49e9db288b13071559c667e95e3691ce8bd0 10.2.1.46:6380 10.2.1.233:6379, 0-5460
83f6a9589aa1bce8932a367894fa391edd0ce269 10.2.1.46:6379 10.2.1.202:6380, 5461-10922
6b8d49e9db288b13071559c667e95e3691ce8bd0 10.2.1.46:6380 master - 0 1493981044497 12 connected 0-5460
81ae2d757f57f36fa1df6e930af3b072084ba3e8 10.2.1.202:6379 master - 0 1493981045500 3 connected 10923-16383
ce62a26102ef54f43fa7cca64d24eab45cf42a61 10.2.1.202:6380 slave 83f6a9589aa1bce8932a367894fa391edd0ce269 0 1493981043495 10 connected
ac630108d1556786a4df74945cfe35db981d15fa 10.2.1.233:6380 slave 81ae2d757f57f36fa1df6e930af3b072084ba3e8 0 1493981042492 11 connected
83f6a9589aa1bce8932a367894fa391edd0ce269 10.2.1.46:6379 master - 0 1493981044497 2 connected 5461-10922
a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba 10.2.1.233:6379 myself,slave 6b8d49e9db288b13071559c667e95e3691ce8bd0 0 0 1 connected
Test-2:
Run the test with Redis Setting:
"appendfsync"="no"
"appendonly"="yes"
Observation:
1. The redis failover occurred for ~40 sec.
2. There are around 20 documents failed on FX and OCR level. Due to inability to write/read the files to Redis.
3. This has been happened when ~50% of RAM got utilized.
4. The Master-slave configuration has be reshuffled as below after this failover.
5. Below are the few highlights of the redis log, plese refer that attached log for more detail.
30Per_AofRW_2.zip
Redis1 Master log:
2515:C 05 May 11:06:30.343 * DB saved on disk
2515:C 05 May 11:06:30.379 * RDB: 15 MB of memory used by copy-on-write
837:S 05 May 11:06:30.429 * Background saving terminated with success
837:S 05 May 11:11:31.024 * 10 changes in 300 seconds. Saving...
837:S 05 May 11:11:31.067 * Background saving started by pid 2534
837:S 05 May 11:12:24.802 * FAIL message received from 6b8d49e9db288b13071559c667e95e3691ce8bd0 about ce62a26102ef54f43fa7cca64d24eab45cf42a61
837:S 05 May 11:12:27.049 * Clear FAIL state for node ce62a26102ef54f43fa7cca64d24eab45cf42a61: slave is reachable again.
2534:C 05 May 11:12:31.110 * DB saved on disk
Redis2 Master log:
5306:M 03 Apr 09:02:36.947 * Background saving terminated with success
5306:M 03 Apr 09:02:49.574 * Starting automatic rewriting of AOF on 3% growth
5306:M 03 Apr 09:02:49.583 * Background append only file rewriting started by pid 12864
5306:M 03 Apr 09:02:54.050 * AOF rewrite child asks to stop sending diffs.
12864:C 03 Apr 09:02:54.051 * Parent agreed to stop sending diffs. Finalizing AOF...
12864:C 03 Apr 09:02:54.051 * Concatenating 0.00 MB of AOF diff received from parent.
12864:C 03 Apr 09:02:54.051 * SYNC append only file rewrite performed
12864:C 03 Apr 09:02:54.058 * AOF rewrite: 2 MB of memory used by copy-on-write
5306:M 03 Apr 09:02:54.098 * Background AOF rewrite terminated with success
5306:M 03 Apr 09:02:54.098 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
5306:M 03 Apr 09:02:54.098 * Background AOF rewrite finished successfully
5306:M 03 Apr 09:04:01.843 * Starting automatic rewriting of AOF on 3% growth
5306:M 03 Apr 09:04:01.853 * Background append only file rewriting started by pid 12867
5306:M 03 Apr 09:04:11.657 * AOF rewrite child asks to stop sending diffs.
12867:C 03 Apr 09:04:11.657 * Parent agreed to stop sending diffs. Finalizing AOF...
12867:C 03 Apr 09:04:11.657 * Concatenating 0.00 MB of AOF diff received from parent.
12867:C 03 Apr 09:04:11.657 * SYNC append only file rewrite performed
12867:C 03 Apr 09:04:11.664 * AOF rewrite: 2 MB of memory used by copy-on-write
5306:M 03 Apr 09:04:11.675 * Background AOF rewrite terminated with success
5306:M 03 Apr 09:04:11.675 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
5306:M 03 Apr 09:04:11.675 * Background AOF rewrite finished successfully
5306:M 03 Apr 09:04:48.054 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5306:M 03 Apr 09:05:28.571 * Starting automatic rewriting of AOF on 3% growth
5306:M 03 Apr 09:05:28.581 * Background append only file rewriting started by pid 12873
5306:M 03 Apr 09:05:33.300 * AOF rewrite child asks to stop sending diffs.
12873:C 03 Apr 09:05:33.300 * Parent agreed to stop sending diffs. Finalizing AOF...
12873:C 03 Apr 09:05:33.300 * Concatenating 2.09 MB of AOF diff received from parent.
12873:C 03 Apr 09:05:33.329 * SYNC append only file rewrite performed
12873:C 03 Apr 09:05:33.336 * AOF rewrite: 11 MB of memory used by copy-on-write
5306:M 03 Apr 09:05:33.395 * Background AOF rewrite terminated with success
5306:M 03 Apr 09:05:33.395 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
5306:M 03 Apr 09:05:33.395 * Background AOF rewrite finished successfully
5306:M 03 Apr 09:07:37.082 * 10 changes in 300 seconds. Saving...
5306:M 03 Apr 09:07:37.092 * Background saving started by pid 12875
12875:C 03 Apr 09:07:47.016 * DB saved on disk
12875:C 03 Apr 09:07:47.024 * RDB: 5 MB of memory used by copy-on-write
5306:M 03 Apr 09:07:47.113 * Background saving terminated with success
5306:M 03 Apr 09:07:51.622 * Starting automatic rewriting of AOF on 3% growth
5306:M 03 Apr 09:07:51.632 * Background append only file rewriting started by pid 12876
5306:M 03 Apr 09:07:56.559 * AOF rewrite child asks to stop sending diffs.
12876:C 03 Apr 09:07:56.559 * Parent agreed to stop sending diffs. Finalizing AOF...
12876:C 03 Apr 09:07:56.559 * Concatenating 0.00 MB of AOF diff received from parent.
12876:C 03 Apr 09:07:56.559 * SYNC append only file rewrite performed
12876:C 03 Apr 09:07:56.567 * AOF rewrite: 2 MB of memory used by copy-on-write
5306:M 03 Apr 09:07:56.645 * Background AOF rewrite terminated with success
5306:M 03 Apr 09:07:56.645 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
5306:M 03 Apr 09:07:56.645 * Background AOF rewrite finished successfully
5306:M 03 Apr 09:12:48.071 * 10 changes in 300 seconds. Saving...
5306:M 03 Apr 09:12:48.081 * Background saving started by pid 12882
12882:C 03 Apr 09:12:58.381 * DB saved on disk
12882:C 03 Apr 09:12:58.389 * RDB: 5 MB of memory used by copy-on-write
5306:M 03 Apr 09:12:58.403 * Background saving terminated with success
5306:M 03 Apr 10:17:33.005 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5306:M 03 Apr 10:22:42.042 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5306:M 03 Apr 10:27:51.039 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
5306:M 03 Apr 11:10:10.606 * Marking node a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba as failing (quorum reached).
5306:M 03 Apr 11:10:10.607 # Cluster state changed: fail
5306:M 03 Apr 11:10:10.608 * FAIL message received from 83f6a9589aa1bce8932a367894fa391edd0ce269 about ac630108d1556786a4df74945cfe35db981d15fa
5306:M 03 Apr 11:10:11.594 # Failover auth granted to 6b8d49e9db288b13071559c667e95e3691ce8bd0 for epoch 7
HOST: localhost PORT: 6379
machine master slave
10.2.1.233 0.00 2.00
10.2.1.46 2.00 0.00
10.2.1.202 1.00 1.00
MASTER SLAVE INFO
hashCode master slave hashSlot
81ae2d757f57f36fa1df6e930af3b072084ba3e8 10.2.1.202:6379 10.2.1.233:6380, 10923-16383
6b8d49e9db288b13071559c667e95e3691ce8bd0 10.2.1.46:6380 10.2.1.233:6379, 0-5460
83f6a9589aa1bce8932a367894fa391edd0ce269 10.2.1.46:6379 10.2.1.202:6380, 5461-10922
6b8d49e9db288b13071559c667e95e3691ce8bd0 10.2.1.46:6380 master - 0 1493981044497 12 connected 0-5460
81ae2d757f57f36fa1df6e930af3b072084ba3e8 10.2.1.202:6379 master - 0 1493981045500 3 connected 10923-16383
ce62a26102ef54f43fa7cca64d24eab45cf42a61 10.2.1.202:6380 slave 83f6a9589aa1bce8932a367894fa391edd0ce269 0 1493981043495 10 connected
ac630108d1556786a4df74945cfe35db981d15fa 10.2.1.233:6380 slave 81ae2d757f57f36fa1df6e930af3b072084ba3e8 0 1493981042492 11 connected
83f6a9589aa1bce8932a367894fa391edd0ce269 10.2.1.46:6379 master - 0 1493981044497 2 connected 5461-10922
a523100ddfbf844c6d1cc7e0b6a4b3a2aa970aba 10.2.1.233:6379 myself,slave 6b8d49e9db288b13071559c667e95e3691ce8bd0 0 0 1 connected

Failed opening .rdb for saving: Permission denied - started after a while of running successfully

I have had a node web service running successfully on an aws ubuntu server for over a month, with the requests cached using redis.
Yesterday I started getting the following error from some of my routes:
MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
I was able to stop the error occurring by using:
config set stop-writes-on-bgsave-error no
as suggested in the answers to this question, but it doesn't actually solve the underlying problem.
To find the underlying problem I checked the logs and found the following had started happening:
[1105] 09 Aug 13:17:14.800 - 0 clients connected (0 slaves), 797680 bytes in use
[1105] 09 Aug 13:17:15.101 * 1 changes in 900 seconds. Saving...
[1105] 09 Aug 13:17:15.101 * Background saving started by pid 28090
[28090] 09 Aug 13:17:15.101 # Failed opening .rdb for saving: Permission denied
[1105] 09 Aug 13:17:15.201 # Background saving error
Over the weekend no one had been using the server, but before the weekend the logs were fine, and we were getting no errors:
[12521] 06 Aug 04:49:27.308 - 0 clients connected (0 slaves), 803352 bytes in use
[12521] 06 Aug 04:49:29.012 * 1 changes in 900 seconds. Saving...
[12521] 06 Aug 04:49:29.012 * Background saving started by pid 26663
[26663] 06 Aug 04:49:29.014 * DB saved on disk
[26663] 06 Aug 04:49:29.014 * RDB: 2 MB of memory used by copy-on-write
[12521] 06 Aug 04:49:29.112 * Background saving terminated with success
As I said, no one has touched this server in the intervening time.
Looking around for people having the same problem I found this question. I checked the ownership and permissions on the directory and db file as suggested in the answers there:
drwxr-xr-x 2 redis redis 26 Aug 6 06:55 redis
-rw-r--r-- 1 redis redis 18 Aug 6 06:55 dump-6379.rdb
The permissions and ownership both look ok to me, but I have noticed that the date on the file and folder is between the last time I saw the service working and the first time it failed. Unfortunately that hasn't really helped me with what to do next and I am at a bit of a loss.
I am looking for suggestions for next steps to find the cause of the problem, or at least a way of making redis able to write again.

What's the hard limit for apache ThreadsPerChild parameter in httpd.conf?

i'm using the ibm http server which is based on Apache. When i tried to increase the parameter ThreadsPerChild more than 1000, the http server always only start up 1000 worker threads. Below is the related information:
error log:
[Thu Jul 05 10:50:45 2012] [debug] mpm_winnt.c(564): Child 9040: retrieved 2 listeners from parent
[Thu Jul 05 10:50:45 2012] [notice] Child 9040: Acquired the start mutex.
[Thu Jul 05 10:50:45 2012] [notice] Child 9040: Starting 1000 worker threads.
[Thu Jul 05 10:50:45 2012] [notice] Child 9040: Starting thread to listen on port 81.
[Thu Jul 05 10:50:45 2012] [notice] Child 9040: Starting thread to listen on port 80.
httpd.conf
<IfModule mpm_winnt.c>
ThreadLimit 2048<br>
ThreadsPerChild 2000
MaxRequestsPerChild 0
</IfModule>
IHS 7.0.0.0
OS winNT
BTW, another concern with ThreadsPerChild is whether one Apache thread handles one client connection here, or one thread can take care of more than one client connection?
Please help me out.
Thanks very much
On limits of ThreadsPerChild setting, quoting from IBM HTTP Server Performance Tuning ;
On 64-bit Windows OS'es, each instance of is limited to approximately
2500 ThreadsPerChild. On 32-bit Windows, this number is closer to
5000. These numbers are not exact limits, because the real limits are the sum of the fixed startup cost of memory for each thread + the
maximum runtime memory usage per thread, which varies based on
configuration and workload. Raising ThreadsPerChild and approaching
these limits risks child process crashes when runtime memory usage
puts the process address space over the 2GB or 3GB barrier.
The interesting to note here is ThreadsPerChild is not the only parameter for tuning concurrent connections to IHS. You may find information about other parameters (like maxClients) and tuning methodology at the following link;
Tuning IBM HTTP Server to maximize the number of client connections to WebSphere Application Server