Improving performance of Redis set up (degraded after setting vm.overcommit_memory=1) - redis

Need some help in diagnosing and tuning the performance of my Redis set up (2 redis-server instances on an Ubuntu 14.04 machine). Note that a write-heavy Django web application shares the VM with Redis. The machine has 8 cores and 25GB RAM.
I recently discovered that background saving was intermittently failing (with a fork() error) even when RAM wasn't exhausted. To remedy this, I applied the setting vm.overcommit_memory=1 (was previously default).
Moreover vm.swappiness=2, vm.overcommit_ratio=50. I have disabled transparent huge pages in my set up as well via echo never > /sys/kernel/mm/transparent_hugepage/enabled (although haven't done echo never > /sys/kernel/mm/transparent_hugepage/defrag).
Right after changing the overcommit_memory setting, I noticed that I/O utilization went from 13% to 36% (on average). I/O operations per second doubled, the redis-server CPU consumption has more than doubled, and the memory it's consuming has gone up 66%. Consequently, the server response time has substantially gone up . This is how abruptly things escalated after applying vm.overcommit_memory=1:
Note that redis-server is the only ingredient showing escalation - gunicorn, nginx ,celery etc. are performing like before. Moreover, redis has become very spikey.
Lastly, New Relic has started showing me 3 redis instances instead of 2 (bottom most graph). I think the forked child is counted as the 3rd:
My question is: how can I diagnose and salvage performance here? Being new to server administration, I'm unsure how to proceed. Help me find out what's going on here and how I can fix it.
free -m has the following output (in case needed):
total used free shared buffers cached
Mem: 28136 27912 224 576 68 6778
-/+ buffers/cache: 21064 7071
Swap: 0 0 0

As you don't have swap enabled in your system ( which might be worth reconsidering if you have SSDs), ( and your swappiness was set to a low value), you can't blame it on increased swapping due to memory contention.
Your caching about 6GB of data inside the VFS cache. In case of contention this cache would have depleted in favor of process working memory, so I believe it's safe to say memory is not an issue all together.
It's a shot in the dark, but my guess is that your redis-server is configured to "sync"/"save" too often ( search for in the redis config file "appendfsync"), and that by removing the memory allocation limitation, it now actually does it's job :)
If the data is not super crucial, set appendfsync to never and perhaps tweek the save settings to cause less frequent saving.
BTW, regarding the redis & forked child, I believe you are correct.

Related

Redis-server using all RAM at startup

i'm using redis and noticed that it crashes with the following error :
MISCONF Redis is configured to save RDB snapshots
I tried the solution suggested in this post
but everything seems to be OK in term of permissions and space.
htop command tells me that redis is consuming 70% of RAM. i tried to stop / restart redis in order to flush but at startup, the amount of RAM used by redis was growing up dramatically and stops around 66%. I'm pretty sure at this moment no processus was using any redis instance !
what happens there ?
The growing up ram issue is an expected behaviour of Redis at first data load, after restarts, writing the data to disk (snapshot process). Redis tends to allocate memory as much as it can unless you don't use "maxmemory" option at your conf file.
It allocates memory but not release immediately. Sometimes it takes hours, I saw such cases.
Well known fact about Redis is that, it can allocate memory up to twice size of the dataset it keeps.
I suggest you to wait couple of hours without any restart (Redis can work in this time, get/set operations etc.) and keep watching the memory.
Please check that too
Redis will not always free up (return) memory to the OS when keys are
removed. This is not something special about Redis, but it is how most
malloc() implementations work. For example if you fill an instance
with 5GB worth of data, and then remove the equivalent of 2GB of data,
the Resident Set Size (also known as the RSS, which is the number of
memory pages consumed by the process) will probably still be around
5GB, even if Redis will claim that the user memory is around 3GB. This
happens because the underlying allocator can't easily release the
memory. For example often most of the removed keys were allocated in
the same pages as the other keys that still exist.

Unable to save in background (redis-server)

I have two redis servers running on the same machine. The second one's log files have several instances with notices such as these:
[50818] 19 Feb 06:41:05.007 * 10 changes in 300 seconds. Saving...
[50818] 19 Feb 06:41:05.007 # Can't save in background: fork: Cannot allocate memory
In contrast, the log files of the first one solely contain successful DB saves. If I were out of memory, I reckon both would have similar logs. It perplexes me that only one has this problem, the other doesn't. Any leads?
Moreover, research led me to this blog post, which contends that the issue can be ameliorated if I do sysctl vm.overcommit_memory=1 on the command line. There's no explanation of how this helps. Can someone explain what's going on here in context of redis?
As Per Redis FAQs :
Background saving is failing with a fork() error under Linux even if I've a lot of free RAM!
Short answer: echo 1 > /proc/sys/vm/overcommit_memory :)
And now the long one:
Redis background saving schema relies on the copy-on-write semantic of
fork in modern operating systems: Redis forks (creates a child
process) that is an exact copy of the parent. The child process dumps
the DB on disk and finally exits. In theory the child should use as
much memory as the parent being a copy, but actually thanks to the
copy-on-write semantic implemented by most modern operating systems
the parent and child process will share the common memory pages. A
page will be duplicated only when it changes in the child or in the
parent. Since in theory all the pages may change while the child
process is saving, Linux can't tell in advance how much memory the
child will take, so if the overcommit_memory setting is set to zero
fork will fail unless there is as much free RAM as required to really
duplicate all the parent memory pages, with the result that if you
have a Redis dataset of 3 GB and just 2 GB of free memory it will
fail. Setting overcommit_memory to 1 says Linux to relax and perform
the fork in a more optimistic allocation fashion, and this is indeed
what you want for Redis.
A good source to understand how Linux Virtual Memory work and other
alternatives for overcommit_memory and overcommit_ratio is this
classic from Red Hat Magazine, "Understanding Virtual Memory". Beware,
this article had 1 and 2 configuration values for overcommit_memory
reversed: refer to the proc(5) man page for the right meaning of
the available values.

Apache server cannot allocate memory for new process

I have a apache server with 32 GB of RAM. When I start the server and execute top to see the resources It show me that the CPU is at 95 percent. It doesn't a normal behaviour and after a few minutes it raises:
apache cannot allocate memory fork unable to fork new process
I don't know how to solve the problem. Any tips?
I had same problem to fix it there is 2 options:
1- move from micro instances to small and this was the change that solved the problem (micro instances on amazon tend to have large cpu steal time)
2- tune the mysql database server configuration and my apache configuration to use a lot less memory.
tuning guide for a low memory situation such as this one: http://www.narga.net/optimizing-apachephpmysql-low-memory-server/ (But don't use the suggestion of MyISAM tables - horrible...)
this 2 options will make the problem much much less happening .. I am still looking for better solution to close the process that are done and kill the ones that hang in there .

Shinking JVM memory and Swap

Virtual Machine:
4CPU
10GB RAM
10GB swap
Java 1.7
-Xms=-Xmx=6144m
Tomcat 7
We observed a very strange behaviour with the JVM. The JVm resident memory began to shrink and the swap usage shot up to over 50%.
Please see below stats from monitoring tools.
http://i44.tinypic.com/206n6sp.jpg
http://i44.tinypic.com/m99hl0.jpg
Any pointers to understand this is grateful.
Thanks!
Or maybe your Java program was idle and it didn't need that memory, and you have high swappiness? In such situation your OS would free RAM just in case and leave only used part.
In my opinion, that is actually good behaviour, why should you waste RAM for process that won't use it?
Unless you run only this one process on VM, then it would be quite good idea to set swappiness to 0 or other small number - this memory was given to this single process, so we may disable swapping it.
Thanks for the response. Yes this is more close to a system troubleshooting than Java but I thought this the right forum to initiate this topic incase anybody has seen such a phenomena with JVM.
Anyways, I had already checked the top and no there was no other process than Java which was hungry for memory. Actually the second top process was utilizing 72MB (RSS).
No the swappiness is not aggressive set on this system but at default 60. One additional information I missed to share is we have 4 app servers in cluster and all showed this behaviour exactly at the same time. AFAIK, JVM does not swap out but the OS would. But all of it is what confusing me.
All these app servers are production and busy serving request so not idle. The used Heap size was at Avg 5 GB used of the the 6GB.
The other interesting thing I found out were some failed messages in the Vmware logs at the same time which is what I'm investigating.

Redis - Default blocking VM

The blocking VM performance is better overall, as there is no time lost in
synchronization, spawning of threads, and resuming blocked
clients waiting for values. So if you are willing to accept an higher
latency from time to time, blocking VM can be a good pick. Especially
if swapping happens rarely and most of your often accessed data
happens to fit in your memory.
This is default mode of Redis (and the only mode going forward I believe now VM is deprecated in 2.6), leaving the OS to handle paging (if/when required). I am correct in my understanding that it will take some time to get "hot" when booted/started. When working on a 1gb RAM node with a 16gb dataset, does Redis attempt to load it all into virtual memory at boot and thus 90%+ is immediately paged out, and only after some good amount of usages does the above statement hold true?
Redis VM was already deprecated in Redis 2.4, and has been removed in Redis 2.6. It is a dead end: don't use it.
I think you are confusing the blocking VM with OS paging. They are two different things.
OS paging is the default mode of Redis when Redis VM is not configured at all (whatever the blocking mode). The OS will swap Redis memory if it does not fit in physical memory. The event loop can be frozen at any time. When it happens, performance is abysmal because none of the Redis internal data structures is designed for this (no locality, no paging system).
Redis VM can be configured in non blocking mode (using I/O threads). When I/Os are done, the event loop is not blocked, and Redis is still responsive. However, when too many I/Os pile up, the I/O threads will be completely busy, and you end up with a responsive Redis, but unable to process any queries requiring I/Os.
Redis VM can also be configured in blocking mode. In this mode all I/Os are synchronously performed in the main event loop thread. So the event loop is frozen in case of I/O (for instance in case of a key miss). All clients are impacted. However, general performance (CPU consumption and latency) is better than with the non blocking mode because some threading scheduling/synchronization is saved.
In practice, the difference between OS paging and the Redis blocking VM is the granularity level. With Redis VM, the granularity is the key. With OS paging, well it is the page (a 4 KB block which can span on several unrelated keys).
In all 3 cases, the initial load of the dump file will be extremely slow and generate a peak of random I/Os on your system. As you pointed out, most objects will be loaded and then swapped out. The warm-up time will be significant.
Except if you have extreme locality in your data, or if you do not care at all about the latencies, using 1 GB RAM for a 16 GB dataset with the Redis VM is science-fiction IMO.
There is a reason why the Redis VM was phased out. By design, it will never perform as well as a disk-based datastore (which can exploit file mapping or direct I/Os to avoid the double buffering, and use adapted data structures like B-trees).
Redis as an in-memory store is excellent. But if you need to store something which is bigger than RAM, don't use it. Other (disk-based) stores will all perform much better.