Redis snapshot overloading memory - redis

I'm using redis as a client side caching mechanism.
Implemented with C# using stackexchange.redis.
I configured the snapshotting to "save 5 1" and rdbcompression is on.
The RDB mechanism loads the rdb file to memory every time it needs to append data.
The problem is when you have a fairly large RDB file and it's loaded to memory all at once. It chokes up the memory, disk and cpu for the average endpoint.
Is there a way to update the rdb file without loading the whole file to memory?
Also any other solution that lowers the load on the memory and cpu is welcome.

The RDB mechanism loads the rdb file to memory every time it needs to append data.
This isn't what the open source Redis server does (other variants, such as the MSFT fork, may behave differently) - RDBs are created by copying the contents of the memory to disk with a forked process. The dump's file is never loaded, except when used for recovery. The increased memory usage during the save process is dependent on the amount of writes performed while the dump is undergoing because of the copy-on-write (COW) mechanism.
Also any other solution that lowers the load on the memory and cpu is welcome.
There are several ways to tackle this, depending on your requirements and budget. These include:
Using both RDB and AOF for data persistency, thus reducing the frequency of dumps.
Delegating persistency to a slave instance.
Sharding your databases and performing cascading dumps.

We tackled the problem by using RDB and now use AOF exclusively.
We have reduced the memory peaks by reducing the auto-aof-rewrite-percentage and also limiting the auto-aof-rewrite-min-size to the desired size.

Related

Redis-server using all RAM at startup

i'm using redis and noticed that it crashes with the following error :
MISCONF Redis is configured to save RDB snapshots
I tried the solution suggested in this post
but everything seems to be OK in term of permissions and space.
htop command tells me that redis is consuming 70% of RAM. i tried to stop / restart redis in order to flush but at startup, the amount of RAM used by redis was growing up dramatically and stops around 66%. I'm pretty sure at this moment no processus was using any redis instance !
what happens there ?
The growing up ram issue is an expected behaviour of Redis at first data load, after restarts, writing the data to disk (snapshot process). Redis tends to allocate memory as much as it can unless you don't use "maxmemory" option at your conf file.
It allocates memory but not release immediately. Sometimes it takes hours, I saw such cases.
Well known fact about Redis is that, it can allocate memory up to twice size of the dataset it keeps.
I suggest you to wait couple of hours without any restart (Redis can work in this time, get/set operations etc.) and keep watching the memory.
Please check that too
Redis will not always free up (return) memory to the OS when keys are
removed. This is not something special about Redis, but it is how most
malloc() implementations work. For example if you fill an instance
with 5GB worth of data, and then remove the equivalent of 2GB of data,
the Resident Set Size (also known as the RSS, which is the number of
memory pages consumed by the process) will probably still be around
5GB, even if Redis will claim that the user memory is around 3GB. This
happens because the underlying allocator can't easily release the
memory. For example often most of the removed keys were allocated in
the same pages as the other keys that still exist.

Is it possible configuring working directory with multiple data folders

I've currently installed Redis on VM which has two mounted disks. I'd like to use those two mounted disks as working directories for Redis.
So is it possible to configure Redis working directory dir with multiple folder locations?
Thanks!
AVR
NO, you CANNOT do that.
Redis can only hold data that fits into memory. Normally that size is much smaller than the size of a disk, and there's no need to use multiple disks to extend the storage.
In some cases, multiple disks might help, e.g. Redis is dumping data set to disk while syncing with slaves, Redis dumps both AOF and RDB files. In these cases, there are multiple readers or writers working at the same time, and that might cause performance issues (i.e. too many disk seeks).
However, since Redis focuses on in-memory store, I'm not sure if that's a big problem to concern.

How to configure namespace to keep partial data as cache in ram and the remaining in hard disk?

I am trying to write some data to a namespace in Aerospike, but i don't have enough ram for the whole data.
How can i configure my Aerospike so that a portion of the data in kept in the ram as cache and remaining is kept in the hard drive?
Can I reduce the number of copies of data made in Aerospike kept in ram?
It can be done by modifying the contents ofaerospike.conf file but how exactly am i going to achieve it.
You should have seen the configuration page in aerospike documentation before asking such question
http://www.aerospike.com/docs/operations/configure/namespace/storage/
How can i configure my Aerospike so that a portion of the data in kept in the ram as cache and remaining is kept in the hard drive?
The post-write-queue parameter defines the amount of RAM to use to keep recently written records in RAM. As long as these records are still in the post-write-queue Aerospike will read directly from RAM rather than disk. This will allow you to configure a LRU cache for an namespace that is storage-engine device and data-in-memory false. Note that this is least recently updated (or created) rather than least recently used (read or write) cache eviction algorithm.

what is copy-on-write memory

As I continuously write data to redis, the memory used by copy-on-write keeps increasing. Even though I write my program to sleep long enough so that redis will be able to finish all the background save (last memory message is 0 MB of memory used by copy-on-write), the next background save will go back to the high number.
Example,
1300MB of memory used by cow
1400MB of memory used by cow
0MB of memory used by cow
1500MB of memory used by cow
What exactly do all these means? As far as I know, if the copy-on-write memory keeps increasing, there is no way there is enough ram. Also, with each background save that is of high memory used, redis seems non-functional. Jedis always hit the socket timeout exception.
Here I will explain a few things: what Copy-on-Write (CoW) is and how it consumes the memory, why setting 'vm.overcommit_memory = 1' won't help the memory usage and performance issue, and best practices of backing up Redis data.
Copy-on-Write and its memory usage
Redis' snapshot backup leverage the CoW semantics, which is provided by modern operating system to resolve the issue that when forking processes, the memory of the parent process is copied to the child process thus doubles the memory footprint. In CoW, the forked child process will share the original memory space of the parent process. It only copies the memory page when either process modifies that memory page. Here is an illustration of the memory space before data modification and after data modification:
When the Redis' RDB backup is on-going, there will be data changes happening in the parent process, which is accepting new requests from clients and handling it in the memory. If the QPS is high, the parent process will copy tons of memory pages for the new changes during the child process' backup time. Thus the parent process will consume extra memory. In extreme cases, if all of the memory pages are modified, the memory footprint of the Redis instance will be doubled. Yeah, there is a possibility that the memory is doubled, and this fact will explain why Redis provides the "overcommit_memory=1" option, and what problem it can resolve, what it cannot (reducing the memory usage).
What "vm.overcommit_memory = 1" is, and what issues it resolves
During the RDB backup, you may see such log error:
10202:M 13 Sep 11:34:16.535 # Can't save in background: fork: Cannot allocate memory
It indicates there is not enough memory to fork the child process to do the backup. If the Redis process consumes 2GB memory now, when forking the child process, operating system will assume you have ANOTHER 2GB memory, so that in extreme cases of CoW, there is sufficient memory to copy all dirty memory pages. Even the extra memory is not used yet when forking the child process, it checks the idle memory to avoid later out-of-memory errors. In the Redis log, it provides the solution:
10202:M 13 Sep 11:33:09.943 # WARNING overcommit_memory is set to 0! Background
save may fail under low memory condition. To fix this issue add
'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the
command 'sysctl vm.overcommit_memory=1' for this to take effect.
So setting 'vm.overcommit_memory = 1' will allow you to fork the child process when the idle memory is low. If you know the dirty memory pages during the backup process won't be too many, there won't be any actual problems because the memory will be allocated successfully every time a new CoW operation happens.
And, 'vm.overcommit_memory = 1' only guarantees that you can fork the child process to backup the Redis data, but it cannot reduce the memory usage if there are writing operations happening all the time in the parent process.
Redis backup practice
There are three ways of persisting the Redis memory data: RDB(snapshotting), AOF, and the hybrid of the two. Any approach will impact the server response time to some extent no matter how you config the settings. To minimize the impact of the persisting process, we normally run the backup in slave instance instead on the master instance. However, there is a new risk if we do it on a slave. When there is network partitions happening, the slave may not be able to keep up-to-date, so backing up on a slave will risk losing some data. One resolution is to have multiple slaves, so the chance of having all of them out-of-sync with the master instance is lowered. Another prevention is setting up robust monitoring system, so we can detect network issues sooner and reduce the time span of the network partition.
From the Redis FAQ:
Redis background saving schema relies on the copy-on-write semantic of the fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory, the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory, all the pages may change while the child process is saving.
The increased memory usage during the save process is dependent on the number of writes performed while the dump is undergoing because of the copy-on-write (COW) mechanism.
What you could do instead is, configure a Redis slave and delegate the task of persistence to it.

Redis - Can data size be greater than memory size?

I'm rather new to Redis and before using it I'd like to learn some important (as for me) details on it. So....
Redis is using RAM and HDD for storing data. RAM is used as fast read/write storage, HDD is used to make this data persistant. When Redis is started it loads all data from HDD to RAM or it loads only often queried data to the RAM? What if I have 500Mb Redis storage on HDD, but I have only 100Mb or RAM for Redis. Where can I read about it?
Redis loads everything into RAM. All the data is written to disk, but will only be read for things like restarting the server or making a backup.
There are a couple of ways you can use it with less RAM than data though. You can set it up in combination with MySQL or another disk based store to work much like memcached - you manage cache misses and persistence manually.
Redis has a VM mode where all keys must fit in RAM but infrequently accessed data can be on disk. However, I'm not sure if this is in the stable builds yet.
Recent versions (>2.0) have improved significantly and memory management is more efficient. See this blog post that explains how to use hashes to optimize RAM memory footprint: http://antirez.com/post/redis-weekly-update-7.html
The feature called Virtual Memory and it official deprecated
Redis VM is now deprecated. Redis 2.4 will be the latest Redis version featuring Virtual Memory (but it also warns you that Virtual Memory usage is discouraged). We found that using VM has several disadvantages and problems. In the future of Redis we want to simply provide the best in-memory database (but persistent on disk as usual) ever, without considering at least for now the support for databases bigger than RAM. Our future efforts are focused into providing scripting, cluster, and better persistence.
more information about VM: https://redis.io/topics/virtual-memory