Redis / Limit of .rdb file - redis

I use Redis, and it save it .rdb file (every transaction).
I notice that the .rdb on the production grows 15 MB every day (Now it stands on 75 MB).
Is there any limit to the .rdb file? Is this affect on the performance of the Redis DB?

The rdb on disk has no direct impact on the running redis instance.
The only limit seems to be the filesystem.
We have a 10 GB compressed rdb which is in-memory about 28 GB in size and also had much bigger ones.
But, you may encounter interrupts if you save such large datasets like ours to disk. (even if you use http://redis.io/commands/bgsave )
When the forked redis process writes the latest diff, redis is unresponsive until it's completely written to disk. This time span depends on different values like writes during bgsave, overall amount of keys, size of hashes and so on.
And, be sure to correctly set up the "save" configuration depending on your needs.

Related

Redis starts slowly when aof file is too large

Redis starts slowly when aof file is too large.The aof file is still large after rewriting.How to deal with it?
We can not close aof and we need start redis up quickly
The aof file is still large after rewriting
That means your data set is really large.
In order to accelerate restart, you can do persistence with RDB file. It should be faster to load RDB file than AOF file.
Also you can try to split your big data set into several small Redis instances, or move your data to Redis Cluster, so that each node has a smaller data set, and the reload work runs faster.

Redis memory usage vs space taken up by back ups

I'm looking at Redis backed up rdb files for a web application. There are 4 such files (for 4 different redis servers working concurrently), sizes being: 13G + 1.6G + 66M + 14M = ~15G
However, these same 4 instances seem to be taking 43.8GB of memory (according to new relic). Why such a large discrepancy between how much space redis data takes up in mem vs disk? Could it be a misconfiguration and can the issue be helped?
I don't think there is any problem.
First of all, the data is stored in compressed format in rdb file so that the size is less than what it is in memory. How small the rdb file is depends on the type of data, but it can be around 20-80% of the memory used by redis
Another reason your memory usage could be more than the actual usage(you can compare the memory from new relic to the one obtained from redis-cli info memory command) is because of memory fragmentation. Whenever redis needs more memory, it will get the memory allocated from the OS, but will not release it easilyly(when the key expires or is deleted). This is not a big issue, as redis will ask for more memory only after using the extra memory that it has. You can also check the memory fragmentation using redis-cli info memory command.

AOF and RDB backups in redis

This question is about Redis persistence.
I'm using redis as a 'fast backend' for a social networking website. It's a single server set up. I've been transferring PostgreSQL responsibilities to Redis steadily. Currently in etc/redis/redis.conf, the appendonly setting is set to appendonly no. Snapshotting settings are save 900 1, save 300 10, save 60 10000. All this is true for production and development both. As per production logs, save 60 10000 gets invoked heavily. Does this mean that practically, I'm getting backups every 60 seconds?
Some literature suggests using AOF and RDB backups together. Thus I was weighing in on turning appendonly on and using appendfsync everysec. For anyone who has had experience of both sides of the coin:
1) Will using appendonly on and appendfsync everysec cause a performance downgrade? Will it hit the CPU? The write load is on the high side.
2) Once I restart the redis server with these new settings, I'll still lose the last 60 secs of my data, correct?
3) Are restart times something to worry about? My dump.rdb file is small; ~90MB.
I'm trying to find out more about redis persistence, and getting my expectations right. Personally, I'm fine with losing 60s of data in the case of a catastrophe, thus whether I should use AOF is also something I'm pondering. Feel free to chime in. Thanks!
Does this mean that practically, I'm getting backups every 60 seconds?
NO. Redis does a background save after 60 seconds, if there're at least 10000 keys have been changed. Otherwise, it doesn't do a background save.
Will using appendonly on and appendfsync everysec cause a performance downgrade? Will it hit the CPU? The write load is on the high side.
It depends on many things, e.g. disk performance (SSD VS HDD), write/read load (QPS), data model, and so on. You need do a benchmark with your own data in your specific environment.
Once I restart the redis server with these new settings, I'll still lose the last 60 secs of my data, correct?
NO. If you turn on both AOF and RDB, when Redis restarts, the AOF file will be used to rebuild the database. Since you config it to appendfsync everysec, you will only lose the last 1 second of data.
Are restart times something to worry about? My dump.rdb file is small; ~90MB.
If you turn on AOF, and when Redis restarts, it replays logs in AOF file to rebuild the database. Normally AOF file is larger then RDB file, and it might be slower than recovering from RDB file. Should you worry about that? Do a benchmark with your own data in your specific environment.
EDIT
IMPORTANT NOTICE
Assume that you already set Redis to use RDB saving, and write lots of data to Redis. After a while, you want to turn on AOF saving. NEVER MODIFY THE CONFIG FILE TO TURN ON AOF AND RESTART REDIS, OTHERWISE YOU'LL LOSE EVERYTHING.
Because, once you set appendonly yes in redis.conf, and restart Redis, it will load data from AOF file, no matter whether the file exists or not. If the file doesn't exist, it creates an empty file, and tries to load data from that empty file. So you'll lose everything.
In fact, you don't have to restart Redis to turn on AOF. Instead, you can use config set command to dynamically turn it on: config set appendonly yes.

How do I back up Redis sever RDB and AOF files for recovery to ensure minimal data loss?

Purpose:
I am trying to make backup copies of both dump.rdb every X time and appendonly.aof every Y time so if the files get corrupted for whatever reason (or even just AOF's appendonly.aof file) I can restore my data from the dump.rdb.backup snapshot and then whatever else has changed since with the most recent copy of appendonly.aof.backup I have.
Situation:
I backup dump.rdb every 5 minutes, and backup appendonly.aof every 1 second.
Problems:
1) Since dump.rdb is being written in the background into a temporary file by a child process - what happens to the key changes that occurs while the child process is creating a new image? I know the AOF file will keep appending regardless of the background write, but does the new dump.rdb file contain the key changes too?
2) If dump.rdb does NOT contain the key changes, is there some way to figure out the exact point where the child process is being forked? That way I can keep track of the point after which the AOF file would have the most up to date information.
Thanks!
Usually, people use either RDB, either AOF as a persistence strategy. Having both of them is quite expensive. Running a dump every 5 min, and copying the aof file every second sounds awfully frequent. Except if the Redis instances only store a tiny amount of data, you will likely kill the I/O subsystem of your box.
Now, regarding your questions:
1) Semantic of the RDB mechanism
The dump mechanism exploits the copy-on-write mechanism implemented by modern OS kernels when the clone/fork processes. When the fork is done, the system just creates the background process by copying the page table. The pages themselves are shared between the two processes. If a write operation is done on a page by the Redis process, the OS will transparently duplicate the page (so than the Redis instance has its own version, and the background process the previous version). The background process has therefore the guarantee that the memory structures are kept constant (and therefore consistent).
The consequence is any write operation started after the fork will not be taken in the dump. The dump is a consistent snapshot taken at fork time.
2) Keeping track of the fork point
You can estimate the fork timestamp by running the INFO persistence command and calculating rdb_last_save_time - rdb_last_bgsave_time_sec, but it is not very accurate (second only).
To be a bit more accurate (millisecond), you can parse the Redis log file to extract the following lines:
[3813] 11 Apr 10:59:23.132 * Background saving started by pid 3816
You need at least the "notice" log level to see these lines.
As far as I know, there is no way to correlate a specific entry in the AOF to the fork operation of the RDB (i.e. it is not possible to be 100% accurate).

Is Redis a memory only store like memcached or does it write the data to the disk

Is Redis memory only store like memcached or does it write the data to the disk? If it does write to the disk, how often is the disk written to?
Redis persistence is described in detail here:
http://redis.io/topics/persistence
By default, redis performs snapshotting:
By default Redis saves snapshots of the dataset on disk, in a binary file called dump.rdb. You can configure Redis to have it save the dataset every N seconds if there are at least M changes in the dataset, or you can manually call the SAVE or BGSAVE commands.
For example, this configuration will make Redis automatically dump the dataset to disk every 60 seconds if at least 1000 keys changed: save 60 1000
Another good reference is this link to the author's blog where he tries to explain how redis persistance works:
http://antirez.com/post/redis-persistence-demystified.html
Redis holds all data in memory. If the size of an application's data is too large for that, then Redis is not an appropriate solution.
However, Redis also offers two ways to make the data persistent:
1) Snapshots at predefined intervals, which may also depend on the number of changes. Any changes between these intervals will be lost at a power failure or crash.
2) Writing a kind of change log at every data change. You can fine-tune how often this is physically written to the disk, but if you chose to always write immediately (which will cost you some performance), then there will be no data loss caused by the in-memory nature of Redis.