Some confusion on backup whole data in redis - redis

Document say:
Whenever Redis needs to dump the dataset to disk, this is what happens:
Redis forks. We now have a child and a parent process.
The child starts to write the dataset to a temporary RDB file.
When the child is done writing the new RDB file, it replaces the old one.
Because I want to backup whole data, I type shutdown command in redis-cli expecting it shutdown and save all data to dump.rdb.After it shutdown completely, I go to db location and see what happen that dimpr.rdb is 423.9MB and temp-21331.rdb is 180.5MB.Temp file is still exist and smaller than dimpr.rdb.Apparently, redis do not use temp file replaces dump.rdb.
I am wondering whether dump.rdb is whole db file at this time?And is it safe to delete the temp file.

What does the file mod timestamp of temp-21331.rdb say? It sounds like a leftover from a crash.
You can delete it.
The documentation is definitely correct. When rewriting, all info is written to a temp file (compressed), and when complete, the dump.rdb file is replaced by this temp-file. There should however be no leftovers during normal usage. What is important: you always need enough free disk space for this operation to succeed. A safe guideline is: 140% times the redis memory limit (it would be 200% if no compression was applied).
Hope this helps, TW

Related

Does bgsave save changes after someone key was already written to RDB file?

During the execution of bgsave, suppose there is a key,and the type of this key is "set". After this set was written to RDB file, if values in this set changes, (such as someone execute 'sadd key xxx'). Will bgsave write this change to the RDB file?
Short answer: no, it is a point-in-time snapshot.
Redis takes a snapshot of the dataset in the background, e.g. when calling BGSAVE, by forking - this allows the persistence process to access the data without copying it. Furthermore, copy-on-write semantics cause the data to be copied when it is written to (e.g. a SADD command) by the main process, making the persistence process oblivious of such changes.
More information is at https://redis.io/topics/persistence and http://oldblog.antirez.com/post/redis-persistence-demystified.html.

Redis data recovery from Append Only File?

If we enable the AppendFileOnly in the redis.conf file, every operation which changes the redis database is loggged in that file.
Now, Suppose Redis has used all the memory allocated to it in the "maxmemory" direcive in the redis.conf file.
To store more data., it starts removing data by any one of the behaviours(volatile-lru, allkeys-lru etc.) specified in the redis.conf file.
Suppose some data gets removed from the main memory, But its log will still be there in the AppendOnlyFile(correct me if I am wrong). Can we get that data back using this AppendOnlyFile ?
Simply, I want to ask that if there is any way we can get that removed data back in the main memory ? Like Can we store that data into disk memory and load that data in the main memory when required.
I got this answer from google groups. I'm sharing it.
----->
Eviction of keys is recorded in the AOF as explicit DEL commands, so when
the file is replayed fully consistency is maintained.
The AOF is used only to recover the dataset after a restart, and is not
used by Redis for serving data. If the key still exists in it (with a
subsequent eviction DEL), the only way to "recover" it is by manually
editing the AOF to remove the respective deletion and restarting the
server.
-----> Another answer for this
The AOF, as its name suggests, is a file that's appended to. It's not a database that Redis searches through and deletes the creation record when a deletion record is encountered. In my opinion, that would be too much work for too little gain.
As mentioned previously, a configuration that re-writes the AOF (see the BGREWRITEAOF command as one example) will erase any keys from the AOF that had been deleted, and now you can't recover those keys from the AOF file. The AOF is not the best medium for recovering deleted keys. It's intended as a way to recover the database as it existed before a crash - without any deleted keys.
If you want to be able to recover data after it was deleted, you need a different kind of backup. More likely a snapshot (RDB) file that's been archived with the date/time that it was saved. If you learn that you need to recover data, select the snapshot file from a time you know the key existed, load it into a separate Redis instance, and retrieve the key with RESTORE or GET or similar commands. As has been mentioned, it's possible to parse the RDB or AOF file contents to extract data from them without loading the file into a running Redis instance. The downside to this approach is that such tools are separate from the Redis code and may not always understand changes to the data format of the files the way the Redis server does. You decide which approach will work with the level of speed and reliability you want.
But its log will still be there in the AppendOnlyFile(correct me if I am wrong). Can we get that data back using this AppendOnlyFile ?
NO, you CANNOT get the data back. When Redis evicts a key, it also appends a delete command to AOF. After rewriting the AOF, anything about the evicted key will be removed.
if there is any way we can get that removed data back in the main memory ? Like Can we store that data into disk memory and load that data in the main memory when required.
NO, you CANNOT do that. You have to take another durable data store (e.g. Mysql, Mongodb) for saving data to disk, and use Redis as cache.

Are Redis' .rdb files' operations "blocking"? Can I copy the .rdb in the middle of a SAVE operation, for instance?

I run a Redis databse inhouse here and want to make a "snapshot of the snapshot".
What the hell? Yes. I want to move the .rdb file once a day into a S3's bucket. Also, it should be a scheduled operation (probably using a cronTab function).
So here comes my question in fact: will I face trouble if the cronTab job starts running in the middle of a SAVE operation (from redis to .rdb)? There is no problem of losing some data, I just want it to work without any obstruction.
Thanks!
When Redis writes out the RDB to disk, it writes to a temporary file. When the save process is done writing it, it then renames/moves it to the "dump.rdb" file (or whatever you've changed it to if you have done so). This is an atomic action. As such you should be fine with the method you propose.
If you want more control over it you could use a tool such as https://github.com/therealbill/redis-buagent which connects as a slave and generates it's own RDB, storing it in memory then into S3 (or wherever else you want to store such as Cloud Files or a local file) or by using redis-cli --rdb to generate a "local" RDB file for you to copy to S3.

Maintaining logs in redis cache using java

Requirement - That our application processes files containing records and we have to maintain the log for the records in every file. The log file could easily be 100 MB at times in size.
Solution - Since database operation would be very heavy, so we wanted to go for in-memory cache. Write the logs for a particular file into a redis key (key might be the unique file name itself). Later when the user wants to see the log file, application should be able to read the contents from the cache using the unique key file name and write its content into a file which the user can see/download.
Question - Is this a good idea that, we keep appending the logs for a particular file to the same key and later when we have to write to the file, we read from the key and write the contents to the file? Basically the value of the redis key would always be string and its size might run into 100 MBs in size. Will there be any problems because of this?
You can achieve with redis easily, but don't forget that redis is in-memory store (make sure you don't run out of RAM). Ask yourself why you want to go for in-memory store over normal disk operations while dealing with files. If you feel like more frequent read operations happens and accessing time is crucial go ahead with redis.
Regarding size - 100MB is not a problem, in redis string can hold upto 512MB & List, Set, Hashes can hold >4billion records
I prefer MongoDB(which is a disk-based document store) for this kind of operations over redis.
Consider looking at this link to know when redis is awesome.

How do I back up Redis sever RDB and AOF files for recovery to ensure minimal data loss?

Purpose:
I am trying to make backup copies of both dump.rdb every X time and appendonly.aof every Y time so if the files get corrupted for whatever reason (or even just AOF's appendonly.aof file) I can restore my data from the dump.rdb.backup snapshot and then whatever else has changed since with the most recent copy of appendonly.aof.backup I have.
Situation:
I backup dump.rdb every 5 minutes, and backup appendonly.aof every 1 second.
Problems:
1) Since dump.rdb is being written in the background into a temporary file by a child process - what happens to the key changes that occurs while the child process is creating a new image? I know the AOF file will keep appending regardless of the background write, but does the new dump.rdb file contain the key changes too?
2) If dump.rdb does NOT contain the key changes, is there some way to figure out the exact point where the child process is being forked? That way I can keep track of the point after which the AOF file would have the most up to date information.
Thanks!
Usually, people use either RDB, either AOF as a persistence strategy. Having both of them is quite expensive. Running a dump every 5 min, and copying the aof file every second sounds awfully frequent. Except if the Redis instances only store a tiny amount of data, you will likely kill the I/O subsystem of your box.
Now, regarding your questions:
1) Semantic of the RDB mechanism
The dump mechanism exploits the copy-on-write mechanism implemented by modern OS kernels when the clone/fork processes. When the fork is done, the system just creates the background process by copying the page table. The pages themselves are shared between the two processes. If a write operation is done on a page by the Redis process, the OS will transparently duplicate the page (so than the Redis instance has its own version, and the background process the previous version). The background process has therefore the guarantee that the memory structures are kept constant (and therefore consistent).
The consequence is any write operation started after the fork will not be taken in the dump. The dump is a consistent snapshot taken at fork time.
2) Keeping track of the fork point
You can estimate the fork timestamp by running the INFO persistence command and calculating rdb_last_save_time - rdb_last_bgsave_time_sec, but it is not very accurate (second only).
To be a bit more accurate (millisecond), you can parse the Redis log file to extract the following lines:
[3813] 11 Apr 10:59:23.132 * Background saving started by pid 3816
You need at least the "notice" log level to see these lines.
As far as I know, there is no way to correlate a specific entry in the AOF to the fork operation of the RDB (i.e. it is not possible to be 100% accurate).