How to backup high frequently written Redis snapshot (RDB)? - redis

Docs of redis say:
Redis is very data backup friendly since you can copy RDB files while the database is running: the RDB is never modified once produced, and while it gets produced it uses a temporary name and is renamed into its final destination atomically using rename(2) only when the new snapshot is complete.
But what happens when I'm copying the snapshot to another location for backup purposes while Redis is performing the rename command to replacing the snapshot? Is my backup broken? Or there is something I'm missing? Do I need a "safe" timeframe to copy the RDB snapshot, without Redis writes a new snapshot or is this something I don't have to care about?

Related

Does bgsave save changes after someone key was already written to RDB file?

During the execution of bgsave, suppose there is a key,and the type of this key is "set". After this set was written to RDB file, if values in this set changes, (such as someone execute 'sadd key xxx'). Will bgsave write this change to the RDB file?
Short answer: no, it is a point-in-time snapshot.
Redis takes a snapshot of the dataset in the background, e.g. when calling BGSAVE, by forking - this allows the persistence process to access the data without copying it. Furthermore, copy-on-write semantics cause the data to be copied when it is written to (e.g. a SADD command) by the main process, making the persistence process oblivious of such changes.
More information is at https://redis.io/topics/persistence and http://oldblog.antirez.com/post/redis-persistence-demystified.html.

Redis data recovery from Append Only File?

If we enable the AppendFileOnly in the redis.conf file, every operation which changes the redis database is loggged in that file.
Now, Suppose Redis has used all the memory allocated to it in the "maxmemory" direcive in the redis.conf file.
To store more data., it starts removing data by any one of the behaviours(volatile-lru, allkeys-lru etc.) specified in the redis.conf file.
Suppose some data gets removed from the main memory, But its log will still be there in the AppendOnlyFile(correct me if I am wrong). Can we get that data back using this AppendOnlyFile ?
Simply, I want to ask that if there is any way we can get that removed data back in the main memory ? Like Can we store that data into disk memory and load that data in the main memory when required.
I got this answer from google groups. I'm sharing it.
----->
Eviction of keys is recorded in the AOF as explicit DEL commands, so when
the file is replayed fully consistency is maintained.
The AOF is used only to recover the dataset after a restart, and is not
used by Redis for serving data. If the key still exists in it (with a
subsequent eviction DEL), the only way to "recover" it is by manually
editing the AOF to remove the respective deletion and restarting the
server.
-----> Another answer for this
The AOF, as its name suggests, is a file that's appended to. It's not a database that Redis searches through and deletes the creation record when a deletion record is encountered. In my opinion, that would be too much work for too little gain.
As mentioned previously, a configuration that re-writes the AOF (see the BGREWRITEAOF command as one example) will erase any keys from the AOF that had been deleted, and now you can't recover those keys from the AOF file. The AOF is not the best medium for recovering deleted keys. It's intended as a way to recover the database as it existed before a crash - without any deleted keys.
If you want to be able to recover data after it was deleted, you need a different kind of backup. More likely a snapshot (RDB) file that's been archived with the date/time that it was saved. If you learn that you need to recover data, select the snapshot file from a time you know the key existed, load it into a separate Redis instance, and retrieve the key with RESTORE or GET or similar commands. As has been mentioned, it's possible to parse the RDB or AOF file contents to extract data from them without loading the file into a running Redis instance. The downside to this approach is that such tools are separate from the Redis code and may not always understand changes to the data format of the files the way the Redis server does. You decide which approach will work with the level of speed and reliability you want.
But its log will still be there in the AppendOnlyFile(correct me if I am wrong). Can we get that data back using this AppendOnlyFile ?
NO, you CANNOT get the data back. When Redis evicts a key, it also appends a delete command to AOF. After rewriting the AOF, anything about the evicted key will be removed.
if there is any way we can get that removed data back in the main memory ? Like Can we store that data into disk memory and load that data in the main memory when required.
NO, you CANNOT do that. You have to take another durable data store (e.g. Mysql, Mongodb) for saving data to disk, and use Redis as cache.

Are Redis' .rdb files' operations "blocking"? Can I copy the .rdb in the middle of a SAVE operation, for instance?

I run a Redis databse inhouse here and want to make a "snapshot of the snapshot".
What the hell? Yes. I want to move the .rdb file once a day into a S3's bucket. Also, it should be a scheduled operation (probably using a cronTab function).
So here comes my question in fact: will I face trouble if the cronTab job starts running in the middle of a SAVE operation (from redis to .rdb)? There is no problem of losing some data, I just want it to work without any obstruction.
Thanks!
When Redis writes out the RDB to disk, it writes to a temporary file. When the save process is done writing it, it then renames/moves it to the "dump.rdb" file (or whatever you've changed it to if you have done so). This is an atomic action. As such you should be fine with the method you propose.
If you want more control over it you could use a tool such as https://github.com/therealbill/redis-buagent which connects as a slave and generates it's own RDB, storing it in memory then into S3 (or wherever else you want to store such as Cloud Files or a local file) or by using redis-cli --rdb to generate a "local" RDB file for you to copy to S3.

RethinkDB backup data

I read article about backing up data, but some issues is not clear for me:
What happens with data, that will be changed after backup process
was started?
Does backup operation work only on current machine? Or will it collect
data from all shards in cluster? If only on current, should I start
backup process on all servers?
Is it slow operation so I should forbid all operation to db while
backup in progress?
If a row changes while the backup is going on, the new value may or may not be in the backup. This is generally OK because RethinkDB only offers single-row atomicity anyway, but if you have a workload where that isn't OK then your other options are to use a filesystem that lets you snapshot the data on disk, or to add a new server to your cluster and set it as a replica of the table you want to back up.
It collects data from all shards.
It can take a very long time.

Some confusion on backup whole data in redis

Document say:
Whenever Redis needs to dump the dataset to disk, this is what happens:
Redis forks. We now have a child and a parent process.
The child starts to write the dataset to a temporary RDB file.
When the child is done writing the new RDB file, it replaces the old one.
Because I want to backup whole data, I type shutdown command in redis-cli expecting it shutdown and save all data to dump.rdb.After it shutdown completely, I go to db location and see what happen that dimpr.rdb is 423.9MB and temp-21331.rdb is 180.5MB.Temp file is still exist and smaller than dimpr.rdb.Apparently, redis do not use temp file replaces dump.rdb.
I am wondering whether dump.rdb is whole db file at this time?And is it safe to delete the temp file.
What does the file mod timestamp of temp-21331.rdb say? It sounds like a leftover from a crash.
You can delete it.
The documentation is definitely correct. When rewriting, all info is written to a temp file (compressed), and when complete, the dump.rdb file is replaced by this temp-file. There should however be no leftovers during normal usage. What is important: you always need enough free disk space for this operation to succeed. A safe guideline is: 140% times the redis memory limit (it would be 200% if no compression was applied).
Hope this helps, TW