Data migrations with Persistence(RDB, AOF)

Data migrations with Persistence(RDB, AOF) - redis

I had read Redis doc to find answer of title. But I can't find it.
I want to know how to operate of RDB and AOF when migrating of cluster nodes.
Assume there are 2 nodes in same cluster.(A node, B node)
* RDB, AOF option are on.
if A's some data is migrated to B, B operates AOF and Snapshotting as soon as getting the data?
If it is not, admin should explicitly send the command(appendonly, BGSAVE) to save changed dataset?
Thanks

The persistence of each node will function as expected regardless of the origin of the data (user or migration). If RDB snapshots are configured, node B will perform a dump with the new data once the thresholds are reached. AOF will continue operating continuously on node B and any new data will be included in it.
Calling BGSAVE is always an option when you want to manually trigger a dump.

Related

When a replica connects to Redis master, how does partial synchronization work?

Let's say master and replica are in sync, and after sometime replica goes down and looses connectivity with master.
When replica comes up again, how does it know what partial data it needs to request for?
And also, if by some logic replica is able to ask the partial data it needs - how does master node respond by giving that partial data? My understanding is master node sends the RDB file to replica, how can it send partial RDB file?

https://redis.io/docs/management/replication/#how-redis-replication-works
Sending an RDB image is only used for a full sync.
For partial sync, the replicas keep track of their position in the replication log (which is initialized when they do a full sync, and then incremented every time they replicate a command). If the replica loses its connection and has to resync, it tells the master what its last valid sync offset was, and the master simply has to replay the portion of the replication log after that offset. For that purpose, it buffers the most recent log entries in memory. If the replica is too far behind (the log has accumulated more than repl-backlog-size bytes of transactions since the replica disconnected), then a partial sync isn't possible and the master forces it to do a full sync instead.

Replica maintains the offset till which it has received the data from the master in it’s RDB file.
So when Replica loses connection and comes up later it knows from which offset to ask the data for.
During the period the master loses the connection with the slave, a buffer on the Redis master, keeps track of all recent write commands: this buffer is called replication backlog.
Redis uses this backlog buffer to decide whether to start a full or partial data resynchronization.
A replica always begins with asking for partial resync (because it is more efficient than full resync) using its last offset. Master checks if the offset from which data is requested from replica, is retrievable from its backlog buffer or not.
If the offset is in the range of the backlog, all the write commands during disconnection could be obtained from it, which indicates that a partial resynchronization can be done and the master approves and begins the partial resync.
On the other hand, if the connection was lost for a long time and the buffer became full on the master side, partial resync is not possible and master rejects it and begins the complete resync.
The buffer size is called: repl-backlog-size and its default size is 1MB
For a system with High Wirtes: 1MB of repl-backlog-size will fill the buffer very quickly and will result in full resync even if replica loses connection for few seconds.
Another parameter: repl-backlog-ttl whose default value is 1hour determines how long the master Redis instance will wait to release the memory of the backlog if all the replicas get disconnected. So let's say your replica got disconnected by more than 1 hour and buffer is filled with only 100KB of data, it will result is complete re-sync as master would discard its buffer as it cannot hold it beyond 1 hour.

Redis data recovery from Append Only File?

If we enable the AppendFileOnly in the redis.conf file, every operation which changes the redis database is loggged in that file.
Now, Suppose Redis has used all the memory allocated to it in the "maxmemory" direcive in the redis.conf file.
To store more data., it starts removing data by any one of the behaviours(volatile-lru, allkeys-lru etc.) specified in the redis.conf file.
Suppose some data gets removed from the main memory, But its log will still be there in the AppendOnlyFile(correct me if I am wrong). Can we get that data back using this AppendOnlyFile ?
Simply, I want to ask that if there is any way we can get that removed data back in the main memory ? Like Can we store that data into disk memory and load that data in the main memory when required.

I got this answer from google groups. I'm sharing it.
----->
Eviction of keys is recorded in the AOF as explicit DEL commands, so when
the file is replayed fully consistency is maintained.
The AOF is used only to recover the dataset after a restart, and is not
used by Redis for serving data. If the key still exists in it (with a
subsequent eviction DEL), the only way to "recover" it is by manually
editing the AOF to remove the respective deletion and restarting the
server.
-----> Another answer for this
The AOF, as its name suggests, is a file that's appended to. It's not a database that Redis searches through and deletes the creation record when a deletion record is encountered. In my opinion, that would be too much work for too little gain.
As mentioned previously, a configuration that re-writes the AOF (see the BGREWRITEAOF command as one example) will erase any keys from the AOF that had been deleted, and now you can't recover those keys from the AOF file. The AOF is not the best medium for recovering deleted keys. It's intended as a way to recover the database as it existed before a crash - without any deleted keys.
If you want to be able to recover data after it was deleted, you need a different kind of backup. More likely a snapshot (RDB) file that's been archived with the date/time that it was saved. If you learn that you need to recover data, select the snapshot file from a time you know the key existed, load it into a separate Redis instance, and retrieve the key with RESTORE or GET or similar commands. As has been mentioned, it's possible to parse the RDB or AOF file contents to extract data from them without loading the file into a running Redis instance. The downside to this approach is that such tools are separate from the Redis code and may not always understand changes to the data format of the files the way the Redis server does. You decide which approach will work with the level of speed and reliability you want.

But its log will still be there in the AppendOnlyFile(correct me if I am wrong). Can we get that data back using this AppendOnlyFile ?
NO, you CANNOT get the data back. When Redis evicts a key, it also appends a delete command to AOF. After rewriting the AOF, anything about the evicted key will be removed.
if there is any way we can get that removed data back in the main memory ? Like Can we store that data into disk memory and load that data in the main memory when required.
NO, you CANNOT do that. You have to take another durable data store (e.g. Mysql, Mongodb) for saving data to disk, and use Redis as cache.

What if Redis keys are never deleted programmatically?

What will happen to my redis data if no expiry is set and no DEL command is used.
Will it be removed after some default time ?
One more,
How redis stores data, is it in any file format ? because I can access data even after restarting the computer. So which files are created by redis and where ?
Thanks.

Redis is a in-memory data store meaning all your data is kept in RAM (ie. volatile). So theoritically your data will live as long as you don't turn the power off.
However, it also provides persistence in two modes:
RDB mode which takes snapshots of your dataset and saves them to the disk in a file called dump.drb. This is the default mode.
AOF mode which records every write operation executed by the server in an Append-Only file and then replays it thus reconstructing the original data.
Redis persistence is very good explained here and here by the creator of Redis himself.

How do I back up Redis sever RDB and AOF files for recovery to ensure minimal data loss?

Purpose:
I am trying to make backup copies of both dump.rdb every X time and appendonly.aof every Y time so if the files get corrupted for whatever reason (or even just AOF's appendonly.aof file) I can restore my data from the dump.rdb.backup snapshot and then whatever else has changed since with the most recent copy of appendonly.aof.backup I have.
Situation:
I backup dump.rdb every 5 minutes, and backup appendonly.aof every 1 second.
Problems:
1) Since dump.rdb is being written in the background into a temporary file by a child process - what happens to the key changes that occurs while the child process is creating a new image? I know the AOF file will keep appending regardless of the background write, but does the new dump.rdb file contain the key changes too?
2) If dump.rdb does NOT contain the key changes, is there some way to figure out the exact point where the child process is being forked? That way I can keep track of the point after which the AOF file would have the most up to date information.
Thanks!

Usually, people use either RDB, either AOF as a persistence strategy. Having both of them is quite expensive. Running a dump every 5 min, and copying the aof file every second sounds awfully frequent. Except if the Redis instances only store a tiny amount of data, you will likely kill the I/O subsystem of your box.
Now, regarding your questions:
1) Semantic of the RDB mechanism
The dump mechanism exploits the copy-on-write mechanism implemented by modern OS kernels when the clone/fork processes. When the fork is done, the system just creates the background process by copying the page table. The pages themselves are shared between the two processes. If a write operation is done on a page by the Redis process, the OS will transparently duplicate the page (so than the Redis instance has its own version, and the background process the previous version). The background process has therefore the guarantee that the memory structures are kept constant (and therefore consistent).
The consequence is any write operation started after the fork will not be taken in the dump. The dump is a consistent snapshot taken at fork time.
2) Keeping track of the fork point
You can estimate the fork timestamp by running the INFO persistence command and calculating rdb_last_save_time - rdb_last_bgsave_time_sec, but it is not very accurate (second only).
To be a bit more accurate (millisecond), you can parse the Redis log file to extract the following lines:
[3813] 11 Apr 10:59:23.132 * Background saving started by pid 3816
You need at least the "notice" log level to see these lines.
As far as I know, there is no way to correlate a specific entry in the AOF to the fork operation of the RDB (i.e. it is not possible to be 100% accurate).

Is Redis a memory only store like memcached or does it write the data to the disk

Is Redis memory only store like memcached or does it write the data to the disk? If it does write to the disk, how often is the disk written to?

Redis persistence is described in detail here:
http://redis.io/topics/persistence
By default, redis performs snapshotting:
By default Redis saves snapshots of the dataset on disk, in a binary file called dump.rdb. You can configure Redis to have it save the dataset every N seconds if there are at least M changes in the dataset, or you can manually call the SAVE or BGSAVE commands.
For example, this configuration will make Redis automatically dump the dataset to disk every 60 seconds if at least 1000 keys changed: save 60 1000
Another good reference is this link to the author's blog where he tries to explain how redis persistance works:
http://antirez.com/post/redis-persistence-demystified.html

Redis holds all data in memory. If the size of an application's data is too large for that, then Redis is not an appropriate solution.
However, Redis also offers two ways to make the data persistent:
1) Snapshots at predefined intervals, which may also depend on the number of changes. Any changes between these intervals will be lost at a power failure or crash.
2) Writing a kind of change log at every data change. You can fine-tune how often this is physically written to the disk, but if you chose to always write immediately (which will cost you some performance), then there will be no data loss caused by the in-memory nature of Redis.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas