When a replica connects to Redis master, how does partial synchronization work? - redis

Let's say master and replica are in sync, and after sometime replica goes down and looses connectivity with master.
When replica comes up again, how does it know what partial data it needs to request for?
And also, if by some logic replica is able to ask the partial data it needs - how does master node respond by giving that partial data? My understanding is master node sends the RDB file to replica, how can it send partial RDB file?

https://redis.io/docs/management/replication/#how-redis-replication-works
Sending an RDB image is only used for a full sync.
For partial sync, the replicas keep track of their position in the replication log (which is initialized when they do a full sync, and then incremented every time they replicate a command). If the replica loses its connection and has to resync, it tells the master what its last valid sync offset was, and the master simply has to replay the portion of the replication log after that offset. For that purpose, it buffers the most recent log entries in memory. If the replica is too far behind (the log has accumulated more than repl-backlog-size bytes of transactions since the replica disconnected), then a partial sync isn't possible and the master forces it to do a full sync instead.

Replica maintains the offset till which it has received the data from the master in it’s RDB file.
So when Replica loses connection and comes up later it knows from which offset to ask the data for.
During the period the master loses the connection with the slave, a buffer on the Redis master, keeps track of all recent write commands: this buffer is called replication backlog.
Redis uses this backlog buffer to decide whether to start a full or partial data resynchronization.
A replica always begins with asking for partial resync (because it is more efficient than full resync) using its last offset. Master checks if the offset from which data is requested from replica, is retrievable from its backlog buffer or not.
If the offset is in the range of the backlog, all the write commands during disconnection could be obtained from it, which indicates that a partial resynchronization can be done and the master approves and begins the partial resync.
On the other hand, if the connection was lost for a long time and the buffer became full on the master side, partial resync is not possible and master rejects it and begins the complete resync.
The buffer size is called: repl-backlog-size and its default size is 1MB
For a system with High Wirtes: 1MB of repl-backlog-size will fill the buffer very quickly and will result in full resync even if replica loses connection for few seconds.
Another parameter: repl-backlog-ttl whose default value is 1hour determines how long the master Redis instance will wait to release the memory of the backlog if all the replicas get disconnected. So let's say your replica got disconnected by more than 1 hour and buffer is filled with only 100KB of data, it will result is complete re-sync as master would discard its buffer as it cannot hold it beyond 1 hour.

Related

Could you please explain Replication feature of Redis

I am very new in REDIS cache implementation.
Could you please let me know what is the replication factor means?
How it works or What is the impact?
Thanks.
At the base of Redis replication (excluding the high availability features provided as an additional layer by Redis Cluster or Redis Sentinel) there is a very simple to use and configure leader follower (master-slave) replication: it allows replica Redis instances to be exact copies of master instances. The replica will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.
This system works using three main mechanisms:
When a master and a replica instances are well-connected, the master keeps the replica updated by sending a stream of commands to the replica, in order to replicate the effects on the dataset happening in the master side due to: client writes, keys expired or evicted, any other action changing the master dataset.
When the link between the master and the replica breaks, for network issues or because a timeout is sensed in the master or the replica, the replica reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
When a partial resynchronization is not possible, the replica will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the replica, and then continue sending the stream of commands as the dataset changes.
Redis uses by default asynchronous replication, which being low latency and high performance, is the natural replication mode for the vast majority of Redis use cases.
Synchronous replication of certain data can be requested by the clients using the WAIT command. However WAIT is only able to ensure that there are the specified number of acknowledged copies in the other Redis instances, it does not turn a set of Redis instances into a CP system with strong consistency: acknowledged writes can still be lost during a failover, depending on the exact configuration of the Redis persistence. However with WAIT the probability of losing a write after a failure event is greatly reduced to certain hard to trigger failure modes.

Behaviour of redis client-output-buffer-limit during resynchronization

I'm assuming that during replica resynchronisation (full or partial), the master will attempt to send data as fast as possible to the replica. Wouldn't this mean the replica output buffer on the master would rapidly fill up since the speed the master can write is likely to be faster than the throughput of the network? If I have client-output-buffer-limit set for replicas, wouldn't the master end up closing the connection before the resynchronisation can complete?
Yes, Redis Master will close the connection and the synchronization will be started from beginning again. But, please find some details below:
Do you need to touch this configuration parameter and what is the purpose/benefit/cost of it?
There is a zero (almost) chance it will happen with default configuration and pretty much moderate modern hardware.
"By default normal clients are not limited because they don't receive data
without asking (in a push way), but just after a request, so only asynchronous clients may create a scenario where data is requested faster than it can read." - the chunk from documentation .
Even if that happens, the replication will be started from beginning but it may lead up to infinite loop when slaves will continuously ask for synchronization over and over. Redis Master will need to fork whole memory snapshot (perform BGSAVE) and use up to 3 times of RAM from initial snapshot size each time during synchronization. That will be causing higher CPU utilization, memory spikes network utilization (if any) and IO.
General recommendations to avoid production issues tweaking this configuration parameter:
Don't decrease this buffer and before increasing the size of the buffer make sure you have enough memory on your box.
Please consider total amount of RAM as snapshot memory size (doubled for copy-on-write BGSAVE process) plus the size of any other buffers configured plus some extra capacity.
Please find more details here

What happens to data before new master is elected in Redis?

In redis master-slave architecture, when a master fails a slave is promoted to master. As only master can perform write operations, What happens to data in the window period when slave is promoted to master. Does my system remain unresponsive?
Define "data":)
Client connections to the master will be closed upon its failure, so your system will be notified of that. Any data that was not written to the master and the replicas before the failure will therefore still reside in your application/system.
Once your system tries using a replica it will be able to read the data in it up to the point it was synchronized before failure. Once the replica is promoted to masterhood, your system will be able to continue writing data.
Note that Redis' synchronization is asynchronous. That means that slaves may lag behind the master and therefore lose some updates in case of failure. Refer to the WAIT command for more information about ensure the consistency.

how does "Disk-backed" replication work in redis cluster

the redis.conf says:
1) Disk-backed: The Redis master creates a new process that writes the RDB
file on disk. Later the file is transferred by the parent
process to the slaves incrementally
I just dont know what does "transferred by the parent process to the slaves" mean?
thank you
It is simple. First read the RDB file into a buffer, and use socket.write to send this to salve's port which is listenning this.
The implemention is more complex than what I said. But this is what redis do. You can refer the replication.c in redis/src for more details.
EDITED:
Yes, the disk-less mechanism just use the child process directly sends the RDB over the wire to slaves, without using the disk as intermediate storage.
Actually, if you use disk to save the RDB and redis master can serve many slaves at the same time without queuing. Once the disk-less replication serve on slave, and if another slave comes and want do a full sync, it need to be queued to wait for the first slave to finish. So there are another settings repl-diskless-sync-delay to wait more slave to do this parallel.
And these two method only occur after something wrong happens. In the normal case, the redis master and salve through a well connected wire to replicate the redis command the slave to keep the same between the master and slave. And if the wire is break or the slave fall down, then need do a partial resync action to obtain the part slave missed. If the psync is not possible to achieve, it will try do full resync. The full resync is what we talked about.
This is how a full synchronization works in more details:
The master starts a background saving process in order to produce an RDB file. At the same time it starts to buffer all new write commands received from the clients. When the background saving is complete, the master transfers the database file to the slave, which saves it on disk, and then loads it into memory. The master will then send all buffered commands to the slave. This is done as a stream of commands and is in the same format of the Redis protocol itself.
And the disk-less replication is just a new feature which supports the full-resync in that case to deal with the slow disk stress. More about it refer to https://redis.io/topics/replication. such as how do psync and why psync will fail, you can find answer from this article.

Cassandra Commit and Recovery on a Single Node

I am a newbie to Cassandra - I have been searching for information related to commits and crash recovery in Cassandra on a single node. And, hoping someone can clarify the details.
I am testing Cassandra - so, set it up on a single node. I am using stresstool on datastax to insert millions of rows. What happens if there is an electrical failure or system shutdown? Will all the data that was in Cassandra's memory get written to disk upon Cassandra restart (I guess commitlog acts as intermediary)? How long is this process?
Thanks!
Cassandra's commit log gives Cassandra durable writes. When you write to Cassandra, the write is appended to the commit log before the write is acknowledged to the client. This means every write that the client receives a successful response for is guaranteed to be written to the commit log. The write is also made to the current memtable, which will eventually be written to disk as an SSTable when large enough. This could be a long time after the write is made.
However, the commit log is not immediately synced to disk for performance reasons. The default is periodic mode (set by the commitlog_sync param in cassandra.yaml) with a period of 10 seconds (set by commitlog_sync_period_in_ms in cassandra.yaml). This means the commit log is synced to disk every 10 seconds. With this behaviour you could lose up to 10 seconds of writes if the server loses power. If you had multiple nodes in your cluster and used a replication factor of greater than one you would need to lose power to multiple nodes within 10 seconds to lose any data.
If this risk window isn't acceptable, you can use batch mode for the commit log. This mode won't acknowledge writes to the client until the commit log has been synced to disk. The time window is set by commitlog_sync_batch_window_in_ms, default is 50 ms. This will significantly increase your write latency and probably decrease the throughput as well so only use this if the cost of losing a few acknowledged writes is high. It is especially important to store your commit log on a separate drive when using this mode.
In the event that your server loses power, on startup Cassandra replays the commit log to rebuild its memtable. This process will take seconds (possibly minutes) on very write heavy servers.
If you want to ensure that the data in the memtables is written to disk you can run 'nodetool flush' (this operates per node). This will create a new SSTable and delete the commit logs referring to data in the memtables flushed.
You are asking something like
What happen if there is a network failure at the time data is being loaded in Oracle using SQL*Loader ?
Or what happens Sqoop stops processing due to some condition while transferring data?
Simply whatever data is being transferred before electrical failure or system shutdown, it will remain the same.
Coming to second question, when ever the memtable runs out of space, i.e when the number of keys exceed certain limit (128 is default) or when it reaches the time duration (cluster clock), it is being stored into sstable, immutable space.