I am new to Redis. I read their documentation on Sentinel and Replication in which they talk about how the replicas try to remain in sync with the master as much as possible, but it is still possible that if the master fails after a successful write, the replica might not receive that write. If Sentinel then marks this replica as the new master, it is possible that the replica serves stale data.
If I cannot afford to lose consistency and prefer it over availability, how can I turn off replication so that when Sentinel marks a new replica as master, all the first requests would be cache misses and my cache can slowly warm up instead of returning potentially stale data?
Also, is that a good idea? Are there other good alternatives?
I cannot afford to lose consistency and prefer it over availability
It's not clear that redis automated failover is appropriate for your application. It looks like each client would need to carefully keep track of server availability.
Suppose you have a few clients, a master, M1, and three replicas, R2, R3, R4. Client C5 writes a new bank account balance to M1, which immediately permanently fails, and R2 is promoted to become master M2. Master did not obtain an acknowledge from a replica before replying to client. No paxos-like consensus protocol happens between servers prior to the reply being sent to C5.
C5 could remember counters / timestamps embedded in each write request, forget the write payload, and detect stale reads. But client C6 could not, unless you supply such data quickly and reliably outside the protocol. Nathan Fritz observes that your app could issue a write and then a PUBLISH event, and monitor multiple replicas with a LISTEN for that event, delaying its report of success to end user. Consider incorporating derecho into your app if the solid promises of virtual synchrony are necessary. Production releases of redis are targeted at a different part of the problem space than your primary interest.
Related
I need to load static data one time in redis in the master node and only when the synchronization is finished for all slaves I am going to be able to read. This is because we are going to have a lot reading and a few writing, and the data is not going to change for a long time.
I read from oficial documentation https://docs.redis.com/latest/rs/concepts/data-access/consistency-durability/, https://docs.redis.com/latest/rs/concepts/data-access/consistency-durability/ and https://redis.io/topics/cluster-tutorial in Redis Cluster consistency guarantees.
I read also Can the WAIT command provide strong consistency in Redis? but without to get a conclusion.
If I use synchronous replication and wait command to check if the replication was successful, do I have some guarantees about consistency ?
By default, a Redis Cluster is not able to guarantee strong consistency. It means that under certain conditions it is possible that Redis Cluster will lose writes that were acknowledged by the system to the client.
The reason why Redis Cluster can lose writes is because it uses asynchronous replication, however, you can improve consistency by forcing the database to flush data to disk before replying to the client, but this usually results in prohibitively low performance. That would be the equivalent of synchronous replication in the case of Redis Cluster. Basically, there is a trade-off to be made between performance and consistency, if you are fine with that!
Redis Cluster has support for synchronous writes when absolutely needed, implemented via the WAIT command. This makes losing writes a lot less likely. However, note that Redis Cluster does not implement strong consistency even when synchronous replication is used: it is always possible, under more complex failure scenarios, that a replica that was not able to receive the write will be elected as master.
There is another notable scenario where Redis Cluster will lose writes, that happens during a network partition where a client is isolated with a minority of instances including at least a master.
For example, imagine a 6 nodes cluster composed of A, B, C, A1, B1, C1, with 3 masters and 3 replicas. There is also a client, let's call it Z1.
After a partition occurs, it is possible that in one side of the partition we have A, C, A1, B1, C1, and in the other side we have B and Z1.
Z1 is still able to write to B, which will accept its writes. If the partition heals in a very short time, the cluster will continue normally. However, if the partition lasts enough time for B1 to be promoted to master on the majority side of the partition, the writes that Z1 has sent to B in the meantime will be lost.
Note that there is a maximum window to the amount of writes Z1 will be able to send to B: if enough time has elapsed for the majority side of the partition to elect a replica as master, every master node in the minority side will have stopped accepting writes.
This amount of time is a very important configuration directive of Redis Cluster, and is called the node timeout.
After node timeout has elapsed, a master node is considered to be failing, and can be replaced by one of its replicas. Similarly, after node timeout has elapsed without a master node to be able to sense the majority of the other master nodes, it enters an error state and stops accepting writes.
I am very new in REDIS cache implementation.
Could you please let me know what is the replication factor means?
How it works or What is the impact?
Thanks.
At the base of Redis replication (excluding the high availability features provided as an additional layer by Redis Cluster or Redis Sentinel) there is a very simple to use and configure leader follower (master-slave) replication: it allows replica Redis instances to be exact copies of master instances. The replica will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.
This system works using three main mechanisms:
When a master and a replica instances are well-connected, the master keeps the replica updated by sending a stream of commands to the replica, in order to replicate the effects on the dataset happening in the master side due to: client writes, keys expired or evicted, any other action changing the master dataset.
When the link between the master and the replica breaks, for network issues or because a timeout is sensed in the master or the replica, the replica reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
When a partial resynchronization is not possible, the replica will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the replica, and then continue sending the stream of commands as the dataset changes.
Redis uses by default asynchronous replication, which being low latency and high performance, is the natural replication mode for the vast majority of Redis use cases.
Synchronous replication of certain data can be requested by the clients using the WAIT command. However WAIT is only able to ensure that there are the specified number of acknowledged copies in the other Redis instances, it does not turn a set of Redis instances into a CP system with strong consistency: acknowledged writes can still be lost during a failover, depending on the exact configuration of the Redis persistence. However with WAIT the probability of losing a write after a failure event is greatly reduced to certain hard to trigger failure modes.
In redis master-slave architecture, when a master fails a slave is promoted to master. As only master can perform write operations, What happens to data in the window period when slave is promoted to master. Does my system remain unresponsive?
Define "data":)
Client connections to the master will be closed upon its failure, so your system will be notified of that. Any data that was not written to the master and the replicas before the failure will therefore still reside in your application/system.
Once your system tries using a replica it will be able to read the data in it up to the point it was synchronized before failure. Once the replica is promoted to masterhood, your system will be able to continue writing data.
Note that Redis' synchronization is asynchronous. That means that slaves may lag behind the master and therefore lose some updates in case of failure. Refer to the WAIT command for more information about ensure the consistency.
As explained in the StackExchange.Redis Basics documentation, you can connect to multiple Redis servers, and StackExchange.Redis will automatically determine the master/slave setup. Quoting the relevant part:
A more complicated scenario might involve a master/slave setup; for this usage, simply specify all the desired nodes that make up that logical redis tier (it will automatically identify the master):
ConnectionMultiplexer redis = ConnectionMultiplexer.Connect("server1:6379,server2:6379");
I performed a test in which I triggered a failover, such that the master would go down for a bit, causing the old slave to become the new master, and the old master to become the new slave. I noticed that in spite of this change, StackExchange.Redis keeps sending commands to the old master, causing write operations to fail.
Questions on the above:
How does StackExchange.Redis decide which endpoint to use?
How should multiple endpoints (as in the above example) be used?
I also noticed that for each connect, StackExchange.Redis opens two physical connections, one of which is some sort of subscription. What is this used for exactly? Is it used by Sentinel instances?
What should happen there is that it uses a number of things (in particular the defined replication configuration) to determine which is the master, and direct traffic at the appropriate server (respecting the "server" parameter, which defaults to "prefer master", but which always sends write operations to a master).
If a "cannot write to a readonly slave" (I can't remember the exact text) error is received, it will try to re-establish the configuration, and should switch automatically to respect this. Unfortunately, redis does not broadcast configuration changes, so the library can't detect this ahead of time.
Note that if you use the library methods to change master, it can exploit pub/sub to detect that change immediately and automatically.
Re the second connection: that would be for pub/sub; it spins this up ahead of time, as by default it attempts to listen for the library-specific configuration broadcasts.
I am designing a replication algorithm, to promote a master among many slaves. I want it to be faster and simpler than Paxos. The basic idea is:
Assign each node a 'Promotion Priority', for example for 5 nodes there would be priorities: 50,40,30,20 and 10, 50 the highest and 10 the lowest.
When master needs to be elected, all slaves will send (at the same time) the other 4 nodes a message requesting to become a master, but only that master will be elected that will be confirmed by all slaves with a confirmation message. A slave will send confirmation message if its own 'Promotion Priority' is lower than the asking node, or if the asking node with higher priority times out to issue rejection message for its own request.
If a slave receives a rejection message from slave with higher 'Promotion Priority' it will abort the procedure.
There should be no nodes with the same priority.
There will be a minimum number of confirmation messages that a slave should collect in order to become a master.
This algorithm should be faster because all the slaves will be electing a master in parallel and the priority will help to speed up the process.
What do you think about it? Does any other algorithm for master promotion with priority exists?
What do you think about it?
It is hard to completely assess the validity of you algorithm without knowing the details of your requirements. Overall, it looks like a valid approach, but there are a few issues that I think deserve some attention.
Your question has some similarities to A distributed algorithm to assign a shared resource to one node out of many. Consequently, some of the arguments raised in my answer to that question hold for this question as well.
When master needs to be elected, all slaves will send (at the same
time) the other 4 nodes a message requesting to become a master, but
only that master will be elected that will be confirmed by all slaves
with a confirmation message.
This approach assumes that all slaves know how many slaves are present at any time -- otherwise the supposed master can never draw the conclusion when it has received a confirmation from all slaves. Implicitly, this means that no slaves can leave and join the system without breaking the algorithm.
In practice though, these slaves will come and go, because of crashes, reboots, network outages etc. The chances of this increase with the number of slaves, but whether or not this is a problem depends on your requirements. How fault tolerant does your system have to be?
By the way, since you mention that there are many slaves, I assume that you are using multicast or broadcast to send the request messages. Otherwise, depending on what many means to you, your set-up could be error prone with regard to administrating where all slaves reside.
A slave will send confirmation message if its own 'Promotion Priority'
is lower than the asking node, or if the asking node with higher
priority times out to issue rejection message for its own request.
Similar to the previous remark: a slave might draw the wrong conclusion if some slave has problem responding for whatever reason. In fact, if one slave is down or has a network problem, all other slaves will draw the same (most likely erroneous) conclusion that the non-responsive slave is the master.
This algorithm should be faster because all the slaves will be
electing a master in parallel
The issues raised in this answer are almost inherent to doing the master selection in a distributed fashion though, and hard to resolve without introducing some kind of centralized decision maker. You gain some, you lose some...
Does any other algorithm for master promotion with priority exists?
Another approach would be to have all slaves in the system constantly maintain administration about who is the current master. This could be done (at the cost of some network bandwidth) by having every slave multicasting/broadcasting its priority periodically, via some sort of heartbeat message. As a result, every slave will be aware of every other slave, and at the moment that a master needs to be selected, every slave can do that instantly. Network issues or other "system health" problems will be detected because heartbeats are missed. This algorithm is flexible with regard to slaves joining and leaving the system. The higher the heartbeat frequency, the more responsive your system will be to topology changes. However, you might still run into issues of slaves running drawing independent conclusions because of a disconnected network. If that is a problem, then you might not be able to solve this in a completely parallel fashion.