What will happen if a shard fails in Redis Cluster? - redis

We have a Redis cluster with 3 shards each with a replica node. If a is lock acquired in a shard and while the thread is holding the lock the master and replica node goes down.
Will the cluster wait until the shard comes back live and not accept new locks until then OR will it run with 2 shards and create a new lock in a new shard?

That depends on retryAttempts and retryInterval settings. They should be big enough to survive failover.

Related

Could you please explain Replication feature of Redis

I am very new in REDIS cache implementation.
Could you please let me know what is the replication factor means?
How it works or What is the impact?
Thanks.
At the base of Redis replication (excluding the high availability features provided as an additional layer by Redis Cluster or Redis Sentinel) there is a very simple to use and configure leader follower (master-slave) replication: it allows replica Redis instances to be exact copies of master instances. The replica will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.
This system works using three main mechanisms:
When a master and a replica instances are well-connected, the master keeps the replica updated by sending a stream of commands to the replica, in order to replicate the effects on the dataset happening in the master side due to: client writes, keys expired or evicted, any other action changing the master dataset.
When the link between the master and the replica breaks, for network issues or because a timeout is sensed in the master or the replica, the replica reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
When a partial resynchronization is not possible, the replica will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the replica, and then continue sending the stream of commands as the dataset changes.
Redis uses by default asynchronous replication, which being low latency and high performance, is the natural replication mode for the vast majority of Redis use cases.
Synchronous replication of certain data can be requested by the clients using the WAIT command. However WAIT is only able to ensure that there are the specified number of acknowledged copies in the other Redis instances, it does not turn a set of Redis instances into a CP system with strong consistency: acknowledged writes can still be lost during a failover, depending on the exact configuration of the Redis persistence. However with WAIT the probability of losing a write after a failure event is greatly reduced to certain hard to trigger failure modes.

Is it possible to make redis strongly consistent?

The https://redis.io/topics/cluster-tutorial states that redis cluster is not strongly consistent. The reasoning it states even if WAIT is enabled is:
The node to which the update wasn't synced becomes master
After the partition and before the node timeout master in minority partition keeps receiving updates.
What if for a key k we find the master node M and the replicas r1,r2....rn using
CLUSTER SLAVES node-id
. And execute
WAIT N
and only proceed with the transaction if it return N? Wouldn't that always ensure that the data is perfectly synced before executing transaction. Wouldn't that ensure strong consistency ?
NO, it still CANNOT guarantee.
Although WAIT returns N, which means all replicas have acknowledge the writes in memory, these nodes might fail before these writes operation are written to disk.

Migration of data from slave to master in Redis Cluster

I'm currently exploring Redis cluster. I've started 6 instances on 3 physical servers(3 master and 3 slaves) with persistence enabled.
I've noticed that when I kill one of the master instances then it's slave is promoted to master after some time. However, it remains as master even when I start the killed instance.
Since, Redis does asynchronous replication, therefore, I was thinking of a scenario where the master, immediately after flushing the data is killed i.e. it wasn't able to replicate that data.
Will this data get replicated to the new master(initially slave), once
the instance comes back up?
NO. If the master haven't replicate data to slave, the data will be lost. When the old master recovers, it will be become a slave of some other node based on some rules. Then the old master will replicate data from its new master.

What happens to data before new master is elected in Redis?

In redis master-slave architecture, when a master fails a slave is promoted to master. As only master can perform write operations, What happens to data in the window period when slave is promoted to master. Does my system remain unresponsive?
Define "data":)
Client connections to the master will be closed upon its failure, so your system will be notified of that. Any data that was not written to the master and the replicas before the failure will therefore still reside in your application/system.
Once your system tries using a replica it will be able to read the data in it up to the point it was synchronized before failure. Once the replica is promoted to masterhood, your system will be able to continue writing data.
Note that Redis' synchronization is asynchronous. That means that slaves may lag behind the master and therefore lose some updates in case of failure. Refer to the WAIT command for more information about ensure the consistency.

Apache Ignite Fault Tolerance

I have few questions about Ignite Cache in Partitioned mode
1)When a node goes down in a Ignite cluster, If the failed node is primary for a key, does the backup of this become new primary?.
2)What happens to the backup copies in the failed node? will they be recreated in the cluster?.
3)If I set CacheRebalanceMode in cache configuration will it be applicable for node failure as well or only in case of node addition?
Yes, this is right. Former backup will become a new primary and new backup will receive the copy in background.
Yes, if backup is lost, new node will assigned for this role. It will receive the copy in background.
In synchronous rebalance mode a node will not complete start process and user will not be able to use the API until the data is rebalanced. This doesn't affect the rebalancing process in case of failures.