Redis cluster can not auto failed over - redis

I setup my Redis cluster (version 6.2) with 3 master nodes and 3 slave nodes. It works well for normal scenario.
However, if I kill one of the master nodes, even if I wait a very long time, the auto-failover does not happen. I use the "cluster nodes" command, the output tells me that the killed node is marked as "master, failed", and all 3 slave nodes are still as "slave". From the log, I also can not see any useful information.
My cluster config, except below 2, all are used default:
cluster-node-timeout 5000
cluster-require-full-covearage no
So may I know who has an idea how to check what is wrong, that is very appreciated!

Related

Handle io.lettuce.core.RedisReadOnlyException when network is partitioned

I have a situation where I use sentinel to get current redis master from sentinel. My setup is one redis master and three slaves and three sentinel nodes. This works fine in most situations but I have found that if I get a network split where the current master and the sentinel node that is configured first in the list of sentinel nodes are isolated from the other nodes, the other two sentinel nodes are doing a reelection to a new master, as intended.
My problem is that when the isolated previous master is accessing the common network again and is reconfigured to slave, my application is never notified that a new master is elected and continues to write to a slave since it still thinks it is writing to a master, ending up in getting "Error in execution; nested exception is io.lettuce.core.RedisReadOnlyException: READONLY You can't write against a read only slave."
I do not know if this is a redis problem or framework problem. Should redis when it is reconfigured from master to save terminate the connection like it is done in normal circumstances when a new master is elected or should the framework handle exceptions and query for current master?
One more interesting aspect of this is if the sentinel node configured first in the sentinel node list continues to be isolated, the behavior continues even if the application accessing redis is restarted.
Is there any mechanism to handle this situation or is this a bug or enhancement to the framework?

Redis cluster with one master and N replica/slave

Is it possible to create a Redis cluster with only 1 master and N slaves/replicas?
I tried it and it failed:
redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 --cluster-replicas 2
*** ERROR: Invalid configuration for cluster creation.
*** Redis Cluster requires at least 3 master nodes.
*** This is not possible with 3 nodes and 2 replicas per node.
*** At least 9 nodes are required.
Is there a way to avoid this restriction of minimum 3 masters?
Redis Cluster doesn't support what you are asking for, but there is another H/A Redis mode, "Redis Sentinel":
https://redis.io/docs/manual/sentinel/
This article is worth reading as it illustrates some pros and cons of the two H/A modes:
Redis Sentinel Pros:
With three nodes, you can build up a fully functional Sentinel deployment. (Image 2)
Simplicity - it’s usually simple to maintain and configure.
Highly available, you can build a Redis Sentinel deployment that can survive certain failures without any need for human intervention.
Work as long as a single master instance is available; it can survive the failure of all slave instances.
Multiple slave nodes can replicate data from a master node.
Redis Sentinel Cons:
Not scalable; writes must go to the master, cannot solve the problem of read-write separation.
Slaves may serve reads, but because of asynchronous replication, outdated reads may result.
It doesn’t shard data, so master and slave utilization will be imbalanced.
The slave node is a waste of resources because it does not serve as a backup node.
Redis-Sentinel must be supported by the client. The client holds half of the magic.

Redis cluster node failure not detected on MISCONF

We currently have a redis cache cluster with 3 masters and 3 slaves hosted on 3 windows servers (1 master/slave by server). We are using StackExhange.Redis as our client.
We have RBD disabled but AOF enabled and are experiencing some problems with the cluster in the following situation :
One of our servers became full and the redis node on this server was unable to write to the AOF file (the error returned to the client was MISCONF Errors writing to the AOF file: No space left on device).
The cluster did not detect that the node was failing and so did not exlclude it from the cluster.
All cache operations were blocked until we make some place on the server.
We know that we don't need the AOF, so we have disalbed it after the incident.
But we would like to confirm or infirm our view on redis clustering: for us, if a node was experiencing a failure, the cluster would redirect all requests to another one. We have tested that with a stopped node master, a slave is promoted into a master so we are confident that our cluster is working, but we are not sure why, in our case, the node was not marked as a failure.
Is the cluster capable of detecting a node failure when the failure is only happening when a request is made from a client to the cluster ?

rabbitmq cluster how to change active/active into active/passive mode?

I have setp a 2 nodes rabbitmq cluster with one loader balancer at frontend, after this was setup, it was working as active/active mode, then network partition happened on one node, I got the failed node out of the cluster and rejoin it into the cluster again, then this failed node were not accecpting any connection.
Then I tried to moved the other node out of the balancer, the recovered node began to accept connections, so this cluster is active/passive mode.
I don't know what caused this, is there any way to change it back to active/active? And which step to specify its mode during setup?
Thanks for your advice in advance!
rabbitmq really (really) doesn't like network partitions. By default, when you have one, everything pauses. In that situation you must fix it manually. Choosing the loser by stopping it and starting it should resume everything once it rejoins the cluster.
If that doesn't work, then shut down the failed node, and use rabbitmqctl to "forget_cluster_node", and then rejoin it to the cluster.
You should read this very carefully
https://www.rabbitmq.com/partitions.html
specifically, "Recovering from a network partition"
Then read the next few paragraphs even more carefully. There are some automatic recovery modes, each with advantages and disadvantages.
At my company we chose autoheal because we value availability, and accept the possible loss of messages.

Redis - Promoting a slave to master manually

Suppose I have [Slave IP Address] which is the slave of [Master IP Address].
Now my master server has been shut down, and I need to set this slave to be master MANUALLY (WITHOUT using sentinel automatic failover, WITH redis command).
Is it possible doing this without restarting the redis service ? (and losing all the cached data)
use SLAVEOF NO ONE to promote a slave to master
http://redis.io/commands/slaveof
it depends, if you are in a cluster you will be better using the fail over. You will need to use the force option in the command
http://redis.io/commands/cluster-failover
Is it possible doing this without restarting the redis service? (and
losing all the cached data)
yes that's possible, you can use
SLAVEOF NO ONE (without sentinel)
But it is recommended to use sentinel to avoid data loss.
sentinel failover master-name(with sentinel)
This will force the sentinel to switch master.
The new master will have all the data that was synchronized before the old-master shutdown.
Redis will automatically choose the best slave with max. data, that will reduce the amount of data we lose when switching master.
Below 2 options in step 3 have helped me to recover the cluster once a master node is down, compute was replaced or other not recoverable state.
1 .- First you need to connect to the slave node, use redis-cli, here a link how to do that: How to connect to remote Redis server?
2 .- Once connected to the slave node run the command cluster nodes to validate master node is in fail state, also run cluster info to see the overall state of your cluster(this is always a good idea)
3 .- Inside the slave node to be promoted run command: cluster failover,
in rare cases when there is some serious issues with redis this
command could fail, and you will need to use cluster failover force
or cluster failover takeover, here more info abut the implications
of those options: https://redis.io/commands/cluster-failover
4 .- Run cluster forged $old_master_id in all your cluster nodes
5 .- Add a new node with cluster meet $new_node_IP $new_node_PORT
6 .- Subscribe your new node to your brand new master, login in to the new bode and run cluster replicate $master_node_id
Steps 1-3 are required for the slave-master promotion and 4-5 are required to left all cluster in a healthy master-slave equilibrium.
As of Redis version 5.0.0 the SLAVEOF command is regarded as deprecated.
If a Redis server is already acting as replica, the command REPLICAOF NO ONE will turn off the replication, turning the Redis server into a MASTER.