Redis Sentinel with 2 master after multi az netsplit - redis

Hello stack community,
I have a question about Redis sentinel for a specific problem case. I use AWS with Multi AZ to create a sensu cluster.
On eu-central-1a I have a sensu+redis(M), a RBMQ+Sentinel and 2 others Sentinels. Same on eu-central-1b but the redis is my slave on this AZ.
What happen if there is a problem and eu-central-1a can not communicate with eu-central-1b ? What I think is that Sentinel on eu-central-1b should promote my redis slave to master, because he can not contact my redis master. So I should have 2 redis masters running together on 2 different AZ.
But when the link is retrieved between AZ, I will still have 2 masters, with 2 different datas. What will happen in this case ? One master will become a slave and data will be replicated without loss ? Do we need to restart a master and he will be a slave ?

Sentinel detects changes to the master for example
If the master goes down and is unreachable a new slave is elected. This is based on the quorum where multiple sentinels agree that the master has gone down. The failover then occurs.
Once the sentinel detects the master come back online it is then a slave I believe thus the new master continues I believe. You will loose data in the switchover from master to new master that in inevitable.
If you loose connection then yes sentinel wont work correctly as it relies on multiple sentinels to agree the master redis is down. You shouldn't use sentinel in a 2 sentinel system.
Basic solution would be for you to put a extra sentinel on another server maybe the client/application server that isn't running redis/sentinel this way you can make use of the quorum and sentinels agreeing the master is down.

Related

Redis sentinel with multilevel replicas

I am using Sentinel as a high availability solution for redis.
I have a problem.
In consideration of reducing the replication pressure of the master, our redis instances are multi-level, as follow:
In the introduction of the sentinel, I found that can monitor multiple masters, so I import it and hope to work as follows:
The second row of the replica belongs to the "master" logically too, so it also needs to be monitored.
Get the opposite of what one wants When the Sentinels just started, they had an election and independent many masters, actual master(role: master), not logic master.
Q: So can sentinels do the monitoring mode in the figure above?
My main configuration is as follows:
sentinel monitor top-master xxx.x.x.x 6379 2
sentinel monitor second-level-first xxx.x.x.x 6379 2
sentinel monitor second-level-second xxx.x.x.x 6379 2
sentinel monitor second-level-third xxx.x.x.x 6379 2
IN BRIEF - NO
To answer the above you would want to drill down into what sentinel is doing.
It is going to find out all the slaves it is connected to a master.
it establishes a pub-sub with those nodes.
when your actual master fails and another node becomes master this cannot be propagated.
Infact, to answer further more, can you please share the configuration of your slave nodes on level1? Infact this should have not been possible at all. I am just wondering how this worked.
If you can share the config files, will go through and update accordingly.

Supporting Slave of Slave Replication with Redis Sentinel?

We have two datacenters, each with two redis instances. Generally they are replicated as chain.
NY1 (Master) --> NY2 (Slave) --> CO1 (Slave) --> CO2 (Slave)
NY is New York and CO is Colorado, our backup datacenter. In order to save bandwidth over the WAN, we don't want CO1 and CO2 connected to NY1. Rather we want a chain configuration, where there is only one slave directly to the master, and the others are all "slaves of slaves".
Can this sort of replication layout be maintained using Sentinel? Or do all slaves have to be a slave of the master, and not a slave of a slave?
Currently this type of setup isn't possible with Sentinel because Sentinel rewrites the configurations of all monitored Redis systems.
For example, if you set up a system as you described and have sentinel monitoring all of the hosts, if the master goes down and forces a failover, each of the Redis hosts will be re-configured. One of the replicas (any of them) will become the new master, and the others will become replicas of the new master. When the old master comes back online, it will be re-configured to be a replica of the new master.
However, in general you can get Redis to work the way you want. You can have as many replicas of a replica as you need by setting the replicaof config value to a replica.
Personally, I would still use Sentinel to monitor the master and the "prime" replicas (those that replicate from the master itself). This could result in one of the prime replicas becoming a new master, so I would enable the notification option. This tells sentinel to call a script whenever a failover happens. In that script you can send an email, hit a Slack webhook, or whatever else you want to do with it. When I get it, I'd manually reconfigure the hosts back into the format I want, but with the new master. It'd be a pain to do it this way but I'd still get automatic failover of the master and prime replicas so my apps will continue working.

Redis - Promoting a slave to master manually

Suppose I have [Slave IP Address] which is the slave of [Master IP Address].
Now my master server has been shut down, and I need to set this slave to be master MANUALLY (WITHOUT using sentinel automatic failover, WITH redis command).
Is it possible doing this without restarting the redis service ? (and losing all the cached data)
use SLAVEOF NO ONE to promote a slave to master
http://redis.io/commands/slaveof
it depends, if you are in a cluster you will be better using the fail over. You will need to use the force option in the command
http://redis.io/commands/cluster-failover
Is it possible doing this without restarting the redis service? (and
losing all the cached data)
yes that's possible, you can use
SLAVEOF NO ONE (without sentinel)
But it is recommended to use sentinel to avoid data loss.
sentinel failover master-name(with sentinel)
This will force the sentinel to switch master.
The new master will have all the data that was synchronized before the old-master shutdown.
Redis will automatically choose the best slave with max. data, that will reduce the amount of data we lose when switching master.
Below 2 options in step 3 have helped me to recover the cluster once a master node is down, compute was replaced or other not recoverable state.
1 .- First you need to connect to the slave node, use redis-cli, here a link how to do that: How to connect to remote Redis server?
2 .- Once connected to the slave node run the command cluster nodes to validate master node is in fail state, also run cluster info to see the overall state of your cluster(this is always a good idea)
3 .- Inside the slave node to be promoted run command: cluster failover,
in rare cases when there is some serious issues with redis this
command could fail, and you will need to use cluster failover force
or cluster failover takeover, here more info abut the implications
of those options: https://redis.io/commands/cluster-failover
4 .- Run cluster forged $old_master_id in all your cluster nodes
5 .- Add a new node with cluster meet $new_node_IP $new_node_PORT
6 .- Subscribe your new node to your brand new master, login in to the new bode and run cluster replicate $master_node_id
Steps 1-3 are required for the slave-master promotion and 4-5 are required to left all cluster in a healthy master-slave equilibrium.
As of Redis version 5.0.0 the SLAVEOF command is regarded as deprecated.
If a Redis server is already acting as replica, the command REPLICAOF NO ONE will turn off the replication, turning the Redis server into a MASTER.

redis sentinel out of sync with servers in a cluster

We have a setup with a number of redis (2.8) servers (lets say 4) and as many redis sentinels. On startup of each machine, we set a pre-select machine as master through the command line and all the rest as slaves of that. and the sentinels all monitor these machines. The clients first connect to the local sentinel and retrieve the master's IP address and then connect there.
This setup is trouble free most of the time but sometimes the sentinels go out of sync with servers. if I name the machines A,B,C and D - sentinels will think B is master while redis servers are all connected to A as the master. bringing down redis server on B doesnt help either. I had to bring it down and manually "Sentinel failover" on A to fix the issue. Question is
1. What causes this to happen and whats the easiest and quickest way to fix this ?
2. What is best configuration - is there something better than this ?
The only time you should set a master is the first time. Once sentinel has taken over management of replication you should let it do it. This includes on restarts. Don't use the command line to set replication. Let sentinel and redis manage it. This is why you're getting issues - you've told sentinel it is authoritative, but you are telling the Redis servers to ignore sentinel.
Sentinel stores the status in its Config file, so when it restarts it can resume the last configuration. So even on restart, let sentinel do it's job.
Also, if you have 4 servers (be specific, not "let's say") you should be running a quorum of three on your monitor statement in sentinel. With a quorum of two you can wind up with two masters

Can we mark a slave as unpromotable by redis-sentinel?

We have a redis cluster with a master and a slave managed by three sentinel processes, and an additional remote slave, hosted in a different datacenter, for transparent failover and data preservation in the case that something bad happens to the master and slave machines.
It may happen that a transient error takes down the master redis process only, and in this situation we would like to see the slave process promoted to master, and the remote slave reslaved to it. However, it seems that sentinel could just as easily promote the remote slave to master, and we have not found any way to prevent this.
Is there any way to mark a particular slave machine as unpromotable, so that sentinel will not try to make it the master in the event of a failover?
Yes. In the slave's config file set the slave-priority setting to zero (the number not the word).