rabbitmq cluster all down ,when first slave node,queue is state down - rabbitmq

I have 3 nodes this disc mode and "ha-mode is all". rabbitmq version 3.6.4
when I try to stop all nodes, first I stop two slave nodes,end stop master nodes. Assume that master node is broken and can't be started. I use rabbitmqctl force_boot setup one slave node, I found queue state is down.
I don't think this is right. I think the slave node setup become master, and queue is available. Do not consider whether the message is lost.
But, first stop master node, then stop new master node, end last node. I can
rabbitmqctl force_boot setup any node. any node is available.

Sounds like you're ending up with unsynchronized slaves and by default RabbitMQ will refuse to fail over to an unsynchronised slave on controlled master shutdown.
Stopping master nodes with only unsynchronised slaves
It's possible that when you shut down a master node that all available slaves are unsynchronised. A common situation in which this can occur is rolling cluster upgrades. By default, RabbitMQ will refuse to fail over to an unsynchronised slave on controlled master shutdown (i.e. explicit stop of the RabbitMQ service or shutdown of the OS) in order to avoid message loss; instead the entire queue will shut down as if the unsynchronised slaves were not there. An uncontrolled master shutdown (i.e. server or node crash, or network outage) will still trigger a failover even to an unsynchronised slave.
If you would prefer to have master nodes fail over to unsynchronised slaves in all circumstances (i.e. you would choose availability of the queue over avoiding message loss) then you can set the ha-promote-on-shutdown policy key to always rather than its default value of when-synced.
https://www.rabbitmq.com/ha.html

Related

Handle io.lettuce.core.RedisReadOnlyException when network is partitioned

I have a situation where I use sentinel to get current redis master from sentinel. My setup is one redis master and three slaves and three sentinel nodes. This works fine in most situations but I have found that if I get a network split where the current master and the sentinel node that is configured first in the list of sentinel nodes are isolated from the other nodes, the other two sentinel nodes are doing a reelection to a new master, as intended.
My problem is that when the isolated previous master is accessing the common network again and is reconfigured to slave, my application is never notified that a new master is elected and continues to write to a slave since it still thinks it is writing to a master, ending up in getting "Error in execution; nested exception is io.lettuce.core.RedisReadOnlyException: READONLY You can't write against a read only slave."
I do not know if this is a redis problem or framework problem. Should redis when it is reconfigured from master to save terminate the connection like it is done in normal circumstances when a new master is elected or should the framework handle exceptions and query for current master?
One more interesting aspect of this is if the sentinel node configured first in the sentinel node list continues to be isolated, the behavior continues even if the application accessing redis is restarted.
Is there any mechanism to handle this situation or is this a bug or enhancement to the framework?

RabbitMQ HA cluster graceful shutdown of a master node when using 'when-synced' policy

Suppose I use ‘when-synced’ policy for both ha-promote-on-failure, ha-promote-on-shutdown on a HA cluster.
If so, ‘mirror to master promotion' will never be occurred and the master queue is blocked if there are no synchronized mirrors on controlled master shutdown.
That's what the documentation says.
https://www.rabbitmq.com/ha.html#cluster-shutdown
By default, RabbitMQ will refuse to promote an unsynchronised mirror
on controlled master shutdown (i.e. explicit stop of the RabbitMQ
service or shutdown of the OS) in order to avoid message loss; instead
the entire queue will shut down as if the unsynchronised mirrors were
not there.
If using 'when-synced' policy and if no mirrors were synchronized at the time of shutting down the master, according to the documentation, master doesn’t seem to shutdown gracefully.
For me it seems like there are only two options.
Waiting for a master to be restored (regardless of how long it takes) if I use ‘when-synced’.
Abandoning all the messages that are not yet synchronized to mirrors (exist only in the master) for availability if I use ‘always’.
Really?
There’s no option like “Blocking queues until one of the mirrors is fully synced, and then promote the synced mirror to the new master”?

Unslave a redis slave

I have a setup of 3 instances in a failover cluster, one master and two slaves. All monitored by sentinels. At one point I decide I don't need one slave, and I want to reuse that redis instance for something else, what commands to I issue?
I tried running slaveof no one on that slave, but it's enslaved again in a few seconds.
Sentinels remember forever the slaves they have seen, in order to reconnect them when they return after a crash or a network partition.
For the sentinels to forget the slave to remove, Redis' doc says "you need to send a SENTINEL RESET mastername command to all the Sentinels: they'll refresh the list of slaves within the next 10 seconds, only adding the ones listed as correctly replicating from the current master INFO output."

Redis - Promoting a slave to master manually

Suppose I have [Slave IP Address] which is the slave of [Master IP Address].
Now my master server has been shut down, and I need to set this slave to be master MANUALLY (WITHOUT using sentinel automatic failover, WITH redis command).
Is it possible doing this without restarting the redis service ? (and losing all the cached data)
use SLAVEOF NO ONE to promote a slave to master
http://redis.io/commands/slaveof
it depends, if you are in a cluster you will be better using the fail over. You will need to use the force option in the command
http://redis.io/commands/cluster-failover
Is it possible doing this without restarting the redis service? (and
losing all the cached data)
yes that's possible, you can use
SLAVEOF NO ONE (without sentinel)
But it is recommended to use sentinel to avoid data loss.
sentinel failover master-name(with sentinel)
This will force the sentinel to switch master.
The new master will have all the data that was synchronized before the old-master shutdown.
Redis will automatically choose the best slave with max. data, that will reduce the amount of data we lose when switching master.
Below 2 options in step 3 have helped me to recover the cluster once a master node is down, compute was replaced or other not recoverable state.
1 .- First you need to connect to the slave node, use redis-cli, here a link how to do that: How to connect to remote Redis server?
2 .- Once connected to the slave node run the command cluster nodes to validate master node is in fail state, also run cluster info to see the overall state of your cluster(this is always a good idea)
3 .- Inside the slave node to be promoted run command: cluster failover,
in rare cases when there is some serious issues with redis this
command could fail, and you will need to use cluster failover force
or cluster failover takeover, here more info abut the implications
of those options: https://redis.io/commands/cluster-failover
4 .- Run cluster forged $old_master_id in all your cluster nodes
5 .- Add a new node with cluster meet $new_node_IP $new_node_PORT
6 .- Subscribe your new node to your brand new master, login in to the new bode and run cluster replicate $master_node_id
Steps 1-3 are required for the slave-master promotion and 4-5 are required to left all cluster in a healthy master-slave equilibrium.
As of Redis version 5.0.0 the SLAVEOF command is regarded as deprecated.
If a Redis server is already acting as replica, the command REPLICAOF NO ONE will turn off the replication, turning the Redis server into a MASTER.

Can we mark a slave as unpromotable by redis-sentinel?

We have a redis cluster with a master and a slave managed by three sentinel processes, and an additional remote slave, hosted in a different datacenter, for transparent failover and data preservation in the case that something bad happens to the master and slave machines.
It may happen that a transient error takes down the master redis process only, and in this situation we would like to see the slave process promoted to master, and the remote slave reslaved to it. However, it seems that sentinel could just as easily promote the remote slave to master, and we have not found any way to prevent this.
Is there any way to mark a particular slave machine as unpromotable, so that sentinel will not try to make it the master in the event of a failover?
Yes. In the slave's config file set the slave-priority setting to zero (the number not the word).