RabbitMQ HA cluster graceful shutdown of a master node when using 'when-synced' policy

RabbitMQ HA cluster graceful shutdown of a master node when using 'when-synced' policy - rabbitmq

Suppose I use ‘when-synced’ policy for both ha-promote-on-failure, ha-promote-on-shutdown on a HA cluster.
If so, ‘mirror to master promotion' will never be occurred and the master queue is blocked if there are no synchronized mirrors on controlled master shutdown.
That's what the documentation says.
https://www.rabbitmq.com/ha.html#cluster-shutdown
By default, RabbitMQ will refuse to promote an unsynchronised mirror
on controlled master shutdown (i.e. explicit stop of the RabbitMQ
service or shutdown of the OS) in order to avoid message loss; instead
the entire queue will shut down as if the unsynchronised mirrors were
not there.
If using 'when-synced' policy and if no mirrors were synchronized at the time of shutting down the master, according to the documentation, master doesn’t seem to shutdown gracefully.
For me it seems like there are only two options.
Waiting for a master to be restored (regardless of how long it takes) if I use ‘when-synced’.
Abandoning all the messages that are not yet synchronized to mirrors (exist only in the master) for availability if I use ‘always’.
Really?
There’s no option like “Blocking queues until one of the mirrors is fully synced, and then promote the synced mirror to the new master”?

Related

Automatic Failover to Promoted Redis Slave using Redis Cluster

Configuration: three redis cluster partitions across three sets of one master and one slave.
When a Master goes down, Lettuce immediately detects the outage and begins retrying. However, Lettuce does not detect that the associated slave has promoted itself to master and continues to retry using the old master that is not reachable and eventually times out. Tried setting various topology refresh options to no avail.
Proposed solution: After the first retry fails (which is the second retry in a row to fail), rerun topology refresh (that was used to derive topology during initialization) using topology from any of the nodes provided (since they all have the same topology information). This will reestablish the connections to the now-current masters. Then retry the failed operation on the partition that previously failed.

Redis Cluster is limited in terms of configuration update propagation compared to Redis Sentinel. Redis Sentinel communicates updates via Pub/Sub while Redis Cluster leaves polling as the sole option.
Lettuce supports periodic and adaptive cluster topology refresh triggers. Periodic updates topology in a regular interval, adaptive refresh listens to disconnects and cluster redirections.
You can configure both through ClusterClientOptions.
Periodic and adaptive refreshes try to cover the most cases which are mostly guesswork compensating the lack of a proper configuration change propagation. There always are loopholes (see issue #672) in which Lettuce is faster than the actual topology change. This leaves Lettuce with an outdated topology view as the actual change happens somewhat later.

Is there any message lost when using replication as HA policy

I am using jboss amq7.1/apache amq, When using replication as the HA policy for my cluster, it is documented that all data synchronization is done over the network, All persistent data received by the master broker is synchronized to the slave when the master drops from the network. A slave broker first needs to synchronize all existing data from the master broker before becoming capable of replacing it.
Per my understanding, if master broker is crashed instead of shutdown by administrator, no persistent data can be synced, therefore messages persisted in journal of master will be lost if the disk used by journal is broken, am I right?

Your understanding is not correct.
All persistent data received by the master broker is replicated to the slave when the master broker receives it so that when the master broker drops from the network (e.g. due to a crash) the slave can replace the master.
Replicating the data from the master to the slave when the master drops from the network would completely defeat the purpose of high availability.

Actually, if HA is configured as Master/Slave, whether network or journal replicated, the receipt of a message to the broker is FIRST replicated and ONLY if successful, it will be confirmed as received to the client.

rabbitmq cluster all down ,when first slave node,queue is state down

I have 3 nodes this disc mode and "ha-mode is all". rabbitmq version 3.6.4
when I try to stop all nodes, first I stop two slave nodes,end stop master nodes. Assume that master node is broken and can't be started. I use rabbitmqctl force_boot setup one slave node, I found queue state is down.
I don't think this is right. I think the slave node setup become master, and queue is available. Do not consider whether the message is lost.
But, first stop master node, then stop new master node, end last node. I can
rabbitmqctl force_boot setup any node. any node is available.

Sounds like you're ending up with unsynchronized slaves and by default RabbitMQ will refuse to fail over to an unsynchronised slave on controlled master shutdown.
Stopping master nodes with only unsynchronised slaves
It's possible that when you shut down a master node that all available slaves are unsynchronised. A common situation in which this can occur is rolling cluster upgrades. By default, RabbitMQ will refuse to fail over to an unsynchronised slave on controlled master shutdown (i.e. explicit stop of the RabbitMQ service or shutdown of the OS) in order to avoid message loss; instead the entire queue will shut down as if the unsynchronised slaves were not there. An uncontrolled master shutdown (i.e. server or node crash, or network outage) will still trigger a failover even to an unsynchronised slave.
If you would prefer to have master nodes fail over to unsynchronised slaves in all circumstances (i.e. you would choose availability of the queue over avoiding message loss) then you can set the ha-promote-on-shutdown policy key to always rather than its default value of when-synced.
https://www.rabbitmq.com/ha.html

Why PUBLISH command in redis slave causes no error?

I have a redis master-slave setup and the configuration of the slave is set to slave_read_only:1, but when I enter a PUBLISH command on the slave node it does not fail. I would expect an error, but it just takes the command and nothing else happens. The message is not propagated to the master either.
The question is, why is that? Did I mis-configure redis? Is that a feature? To what purpose? Or is it just a bug?
The problem arises in a setup where automatic failover occurs. A master may become a slave and clients of that slave may publish messages without realizing that it is no master any more. Do I have to check before each message is sent if the redis node is still master?
I use redis 3.0.5

You didn't misconfigure - this is the defined behavior as PUBLISH isn't considered a write command.
Also note, that when replicating published events are replicated from master to slaves (downstream, as usual), so if you're publishing to a slave only clients connected to it or to its slaves and subscribed to the relevant channel will get the message.

what is meaning partial resynchronization of redis?

Starting with Redis 2.8,redis add a function named "Partial resynchronization".I read this official document,but i don't understand.who can help me?

It is about master-slave replication.
The normal behavior of a Redis slave (slave of command, or configuration) is to connect to the master, ask the master to accumulate master-slave traffic, request a complete dump on filesystem to the master, download this dump on the slave, load the dump, and finally play the accumulated traffic until the slave catches up with the master.
This mechanism is quite robust but not very efficient to cover transient connection drops between the slave and the master. If the master-slave link is down for a couple of seconds, the slave will request a full resynchronization (involving a dump, etc ...), even if only a few commands have been missed.
Starting with 2.8, Redis includes a partial replication mechanism so a slave can reconnect to the master, and if some conditions are met (like a transient connection drop), asks the master to resynchronize without having to dump the whole memory instance.
In order to support this feature, the master has to buffer and keep a backlog of commands, so they can be served to the slaves at any time if needed. If the slave is too late behind the master, the backlog may not contain anymore the required data. In that case, a normal full synchronization is done, as in previous versions.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

RabbitMQ HA cluster graceful shutdown of a master node when using 'when-synced' policy - rabbitmq

Related

Automatic Failover to Promoted Redis Slave using Redis Cluster

Is there any message lost when using replication as HA policy

rabbitmq cluster all down ,when first slave node,queue is state down

Why PUBLISH command in redis slave causes no error?

what is meaning partial resynchronization of redis?

Categories

Resources