I am using jboss amq7.1/apache amq, When using replication as the HA policy for my cluster, it is documented that all data synchronization is done over the network, All persistent data received by the master broker is synchronized to the slave when the master drops from the network. A slave broker first needs to synchronize all existing data from the master broker before becoming capable of replacing it.
Per my understanding, if master broker is crashed instead of shutdown by administrator, no persistent data can be synced, therefore messages persisted in journal of master will be lost if the disk used by journal is broken, am I right?
Your understanding is not correct.
All persistent data received by the master broker is replicated to the slave when the master broker receives it so that when the master broker drops from the network (e.g. due to a crash) the slave can replace the master.
Replicating the data from the master to the slave when the master drops from the network would completely defeat the purpose of high availability.
Actually, if HA is configured as Master/Slave, whether network or journal replicated, the receipt of a message to the broker is FIRST replicated and ONLY if successful, it will be confirmed as received to the client.
Related
Suppose I use ‘when-synced’ policy for both ha-promote-on-failure, ha-promote-on-shutdown on a HA cluster.
If so, ‘mirror to master promotion' will never be occurred and the master queue is blocked if there are no synchronized mirrors on controlled master shutdown.
That's what the documentation says.
https://www.rabbitmq.com/ha.html#cluster-shutdown
By default, RabbitMQ will refuse to promote an unsynchronised mirror
on controlled master shutdown (i.e. explicit stop of the RabbitMQ
service or shutdown of the OS) in order to avoid message loss; instead
the entire queue will shut down as if the unsynchronised mirrors were
not there.
If using 'when-synced' policy and if no mirrors were synchronized at the time of shutting down the master, according to the documentation, master doesn’t seem to shutdown gracefully.
For me it seems like there are only two options.
Waiting for a master to be restored (regardless of how long it takes) if I use ‘when-synced’.
Abandoning all the messages that are not yet synchronized to mirrors (exist only in the master) for availability if I use ‘always’.
Really?
There’s no option like “Blocking queues until one of the mirrors is fully synced, and then promote the synced mirror to the new master”?
So the documentation to the "Replicated LevelDB Store" says:
The elected master broker node starts and accepts client connections. The other nodes go into slave mode and connect the the master and synchronize their persistent state /w it. The slave nodes do not accept client connections. All persistent operations are replicated to the connected slaves. If the master dies, the slaves with the latest update gets promoted to become the master. The failed node can then be brought back online and it will go into slave mode.
So one chosen master exist, it accepts client connections and the rest are replicated slave nodes who do not accept client connections. Fine.
So if the master dies it's all working fine - the master gets reelected, clients disconnect and they eventually connect to the new master. Awesome.
Now what happens if the master isn't dead from the perspective of Zookeeper, but it's just NOT ACCESSIBLE from clients. So a master is chosen, it's considered live(as i understand zookeeper's need to be able to connect to it to be considered available), but the actual clients can't connect to it?
Sure clients CAN connect to the other slave nodes, they just can't connect to the master. But the master won't ever be changed as it's live. Is that how it works?
Not sure i understood it right.
LevelDB support in ActiveMQ is deprecated and has been for quite some time (years) so I'd suggest not bothering with it as there is no support and plenty of open bugs that will not be fixed.
I'd suggest taking a look instead at ActiveMQ Artemis.
You understand it right, and it's a reasonable design.
Clients only commuicate to master, and slaves are just used for backup. If what you described really happens, maybe caused by network problem, then you should fix the network(or any other possible reasons).
Redis's replication starts upon connection of a slave to the master. But after the inital replication is over, how does the slave continuously stay in sync with the master? I could not find any part of the documentation describing this mechanism. In particular, how can I measure the lag between the master and the slave?
After the initial replication, the master writes changes to internal buffers and sends them to the slave(s). From the replication page:
The master will then send to the slave all buffered commands. This is
done as a stream of commands and is in the same format of the Redis
protocol itself.
You can look at the full replication source code (this points to Redis version 3.0) on GitHub for the nitty-gritty details.
As far as latency is concerned, there is a page dedicated to latency troubleshooting and one dedicated to latency monitoring. These two pages contain a plethora of background information and techniques to troubleshoot/measure Redis latency. A simple place to start is by running redis-cli --latency -h 'host' -p 'port' from slave to master and/or master to slave.
I believe you can find that out by doing issuing INFO replication on the slave and examining the value of slave_repl_offset.
Starting with Redis 2.8,redis add a function named "Partial resynchronization".I read this official document,but i don't understand.who can help me?
It is about master-slave replication.
The normal behavior of a Redis slave (slave of command, or configuration) is to connect to the master, ask the master to accumulate master-slave traffic, request a complete dump on filesystem to the master, download this dump on the slave, load the dump, and finally play the accumulated traffic until the slave catches up with the master.
This mechanism is quite robust but not very efficient to cover transient connection drops between the slave and the master. If the master-slave link is down for a couple of seconds, the slave will request a full resynchronization (involving a dump, etc ...), even if only a few commands have been missed.
Starting with 2.8, Redis includes a partial replication mechanism so a slave can reconnect to the master, and if some conditions are met (like a transient connection drop), asks the master to resynchronize without having to dump the whole memory instance.
In order to support this feature, the master has to buffer and keep a backlog of commands, so they can be served to the slaves at any time if needed. If the slave is too late behind the master, the backlog may not contain anymore the required data. In that case, a normal full synchronization is done, as in previous versions.
I am using ActiveMQ version 5.4 and I have a pure master slave configuration. My slave is configured such that starts its network transports connectors in the event of a failure. My clients are configured using the failover protocol, just like the docs say:
failover://(tcp://masterhost:61616,tcp://slavehost:61616)?randomize=false
When my master dies, the clients successfully fail over to the slave perfectly. The problem is that after I recover (i.e. stop the slave, copy over the data, restart the master, then restart the slave), the clients are still trying to connect to the the slave (which does not have any open network connectors at that point). Thus, the clients never reconnect to the master after restarting it. Is this how it's supposed to work?
I've seen this as well. If you're using the PooledConnectionFactory, set an expiry timeout on the pooled connections via setExpiryTimeout. The API documentation here suggests that this will force reconnection to the master broker:
allow connections to expire, irrespective of load or idle time. This is useful with failover to force a reconnect from the pool, to reestablish load balancing or use of the master post recovery