I am using activeMQ with fail over protocol and mySql Db as data storage.When i start both broker; one broker is active(as master) and other broker (as slave) become inactive but after certain time slave wake up become master.I am unable to find anything why this happen.
Related
Suppose I use ‘when-synced’ policy for both ha-promote-on-failure, ha-promote-on-shutdown on a HA cluster.
If so, ‘mirror to master promotion' will never be occurred and the master queue is blocked if there are no synchronized mirrors on controlled master shutdown.
That's what the documentation says.
https://www.rabbitmq.com/ha.html#cluster-shutdown
By default, RabbitMQ will refuse to promote an unsynchronised mirror
on controlled master shutdown (i.e. explicit stop of the RabbitMQ
service or shutdown of the OS) in order to avoid message loss; instead
the entire queue will shut down as if the unsynchronised mirrors were
not there.
If using 'when-synced' policy and if no mirrors were synchronized at the time of shutting down the master, according to the documentation, master doesn’t seem to shutdown gracefully.
For me it seems like there are only two options.
Waiting for a master to be restored (regardless of how long it takes) if I use ‘when-synced’.
Abandoning all the messages that are not yet synchronized to mirrors (exist only in the master) for availability if I use ‘always’.
Really?
There’s no option like “Blocking queues until one of the mirrors is fully synced, and then promote the synced mirror to the new master”?
So the documentation to the "Replicated LevelDB Store" says:
The elected master broker node starts and accepts client connections. The other nodes go into slave mode and connect the the master and synchronize their persistent state /w it. The slave nodes do not accept client connections. All persistent operations are replicated to the connected slaves. If the master dies, the slaves with the latest update gets promoted to become the master. The failed node can then be brought back online and it will go into slave mode.
So one chosen master exist, it accepts client connections and the rest are replicated slave nodes who do not accept client connections. Fine.
So if the master dies it's all working fine - the master gets reelected, clients disconnect and they eventually connect to the new master. Awesome.
Now what happens if the master isn't dead from the perspective of Zookeeper, but it's just NOT ACCESSIBLE from clients. So a master is chosen, it's considered live(as i understand zookeeper's need to be able to connect to it to be considered available), but the actual clients can't connect to it?
Sure clients CAN connect to the other slave nodes, they just can't connect to the master. But the master won't ever be changed as it's live. Is that how it works?
Not sure i understood it right.
LevelDB support in ActiveMQ is deprecated and has been for quite some time (years) so I'd suggest not bothering with it as there is no support and plenty of open bugs that will not be fixed.
I'd suggest taking a look instead at ActiveMQ Artemis.
You understand it right, and it's a reasonable design.
Clients only commuicate to master, and slaves are just used for backup. If what you described really happens, maybe caused by network problem, then you should fix the network(or any other possible reasons).
I have 3 ActiveMQ brokers in a networked Shared File System(GlusterFS)/Master Slave configuration - all in VMs.
If the master fails the client should failover to the new master.
The issue I have is that the connection to the new master takes about 50 seconds.
Is that reasonable?
How to improve it?
My client connection looks like this
failover:(tcp://a1:61616?connectionTimeout=1000,tcp://a2:61616?connectionTimeout=1000,tcp://a3:61616?connectionTimeout=1000)?randomize=false&maxReconnectDelay=10000&backup=true"
Also when disconnecting the master by disconnecting network cable it stops and throws an exception regarding the kahaDB (which is on GlusterFS) and needs to be restarted.
Is there a workaround for this behavior so the master broker auto-restarts or is able to connect automatically once the network comes back?
The failover depends on the time the underlying file system take for releasing the file lock.
In your case, the NFS cluster is waiting 50s to detect that the first node is lost and so release the lock on the kahadb file, wich can then be taken by the seconde node.
You can customize this delay with the NFSD_V4_GRACE and NFSD_V4_LEASE parameters in the NFS server configuration file (/etc/sysconfig/nfs on redhat/centos systems).
You can also customize the kahadb lockKeepAlivePeriod, see http://activemq.apache.org/pluggable-storage-lockers.html
I want to understand zookeeper's role in replicated leveldb for ActiveMQ broker.
About zookeeper election : How does zookeeper knows that out of all the clients connected to zookeeper, which clients are ActiveMQ brokers fighting to become master. Is there any particular key or configuration which is passed by all the brokers connecting to zookeeper which says that we (let say 3) ActiveMQ brokers belong to same environment and fighting to become master.
At what interval slave broker copy data from master broker ? Any corner cases where data might be lost ?
Does ActiveMQ provides guarantee of message ordering using replicated leveldb ? I am talking about the case when re-election of master happens while producer is sending messages in sequence to the broker?
Thanks,
Anuj
By zkPath in the Zookeeper configuration and by broker name.
Each message is synced to a quorum (nodes/2+1) brokers before the transaction completes. So there is no sync interval, it's synced in real time. The cluster will no function unless you have a quorum of brokers online so there should be no data loss.
The messages are synced to a majority of the nodes in a synchronous fashion. At reelection, a node with the latest updates will be elected. Ordered messages should be no problem. However, it's generally problematic to rely critically on ordered messages in a message queueing. As a rule of thumb - message order will only be complete under "happy days". Dead letters, multiple consumers and so forth might as well mess up message order.
I am using ActiveMQ version 5.4 and I have a pure master slave configuration. My slave is configured such that starts its network transports connectors in the event of a failure. My clients are configured using the failover protocol, just like the docs say:
failover://(tcp://masterhost:61616,tcp://slavehost:61616)?randomize=false
When my master dies, the clients successfully fail over to the slave perfectly. The problem is that after I recover (i.e. stop the slave, copy over the data, restart the master, then restart the slave), the clients are still trying to connect to the the slave (which does not have any open network connectors at that point). Thus, the clients never reconnect to the master after restarting it. Is this how it's supposed to work?
I've seen this as well. If you're using the PooledConnectionFactory, set an expiry timeout on the pooled connections via setExpiryTimeout. The API documentation here suggests that this will force reconnection to the master broker:
allow connections to expire, irrespective of load or idle time. This is useful with failover to force a reconnect from the pool, to reestablish load balancing or use of the master post recovery