ActiveMQ failover protocol not reconnecting to master after restarting - activemq

I am using ActiveMQ version 5.4 and I have a pure master slave configuration. My slave is configured such that starts its network transports connectors in the event of a failure. My clients are configured using the failover protocol, just like the docs say:
failover://(tcp://masterhost:61616,tcp://slavehost:61616)?randomize=false
When my master dies, the clients successfully fail over to the slave perfectly. The problem is that after I recover (i.e. stop the slave, copy over the data, restart the master, then restart the slave), the clients are still trying to connect to the the slave (which does not have any open network connectors at that point). Thus, the clients never reconnect to the master after restarting it. Is this how it's supposed to work?

I've seen this as well. If you're using the PooledConnectionFactory, set an expiry timeout on the pooled connections via setExpiryTimeout. The API documentation here suggests that this will force reconnection to the master broker:
allow connections to expire, irrespective of load or idle time. This is useful with failover to force a reconnect from the pool, to reestablish load balancing or use of the master post recovery

Related

What happens if MASTER node is inaccessible by the clients in "Replicated levelDB Store" in ActiveMQ?

So the documentation to the "Replicated LevelDB Store" says:
The elected master broker node starts and accepts client connections. The other nodes go into slave mode and connect the the master and synchronize their persistent state /w it. The slave nodes do not accept client connections. All persistent operations are replicated to the connected slaves. If the master dies, the slaves with the latest update gets promoted to become the master. The failed node can then be brought back online and it will go into slave mode.
So one chosen master exist, it accepts client connections and the rest are replicated slave nodes who do not accept client connections. Fine.
So if the master dies it's all working fine - the master gets reelected, clients disconnect and they eventually connect to the new master. Awesome.
Now what happens if the master isn't dead from the perspective of Zookeeper, but it's just NOT ACCESSIBLE from clients. So a master is chosen, it's considered live(as i understand zookeeper's need to be able to connect to it to be considered available), but the actual clients can't connect to it?
Sure clients CAN connect to the other slave nodes, they just can't connect to the master. But the master won't ever be changed as it's live. Is that how it works?
Not sure i understood it right.
LevelDB support in ActiveMQ is deprecated and has been for quite some time (years) so I'd suggest not bothering with it as there is no support and plenty of open bugs that will not be fixed.
I'd suggest taking a look instead at ActiveMQ Artemis.
You understand it right, and it's a reasonable design.
Clients only commuicate to master, and slaves are just used for backup. If what you described really happens, maybe caused by network problem, then you should fix the network(or any other possible reasons).

MQTT and AWS ELB : How to make ELB _forget_ to which node a client was previously connected to?

I have a cluster of 2 RabbitMQ nodes (each running version 3.6.10 of RabbitMQ with MQTT plugin enabled) and an AWS classic load balancer in front of them. Server and clients exchange MQTT messages.
Clients (apps running on mobile devices and using Eclipse Paho client lib) connect to the load balancer which distributes connections in round-robin fashion.
When I bring down one node, say Node1, all clients that were connected to Node1 get a callback indicating connection to the broker is lost.
These clients try to reconnect but the connection attempt fails indicating the broker is not reachable.
From AWS console I can see that AWS ELB detects that Node1 is down and marks it as "OutOfService".
Connection requests from new clients are routed to the "InService" node Node2; however, connection requests from existing clients that were previously connected to Node1 always fail!
ELB is configured with idle timeout of 180 seconds. Enabling or disabling connection draining in ELB did not make any difference.
Is there any specific configuration to make ELB forget that the existing clients were connected to Node1 and allow them to connect to Node2?
I tried by adding following HA policy :
rabbitmqctl set_policy ha-mqtt "^mqtt" \ '{"ha-mode":"exactly","ha-params":2,"ha-sync-mode":"automatic"}'
With this policy in place, all queues created for MQTT clients were mirrored. Now when Node1 is down, connection attempts from existing client IDs also get routed to the other active node!
This makes me wonder what is the relationship between client IDs from MQTT clients and their connection to broker nodes? I thought mirroring of queues is necessary only to retain and access messages that were not yet acknowledged when the queue master node goes down. But I see that the clients are not even able to establish a connection!

ActiveMQ takes a long time to failover

I have 3 ActiveMQ brokers in a networked Shared File System(GlusterFS)/Master Slave configuration - all in VMs.
If the master fails the client should failover to the new master.
The issue I have is that the connection to the new master takes about 50 seconds.
Is that reasonable?
How to improve it?
My client connection looks like this
failover:(tcp://a1:61616?connectionTimeout=1000,tcp://a2:61616?connectionTimeout=1000,tcp://a3:61616?connectionTimeout=1000)?randomize=false&maxReconnectDelay=10000&backup=true"
Also when disconnecting the master by disconnecting network cable it stops and throws an exception regarding the kahaDB (which is on GlusterFS) and needs to be restarted.
Is there a workaround for this behavior so the master broker auto-restarts or is able to connect automatically once the network comes back?
The failover depends on the time the underlying file system take for releasing the file lock.
In your case, the NFS cluster is waiting 50s to detect that the first node is lost and so release the lock on the kahadb file, wich can then be taken by the seconde node.
You can customize this delay with the NFSD_V4_GRACE and NFSD_V4_LEASE parameters in the NFS server configuration file (/etc/sysconfig/nfs on redhat/centos systems).
You can also customize the kahadb lockKeepAlivePeriod, see http://activemq.apache.org/pluggable-storage-lockers.html

rabbitmq with ha req/reply

i have the following scenario which i want to fulfill:
rabbit mq must be loadbalanced (is it something which is provided by rabbitmq out of the box OR something like haproxy load balancer would work great. Which one is well loadbalanced.)
CAN haproxy directly push messages to rabbitmq (lets say a POST request coming to http://localhost:3333/redirectToRabbit gets redirected to rabbit and optionally either the ACK or RESPONSE goes back to client. Also note haproxy would load balance the request)
with HA; what the best configuration ( exchange with durable queue, durable queue or something else. NOTE: How would the messages gets redirected to some other rabbitmq instance if one of the rabbitmq instance goes down -- persisted and auto redirection to available rabbitmq )
Assuming you setup a two-node RabbitMQ cluster. Before talking about ha proxy, you need to understand the ha policies and the behavior of ha queues first. Different ha options might cause completely different behaviors of RabbitMQ message replication and node failover. RabbitMQ is so flexible, so don't expect a golden way of configuration which could meet all scenarios.
Then, since you have two nodes which could accept connections, your client could either use a loadbalancer (such as ha proxy) or to use a client driver which supports connecting to multiple nodes of a cluster. Either way will work.
When using haproxy, you have one load balancer ip. Client connects only to this load balancer ip, the load balancer forward you connection to the underlying nodes. But as long as a connection created, the client connection instance keeps talking to one of the node. When one of the node is down, if no "Health Checking" options are configured in your load balancer, client might get random connection failures. When you have "Health Checking" options configured correctly, the load balancer knows which nodes are down, so that clients will only connect to healthy nodes, which solves the issue.
When not using a load balancer and only base on client driver to connect to all the nodes, the client driver should be able to handle connection failure or health check internally and do failover/retry, etc, to ensure connections go to healthy nodes.

Redis - Tomcat Session Manager : Read from Slave

I am using redis(Redis 3.1) as session store for tomcat(Tomcat 7). To ensure high availability, there is a sentinel setup and two instances(master and slave) of redis server. The slave is configured as read-only. After running few tests and verifying the statistics, it's observerd there are no read requests sent to the slave. All the read requests are processed by the master alone.
Could you please let me know how I can make the slave serve the read requests?
You could use Redis based Tomcat Session Manager provided by Redisson. It allows to manage which type of node use for read operation (master, slave or both master and slave). Perfectly works in Sentinel/Cluster modes.