I am setting up rabbitMQ cluster on 2 nodes -- node1, node2. and try to make node2 to join node1's cluster
what I did is following :
1, install rabbitMQ (and Erlang) seperately on node1 and node2,
2, "rabbitmqctl stop_app" on node2, delete the .erlang.cookie and then copied .erlang.cookie from node1 to node2
3, "rabbitmqctl join_cluster --ram rabbit#node1", now I have connection error, "unable to connect to node rabbit#node2", cookie issue.
if I copy back the old .erlang.cookie generated by node2, I will have connection error to rabbit#node1, (which make sense, since I am supposed to copy node1's cookie to node2).
anything I am doing wrong here...
Thanks
In case you install the nodes on different machines you must make sure the machines are reachable.
in Linux:
1) update /etc/hosts
on slave:
on master:
2) open tcp ports 1-20000 (or 1-65535) between the nodes
This is usually the problem!
good luck!
Related
I understand that a fail-over in the non-cluster replication setup works only by manual intervention.
I've set up the following non-cluster replication (that is, with total 3 nodes):
node1 (master)
/ \
node2 node3
(slave1) (slave2)
Let's say that node1 is broken.
So, I issue "replicaof no one" on node2, trying to make it as the new master. (For this, Redis restart is NOT required, right?)
In this example, is it assumed that I need to make the following changes, all manually?
1. fix and reconfigure node1 to make it as a slave of node2 (new master)? - Redis restart is indeed required.
2. reconfigure node3 to make it as a slave of node2 (new master) instead of node1 (original master)? - Redis restart is also required.
I just want to verify to see if I understand it correctly.
Thank you in advance!
When connecting to a redis cluster with ioredis (https://github.com/luin/ioredis) you only need to specify one node e.g. with a three node cluster
127.0.0.1:7000
127.0.0.1:7001
127.0.0.1:7002
You can connect using simply:
new Redis.Cluster([{
port: 7000,
host: '127.0.0.1'
}])
If the :7000 node dies and you replace it with a different node, doing something like:
redis-trib.rb call 127.0.0.1:7001 cluster forget [node_id of :7000]
redis-trib.rb add-node 127.0.0.1:7003 127.0.0.1:7001
redis-trib.rb fix 127.0.0.1:7001
Will ioredis be able to continue working (accepting that the data from the :7000 is lost), will it ever need to be able to contact 127.0.0.1:7000 again or is that only for the initial connection?
From my experiments it does seem that this scenario works and the answer to my question is yes, but I want to check that this is expected and a supported situation.
When connecting to a cluster, ioredis will ask the :7000 for the node list of the cluster, and after that ioredis is able to discover the new node and handle the failover. So, the answer is yes if the :7000 dies after the node list being fetched.
I'm setting up a RabbitMQ cluster reading from its docs.
While setting it up, it joins Machine2 with Machine1 via command rabbitmqctl join_cluster rabbit#rabbit1. Now what is rabbit#rabbit1?
I know its user#hostname, but when I fire this command, it says Error: {cannot_discover_cluster,"Cannot cluster node with itself"}.
When I type-in the IP instead of hostname, it says Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}.
I've also added IP rabbit1 in the /etc/hosts file as well.
What exactly am I missing here?
Rabbit#rabbit1,
In this case the rabbit1 is the name of the computer/host where the rabbitmq server is present.
You can just use the name of the server like Rabbit#name_of_the_server where you want to do clustering with.
You can also see what is the name of the current rabbitmq host:
rabbitmqctl cluster_status
That will give you the name I mean host name.
And you need to make sure that before you do clustering you need to stop the rabbitmq server on that machine and then do clustering and then restart the rabbitmq node.
Check this link:
https://www.rabbitmq.com/clustering.html
"Cannot cluster node with itself" is true. You have to change cluster name for it to join in. Use set_cluster_name to change the cluster name on other nodes first, and then come back to this node and join it to newly named cluster. For example,
On node2,
`rabbitmqctl set_cluster_name rabbit#new`
Back on node1,
`rabbitmqctl stop_app`
`rabbitmqctl reset`
`rabbitmqctl join_cluster rabbit#new`
`rabbitmqctl start_app`
Quite simple way.
you are trying to join one to itself.
You have two possible errors:
error in /etc/hosts ( wrong alias )
you actual try to join the rabbit#rabbit1 to rabbit#rabbit1
Suppose I have a redis cluster with nodes 10.0.0.1, 10.0.0.2, 10.0.0.3 and 10.0.0.4, which I'm using as a cache.
Then, for whatever reason, node 10.0.0.4 fails and goes down. This brings down the entire cluster:
2713:M 13 Apr 21:07:52.415 * FAIL message received from [id1] about [id2]
2713:M 13 Apr 21:07:52.415 # Cluster state changed: fail
Which causes any query to be shut down with "CLUSTERDOWN The cluster is down".
However, since I'm using the cluster as a cache, I don't really care if a node goes down. A key can get resharded to a different node and lose its contents without affecting my application.
Is there a way to set up such an automated resharding?
I found something close enough to what I need.
By setting cluster-require-full-coverage to "no", the rest of the cluster will continue to respond to queries, although the client needs to handle the possibility of being redirected to a failing node.
Then I can replace the broken node by running:
redis-trib.rb call 10.0.0.1:6379 cluster forget [broken_node_id]
redis-trib.rb add-node 10.0.0.5:6379 10.0.0.1:6379
redis-trib.rb fix 10.0.0.1:6379
Where 10.0.0.5:6379 is the node that will replace the broken one.
By assuming you have only master nodes in your current cluster, you will definitely get cluster down error because there is no replica of down master and Redis thinks cluster is not in safe and triggers an error.
Solution
Create a new node (Create redis.conf with desired parameters.)
Join that node to cluster
redis-trib.rb add-node 127.0.0.1:6379 EXISTING_MASTER_IP:EXISTING_MASTER_PORT
Make node slave of 10.0.0.4
redis-cli -p 6379 cluster replicate NODE_ID_OF_TARGET_MASTER
To Test
First be sure, cluster is in good shape.(All slots are covered and nodes are agreed about configurations.)
redis-trib.rb check 127.0.0.1:6379 (On any master)
Kill process of 10.0.0.4
Wait Slave to be new master.(It happens quickly. Slots assigned to 10.0.0.4 will be resharded automatically to Slave.)
Check cluster and be sure all slots are moved new master
redis-trib.rb check 127.0.0.1:6379 (On any master)
No manual actions needed. Additionally, if you have more slaves in cluster they may be promoted as new masters of other masters as well. (e.g. You have a setup of 3 master, 3 slaves. Master1 goes down, Slave1 becomes new master. Slave1 goes down, Slave1 can be new master as Master1.)
I have master-slave configuration of RabbitMQ. As two Docker containers, with dynamic internal IP (changed on every restart).
Clustering works fine on clean run, but if one of servers got restarted it cannot reconnect to the cluster:
rabbitmqctl join_cluster --ram rabbit#master
Clustering node 'rabbit#slave' with 'rabbit#master' ...
Error: {ok,already_member}
And following:
rabbitmqctl cluster_status
Cluster status of node 'rabbit#slave' ...
[{nodes,[{disc,['rabbit#slave']}]}]
says that node not in a cluster.
Only way I found it remove this node, and only then try to rejoin cluster, like:
rabbitmqctl -n rabbit#master forget_cluster_node rabbit#slave
rabbitmqctl join_cluster --ram rabbit#master
That works, but doesn't look good for me. I believe there should be better way to rejoin cluster, than forgetting and join again. I see there is a command update_cluster_nodes also, but seems that this something different, not sure if it could help.
What is correct way to rejoin cluster on container restart?
I realize that this has been opened for a year but I though I would answer just in case it might help someone.
I believe that this issue has been resolved in a recent RabbitMQ release.
I implemented a Dockerized RabbitMQ Cluster using the Rabbit management 3.6.5 image and my nodes are able to auto rejoin the cluster on container or Docker host restart.