RabbitMQ clusturing `join_cluster`

RabbitMQ clusturing `join_cluster` - rabbitmq

I'm setting up a RabbitMQ cluster reading from its docs.
While setting it up, it joins Machine2 with Machine1 via command rabbitmqctl join_cluster rabbit#rabbit1. Now what is rabbit#rabbit1?
I know its user#hostname, but when I fire this command, it says Error: {cannot_discover_cluster,"Cannot cluster node with itself"}.
When I type-in the IP instead of hostname, it says Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}.
I've also added IP rabbit1 in the /etc/hosts file as well.
What exactly am I missing here?

Rabbit#rabbit1,
In this case the rabbit1 is the name of the computer/host where the rabbitmq server is present.
You can just use the name of the server like Rabbit#name_of_the_server where you want to do clustering with.
You can also see what is the name of the current rabbitmq host:
rabbitmqctl cluster_status
That will give you the name I mean host name.
And you need to make sure that before you do clustering you need to stop the rabbitmq server on that machine and then do clustering and then restart the rabbitmq node.
Check this link:
https://www.rabbitmq.com/clustering.html

"Cannot cluster node with itself" is true. You have to change cluster name for it to join in. Use set_cluster_name to change the cluster name on other nodes first, and then come back to this node and join it to newly named cluster. For example,
On node2,
`rabbitmqctl set_cluster_name rabbit#new`
Back on node1,
`rabbitmqctl stop_app`
`rabbitmqctl reset`
`rabbitmqctl join_cluster rabbit#new`
`rabbitmqctl start_app`
Quite simple way.

you are trying to join one to itself.
You have two possible errors:
error in /etc/hosts ( wrong alias )
you actual try to join the rabbit#rabbit1 to rabbit#rabbit1

Related

RabbitMQ cluster on a single machine

I want to create a three node RabbitMQ cluster on a single RHEL8 machine for testing purposes. I tried instructions given in RabbitMQ official guide and also tried to follow this guide.
The first node works fine and it's running. However, the second node cannot be started and throws up an error.
I used below commands as mentioned in the guide.
RABBITMQ_NODE_PORT=5672 RABBITMQ_NODENAME=rabbit rabbitmq-server -detached
RABBITMQ_NODE_PORT=5673 RABBITMQ_NODENAME=hare rabbitmq-server -detached
rabbitmqctl -n hare stop_app
This command throws up below error.
DIAGNOSTICS
attempted to contact: [hare#localhost]
hare#localhost:
connected to epmd (port 4369) on localhost
epmd reports: node 'hare' not running at all
other nodes on localhost: [rabbit]
On further inspection of logs, it seems like that this node tries to use the same ports used by the first node (e.g. MQTT port 1883).
I think I might have to use the other option of declaring /etc/rabbitmq/rabbitmq.conf. Mainly because it seems to give more options to change ports etc.
A sample config file resembling the one needed in my case or a link to a proper guide is highly appreciated.

You didn't specify, but you must have the MQTT plugin enabled for there to be a conflict on that port, correct?
The easiest work-around would be to have two configuration files specifying different ports for MQTT, AMQP and anything else. Then, use the RABBITMQ_CONFIG_FILE environment variable to point to the appropriate file:
RABBITMQ_NODE_PORT=5672 RABBITMQ_NODENAME=rabbit0 \
RABBITMQ_CONFIG_FILE=/path/to/rabbitmq-0.conf rabbitmq-server -detached
RABBITMQ_NODE_PORT=5673 RABBITMQ_NODENAME=rabbit1 \
RABBITMQ_CONFIG_FILE=/path/to/rabbitmq-1.conf rabbitmq-server -detached
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

rabbtimqadmin - Could not connect: [Errno -2] Name or service not known

I have RabbitMQ installed on a CentOS 5.x server which I use for message passing between my programs. I've installed rabbitmqadmin following the directions on https://www.rabbitmq.com/management-cli.html and have used it on my servers in the past.
From what I can tell it looks like this particular server is misconfigured. My web-searches have failed me on trying to get more information on how to troubleshoot this issue.
The error:
[root#server ~]# python26 /usr/local/bin/rabbitmqadmin list nodes
*** Could not connect: [Errno -2] Name or service not known
[root#server ~]#
I have tried several different rabbitmqadmin commands and they give the same result. If I run the command without the extra params it displays the normal help dialog. I have this setup and working on several other servers.
Any idea on what the root issue is? If not, anyway to get more details, like verbose?
Update:
I just tried to check the version of rabbitmq and its yielding an error too:
[root#server ~]# rabbitmqctl status
Status of node rabbit#server ...
Error: unable to connect to node rabbit#server: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit#server]
rabbit#server:
* connected to epmd (port 4369) on server
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
current node details:
- node name: rabbitmqctl25451#server
- home dir: /var/lib/rabbitmq
- cookie hash: WXaeZT7XXm13naagfRX5cg==
[root#server ~]#
I'm going to see if I can find something from this... I find this weird because the server is passing messages fine and can be monitored through the web console.
Erlang version:
[root#server rabbitmq]# erl -eval 'erlang:display(erlang:system_info(otp_release)), halt().' -noshell
"R14B04"
[root#server rabbitmq]#
Rabbitmq Version:
[root#server rabbitmq]# python26 /usr/local/bin/rabbitmqadmin --version
rabbitmqadmin 3.3.5
[root#server rabbitmq]#

After much digging and frustration, I found my problem... I'm posting the solution in case anyone else has a similar experience
Previously, I found that if you setup RabbitMQ on a linux server then change the hostname that it can break some of the rabbit configuration.
The awesome part about this problem is that someone changed the name of the server from all capital letters to lowercase...
I've solve this one of two ways:
Solution 1:
Revert the host name back to the previous name. So that rabbitmq references with the appended server name work again.
Solution 2:
If you want to keep the server name change, then you can create a rabbitmq-env.conf files in /etc/rabbitmq like:
NODENAME=rabbit#OLDHOSTNAME
If you aren't sure what your previous name was, you can reference it by doing an ls in your /var/lib/rabbitmq/mnesia/ folder. You'll then see a folder that matches the nodename you need to specify.
Reference: https://www.rabbitmq.com/man/rabbitmq-env.conf.5.man.html
UPDATE:
Host name is CaSE SeNSiTIve... had someone change a hostname on me and the only difference was the case... so took a while to notice...

Yesterday I've lost a few hours with this same problem and it was in a fresh install, so the problem was that the erlang cookie from my user and root user was different than the one from rabbitmq user.
Find out the HOME for the user rabbitmq:
# cat /etc/passwd | grep rabbitmq
Check if the cookies differs from each other:
# vimdiff /var/lib/rabbitmq/.erlang.cookie ~/.erlang.cookie
If they are different, copy the cookie from rabbitmq for the user that you want to have access to the server:
# cp /var/lib/rabbitmq/.erlang.cookie ~/.erlang.cookie
References:
rabbitmqctl status says "TCP connection succeeded but Erlang distribution failed"
How Nodes (and CLI tools) Authenticate to Each Other: the Erlang Cookie

Auto reconnect to RabbitMQ cluster after server restart

I have master-slave configuration of RabbitMQ. As two Docker containers, with dynamic internal IP (changed on every restart).
Clustering works fine on clean run, but if one of servers got restarted it cannot reconnect to the cluster:
rabbitmqctl join_cluster --ram rabbit#master
Clustering node 'rabbit#slave' with 'rabbit#master' ...
Error: {ok,already_member}
And following:
rabbitmqctl cluster_status
Cluster status of node 'rabbit#slave' ...
[{nodes,[{disc,['rabbit#slave']}]}]
says that node not in a cluster.
Only way I found it remove this node, and only then try to rejoin cluster, like:
rabbitmqctl -n rabbit#master forget_cluster_node rabbit#slave
rabbitmqctl join_cluster --ram rabbit#master
That works, but doesn't look good for me. I believe there should be better way to rejoin cluster, than forgetting and join again. I see there is a command update_cluster_nodes also, but seems that this something different, not sure if it could help.
What is correct way to rejoin cluster on container restart?

I realize that this has been opened for a year but I though I would answer just in case it might help someone.
I believe that this issue has been resolved in a recent RabbitMQ release.
I implemented a Dockerized RabbitMQ Cluster using the Rabbit management 3.6.5 image and my nodes are able to auto rejoin the cluster on container or Docker host restart.

RabbitMQ - AWS EC2 Clustering hell

Sorry, should be shot for having to even ask this, but wasted day on this - and feel like I've read everything there is.
I can't create a cluster on my EC2 instances (3) that are spread on three different regions. The hosts:
rabbit#ip-172-31-47-217
rabbit#ip-172-31-1-82
rabbit#ip-172-31-36-111
The initial state before trying to make the cluster:
ubuntu#ip-172-31-47-217:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-47-217' ...
[{nodes,[{disc,['rabbit#ip-172-31-47-217']}]},
{running_nodes,['rabbit#ip-172-31-47-217']},
{partitions,[]}]
ubuntu#ip-172-31-36-111:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-36-111' ...
[{nodes,[{disc,['rabbit#ip-172-31-36-111']}]},
{running_nodes,['rabbit#ip-172-31-36-111']},
{partitions,[]}]
ubuntu#ip-172-31-1-82:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-1-82' ...
[{nodes,[{disc,['rabbit#ip-172-31-1-82']}]},
{running_nodes,['rabbit#ip-172-31-1-82']},
{partitions,[]}]
When I try to check status from one server for another:
sudo rabbitmqctl status -n rabbit#ip-172-31-1-82
Status of node 'rabbit#ip-172-31-1-82' ...
Error: unable to connect to node 'rabbit#ip-172-31-1-82': nodedown
nodes in question: ['rabbit#ip-172-31-1-82']
hosts, their running nodes and ports:
- unable to connect to epmd on ip-172-31-1-82: timeout (timed out)
current node details:
- node name: 'rabbitmqctl3835#ip-172-31-36-111'
- home dir: /var/lib/rabbitmq
- cookie hash: 0tsf/OyQZI7zobmv1Ia97w==
All three servers have the same erlang cookie hash.
I can verify the host names are setup properly:
host ip-172-31-36-111
ip-172-31-36-111.us-west-2.compute.internal has address 172.31.36.111
I know the ports are open:
netstat -plten | grep beam
Because I opened all TCP and UDP at this point as a test, no change.
and finally if this would behave differently given those failures:
sudo rabbitmqctl join_cluster --ram rabbit#ip-172-31-1-82
Clustering node 'rabbit#ip-172-31-47-217' with 'rabbit#ip-172-31-1-82' ...
Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}
Please help, being driven insane by this.

The problem is that they are in different regions (presumably in EC2-classic - you didn't mention whether you were using a VPC). This means they cannot communicate via their private IPs (see e.g. Can EC2 instances in different regions communicate over their private IP addresses?)
ping 172.31.36.111
will fail from one of the other servers, for example. Pinging using the hostname probably will probably even fail on the DNS lookup.
Your options are:
Put them in separate zones in a single region (in EC2 classic, they will be able to communicate). You could also use a VPC in this case, putting the in separate subnets but allowing interconnections via appropriately set up security groups.
Set up /etc/hosts on each server to point the relevant public IPs of the other servers (you could attach elastic IPs to each server to ensure stability across server restarts). You could also set the hostname of each server for clarity. Set you your security groups to allow access on the relevant ports that rabbitmq uses. There may be security implications of doing this, since the data will be travelling over the public internet.
Set up a VPN between each server in the cluster. Amazon VPC has a VPN facility, but there are ways of setting it up yourself I think.
I think only option 1 is simplest. Option 2 has major security implications (I believe there are ways of securing the connection between the cluster servers, but they aren't documented on the rabbitmq website as far as I can tell). Option 3 is complex but probably the best option if you need multiple regions.
Note that rabbitmq clusters aren't meant to be run over wide geographical areas, since they aren't too reliable in the face of network partitions. See here: https://www.rabbitmq.com/clustering.html

Rabbit will not cluster on ec2

I am having server issues with getting rabbit to cluster.
I boot up two nodes on ec2.
On the the first node booted I do this.
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
I boot another node.
sudo service rabbitmq-server stop
#Copy cookie from the first server booted
sudo su - -c 'echo -n "cookie" > /var/lib/rabbitmq/.erlang.cookie'
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl cluster rabbit#server1
1) sever1 is running
2) What ports to need open? I have 22, 4369, 5672
sudo rabbitmqctl cluster rabbit#aws-rabbit-server-east-development-20121102162143
Clustering node 'rabbit#aws-rabbit-server-east-development-20121103033005' with ['rabbit#aws-rabbit-server-east-development-20121102162143'] ...
Error: {no_running_cluster_nodes,['rabbit#aws-rabbit-server-east-development-20121102162143'],
['rabbit#aws-rabbit-server-east-development-20121102162143']}
What could possibility be missing from there docs or what what am I missing?

I had a similar problem on EC2 with two windows machines. I eventually got it working but I'm not sure I did it in the correct way so there may be a better solution.
The issue I found was that the two nodes could not see each other when trying to cluster. Each time you start a Rabbit node it seemed to be assigned a port number dynamically.
This obviously makes it very difficult to know which port to open up in the security group so to solve this, I restricted the range of ports Rabbit chose from when assigning the port. I restricted this to a range of 1 port on each node so I always know which port was being assigned.
The easiest way I found to do this was by editing the sbin\rabbitmq-service.bat file.
find the line -kernel inet_default_connect_options "[{nodelay,true}]" ^
add the following two lines to the file underneath:
-kernel inet_dist_listen_min ##### ^
-kernel inet_dist_listen_max ##### ^
replacing ##### with your chosen port number.
So you should now open up the following ports:
5672 - RabbitMQ’s listening port
4369 - Erlang Port Mapper Daemon
##### - the chosen port number for the Erlang nodes to communicate via
Because Erlang does not recognise FQDNs you may need to modify the hosts file on all the servers to make sure they are all able to resolve all the Erlang node name to an IP address, e.g.
123.123.123.111 NODE1
123.123.123.222 NODE2
once this is done you should then be able to see each node from the other. you can do this by using calling the following from the command line (replacing rabbit#NODE2 with whichever node you want to see)
rabbitmqctl status -n rabbit#NODE2
Hope this give you some help, I'm no expert but found this got things working for me!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas