I want to create a three node RabbitMQ cluster on a single RHEL8 machine for testing purposes. I tried instructions given in RabbitMQ official guide and also tried to follow this guide.
The first node works fine and it's running. However, the second node cannot be started and throws up an error.
I used below commands as mentioned in the guide.
RABBITMQ_NODE_PORT=5672 RABBITMQ_NODENAME=rabbit rabbitmq-server -detached
RABBITMQ_NODE_PORT=5673 RABBITMQ_NODENAME=hare rabbitmq-server -detached
rabbitmqctl -n hare stop_app
This command throws up below error.
DIAGNOSTICS
attempted to contact: [hare#localhost]
hare#localhost:
connected to epmd (port 4369) on localhost
epmd reports: node 'hare' not running at all
other nodes on localhost: [rabbit]
On further inspection of logs, it seems like that this node tries to use the same ports used by the first node (e.g. MQTT port 1883).
I think I might have to use the other option of declaring /etc/rabbitmq/rabbitmq.conf. Mainly because it seems to give more options to change ports etc.
A sample config file resembling the one needed in my case or a link to a proper guide is highly appreciated.
You didn't specify, but you must have the MQTT plugin enabled for there to be a conflict on that port, correct?
The easiest work-around would be to have two configuration files specifying different ports for MQTT, AMQP and anything else. Then, use the RABBITMQ_CONFIG_FILE environment variable to point to the appropriate file:
RABBITMQ_NODE_PORT=5672 RABBITMQ_NODENAME=rabbit0 \
RABBITMQ_CONFIG_FILE=/path/to/rabbitmq-0.conf rabbitmq-server -detached
RABBITMQ_NODE_PORT=5673 RABBITMQ_NODENAME=rabbit1 \
RABBITMQ_CONFIG_FILE=/path/to/rabbitmq-1.conf rabbitmq-server -detached
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
Related
I have RabbitMQ installed on a CentOS 5.x server which I use for message passing between my programs. I've installed rabbitmqadmin following the directions on https://www.rabbitmq.com/management-cli.html and have used it on my servers in the past.
From what I can tell it looks like this particular server is misconfigured. My web-searches have failed me on trying to get more information on how to troubleshoot this issue.
The error:
[root#server ~]# python26 /usr/local/bin/rabbitmqadmin list nodes
*** Could not connect: [Errno -2] Name or service not known
[root#server ~]#
I have tried several different rabbitmqadmin commands and they give the same result. If I run the command without the extra params it displays the normal help dialog. I have this setup and working on several other servers.
Any idea on what the root issue is? If not, anyway to get more details, like verbose?
Update:
I just tried to check the version of rabbitmq and its yielding an error too:
[root#server ~]# rabbitmqctl status
Status of node rabbit#server ...
Error: unable to connect to node rabbit#server: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit#server]
rabbit#server:
* connected to epmd (port 4369) on server
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
current node details:
- node name: rabbitmqctl25451#server
- home dir: /var/lib/rabbitmq
- cookie hash: WXaeZT7XXm13naagfRX5cg==
[root#server ~]#
I'm going to see if I can find something from this... I find this weird because the server is passing messages fine and can be monitored through the web console.
Erlang version:
[root#server rabbitmq]# erl -eval 'erlang:display(erlang:system_info(otp_release)), halt().' -noshell
"R14B04"
[root#server rabbitmq]#
Rabbitmq Version:
[root#server rabbitmq]# python26 /usr/local/bin/rabbitmqadmin --version
rabbitmqadmin 3.3.5
[root#server rabbitmq]#
After much digging and frustration, I found my problem... I'm posting the solution in case anyone else has a similar experience
Previously, I found that if you setup RabbitMQ on a linux server then change the hostname that it can break some of the rabbit configuration.
The awesome part about this problem is that someone changed the name of the server from all capital letters to lowercase...
I've solve this one of two ways:
Solution 1:
Revert the host name back to the previous name. So that rabbitmq references with the appended server name work again.
Solution 2:
If you want to keep the server name change, then you can create a rabbitmq-env.conf files in /etc/rabbitmq like:
NODENAME=rabbit#OLDHOSTNAME
If you aren't sure what your previous name was, you can reference it by doing an ls in your /var/lib/rabbitmq/mnesia/ folder. You'll then see a folder that matches the nodename you need to specify.
Reference: https://www.rabbitmq.com/man/rabbitmq-env.conf.5.man.html
UPDATE:
Host name is CaSE SeNSiTIve... had someone change a hostname on me and the only difference was the case... so took a while to notice...
Yesterday I've lost a few hours with this same problem and it was in a fresh install, so the problem was that the erlang cookie from my user and root user was different than the one from rabbitmq user.
Find out the HOME for the user rabbitmq:
# cat /etc/passwd | grep rabbitmq
Check if the cookies differs from each other:
# vimdiff /var/lib/rabbitmq/.erlang.cookie ~/.erlang.cookie
If they are different, copy the cookie from rabbitmq for the user that you want to have access to the server:
# cp /var/lib/rabbitmq/.erlang.cookie ~/.erlang.cookie
References:
rabbitmqctl status says "TCP connection succeeded but Erlang distribution failed"
How Nodes (and CLI tools) Authenticate to Each Other: the Erlang Cookie
I have master-slave configuration of RabbitMQ. As two Docker containers, with dynamic internal IP (changed on every restart).
Clustering works fine on clean run, but if one of servers got restarted it cannot reconnect to the cluster:
rabbitmqctl join_cluster --ram rabbit#master
Clustering node 'rabbit#slave' with 'rabbit#master' ...
Error: {ok,already_member}
And following:
rabbitmqctl cluster_status
Cluster status of node 'rabbit#slave' ...
[{nodes,[{disc,['rabbit#slave']}]}]
says that node not in a cluster.
Only way I found it remove this node, and only then try to rejoin cluster, like:
rabbitmqctl -n rabbit#master forget_cluster_node rabbit#slave
rabbitmqctl join_cluster --ram rabbit#master
That works, but doesn't look good for me. I believe there should be better way to rejoin cluster, than forgetting and join again. I see there is a command update_cluster_nodes also, but seems that this something different, not sure if it could help.
What is correct way to rejoin cluster on container restart?
I realize that this has been opened for a year but I though I would answer just in case it might help someone.
I believe that this issue has been resolved in a recent RabbitMQ release.
I implemented a Dockerized RabbitMQ Cluster using the Rabbit management 3.6.5 image and my nodes are able to auto rejoin the cluster on container or Docker host restart.
Sorry, should be shot for having to even ask this, but wasted day on this - and feel like I've read everything there is.
I can't create a cluster on my EC2 instances (3) that are spread on three different regions. The hosts:
rabbit#ip-172-31-47-217
rabbit#ip-172-31-1-82
rabbit#ip-172-31-36-111
The initial state before trying to make the cluster:
ubuntu#ip-172-31-47-217:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-47-217' ...
[{nodes,[{disc,['rabbit#ip-172-31-47-217']}]},
{running_nodes,['rabbit#ip-172-31-47-217']},
{partitions,[]}]
ubuntu#ip-172-31-36-111:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-36-111' ...
[{nodes,[{disc,['rabbit#ip-172-31-36-111']}]},
{running_nodes,['rabbit#ip-172-31-36-111']},
{partitions,[]}]
ubuntu#ip-172-31-1-82:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-1-82' ...
[{nodes,[{disc,['rabbit#ip-172-31-1-82']}]},
{running_nodes,['rabbit#ip-172-31-1-82']},
{partitions,[]}]
When I try to check status from one server for another:
sudo rabbitmqctl status -n rabbit#ip-172-31-1-82
Status of node 'rabbit#ip-172-31-1-82' ...
Error: unable to connect to node 'rabbit#ip-172-31-1-82': nodedown
nodes in question: ['rabbit#ip-172-31-1-82']
hosts, their running nodes and ports:
- unable to connect to epmd on ip-172-31-1-82: timeout (timed out)
current node details:
- node name: 'rabbitmqctl3835#ip-172-31-36-111'
- home dir: /var/lib/rabbitmq
- cookie hash: 0tsf/OyQZI7zobmv1Ia97w==
All three servers have the same erlang cookie hash.
I can verify the host names are setup properly:
host ip-172-31-36-111
ip-172-31-36-111.us-west-2.compute.internal has address 172.31.36.111
I know the ports are open:
netstat -plten | grep beam
Because I opened all TCP and UDP at this point as a test, no change.
and finally if this would behave differently given those failures:
sudo rabbitmqctl join_cluster --ram rabbit#ip-172-31-1-82
Clustering node 'rabbit#ip-172-31-47-217' with 'rabbit#ip-172-31-1-82' ...
Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}
Please help, being driven insane by this.
The problem is that they are in different regions (presumably in EC2-classic - you didn't mention whether you were using a VPC). This means they cannot communicate via their private IPs (see e.g. Can EC2 instances in different regions communicate over their private IP addresses?)
ping 172.31.36.111
will fail from one of the other servers, for example. Pinging using the hostname probably will probably even fail on the DNS lookup.
Your options are:
Put them in separate zones in a single region (in EC2 classic, they will be able to communicate). You could also use a VPC in this case, putting the in separate subnets but allowing interconnections via appropriately set up security groups.
Set up /etc/hosts on each server to point the relevant public IPs of the other servers (you could attach elastic IPs to each server to ensure stability across server restarts). You could also set the hostname of each server for clarity. Set you your security groups to allow access on the relevant ports that rabbitmq uses. There may be security implications of doing this, since the data will be travelling over the public internet.
Set up a VPN between each server in the cluster. Amazon VPC has a VPN facility, but there are ways of setting it up yourself I think.
I think only option 1 is simplest. Option 2 has major security implications (I believe there are ways of securing the connection between the cluster servers, but they aren't documented on the rabbitmq website as far as I can tell). Option 3 is complex but probably the best option if you need multiple regions.
Note that rabbitmq clusters aren't meant to be run over wide geographical areas, since they aren't too reliable in the face of network partitions. See here: https://www.rabbitmq.com/clustering.html
I start up an infinispan cache which joins to a cluster. It is the only cache in the cluster.
Now, I connect using JMX to see what ports are being used.
I click on:
CacheManager / MyCache/ CacheManager/ Attributes
Under clusterMembers, I see [mymachine-54202]
Thinking 54202 is the port, I do both a
lsof -i udp
lsof -i tcp
I am on a mac and I can't see anything on 54202. What does 54202 correspond to then?
Thanks
It's a random number to differentiate between multiple caches running on the same box.
For more details, see http://docs.jboss.org/infinispan/5.0/apidocs/config.html#ce_global_transport
I am having server issues with getting rabbit to cluster.
I boot up two nodes on ec2.
On the the first node booted I do this.
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
I boot another node.
sudo service rabbitmq-server stop
#Copy cookie from the first server booted
sudo su - -c 'echo -n "cookie" > /var/lib/rabbitmq/.erlang.cookie'
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl cluster rabbit#server1
1) sever1 is running
2) What ports to need open? I have 22, 4369, 5672
sudo rabbitmqctl cluster rabbit#aws-rabbit-server-east-development-20121102162143
Clustering node 'rabbit#aws-rabbit-server-east-development-20121103033005' with ['rabbit#aws-rabbit-server-east-development-20121102162143'] ...
Error: {no_running_cluster_nodes,['rabbit#aws-rabbit-server-east-development-20121102162143'],
['rabbit#aws-rabbit-server-east-development-20121102162143']}
What could possibility be missing from there docs or what what am I missing?
I had a similar problem on EC2 with two windows machines. I eventually got it working but I'm not sure I did it in the correct way so there may be a better solution.
The issue I found was that the two nodes could not see each other when trying to cluster. Each time you start a Rabbit node it seemed to be assigned a port number dynamically.
This obviously makes it very difficult to know which port to open up in the security group so to solve this, I restricted the range of ports Rabbit chose from when assigning the port. I restricted this to a range of 1 port on each node so I always know which port was being assigned.
The easiest way I found to do this was by editing the sbin\rabbitmq-service.bat file.
find the line -kernel inet_default_connect_options "[{nodelay,true}]" ^
add the following two lines to the file underneath:
-kernel inet_dist_listen_min ##### ^
-kernel inet_dist_listen_max ##### ^
replacing ##### with your chosen port number.
So you should now open up the following ports:
5672 - RabbitMQ’s listening port
4369 - Erlang Port Mapper Daemon
##### - the chosen port number for the Erlang nodes to communicate via
Because Erlang does not recognise FQDNs you may need to modify the hosts file on all the servers to make sure they are all able to resolve all the Erlang node name to an IP address, e.g.
123.123.123.111 NODE1
123.123.123.222 NODE2
once this is done you should then be able to see each node from the other. you can do this by using calling the following from the command line (replacing rabbit#NODE2 with whichever node you want to see)
rabbitmqctl status -n rabbit#NODE2
Hope this give you some help, I'm no expert but found this got things working for me!