Rabbitmq join an up to date cluster - rabbitmq

I have a cluster of two nodes A (master) and B (slave). A node was the master and B node joined A node successfully. Then the A node instance went down and B node instance has now more messages in queue than A node. I restarted the A node instance and I'm trying to join B node as a slave now because it it's more up to date. However, I'm getting the following message when trying to join B node:
sudo rabbitmqctl join_cluster rabbit#bnode
{:badrpc_multi, {:EXIT, {{:function_clause, [{:gen, :do_for_proc, [{:rex, {:error, {:node_name, :short}}}, #Function<0.9801092/1 in :gen.call/4>], [file: 'gen.erl', line: 220]}, {:gen_server, :call, 3, [file: 'gen_server.erl', line: 219]}, {:rpc, :do_call, 3, [file: 'rpc.erl', line: 327]}, {:lists, :foldl, 3, [file: 'lists.erl', line: 1263]}, {:rabbit_mnesia, :discover_cluster, 1, [file: 'src/rabbit_mnesia.erl', line: 779]}, {:rabbit_mnesia, :join_cluster, 2, [file: 'src/rabbit_mnesia.erl', line: 212]}, {:rpc, :"-handle_call_call/6-fun-0-", 5, [file: 'rpc.erl', line: 197]}]}, {:gen_server, :call, [{:rex, {:error, {:node_name, :short}}}, {:call, :rabbit_mnesia, :cluster_status_from_mnesia, [], #PID<0.62.0>}, :infinity]}}}, [error: {:node_name, :short}]}
Is that the correct approach to follow?
As I've read in some other posts I tried to remove existing mnesia data from A node: sudo rm -rf /var/lib/rabbitmq/mnesia/* and even tried with reset command (although it's not what I wanted) sudo rabbitmqctl reset
Still I cannot join B node.

a) I assume both nodes share the same elrang.cookie: https://www.rabbitmq.com/clustering.html. You could try on node A:
sudo rabbitmqctl stop_app
sudo rabbitmqctl join_cluster rabbit#bnode
sudo rabbitmqctl start_app
b) Another option will be to forget node A from the cluster.
node A: sudo rabbitmqctl stop_app
node B: sudo rabbitmqctl forget_cluster_node rabbit#Anode
On node A start the app: sudo rabbitmqctl start_app.
In case you get an error (inconsistent cluster) reset A node: sudo rabbitmqctl reset; sudo rabbitmqctl start_app.

I found the answer. Instead of trying to save a not synchronized node that threw that error it's better to launch another instance (using the autoscaling group from AWS for example) that will join that node B (up to date node).
Based on that other answer:
How to set up autoscaling RabbitMQ Cluster AWS

Related

node with name "rabbit" already running after changing the Erlang cookie

I'm trying to create a two node rabbitmq cluster. I've setup the rabbitmq server successfully on both nodes. After replacing the slave's node cookie with the master's .erlang.cookie. System is unable to start rabbitmq app.
I'm executing the following commands:
nohup rabbitmq-server restart &
rabbitmqctl start_app
/sbin/service rabbitmq-server stop
nohup is generating the following logs in /var/lib/rabbitmq/nohup.log
ERROR: node with name "rabbit" already running on "ip-172-31-83-71"
ERROR: node with name "rabbit" already running on "ip-172-31-83-71"
ERROR: node with name "rabbit" already running on "ip-172-31-83-71"
Also terminal is following the following error:
Error: unable to perform an operation on node 'rabbit#ip-172-31-83-71'. Please see diagnostics information and suggestions below.
DIAGNOSTICS
===========
attempted to contact: ['rabbit#ip-172-31-83-71']
rabbit#ip-172-31-83-71:
* connected to epmd (port 4369) on ip-172-31-83-71
* epmd reports: node 'rabbit' not running at all
no other nodes on ip-172-31-83-71
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-26458-rabbit#ip-172-31-83-71'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: aoPchC2KIy7esHVGVNLP4w==
I've also tried by revert the .erlang.cookie and in this case it's working fine. Can anyone please guide me what I'm missing?

how to reinstall a dead node in rabbitmq cluster

I have a rabbitmq cluster on rabbit1, rabbit2 and rabbit3. Rabbit2 and Rabbit3 join rabbit1 cluster as RAM node. And rabbit3 is crashed. In rabbit1 and rabbit2, when checking cluster status, the following answer I got.
ubuntu#rabbit2:~$ sudo rabbitmqctl cluster_status
Cluster status of node rabbit#rabbit2
[{nodes,[{disc,[rabbit#rabbit1]},{ram,[rabbit#rabbit3,rabbit#rabbit2]}]},
{running_nodes,[rabbit#rabbit1,rabbit#rabbit2]},
{cluster_name,<<"rabbit#localhost">>},
{partitions,[]},
{alarms,[{rabbit#rabbit1,[]},{rabbit#rabbit2,[]}]}]
Now I uninstall rabbitmq3 and install again. Install rabbitmq on rabbit3 successfully.
sudo service rabbitmq-server stop
sudo rm -rf /var/lib/rabbitmq/
sudo apt-get remove rabbitmq-server -y
sudo apt-get autoremove -y
sudo apt-get install rabbitmq-server -y
After installation, I try to add rabbit3 into cluster again. First step, I check cluster status on rabbit3.
ubuntu#rabbit3:~$ sudo rabbitmqctl cluster_status
Cluster status of node rabbit#rabbit3
[{nodes,[{disc,[rabbit#rabbit3]}]},
{running_nodes,[rabbit#rabbit3]},
{cluster_name,<<"rabbit#localhost">>},
{partitions,[]},
{alarms,[{rabbit#rabbit3,[]}]}]
Then I override the cookie file
ubuntu#rabbit3:~$ sudo sh -c "echo abcdefg > /var/lib/rabbitmq/.erlang.cookie"
ubuntu#rabbit3:~$ sudo cat /var/lib/rabbitmq/.erlang.cookie
abcdefg
Check the cluster again.
ubuntu#rabbit3:~$ sudo rabbitmqctl cluster_status
Cluster status of node rabbit#rabbit3
Error: unable to connect to node rabbit#rabbit3: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit#rabbit3]
rabbit#rabbit3:
* connected to epmd (port 4369) on rabbit3
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
* suggestion: is the Erlang distribution using TLS?
current node details:
- node name: 'rabbitmq-cli-25#localhost'
- home dir: /var/lib/rabbitmq
- cookie hash: esZsDxSN6VGbi9JkMSxNZA==
rabbit#rabbit3 node cannot be connected, and cannot configure it anymore. I checked the rabbitmq installation document. It introduces how to configure cluster with happy path.
But if a node is dead, how to re-install it back.
I made a mistake during installation, in general, I should stop rabbitmq before setting cookie, like
sudo rabbitmqctl stop
sudo sh -c "echo abcd123456 > /var/lib/rabbitmq/.erlang.cookie"
sudo cat /var/lib/rabbitmq/.erlang.cookie
sudo chmod 400 /var/lib/rabbitmq/.erlang.cookie
sudo sh -c "echo abcd123456 > ~/.erlang.cookie"
sudo chmod 400 ~/.erlang.cookie

How do I start a RabbitMQ node?

I keep getting this error every time I try to do something with RabbitMQ:
attempted to contact: [fdbvhost#FORTE]
fdbvhost#FORTE:
* connected to epmd (port 4369) on FORTE
* epmd reports: node 'fdbvhost' not running at all
no other nodes on FORTE
* suggestion: start the node
current node details:
- node name: 'rabbitmq-cli-54#FORTE'
- home dir: C:\Users\Jesus
- cookie hash: iuRlQy0F81aBpoY9aQqAzw==
This is the output I get when I run rabbitmqctl -n fdbvhost status or /rabbitmqctl -n fdbvhost list_vhosts.
I've tried rabbitmqctl -n fdbvhost start which gives me the following output:
Error: could not recognise command
Usage:
rabbitmqctl [-n <node>] [-t <timeout>] [-q] <command> [<command options>]
...
So this doesn't start it. I cannot find anything about starting a node in the documentation. How do I actually start my node/vhost?
Try running the following command from the RabbitMQ's installation sbin directory
rabbitmq-server start -detached
This should start the broker node if it was stopped for some reason.
Check if you have RabbitMQ installed as a service in the /etc/init.d/ folder
sudo su # might be needed
cd /etc/init.d/
ls . | grep rabbit
The output should be rabbitmq-server
If that's the case, then, try restarting your service with:
sudo service rabbitmq-server restart
For mac users
To Start
brew services start rabbitmq
To Restart
brew services restart rabbitmq
To Stop
brew services stop rabbitmq
To Know the status of the server
brew services info rabbitmq

Rabbitmq Clustering with three nodes

I am trying to do clustering on RABBITMQ. I have added 2 nodes but unable to add 3rd one.i have clustered rabbit#node1 and rabbit#node2. Now I am trying to cluster rabbit#node3 with rabbit#node1.
Here is what I am trying to do
rabbitmqctl join_cluster rabbit#node1
Clustering node rabbit#node3 with rabbit#node1 ...
Error: mnesia_not_running
Is there any solution that how to add a third node in cluster? Or any solution for the Error: mnesia_not_running
When joining cluster, target node application should be started, while source (current) node application should be stopped. Application stopped and started with rabbitmqctl stop_app/rabbitmqctl start_app.
Maybe you have stopped application on rabbit#node1, while joining it to cluster, in that case you should to run rabbitmqctl start_app on rabbit#node1, or rabbitmqctl -n rabbit#node1 start_app to be able to join it's cluster. Or you can join rabbit#node2 cluster and start app later.
To have working cluster you should start application on all nodes after joining.
It happens when the target node's app is stopped. When joining a node to a rabbitmq cluster, only the source node(the node which you are trying to link) should be stopped.
master node:
rabbitmqctl start_app
on the current node:
rabbitmqctl stop_app
rabbitmqctl join_cluster rabbit#node1

Error with rabbit-mq server

I am trying to setup OpenStack on Ubuntu 12.04 using devstack. Now, the error I am getting is:
Setting up rabbitmq-server (2.7.1-0ubuntu4) ...
Starting rabbitmq-server: FAILED - check /var/log/rabbitmq/startup_{log, _err}
rabbitmq-server.
invoke-rc.d: initscript rabbitmq-server, action "start" failed.
dpkg: error processing rabbitmq-server (--configure):
subprocess installed post-installation script returned error exit status 1
No apport report written because MaxReports is reached already
Errors were encountered while processing:
rabbitmq-server
E: Sub-process /usr/bin/dpkg returned an error code (1)
++ err_trap
++ local r=100
++ set +o xtrace
stack.sh failed
Any idea why am I getting this error?
I had this issue twice, when either hostname or ip address in the hosts file didn't match.
Therefore, check that you provide the correct ip address and hostname in the /etc/hosts file
Run sudo cat /etc/hostname to see your hostname
Output:
yoursite
Run sudo nano /etc/hosts
File contains:
127.0.0.1 yoursite
As you see from cat /etc/hostname, hostname is the same as in the /etc/hosts:
Run sudo rabbitmq-server start to start the rabbitmq-server
Try deleting the folder /var/lib/rabbitmq and re-running ./stack.sh
If that doesn't work either, run the following after stach.sh fails:
chown -R rabbitmq:rabbitmq /var/lib/rabbitmq
chown -R rabbitmq:rabbitmq /var/log/rabbitmq
service rabbitmq-server restart
and check the status of rabbitmq using "rabbitmqctl status"
Similar thing happen to me. Rabbit depends on being able to resolve a hostname, run this:
echo "127.0.0.1 $(hostname -s)" | sudo tee -a /etc/hosts
This way works for me.
First go to
sudo vim /etc/hosts
and set
127.0.0.1 <hostname>
then open firewall
sudo rabbitmq-plugins enable rabbitmq_management
sudo service rabbitmq-server restart
For a clean environment, this will not happen. You must run devstack for several times, and one of them failed but you didn't get it cleaned.
run command pf -ef | grep rabbitmq, kill all rabbitmq processes. then it would be fine to run ./stack.sh
it is highly recommended to run ./unstack.sh && ./clean.sh before ./stack.sh
Just to be sure, take a look to your local network
ip add
If there's no lo network, then you should enable it:
ifconfig lo up
Then restart the server again and let's see if it works again now
systemctl start rabbitmq-server
I had the same problem though my /etc/hosts and DNS were OK. I suspect that SystemV init script was started too early when the network was not ready yet. I rewrote the startup script to systemd on CentOS 7.8 and it seems to work well now.
[Unit]
Description=RabbitMQ
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
RuntimeDirectory=rabbitmq
PrivateTmp=true
Restart=on-failure
RestartSec=10
WorkingDirectory=/opt/data/rabbitmq/
User=rabbitmq
Group=rabbitmq
ExecStart=/opt/app/rabbitmq/default/sbin/rabbitmq-server
ExecStop=/opt/app/rabbitmq/default/sbin/rabbitmqctl stop
ExecStop=/bin/sh -c "while ps -p $MAINPID >/dev/null 2>&1; do sleep 1; done"
StandardOutput=journal
StandardError=inherit
[Install]
WantedBy=multi-user.target