More than one node on the same host? - rabbitmq

I've been following the RabbitMQ guide on clustering, I'm using RabbitMQ version 2.8.7 and I'm attempting to launch 2 nodes on the same host. I'm launching both nodes the same way, using the following commands...
RABBITMQ_NODE_PORT=5674 RABBITMQ_PID_FILE='/var/log/rabbitmq/rabbit-disc' RABBITMQ_NODENAME=rabbit-disc rabbitmq-server -detached
RABBITMQ_NODE_PORT=5673 RABBITMQ_PID_FILE='/var/log/rabbitmq/rabbit-ram' RABBITMQ_NODENAME=rabbit-ram rabbitmq-server -detached
I then attempt to add the second node to a cluster with the first one by stopping it, using the following command...
sudo rabbitmqctl -n rabbit-ram stop_app
However, rather than stopping it simply hangs on...
Stopping node 'rabbit-ram#test-01' ...
It never completes stopping the node. I've looked at both the log files and pid output itself and neither throw any error or offer any hints as to why the process is locking when I try and stop or issue it any command for that matter.
I've also tried providing completely different values for other arguments in the start commands including RABBITMQ_MNESIA_BASE thinking there may be some locking issue but that doesn't solve anything.
I've got the following plugins installed:
[e] amqp_client 2.8.7
[e] erlando 2.8.7
[e] mochiweb 2.3.1-rmq2.8.7-gitd541e9a
[E] rabbitmq_management 2.8.7
[e] rabbitmq_management_agent 2.8.7
[e] rabbitmq_mochiweb 2.8.7
[E] rabbitmq_shovel 2.8.7
[E] rabbitmq_shovel_management 2.8.7
[e] webmachine 1.9.1-rmq2.8.7-git52e62bc
Any help on figuring out why the locking is occurring and how to overcome it would be greatly appreciated.

It appears running the rabbitmq_management plugin and its dependencies causes the issue. Running multiple nodes with it disabled isn't a problem, however when I enable it on its own it enables all of the following too...
* mochiweb-2.3.1-rmq2.8.7-gitd541e9a
* rabbitmq_management_agent-2.8.7
* rabbitmq_mochiweb-2.8.7
* webmachine-1.9.1-rmq2.8.7-git52e62bc
These appear to be causing a clash. I'd assume because they web view is trying to launch on every node when it's already taken up on the original node. I could dig around the configs to have multiple webviews but it's not needed. Disabling these plugins after my first node launches is a sufficient fix for me.

Related

can't start rabbitmq-server after installation

I'm trying to use rabbitmq for a django tutorial but when I want to start the server I get this error:
~$ sudo rabbitmq-server
Configuring logger redirection
14:49:57.041 [error]
14:49:57.044 [error] BOOT FAILED
BOOT FAILED
14:49:57.044 [error] ===========
===========
14:49:57.044 [error] ERROR: could not bind to distribution port 25672, it is in use by another node: rabbit#wss
ERROR: could not bind to distribution port 25672, it is in use by another node: rabbit#wss
14:49:57.045 [error]
14:49:58.046 [error] Supervisor rabbit_prelaunch_sup had child prelaunch started with rabbit_prelaunch:run_prelaunch_first_phase() at undefined exit with reason {dist_port_already_used,25672,"rabbit","wss"} in context start_error
14:49:58.046 [error] CRASH REPORT Process <0.153.0> with 0 neighbours exited with reason: {{shutdown,{failed_to_start_child,prelaunch,{dist_port_already_used,25672,"rabbit","wss"}}},{rabbit_prelaunch_app,start,[normal,[]]}} in application_master:init/4 line 138
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbitmq_prelaunch,{{shutdown,{failed_to_start_child,prelaunch,{dist_port_already_used,25672,\"rabbit\",\"wss\"}}},{rabbit_prelaunch_app,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbitmq_prelaunch,{{shutdown,{failed_to_start_child,prelaunch,{dist_port_already_used,25672,"rabbit","wss"}}},{rabbit_prelau
Crash dump is being written to: erl_crash.dump...done
I've searched for port to see that if it's in use or not and I used lsof -i :25672 and I get nothing.
I don't know too much about these things so if you need anything please tell me.
Try:
sudo lsof -i :25672
sudo kill <PID>
sudo rabbitmq-server
Where <PID> is the process ID that is occupying port 25672
I have encountered this issue. I figured out that this issue is coming because the rabbitmq-server is already running on the machine.
I have used the following command
rabbitmqctl.bat status to know the status of the rabbitmq-server. This helped me to know if the server is up or down.
If it is up, this could the reason you are getting the error that you have specified in your post.
You can issue the following command to make the server down
rabbitmqctl.bat stop
Now you can try starting the rabbitmq-server by issuing the following command
rabbitmq-server start
Note that I am using Windows. And I have executed these commands by pointing the command prompt to C:\Program Files\RabbitMQ\rabbitmq_server-3.8.14\sbin as my rabbitmq installation directory is C:\Program Files\RabbitMQ\rabbitmq_server-3.8.14.
I have encountered this before. Here is what caused it and how I fixed it:
This is one of those commands which requires the magic word sudo (i.e it needs a superuser privilege).
If you forget to add sudo to the command, it begins the process but later fails when it hits a superuser-only roadblock. This leaves you with an incomplete process. Now when you decide to add sudo, it attempts the same process again but finds out that someone without the right privilege has made a mess or is still messing around.
Then the solution will be to cancel out whatever the first command has started and try again.
sudo lsof -i :25672
This list out details about the port 25672
You will see the PID (process ID) e.g 1301
Then stop the process on that port with:
sudo kill <PID>
for example, sudo kill 1301
And make sure you are killing the right process if not you may get into trouble.
Now, retry the command with sudo:
sudo rabbitmq-server
ALSO,
In most cases, this error occurs because without deliberately stopping the rabbitmq-server, it always keeps running even after you restart you system.
another way to stop rabitmq server windows+R then type "services.msc" and then find for RabitMq.slelect and stop from left top corner.
Then re run your rabitmq server.
-Hi guys, I am putting up an answer that can help Googlers to run multiple rabbitmq-server on the same machine. Trying to achieve the latter, I ran into a similar error reported in the first place and solved that by defining:
export RABBITMQ_DIST_PORT=anything_other_than_25672
as stated in the documentation:
https://www.rabbitmq.com/networking.html#epmd-inet-dist-port-range
if you are using windows go to task manager and stop rabbitmq from running...
then reload the rabbitmq-server
For Linux others answered but in Windows you should press Ctrl+Alt+delete and select task management and in that end proccess that depends on erlang.
Note that it requires Administrator previlage.
Now enter this command to start rabbitmq-server:
rabbitmq-server start
Every time you restart your computer you should do these steps.For prevent do them again you should stop rabbitmq service from startup services.
went through same problem in windows, it is already running after installation as a service
so just enable the plugins from the rabbitmq commandline by entering the code as
rabbitmq-plugins enable management_plugin
than go to the localhost:15672 and good to go.
This means that your port 25672 is already in use
try: -
sudo lsof -i :25672
sudo kill <PID>
and now start your rabbitmq server using
sudo rabbitmq-server

RABBITMQ - Applying Plugin Failed

EDIT: After setting the rabbitmq variables up in System Environment Variables and trying another un/reinstall the issue is resolved.
The following plugins have been enabled: rabbitmq_shovel
Applying plugin configuration to rabbit#MSGTEST01... started 1 plugin.
END EDIT
c:\RabbitMQ\rabbitmq_server-3.6.12\sbin>rabbitmq-plugins enable
rabbitmq_shovel rabbitmq_shovel_management Plugin configuration
unchanged.
Applying plugin configuration to rabbit#M... failed. Error:
{enabled_plugins_mismatch,"c:\Users\\AppData\Roaming\RabbitMQ
\ENABLE~1",
"c:\RabbitMQ\ENABLE~1"}
I set the following then reinstalled the service:
set RABBITMQ_BASE=c:\RabbitMQ
set RABBITMQ_CONFIG_FILE=c:\RabbitMQ\rabbitmq
set RABBITMQ_LOG_BASE=c:\RabbitMQ\logs
set RABBITMQ_MNESIA_BASE=c:\RabbitMQ\db
set RABBITMQ_ENABLED_PLUGINS_FILE=c:\RabbitMQ\enabled_plugins
Why is it still looking in my roaming profile for anything? Moreover, ENABLE~1 doesn't look like a valid filename.
I've tried blowing away my roaming profile data, RabbitMQ recreates the files.
I've tried copying my C:\RabbitMQ\enables_plugins to roaming, same thing.
Tried the reverse, same thing.
I've tried uninstalling and resintalling the service, same thing.
I'm able to enable the management ui after install, but not rabbitmq_shovel and cannot figure out what the issue is.
Again, this works after install:
rabbitmq-plugins enable rabbitmq_management
This fails with the error above:
rabbitmq-plugins enable rabbitmq_shovel
I'm running these commands as Admin in CMD.
Set up the variables in System Environment Variables and perform the following in CMD as Admin:
rabbitmqctl shutdown
rabbitmqctl stop
rabbitmq-service.bat remove
rabbitmq-service.bat install
rabbitmq-service.bat start
rabbitmq-plugins enable rabbitmq_management
rabbitmq-plugins enable rabbitmq_shovel
That worked for me.
I was able to fix the issue without having to uninstall RabbitMQ
Open the files named "enabled_plugins" at the two directory locations int he error your recived.  For me, those directories were "C:\Users\UserName\AppData\Roaming\RabbitMQ" and "C:\ProgramData\RabbitMQ".
For me, one file had:
[rabbitmq_management].
while the other was empty.  I copied the config snippet above into the empty file and saved it.
Run the commands as ADMIN in Command Prompt:
rabbitmqctl shutdown
rabbitmqctl stop
rabbitmq-service.bat start
rabbitmq-plugins enable rabbitmq_management
I did receive the same "enabled_plugins_mismatch" error; however, when I browsed to my RabbitMQ UI plugin at "http://localhost:15672/#/" the UI showed up and is functioning.

Flink on YARN with HA enabled crashes all RMs on attempt restoration

I am trying to make Flink (1.2.0) work on our Hadoop cluster (CDH 5.10.0) with HA enabled but when I test it out by killing the active RM it brings down the entire cluster.
I have configured Flink's HA in flink-conf.yml:
high-availability: zookeeper
high-availability.zookeeper.quorum: zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
high-availability.zookeeper.storageDir: hdfs:///tmp/flink/recovery
high-availability.zookeeper.path.root: /flink
high-availability.zookeeper.path.namespace: /cluster1
yarn.application-attempts: 2
I then run a flink session using yarn-session.sh -n 2 -nm "Flink HA test"
When I try to kill the active RM using kill -9, YARN correctly switches to the standby RM and I can see applications as ACCEPTED for a minute but soon the standby RM crashes throwing the following exception:
2017-03-08 12:29:36,997 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.transferStateFromPreviousAttempt(SchedulerApplicationAttempt.java:601)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:698)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1303)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:702)
at java.lang.Thread.run(Thread.java:745)
When I do not configure Flink's HA the problem disappears.
Any idea what might be causing it?
As discussed in the comments:
This was likely a yarn issue, caused by misconfiguration.
Given that the issue only occurred in very old versions (which are already out of support) it is likely not possible to add more detail.

How do I stop the RabbitMQ server on localhost

I installed RabbitMQ server on OS X, and started it on command line. Now, it is not obvious that how I should stop it from running? After I did:
sudo rabbitmq-server -detached
I get:
Activating RabbitMQ plugins ...
0 plugins activated:
That was it. How should I properly shut it down? In the document, it mentions using rabbitmqctl(1), but it's not clear to me what that means. Thanks.
Edit: As per comment below, this is what I get for running sudo rabbitmqctl stop:
(project_env)mlstr-1:Package mlstr$ sudo rabbitmqctl stop
Password:
Stopping and halting node rabbit#h002 ...
Error: unable to connect to node rabbit#h002: nodedown
DIAGNOSTICS
===========
nodes in question: [rabbit#h002]
hosts, their running nodes and ports:
- h002: [{rabbit,62428},{rabbitmqctl7069,64735}]
current node details:
- node name: rabbitmqctl7069#h002
- home dir: /opt/local/var/lib/rabbitmq
- cookie hash: q7VU0JjCd0VG7jOEF9Hf/g==
Why is there still a 'current node'? I have not run any client program but only the RabbitMQ server, does that mean a server is still running?
It turns out that it is related to permissions. Somehow my rabbitmq server was started with user 'rabbitmq' (which is strange), so that I had to do
sudo -u rabbitmq rabbitmqctl stop
In my dev environment where I keep it running all the time, I use:
launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.rabbitmq.plist
and to start it
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.rabbitmq.plist
Even easier....
brew services stop rabbitmq
brew services start rabbitmq
Use rabbitmqctl stop to stop any node. If you need to specify the node giving you trouble, add the -n rabbit#[hostname] option.
You can also use the shortcut RabbitMQ Service - stop if you don't like the commands
stop
sudo systemctl stop rabbitmq-server
start
sudo systemctl start rabbitmq-server
For Windows, use PowerShell as Admin, then run
.\rabbitmq-service.bat stop
stop Stop the service. The service must be running for this command to have any effect.
https://www.rabbitmq.com/man/rabbitmq-service.8.html
For OP's answer above,
It turns out that it is related to permissions.
I have no knowledge on this.
For mac users
To Stop
brew services stop rabbitmq
To Start
brew services start rabbitmq
To Restart
brew services restart rabbitmq
To Know the status of the server
brew services info rabbitmq

"node with name "rabbit" already running", but also "unable to connect to node 'rabbit'"

Rabbitmq server does not start, saying it's already running:
$: rabbitmq-server
Activating RabbitMQ plugins ...
0 plugins activated:
node with name "rabbit" already running on "android-d1af002161676bee"
diagnostics:
- nodes and their ports on android-d1af002161676bee: [{rabbit,52176},
{rabbitmqprelaunch2254,
59205}]
- current node: 'rabbitmqprelaunch2254#android-d1af002161676bee'
- current node home dir: /Users/Jordan
- current node cookie hash: ZSx3slRJURGK/nHXDTBRqQ==
But, rabbitmqctl seems to think otherwise:
rabbitmqctl -n rabbit status
Status of node 'rabbit#android-d1af002161676bee' ...
Error: unable to connect to node 'rabbit#android-d1af002161676bee': nodedown
diagnostics:
- nodes and their ports on android-d1af002161676bee: [{rabbit,52176},
{rabbitmqctl2462,59256}]
- current node: 'rabbitmqctl2462#android-d1af002161676bee'
- current node home dir: /Users/Jordan
- current node cookie hash: ZSx3slRJURGK/nHXDTBRqQ==
Any takers?
The rabbitmq server was running somewhere but it just couldn't be connected to.
One of the following will mention something about rabbits:
$: ps aux | grep epmd
$: ps aux | grep erl
Kill the process with kill -9 {pid of rabbitmq process}
i was having the same problem then I realized I was not issuing the right command.
./rabbitmqctl stop
this works everytime, although it does take down erlang runtime too. also mind where your config file.
I used rabbitmqctl stop and then restarted using rabbitmq-server as root.
This issue can be caused by two issues:
Rabbit is already running on the server. If that is the case, use the answer you found of killing the currently running process (ps aux | grep rabbit | grep -v grep)
You have changed the IP address of your machine but not changed the /etc/hosts file to reflect the new IP address of the machine.
The more common of the issues is the first, but the harder to find is the second (especially if you have rabbit running on the other machine. If rabbit is installed on the other machine it will look at the old IP address and would see another machine already running rabbitmq and give you the same error. This has caused me grief in the past.
I was having this same error # Win 7, but the solutions above did not worked for me, what did solved was to remove and reinstall the service. Using a console with admin rights:
rabbitmq-service remove
rabbitmq-service install
I hope this might help someone else too
$CD RabbitMQ Server\rabbitmq_server-3.7.8\sbin
rabbitmq-service remove
rabbitmq-service install
Go : windows Services
Find : RabbitMQ and Start it
after this Enable plugin :
rabbitmq-plugins enable rabbitmq_management
In my case under Ubuntu 11.10 it helped to
#rabbitmqctl cluster MASTER SLAVE
#rabbitmqctl start_app
before I always got this error message...
Using admin console, in Win 2012R2 ver 3.5.5 rabbit, got it to work using the remove and install then rabbitmq-server restart
then ctr-c to terminate the job, then I was able to use the windows service console and start the rabbitMq service.
In my case(windows),
1. I just ran the stop service.
2. The started the service.