Zookeeper: It is probably not running - apache

I am trying to start zookeeper on a remote virtual machine. I use this for my project regularly and I do not have any problems while starting the zookeeper. But lately when I am trying to start the server I am getting an error.
When I give ./zkServer.sh start it shows zookeeper server started.
When I check for status using ./zkServer.sh status it shows "Error contacting service. It is probably not running."
I am totally working with 5 Virtual Machines. All these machines were fine initially. I started getting problems with machine 1. But recently I have the same problem with all my virtual machines. Can someone tell me what the issue is and suggest me a way to clear this issue?

Most probably Zookeeper server exited.
If we are running it on a Linux box, use the linux commands. Some of them:
ps -ef | grep -i zookeeper
jps
etc.
Also, try running it in foreground
zkServer.sh start-foreground

In My case the issue was $PATH issue...
You will get what was the issue by running zookeeper in foreground
zkServer.sh start-foreground

I encountered same problem,too. In my case problem is about zookeeper locations configuration is not same for each node so zookeeper can not provide Quorum and mentioned nodes can not be part of cluster.
Please be sure server definition for each node is same.
For example for all nodes, server definition must be same as below
server.0=ip0:2888:3888
server.1=ip1:2888:3888
server.2=ip2:2888:3888
server.3=ip3:2888:3888
server.4=ip4:2888:3888

In my case the issue was some how ClientPort attribute's value was missed in one of the box so in console it was showing as invalid config path.With the help of command 'zkServer.sh start-foreground' investigated and found root cause.

Related

Meld error when setting up a new cluster

I am evaluating the DataStax OpsCenter on a virtual machine to start managing/monitoring cassandra. I am following the online docs to create cluster topology models via OpsCenter LCM, but the error message doesn't provide much information for me to continue. The jobs status are,
error- MeldError, 400 Client Error: Bad Request for url: http://[ip_address]:8888/api/v1/lcm/internal/nodes/6185c776-9034-45b4-a54f-6eb9511274a2/package_information
Meld failed on name="testnode1" ssh-management-address=[ip_address]" node-id="6185c776-9034-45b4-a54f-6eb9511274a2" node-name="testnode1" job-id="1b792c69-bcca-489f-ad12-a6285ba84d59" stdout=" Meld has started... " stderr=""
My question is what might be wrong and any hint how to resolve that?
I am new to the cassandra and DataStax communities, please forgive me if any silly question asked!
Q: I used to be a buildbot user and DataStax agent looks like a Buildbot's slave. Why we don't need agent setup on the remote machine to work with opscenter? The working directory of agent is configured in opscenter?
The opscenterd.log, https://pastebin.com/TJsvmr6t
According to the compatibility of the tools set mentioned in https://docs.datastax.com/en/landing_page/doc/landing_page/compatibility.html#compatibilityDocument__opsc-compatibility , I actually use the OpsCenter v5.2 for monitoring and basic db operations. After trial-and-error of .yaml of Agent and .conf of Cassandra 2.2, the Dashboard works!
Knowledge gained,
The OpsCenter 5.2 actually works with Cassandra 2.2 which is not listed in the compatibility table
For beginner, if not sure where to start, try to install all the components on one machine to get idea of the least viable working setup. And from there to configure the actual dev/test/production environment.

Unable to connect to local RabbitMQ on Windows 10

I've installed RabbitMQ (latest version downloadable from RabbitMQ website) on my Windows 10 machine. It installed with ERlang 19.1.
I'm trying to install RabbitMQ Web UI Management Tools using the following command (using RabbitMQ Command Prompt):
rabbitmq-plugins enable rabbitmq_management
I'm getting the following error:
The directory name is invalid.
The filename, directory name, or volume label syntax is incorrect.
The filename, directory name, or volume label syntax is incorrect.
Plugin configuration unchanged.
Applying plugin configuration to rabbit#[0x7FF9A8527044]... failed.
* Could not contact node rabbit#[0x7FF9A8527044].
Changes will take effect at broker restart.
* Options: --online - fail if broker cannot be contacted.
--offline - do not try to contact broker.
I've looked up on SO and tried stopping and restarting, overriding erlang cookie, but nothing helps.
I think there's a problem with RabbitMQ itself. The service itself is marked as started, but if I try to telnet the default port (5672) then it fails (it's not a firewall issue - I've disabled it).
Also I don't see an log files created for RabbitMQ or any related Event Logs messages. So it's hard to diagnose exactly the problem.
I also tried uninstalling and re-install both erlang and RabbitMQ. Still didn't help.
How do I further diagnose the problem?
Found a solution to the problem (downgrading Erlang did not work in my case, but just in case I left it on Erlang 18 in case there were other issues with ver 19).
What puzzled my eye was this line: Applying plugin configuration to rabbit#[0x7FF9A8527044]... failed.. Seems like it's trying to connect to rabbit instance at a wrong machine name.
I then ran rabbitmqctl.bat status which failed but again showed that it's trying to connect to [0x7FF9A8527044] while the node name was rabbit#my-mchine-name. So I started reading the configuration section at RabbitMQ website and the solution was simple - setting the node name manually.
All I had to do is add an environment variable named RABBITMQ_NODENAME with the node name being rabbit#localhost. And that's it. Problem solved!
you may be running into issues with Erlang 19 incompatibility. there has been some history of Erlang 19 support problems with RMQ. Try installing Erlang 18 instead.
If that fails, I would recommend using Docker for Windows and installing / running RabbitMQ in that. I've moved all my services like RabbitMQ, MongoDB, etc. into Docker containers and it's made my life as a dev so much simpler.
In my case I had to trash the local account config located at : %APPDATA%\RabbitMQ\.
Deleting the entire folder and reinstalling the service did the trick.
Rabbitmq 3.6.14
Erlang 20.1 OTP

Mule-EE windows service not starting

I was trying to start the mule mmc and mule-ee using the start launcher given in the mule mmc distribution package. When Ia m running the start.bat it was mentioning that the mule instance is already running.
When i checked the windows process I was not able to see any related java process running and in services I could see mul-ee in stopped state. When I am trying to start the mule-ee service it is mentioning that the file path cannot be identified.
I even tried to remove the service which also was not possible.
Could you please help in finding what might be issue ?.
Regards
Arun
I have resolved the issue temporarily by deleting the mule-ee service.
Below is the command used :
sc delete mule-ee
Regards
Arun
Same issue occur to me also, I find the port which running for mule instance, I have killed it and run start.bat file.. I am not sure about mule-ee is the problem.
I might guess that your are not stopped your mule and mmc instance while shoutdown the pc.

Enterprise Jenkins HA plugin not working as it should

I've been trying to setup Enterprise Jenkins with the High Availabilty setup. The current setup consists of two jenkins masters sharing the same jenkins home, say master1 and master2, an installation of the jenkins-ha-monitor-1.1-1.1 rpm on both these masters, say monitor1 and monitor2. With this setup, according to the documentation atleast, the HA plugin should work as expected. Promotion and demotion scripts are similar to the ones in the documentation (only the ip and interface is different, same approach). i.e
For demotion
ifconfig eth0:2 down
For promotion
ifconfig eth0:2 the.floating.ip
Now for the nodes to get registered correctly I have to start master1, master2, monitor1 and monitor2 in that order. Tailing the logs for both I see that when the services are started in that order they are registered correctly by both monitor services as nodes in a cluster, and in the HA status gui in the jenkins console.
Now when master1 is killed by sending it a KILL signal monitor2 recognizes this and runs the promotion script. But monitor one keeps throwing :
Oct 24, 2012 3:47:36 PM
com.cloudbees.jenkins.ha.singleton.HASingleton$3 suspect INFO:
Suspecting a node failure in a cluster: jenkins-master-1-285 Oct 24,
2012 3:47:39 PM com.cloudbees.jenkins.ha.singleton.HASingleton$3
suspect INFO: Suspecting a node failure in a cluster:
jenkins-master-1-285
continuously without ever runnign the demotion script. Now since master2 has taken up the floating ip via its promotion script, and master1 still has that ip because demotion script is not run the setup ends up with two boxes claiming the same ip. Moreover restarting master1 does not do anything, i.e master1 does not get added to the cluster as a seconday node, monitor1 still keeps spitting the above messages to log, the floating ip keeps returning "Unable to connect" and master2 and monitor2 show the cluster as master2,monitor2 and monitor1. So my question/problem is twofold - why isnt master1 accepted back into the cluster? And why isn't the demotion script run as it should?
Also FYI i have tried to do a
service jenkins stop
and in that case the demotion script runs but again there are similar issues when
service jenkins start
is run on the master that was stopped earlier since the promotion script is run regardless of whether a primary jenkins exists. And in this case the two monitors register different clusters like so monitor1 : master1,monitor1 and monitor2 : master2,monitor2.
Running an ifconfig shows that both masters have taken up the floating ip at this point.
Any help is appreciated! Thanks!
Still under investigation with support. The originally reported problem (here) suggests that the two nodes are communicating fine, but promotions/demotions are not run correctly—either a bug in JGroups or in its usage in Jenkins high availability.
But further tests turned up problems with UDP multicast communication, which has been reported for RedHat/CentOS hosts. Work is underway to offer an alternate JGroups stack which does not rely on multicast (or UDP) at all, using the shared $JENKINS_HOME directory to register Jenkins and monitor instances (as TCP address:port records).

How to fix the Zookeeper error for Hbase

Main OS is windows 7 64bit. Using VM player to create two vm CentOS 5.6 system. The net connection is bridge. I installed Hbase on both of the CentOS system, one is master, the other is slave. When I enter the shell, and run status 'details'.
The error from master is
zookeeper.ZKConfig: no valid quorum servers found in zoo.cfg ERROR:
org.apache.hadoop.hbase.ZooKeeperConnectionException: An error is
preventing HBase from connecting to ZooKeeper
And the error from slave is
ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is
able to connect to ZooKeeper but the connection closes immediately.
This could be a sign that the server has too many connections (30 is
the default). Consider inspecting your ZK server logs for that error
and then make sure you are reusing HBaseConfiguration as often as you
can. See HTable's javadoc for more information.
Please give me some suggestion.
Thanks a lot
Check if this is within your .bashrc, if not, add them and restart all hbase services (do not forget to manually run them as well), that did it for me with a pseudo-distributed installation. My problem (and maybe yours as well) was that Hbase wasn't detecting it's configuration.
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HBASE_CONF_DIR=/etc/hbase/conf
I see this very often on my machine. I don't have a failsafe cure, but end up running stop-all.sh, and deleting every place that hadoop and dfs (its a dfs failure) store their temp files. It seems to happen after my computer goes to sleep while dfs is running.
I am going to experiment with single-user mode to avoid this. I dont need distribution while developing.