Datastax OpsCenter not showing nodes - datastax

I installed datastax enterprise in my win7 system,but it is not displaying any node in opscenter dashboard.(Actually I have re-installed the datastax due to some issue in previous installation.)
I am getting the node detail in command line using nodetool command,but no node is present in the datastax ops center dashboard.
I think OpsCenter agent is failing to connect the node.
Please help me
Thanks,
Subhra

The agent might not be started on your system in linux its in /usr/share/datastax-agent/bin run the 'install_agent'.
Also check if the ports for running opscenter are not blocked.

Follow below mentioned procedure :
1) Check datastax-agent is installed on nodes and also service is running.
2) Check Port connection is open for datastax-agent.
http://docs.datastax.com/en/archived/opscenter/5.1/opsc/reference/opscPorts_r.html
3) Reconfigure your existing Cluster details in Opscenter, after deleting previous configuration in Opscenter.
4) If issue still exist check log file of opscenter (opscenterd.log)

Related

ambari cluster + poor connection between ambari-agent to ambari server

we have ambari cluster with 872 data-nodes machines , when ambari version is 2.6.x
we have for now some network problem ,
after long investigation we found that , ambari agent that runs on some machine not communicate well with the ambari server
therefore we get some strange behaviors as 5 dead data-nodes from ambari dashboard , while for sure datanodes machine are healthy
is it possible to give more tolerated value in ambari agent configuration so the ack between ambari agent to ambari server will be after more little time in order to ignore the network problems ?
something like timeout or time connection between the ambari agent to ambari server
First of all, you need to get the root cause of the issue why Data Node is showing as Dead.
Ambari agent runs on every node. It is responsible for sending
metrics and heartbeat to the Ambari server which then publishes to
your Ambari web UI.
The name node waits for 10 minutes till it declares the data node as dead and copies
the blocks to other data nodes.
If it's showing that data node is dead then please check the Ambari agent status in
the specific node by running-service ambari-agent status. Parallelly you can check the ambari-agent.log in the worker node to check why Ambari agent stopped working.
You can configure your http timeouts in ambari-agents for service tasks, http timeouts
https://github.com/apache/ambari/blob/trunk/ambari-agent/conf/unix/ambari-agent.ini
There's a HTTP Timeout section you can configure it based on your network throughput.
The file should be in /etc/ambari-agent/ambari.properties

DataStax - OpsCenter LifeCycle Manager (Failed to Start LSB: DataStax Enterprise)

I am currently trying to create a cluster using LCM and one my nodes fails with the error Failed to Start LSB: DataStax. I have attempted to edit certain files to solve the issue but to no avail.
I setup OpsCenter via Tarball and configured a cassandra DC along with graph and search in LCM. When I run install for the cluster the job fails at the "/usr/bin/systemctl restart dse" command with the error "Failed to start LSB:DataStax Enterprise is encountered". Any advise on fixing this issue is greatly appreciated.

Meld error when setting up a new cluster

I am evaluating the DataStax OpsCenter on a virtual machine to start managing/monitoring cassandra. I am following the online docs to create cluster topology models via OpsCenter LCM, but the error message doesn't provide much information for me to continue. The jobs status are,
error- MeldError, 400 Client Error: Bad Request for url: http://[ip_address]:8888/api/v1/lcm/internal/nodes/6185c776-9034-45b4-a54f-6eb9511274a2/package_information
Meld failed on name="testnode1" ssh-management-address=[ip_address]" node-id="6185c776-9034-45b4-a54f-6eb9511274a2" node-name="testnode1" job-id="1b792c69-bcca-489f-ad12-a6285ba84d59" stdout=" Meld has started... " stderr=""
My question is what might be wrong and any hint how to resolve that?
I am new to the cassandra and DataStax communities, please forgive me if any silly question asked!
Q: I used to be a buildbot user and DataStax agent looks like a Buildbot's slave. Why we don't need agent setup on the remote machine to work with opscenter? The working directory of agent is configured in opscenter?
The opscenterd.log, https://pastebin.com/TJsvmr6t
According to the compatibility of the tools set mentioned in https://docs.datastax.com/en/landing_page/doc/landing_page/compatibility.html#compatibilityDocument__opsc-compatibility , I actually use the OpsCenter v5.2 for monitoring and basic db operations. After trial-and-error of .yaml of Agent and .conf of Cassandra 2.2, the Dashboard works!
Knowledge gained,
The OpsCenter 5.2 actually works with Cassandra 2.2 which is not listed in the compatibility table
For beginner, if not sure where to start, try to install all the components on one machine to get idea of the least viable working setup. And from there to configure the actual dev/test/production environment.

Unable to connect to local RabbitMQ on Windows 10

I've installed RabbitMQ (latest version downloadable from RabbitMQ website) on my Windows 10 machine. It installed with ERlang 19.1.
I'm trying to install RabbitMQ Web UI Management Tools using the following command (using RabbitMQ Command Prompt):
rabbitmq-plugins enable rabbitmq_management
I'm getting the following error:
The directory name is invalid.
The filename, directory name, or volume label syntax is incorrect.
The filename, directory name, or volume label syntax is incorrect.
Plugin configuration unchanged.
Applying plugin configuration to rabbit#[0x7FF9A8527044]... failed.
* Could not contact node rabbit#[0x7FF9A8527044].
Changes will take effect at broker restart.
* Options: --online - fail if broker cannot be contacted.
--offline - do not try to contact broker.
I've looked up on SO and tried stopping and restarting, overriding erlang cookie, but nothing helps.
I think there's a problem with RabbitMQ itself. The service itself is marked as started, but if I try to telnet the default port (5672) then it fails (it's not a firewall issue - I've disabled it).
Also I don't see an log files created for RabbitMQ or any related Event Logs messages. So it's hard to diagnose exactly the problem.
I also tried uninstalling and re-install both erlang and RabbitMQ. Still didn't help.
How do I further diagnose the problem?
Found a solution to the problem (downgrading Erlang did not work in my case, but just in case I left it on Erlang 18 in case there were other issues with ver 19).
What puzzled my eye was this line: Applying plugin configuration to rabbit#[0x7FF9A8527044]... failed.. Seems like it's trying to connect to rabbit instance at a wrong machine name.
I then ran rabbitmqctl.bat status which failed but again showed that it's trying to connect to [0x7FF9A8527044] while the node name was rabbit#my-mchine-name. So I started reading the configuration section at RabbitMQ website and the solution was simple - setting the node name manually.
All I had to do is add an environment variable named RABBITMQ_NODENAME with the node name being rabbit#localhost. And that's it. Problem solved!
you may be running into issues with Erlang 19 incompatibility. there has been some history of Erlang 19 support problems with RMQ. Try installing Erlang 18 instead.
If that fails, I would recommend using Docker for Windows and installing / running RabbitMQ in that. I've moved all my services like RabbitMQ, MongoDB, etc. into Docker containers and it's made my life as a dev so much simpler.
In my case I had to trash the local account config located at : %APPDATA%\RabbitMQ\.
Deleting the entire folder and reinstalling the service did the trick.
Rabbitmq 3.6.14
Erlang 20.1 OTP

Zookeeper: It is probably not running

I am trying to start zookeeper on a remote virtual machine. I use this for my project regularly and I do not have any problems while starting the zookeeper. But lately when I am trying to start the server I am getting an error.
When I give ./zkServer.sh start it shows zookeeper server started.
When I check for status using ./zkServer.sh status it shows "Error contacting service. It is probably not running."
I am totally working with 5 Virtual Machines. All these machines were fine initially. I started getting problems with machine 1. But recently I have the same problem with all my virtual machines. Can someone tell me what the issue is and suggest me a way to clear this issue?
Most probably Zookeeper server exited.
If we are running it on a Linux box, use the linux commands. Some of them:
ps -ef | grep -i zookeeper
jps
etc.
Also, try running it in foreground
zkServer.sh start-foreground
In My case the issue was $PATH issue...
You will get what was the issue by running zookeeper in foreground
zkServer.sh start-foreground
I encountered same problem,too. In my case problem is about zookeeper locations configuration is not same for each node so zookeeper can not provide Quorum and mentioned nodes can not be part of cluster.
Please be sure server definition for each node is same.
For example for all nodes, server definition must be same as below
server.0=ip0:2888:3888
server.1=ip1:2888:3888
server.2=ip2:2888:3888
server.3=ip3:2888:3888
server.4=ip4:2888:3888
In my case the issue was some how ClientPort attribute's value was missed in one of the box so in console it was showing as invalid config path.With the help of command 'zkServer.sh start-foreground' investigated and found root cause.