Datastax Agent (Cassandra) Opscenter shutdown issue - datastax

after the datastax-agent is up he is shutdown immediate?
i'm getting that in log
(DataStax agent ran out of memory! Shutting down!
)

Edit /etc/datastax-agent/datastax-agent-env.sh on the node and change the maxinmum RAM.
By default it is 128MB, but 256MB solved a similar problem for me. (-Xmx256M). Note: that's the file path for Debian.

Related

Minikube on Windows and HyperV: Stuck on prompt "minikube login"

I'm "extremely" new to Kubernetes, and I wanted to try it out on my local machine, which is running Windows 10 along with HyperV. I saw that minikube is used for local development, and I was able to find in on Chocolatey, so I installed it using that:
choco install minikube -y
(I think this also installs kubectl)
The problem I have is that I'm not able to start it; I'm running the following command:
minikube start --vm-driver=hyperv
I have an external switch configured in HyperV (I found it as a suggestion somewhere), but when I run the command, it's stuck in Creating VM ...
I thought maybe it would give me a clue if I look at the VM created in HyperV, and when I open that, I see the following:
So, it seems that it's waiting for input, and that's why it's stuck! I tried searching for the problem, but to no avail.
I would appreciate any help
PS: It seems to me that if I wait long enough, the following message appears on the console:
Temporary Error: provisioning: error getting ssh client: Error dialing
tcp via ssh client: ssh: handshake failed: ssh: unable to authenticate,
attempted methods [none publickey], no supported methods remain
So, somehow by chance, I think I found how to resolve the issue.
First thing is that: the fact the VM is displaying that prompt (minikube login) seems to be normal, and it does NOT prevent the minikube start from succeeding.
To resolve the issue, this is what I did:
Delete ~/.kube directory
Delete ~/.minikube directory (in case it exists)
The MOST IMPORTANT step: stop/start the Hyper-V Virtual Machine Management Windows service
These steps seem to have solved the issue for me
PS: I used this command to start minikube and enable verbose logging:
minikube start --vm-driver hyperv -v 7 --alsologtostderr
Farzad, what resources have you used for setting up the minikube? Can you please clarify what do you mean by "unable to start". Are the regular kubectl commands working?
For example kubectl get nodes? That is of course if below steps won't help you.
The screenshot you shared shows a running VM:
Minikube runs a single-node Kubernetes cluster inside a VM on your
laptop for users looking to try out Kubernetes or develop with it
day-to-day.
You mentioned that you've created the vSwitch, you should be using a flag that is pointing minikube to use that external vSwitch:
minikube start --vm-driver hyperv --hyperv-virtual-switch "vSwitch name"
You also mentioned choco, did you install kubernetes-cli (as you did not mention it in the question)? It might be the reason why your commands do not work (seems like the new version downloads kubectl with choco install minikube):
kubectl is a command line interface for running commands against
Kubernetes clusters
At this moment I recommend stoping the minikube VM:
minikub stop
Delete the cluster
minikube delete
Sometimes regular minikube stop, minikube delete does not work so you might have to manually turn off the minikubeVM in Hyper-V, then I recommend to go to c:\users\%username%\ and delete .kube and .minikube.
Use cuninst minikube
Restart and install again as specified in the minikube documentation:
choco install minikube
choco install kubernetes-cli
As for the error you mentioned, let's try to run the cluster properly, and if this persists, we will take care of it.
Try this:
kubectl config use-context minikube
I encountered the same issue. The reason was I chose the wrong disk file to start my VM after creating it in Virtual Box.
This solved my issue.
minikube delete
minikube start --vm-driver hyperv -v 7 --alsologtostderr

yarn not getting nodes

This is in AWS EMR cluster with 2 task nodes and a Master.
I'm trying the hello-samza that launches a yarn job. The job gets stuck in ACCEPTED STATE. I looked in other posts and it seems that my yarn getting no nodes. Any help on what yarn not getting task nodes will help.
[hadoop#xxx hello-samza]$ deploy/yarn/bin/yarn node -list
17/04/18 23:30:45 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
Total Nodes:0
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
[hadoop#xxx hello-samza]$ deploy/yarn/bin/yarn application -list -appStates ALL
17/04/18 23:26:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED]):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1492557889328_0001 wikipedia-parser_1 Samza hadoop default ACCEPTED UNDEFINED 0% N/A
I made a complete answer for a similar case I've been experiencing: have a look at it, it might be this kind of conf issue
It seems like the nodemanagers are not running on either node (either not started at all or exited with error). Use jps command to check if all the daemons associated with YARN are running on the two nodes. Additionally, check both nodemanager logs to see if any exceptions might have killed it.

epmd error for host myhost: address (cannot connect to host/port) on windows 10

I am trying to install rabbitmq. The installation of both erlang i.e OTP 18.1 file was done successfulyl and also rabbitmq installation completed successfully. But when I try to connect rabbitmq, I get the following error:
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.5.6\sbin>rabbitmq-plugins.bat enable rabbitmq_management
Plugin configuration unchanged.
Applying plugin configuration to rabbit#INLN50899724A... failed.
* Could not contact node rabbit#INLN50899724A.
Changes will take effect at broker restart.
* Options: --online - fail if broker cannot be contacted.
--offline - do not try to contact broker.
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.5.6\sbin>rabbitmq-server restart
ERROR: epmd error for host INLN50899724A: address (cannot connect to host/port)
Click below to see the image containing error
Error Empd Rabbitmq
I may be replying really late, but still I'm facing this issue. So it may help somebody event while installing rabbitmq version 3.6.5. To change the node name, open "rabbitmq-env.bat" under "installation dir\sbin" and change RABBITMQ_NODENAME to "rabbit#localhost" (line number 90 in rabbitmq 3.6.5). But make sure you remove the windows service, do change the nodename, install service and start it. This worked for me.
No other options worked for me which were marked as right answer in stackoverflow!
Remove the RabbitMQ service. Uninstall RabbitMQ. Kill the epmd.exe process. Delete your c:\users\\AppData\Roaming\RabbitMQ Directory.
Go to Control Panels -> System -> Advanced -> Environment Variables
Add a variable named RABBITMQ_NODENAME and set it to rabbit#localhost
Reinstall RabbitMQ.
Navigate to the RabbitMQ sbin directory (or run the command from the start menu) and run rabbitmqctl status.
You should no longer see the (cannot connect to host/port) error.
And yes, this will fix your Cisco AnyConnect VPN related installation issues.
open C:\Program Files\RabbitMQ Server\rabbitmq_server-3.7.15\sbin\rabbitmq-server.bat
Add the below command as the first line in
set RABBITMQ_NODENAME=rabbit#localhost
refer attached image
For Windows Machine:
Go in C:\Users\<YourUserName>\AppData\Roaming\RabbitMQ
Create a file rabbitmq-env.conf
Add the following:
CONFIG_FILE=C:\Users\<YourUserName>\AppData\Roaming\RabbitMQ\rabbitmq
NODE_IP_ADDRESS=127.0.0.1
NODENAME=rabbit#localhost
The above is my env-config, for this particular issue setting
the nodename will be sufficient.
Turn off you firewall & start the rabbitmq, it will work. After running it one time, even if you turn on the firewall, it will work.
This works for me in Windows 10 machine.
in your shell
$ export RABBITMQ_NODENAME=rabbit#localhost
$ /sbin/rabbitmq-server -detached
change rabbit#INLN50899724A
to rabbit#localhost and try again.
Or, edits your hosts file so that INLN50899724A points to 127.0.0.1
For using rabbit mq on windows 10 for similar error I did below
set RABBITMQ_NODENAME=rabbit#localhost
in the path where rabbit MQ is installed i.e for me it was in C:\Program Files\RabbitMQ Server\rabbitmq_server-3.8.5\sbin>
and then started
.\rabbitmq-server start
Also, I had changed the host to point to my computer name in c:\Windows\System32\Drivers\etc
127.0.0.1 yourhostnamehere

YARN error: TaskAttempt killed because it ran on unusable node ... Container released on a *lost* node

I am using CDH 5.4 with Pig 0.12. I am getting a lot of this error from all nodes:
TaskAttempt killed because it ran on unusable nodename:portnumber Container released on a *lost* node
What does this mean? In particular what does "lost" mean here? It doesn't look like the node is really lost in the cluster. Another question (more important question) is how to resolve this issue. Any help would be appreciated.
This particular case turned out to be a data storage problem. I restarted datanode manager from nodes which were lost with the message of "1/1 local-dirs are bad: /data/hadoop/yarn/local;"

opscenter backup fails while trying to snapshot and non-existing SStable

I'm running a DSE cluster in AWS: m2.4xlarge instances running Datastax Enterprise 4.6.1, with Cassandra 2.0.12.200 and Opscenter 5.1.0.
When we try to do a backup of a keyspace, we get this:
Snapshot of keyspaces [XXXXXXX] on node XXX.XXX.XXX.XXX failed: javax.management.RuntimeMBeanException: java.lang.RuntimeException: Tried to hard link to file that does not exist /raid0/cassandra/data/XXXXXX/XXXXXX/XXXXXXXXXXXX-jb-1-Index.db
Any ideas?
This is likely the following known issue:
https://issues.apache.org/jira/browse/CASSANDRA-6433
The workaround for this is a rolling restart and it is fixed in c* 2.1. It seems to be caused when you drop a keyspace and re-create it again.
I had a similar issue with a dropped keyspace and compaction.
Run a "nodetool repair" and check the "system" and "opscenter" keyspaces for traces of the deleted keyspace. You might need to manually remove stale rows.