"nodename nor servname provided" when trying to start a mesos-slave - ssh

I was following this simple guide on installing Mesos locally https://mesosphere.com/2014/07/07/installing-mesos-on-your-mac-with-homebrew/
I was able to start a mesos master and was able to see the master's console perfectly fine at localhost:5050. However when I tried to start a new slave using sudo /usr/local/sbin/mesos-slave --master=127.0.0.1:5050 , it gave me
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0328 16:14:45.329051 2041414416 process.cpp:889] nodename nor servname provided, or not known
* Check failure stack trace: *
Any help will be appreciated, thanks

This can also happen if you working on a Framework and attempting to connect to a Mesos Master. Mesos is attempting to resolve your computer's name via DNS and isn't finding an entry. This can happen if you've changed your Mac's hostname post setup ( as I did, and had this same error ).
To fix, run hostname -f, that will will give you the value of what OS/X thinks it's name is. Then simply ensure you've got 127.0.0.1 VALUE_OF_HOSTNAME_-F in your /etc/hosts file.

Related

AEROSPIKE_ERR_CONNECTION Bad file descriptor, 127.0.0.1:3000; Not able to connect to local node from aql

I have installed aerospike on my mac my following this installation steps
All the validations are working fine. I am able to connect to the cluster using browser chrome. Below is the screen shot.
I have also installed the AQL tools following the instructions here.
But I'm unable to connect to local node from aql.
$ aql
2017-11-21 16:06:09 WARN Failed to connect to seed 127.0.0.1 3000.
AEROSPIKE_ERR_CONNECTION Bad file descriptor, 127.0.0.1:3000
Error -1: Failed to connect
$ asadm
Aerospike Interactive Shell, version 0.1.11
ERROR: Not able to connect any cluster.
Also, I have noticed the Java client is giving error.
AerospikeClient client = new AerospikeClient("localhost", 3000);
when I changed the localhost to actual Ip returned by vagrant ssh -c "ip addr"|grep 'global eth1' it is working fine.
How to connect with aql using customer parameters? I want to pass ip address and port as parameters to aql. Any suggestions.
$ aql --help
https://www.aerospike.com/docs/tools/aql/index.html - discusses all various command line options.
$ aql -h a.b.c.d -p 1234
There is another possibility, you have your owned port instead of the default 3000, so when you try to connect to aerospike, you can try to run command like : aql -p4000
Hope this may help you
Seems like the port is not getting freed even after exiting the vagrant console.
Tried closing all the terminal windows and then starting again. But no luck.
Finally, restarting the system resolved the issue.

Unable to resolve github.com from OpenShift origin pods

I have a basic OpenShift origin cluster started with oc cluster up
Now, in the default 'MyProject' i wanted to build a source from git repo and it's failing with the error
Could not resolve host: github.com; Name or service not known
Even I tried setting up gogs and migrate the public hosted source code on github.com to gogs pod but throwing same error.
Kindly advise if there are any additional network settings required during OpenShift cluster setup in order to access github.com or any other public domains. I can sense it's a network issue but not sure what exactly needs to be configured on the cluster.
I know this is an old ticket, but I came across this issue when looking for a solution for my problem. I had exactly the same problem as described in this issue. For me, the problem lies within the combination between Ubuntu 18.04 and docker. I followed solution B from this answer.
Hopefully this helps someone as I've lost a lot of time trying to resolve this issue by looking for the problem as if it was a problem from openshift/okd while the actual cause lies within the combination between docker and ubuntu (at least for me).
You can edit the config Map of Node in master server ( In order to provide proper information of your nameserver to the pods.)
# oc get cm -n openshift-node
for all compute nodes edit the config map by below command.( Only need to perform in master server)
# oc edit cm node-config-compute -n openshift-node
......
dnsBindAddress: 127.0.0.1:53
dnsDomain: cluster.local
dnsIP: 10.0.80.11
dnsNameservers: null
dnsRecursiveResolvConf: /etc/origin/node/resolv.conf
.......
Edit dnsIP section with your DNS IP. Then restart the service
# systemctl restart atomic-openshift-node.service
The DNS ip will be prepended in all /etc/resolv.conf file of Pods.
Click for detail info
Shutdown the cluster with: oc cluster down
Edit the file: openshift.local.clusterup/node/node-config.yml and set dnsIP: "" to 8.8.8.8
Edit the file openshift.local.clusterup/kubedns/resolv.conf
and add
nameserver 8.8.8.8
nameserver 8.8.4.4
Also make sure you have the DNS options inside the docker config file
Edit /etc/docker/daemon.json and add
"dns": ["8.8.8.8", "8.8.4.4"]
Then start your cluster with
oc cluster up
and now it should work fine.

docker-machine create --driver generic kills ssh on google compute engine

Hi I am still learning docker's wonderful magical world. I use docker on linux with docker-machine I already added 2 already existing Linux servers with the docker-machine create and successfully run my containers on them. Now I try to do the same with an already existing google compute engine based machine which has Linux too. I use the command:
docker-machine create --driver generic --generic-ip-address ipaddress --generic- ssh-key path_To_Key --generic-ssh-user user_Name machine_Name
And I get an error:
Error creating machine: Error checking the host: Error checking and/or
regenerating the certs: There was an error validating certificates for
host "X.X.X.X:2376": dial tcp X.X.X.X:2376: i/o timeout You can
attempt to regenerate them using 'docker-machine regenerate-certs
[name]'.
Then the docker-machine does not know it's ip But I seems to give it a command trought docker-machine ssh
Altough I am not able to log in with ssh anywhere else and I must stop/remove the created machine and restart it.
Anyone has a similar problem?
According to generic driver's page at docker docs, try to edit --generic-ip-address=ip_address with equal sign.

rabbtimqadmin - Could not connect: [Errno -2] Name or service not known

I have RabbitMQ installed on a CentOS 5.x server which I use for message passing between my programs. I've installed rabbitmqadmin following the directions on https://www.rabbitmq.com/management-cli.html and have used it on my servers in the past.
From what I can tell it looks like this particular server is misconfigured. My web-searches have failed me on trying to get more information on how to troubleshoot this issue.
The error:
[root#server ~]# python26 /usr/local/bin/rabbitmqadmin list nodes
*** Could not connect: [Errno -2] Name or service not known
[root#server ~]#
I have tried several different rabbitmqadmin commands and they give the same result. If I run the command without the extra params it displays the normal help dialog. I have this setup and working on several other servers.
Any idea on what the root issue is? If not, anyway to get more details, like verbose?
Update:
I just tried to check the version of rabbitmq and its yielding an error too:
[root#server ~]# rabbitmqctl status
Status of node rabbit#server ...
Error: unable to connect to node rabbit#server: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit#server]
rabbit#server:
* connected to epmd (port 4369) on server
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
current node details:
- node name: rabbitmqctl25451#server
- home dir: /var/lib/rabbitmq
- cookie hash: WXaeZT7XXm13naagfRX5cg==
[root#server ~]#
I'm going to see if I can find something from this... I find this weird because the server is passing messages fine and can be monitored through the web console.
Erlang version:
[root#server rabbitmq]# erl -eval 'erlang:display(erlang:system_info(otp_release)), halt().' -noshell
"R14B04"
[root#server rabbitmq]#
Rabbitmq Version:
[root#server rabbitmq]# python26 /usr/local/bin/rabbitmqadmin --version
rabbitmqadmin 3.3.5
[root#server rabbitmq]#
After much digging and frustration, I found my problem... I'm posting the solution in case anyone else has a similar experience
Previously, I found that if you setup RabbitMQ on a linux server then change the hostname that it can break some of the rabbit configuration.
The awesome part about this problem is that someone changed the name of the server from all capital letters to lowercase...
I've solve this one of two ways:
Solution 1:
Revert the host name back to the previous name. So that rabbitmq references with the appended server name work again.
Solution 2:
If you want to keep the server name change, then you can create a rabbitmq-env.conf files in /etc/rabbitmq like:
NODENAME=rabbit#OLDHOSTNAME
If you aren't sure what your previous name was, you can reference it by doing an ls in your /var/lib/rabbitmq/mnesia/ folder. You'll then see a folder that matches the nodename you need to specify.
Reference: https://www.rabbitmq.com/man/rabbitmq-env.conf.5.man.html
UPDATE:
Host name is CaSE SeNSiTIve... had someone change a hostname on me and the only difference was the case... so took a while to notice...
Yesterday I've lost a few hours with this same problem and it was in a fresh install, so the problem was that the erlang cookie from my user and root user was different than the one from rabbitmq user.
Find out the HOME for the user rabbitmq:
# cat /etc/passwd | grep rabbitmq
Check if the cookies differs from each other:
# vimdiff /var/lib/rabbitmq/.erlang.cookie ~/.erlang.cookie
If they are different, copy the cookie from rabbitmq for the user that you want to have access to the server:
# cp /var/lib/rabbitmq/.erlang.cookie ~/.erlang.cookie
References:
rabbitmqctl status says "TCP connection succeeded but Erlang distribution failed"
How Nodes (and CLI tools) Authenticate to Each Other: the Erlang Cookie

MPICH2 on multiple machines (HYDU_sock_connect error)

I am trying to execute an MPI program in 2 different PCs. However, when I ran this command in pc1:
mpirun -hosts user#host -n 4 bin/Demo_01.exe
I'm getting this error:
[proxy:0:0#pc2] HYDU_sock_connect (./utils/sock/sock.c:203): unable to connect from "pc2" to "pc1" (Connection refused)
[proxy:0:0#pc2] main (./pm/pmiserv/pmip.c:209): unable to connect to server ubuntu at port 57395 (check for firewalls!)
Although I configured SSH connections as without password and disabled firewalls on each machines, the error is still there. My operating system is Ubuntu 12.04 and mpi is MPICH2.
Is there anyone to help?
the error is caused by the the client not connecting back to server as it doesnt know the ip of the server i.e
..main (./pm/pmiserv/pmip.c:209): unable to connect to server ubuntu at...etc
the fix is to add each of hostname and related ip in the /etc/hosts i.e
172.17.0.2 master
172.17.0.3 node1
172.17.0.4 node2
this should allow for bi-directional communication of the master and the node clients
I had the same error, but the accepted answer did not help me.
For me in the hosts file I had:
localhost:8
CPUX:2
I should of had:
CPUZ:8
CPUX:2
I.e the name of the node instead of localhost. Maybe this might help some one.
Fixed. After I followed these steps, the error disappeared:
Create administrator user accounts in both machines with the same username and password.
Define hostnames by editing the file: /etc/hosts
Make a clean install of ssh in both machines.
Configure ssh for connecting without a password. To do this follow these links:
http://www.thegeekstuff.com/2008/11/3-steps-to-perform-ssh-login-without-password-using-ssh-keygen-ssh-copy-id/ and http://dustymabe.com/2012/08/18/exchanging-ssh-keys-using-ssh-copy-id/
Locate the executable MPI program into the same paths in both machines.
montekristo_07's answer is mostly correct but not minimal; steps #2 and #3 are not strictly necessary.
You do not need to edit all your hosts' /etc/hosts files, and, if your LAN uses DHCP and you have any local DNS service running, you should not edit all your hosts' /etc/hosts files.
Insure that:
only externally-resolvable hostnames are referenced in your mpiexec command line (i.e. not "localhost"), and
the /etc/hosts file on the master (the machine on which you run mpiexec) does not have a line associating the public name of the master with the loopback address (127.0.0.1)
A simple test is to use literal IP addresses in your mpiexec command line. If this fixes your problem, then it's a hostname resolution problem...somewhere.
What is essential is to remember is that what is passed on your mpiexec command line, in particular host names, are going to be sent to and resolved on remote hosts.