Cannot SSH Dataproc Master in Cluster

Cannot SSH Dataproc Master in Cluster - ssh

I've been trying to create a cluster in Dataproc using as initialization script the jupyter repository.
But when I try to ssh to the master so to be able to access the Jupyter interface running this command:
gcloud compute ssh --zone=zone_name \
--ssh-flag="-D 10000" \
--ssh-flag="-N" \
--ssh-flag="-n" "cluster1-m" &
I get the error:
Permission denied (publickey). ERROR: (gcloud.compute.ssh)
[/usr/bin/ssh] exited with return code [255].
I could confirm that all ssh keys are created normally. I tried this other option then:
gcloud compute ssh --zone=zone_name \
--ssh-flag="-D 10000" \
--ssh-flag="-N" \
--ssh-flag="-n" "will#cluster1-m" &
Which seems to work as I can ssh into the instance but now I get the error:
bind: Cannot assign requested address
channel_setup_fwd_listener_tcpip: cannot listen to port: 10000 Could
not request local forwarding.
For creating the cluster I used:
gcloud dataproc clusters create $CLUSTER_NAME \
--metadata "JUPYTER_PORT=8124,JUPYTER_CONDA_PACKAGES=numpy:pandas:scikit-learn:jinja2:mock:pytest:pytest-cov" \
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh \
--bucket $BUCKET_NAME
and I'm running this in a docker image Debian 8.9 (jessie).
If you need any extra information please let me know.

If you verified you can do a normal SSH into the cluster, then if you're just getting the bind: Cannot assign requested address error it probably means you have another SSH session with local port forwarding already on your current machine using port 10000 already. If you see the bind error, you should always first try a different local port, like -D 12345. You can use top or your task manager to check if you have a hanging ssh -D command somewhere still running and occupying port 10000.

Related

Unable to ssh to master node of Google Cloud Dataproc, but can ssh to Compute Engine VM

I am having no trouble sshing into a Google Cloud compute engine VM, but am unable to ssh into the master node of a Google Cloud Dataproc cluster.
Specifically,
gcloud compute ssh my-vm
works just fine, while
gcloud compute ssh mycluster-m
fails with error message:
admin#IP.ADDRESS: Permission denied (publickey).
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
The compute engine VM and the Dataproc cluster are in the same project. I understand from the error message it is something related to the ssh keys, but I am not sure how to fix it - I checked the ssh keys in the project via cloud console, and it is correct, and tried the usual gcloud auth login to reset gcloud project login details.
Any hints on how to fix this?
Edits: I am trying to ssh from my machine, not the cloud console- that's a good point, I will try that and see if that is possible. But in the end I want to use this to connect to a Jupyter notebook from my local computer, so that does not solve the issue of being unable to SSH from my machine to the VM.
Concerning the command to create the Dataproc cluster, I use tools from the hail dataproc python library, but these are basically just convenience shells for the gcloud compute commands, and this is what is failing. But the command I used to create the Dataproc cluster was:
gcloud beta dataproc clusters create \
test \
--image-version=1.4-debian9 \
--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g \
--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.53/init_notebook.py \
--metadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/0.2.53/hail-0.2.53-py3-none-any.whl|||PKGS=aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|dill>=0.3.1.1,<0.4|gcsfs==0.2.1|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests>=2.21.0,<2.21.1|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.* \
--master-machine-type=n1-highmem-8 \
--master-boot-disk-size=100GB \
--num-master-local-ssds=0 \
--num-preemptible-workers=0 \
--num-worker-local-ssds=0 \
--num-workers=2 \
--preemptible-worker-boot-disk-size=40GB \
--worker-boot-disk-size=40GB \
--worker-machine-type=n1-standard-8 \
--initialization-action-timeout=20m \
--labels=creator=my_name \
--max-idle=10m

Turns out the problem is that the cluster creates a new account called my_username on the cluster master VM, but I am logged into my laptop as a user called 'admin'. So there is a mismatch between account name and key at the destination, so the login fails.
Can be fixed by adding username to the gcloud command:
gcloud compute ssh my_username#mycluster-m
Though I still don't really understand why the ssh keys are different for the dataproc VM and a compute engine VM, I'd be happy if someone can enlighten me.

gcloud compute ssh from one VM to another VM on Google Cloud

I am trying to ssh into a VM from another VM in Google Cloud using the gcloud compute ssh command. It fails with the below message:
/usr/local/bin/../share/google/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
import sets
Connection timed out
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
I made sure the ssh keys are in place but still it doesn't work. What am I missing here?

There is an assumption that you have connected to the externally-visible instance using SSH beforehand with gcloud.
From your local machine, start ssh-agent with the following command to manage your keys for you:
me#local:~$ eval `ssh-agent`
Call ssh-add to load the gcloud compute public keys from your local computer into the agent, and use them for all SSH commands for authentication:
me#local:~$ ssh-add ~/.ssh/google_compute_engine
Log into an instance with an external IP address while supplying the -A argument to enable authentication agent forwarding.
gcloud compute ssh --ssh-flag="-A" INSTANCE
source: https://cloud.google.com/compute/docs/instances/connecting-to-instance#sshbetweeninstances.
I am not sure about the 'flags' because it's not working for me bu maybe I have a different OS or Gcloud version and it will work for you.

Here are the steps I ran on my Mac to connect to the Google Dataproc master VM and then hop onto a worker VM from the master MV. I ssh'd to the master VM to get the IP.
$ gcloud compute ssh cluster-for-cameron-m
Warning: Permanently added '104.197.45.35' (ECDSA) to the list of known hosts.
I then exited. I enabled forwarding for that host.
$ nano ~/.ssh/config
Host 104.197.45.35
ForwardAgent yes
I added the gcloud key.
$ ssh-add ~/.ssh/google_compute_engine
I then verified that it was added by listing the key fingerprints with ssh-add -l. I reconnected to the master VM and ran ssh-add -l again to verify that the keys were indeed forwarded. After that, connecting to the worker node worked just fine.
ssh cluster-for-cameron-w-0

About using SSH Agent Forwarding...
Because instances are frequently created and destroyed on the cloud, the (recreated) host fingerprint keeps changing. If the new fingerprint doesn't match with ~/.ssh/known_hosts, SSH automatically disables Agent Forwarding. The solution is:
$ ssh -A -o UserKnownHostsFile=/dev/null ...

How to resolve Permission denied (publickey,gssapi-keyex,gssapi-with-mic)?

This question may have been asked before but I don't understand the concept. Can you please help me here?
Weird issue from this morning .. see i just push my file to google cloud computing then showing below error.. I don't know where to look that error.
ri#ri-desktop:~$ gcloud compute --project "project" ssh --zone "europe-west1-b" "instance"
Warning: Permanently added '192.xx.xx.xx' (ECDSA) to the list of known hosts.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].

This occurs when your compute instance has PermitRootLogin no in it's SSHD config and you try to login as root. You can change the login user by adding username# before the instance name. Here is a complete example:
gcloud compute instances create my-demo-compute \
--zone us-central1-f \
--machine-type f1-micro \
--image-project debian-cloud \
--image-family debian-8 \
--boot-disk-size=10GB
gcloud --quiet compute ssh user#hostname --zone us-central1-f
In the example above, gcloud will set the correct credentials and will make sure you login. You can add the --quiet to ignore the ssh-password question.

One possible cause is that someone else in your project set the per-instance metadata for sshKeys (which overrides the project-wide metadata). When you run gcloud compute instances describe your-instance-name do you see a key called sshKeys in the metadata items?
It would also be helpful to see the contents of the latest log in ~/.config/gcloud/logs/. However, please make sure to scrub it of sensitive information.

I have a MacBook after facing with same problem, I re-created my SSH key in this format and works fine.
Generate your key with:
ssh-keygen -t rsa -C your_username
Copy the key and paste the ssh key under compute Engine metadata:
cat ~/.ssh/id_rsa.pub
It should work fine

Creating Instances from Snapshots

I've an f1-micro instance which I've been testing docker on created as such:
$ gcloud compute instances create dockerbox \
--image container-vm-v20140731 \
--image-project google-containers \
--zone europe-west1-b \
--machine-type f1-micro
This all works fine.
I'm now in the process of upgrading to a larger google compute engine VM. I've taken a snapshot of the fi-micro dockerbox, then used this as the Boot Source for the larger n1-standard-8 VM... this seems to create without problems until I try to ssh onto it.
via the command line:
$ gcloud compute --project "secure-electron-631" ssh --zone "europe-west1-b" "me#biggerbox"
ssh: connect to host xx.xx.xx.xx port 22: Connection timed out
ERROR: (gcloud.compute.ssh) Your SSH key has not propagated to your instance yet. Try running this command again.
via the browser, ssh connection I get:
Connection Failed
We are unable to connect to the VM on port 22. Please check that the VM is healthy and the SSH server is running.
I've tried multiple times but same result
I've confirmed it biggerbox is RUNNING. not sure about sshd

OK, problem seemed to stem from not detaching the micro instance from a mounted persistant disk when I took the snapshot. Detached and unmounted the PD volume and snapshotted the micro-instance again and based a new n1-standard-8 on it. Works ok now.
FYI, also handy for those troubleshooting GCE instance ssh:
https://github.com/GoogleCloudPlatform/compute-ssh-diagnostic-sh

Using knife ec2 plugin to create VM in VPC private subnet

Although I've written a fair amount of chef, I'm fairly new to both AWS/VPC and administrating network traffic (especially a bastion host).
Using the knife ec2 plugin, I would like the capability to dynamically create and bootstrap a VM from my developer workstation. The VM should be able to exist in either a public or private subnet of my VPC. I would like to do all of this without use of an elastic IP. I would also like for my bastion host to be hands off (i.e. I would like to avoid having to create explicit per-VM listening tunnels on my bastion host)
I have successfully used the knife ec2 plugin to create a VM in the legacy EC2 model (e.g. outside of my VPC). I am now trying to create an instance in my VPC. On the knife command line, I'm specifying a gateway, security groups, subnet, etc. The VM gets created, but knife fails to ssh to it afterward.
Here's my knife command line:
knife ec2 server create \
--flavor t1.micro \
--identity-file <ssh_private_key> \
--image ami-3fec7956 \
--security-group-ids sg-9721e1f8 \
--subnet subnet-e4764d88 \
--ssh-user ubuntu \
--server-connect-attribute private_ip_address \
--ssh-port 22 \
--ssh-gateway <gateway_public_dns_hostname (route 53)> \
--tags isVPC=true,os=ubuntu-12.04,subnet_type=public-build-1c \
--node-name <VM_NAME>
I suspect that my problem has to do with the configuration of my bastion host. After a day of googling, I wasn't able to find a configuration that works. I'm able to ssh to the bastion host, and from there I can ssh to the newly created VM. I cannot get knife to successfully duplicate this using the gateway argument.
I've played around with /etc/ssh/ssh_config. Here is how it exists today:
ForwardAgent yes
#ForwardX11 no
#ForwardX11Trusted yes
#RhostsRSAAuthentication no
#RSAAuthentication yes
#PasswordAuthentication no
#HostbasedAuthentication yes
#GSSAPIAuthentication no
#GSSAPIDelegateCredentials no
#GSSAPIKeyExchange no
#GSSAPITrustDNS no
#BatchMode no
CheckHostIP no
#AddressFamily any
#ConnectTimeout 0
StrictHostKeyChecking no
IdentityFile ~/.ssh/identity
#IdentityFile ~/.ssh/id_rsa
#IdentityFile ~/.ssh/id_dsa
#Port 22
#Protocol 2,1
#Cipher 3des
#Ciphers aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc
#MACs hmac-md5,hmac-sha1,umac-64#openssh.com,hmac-ripemd160
#EscapeChar ~
Tunnel yes
#TunnelDevice any:any
#PermitLocalCommand no
#VisualHostKey no
#ProxyCommand ssh -q -W %h:%p gateway.example.com
SendEnv LANG LC_*
HashKnownHosts yes
GSSAPIAuthentication yes
GSSAPIDelegateCredentials no
GatewayPorts yes
I have also set /home/ubuntu/.ssh/identity to the matching private key of my new instance.
UPDATE:
I notice the following in the bastion host's /var/log/auth.log:
May 9 12:15:47 ip-10-0-224-93 sshd[8455]: Invalid user from <WORKSTATION_IP>
May 9 12:15:47 ip-10-0-224-93 sshd[8455]: input_userauth_request: invalid user [preauth]

I finally resolved this. I was missing the username when specifying my gateway. I originally thought that the --ssh-user argument would be used for both the gateway AND the VM I'm attempting to bootstrap. This was incorrect, username must be specified for both.
knife ec2 server create \
--flavor t1.micro \
--identity-file <ssh_private_key> \
--image ami-3fec7956 \
--security-group-ids sg-9721e1f8 \
--subnet subnet-e4764d88 \
--ssh-user ubuntu \
--server-connect-attribute private_ip_address \
--ssh-port 22 \
--ssh-gateway ubuntu#<gateway_public_dns_hostname (route 53)> \
--tags isVPC=true,os=ubuntu-12.04,subnet_type=public-build-1c \
--node-name <VM_NAME>
Just the line containing the update (notice the ubuntu# in front):
--ssh-gateway ubuntu#<gateway_public_dns_hostname (route 53)>
I have now gone through and locked my bastion host back down, including removal of /home/ubuntu/.ssh/identity, as storing the private key on the bastion host was really bugging me.
FYI: When setting up a bastion host, the "out of the box" configuration of sshd will work when using the Amazon Linux AMI image. Also, some of the arguments above are optional, such as --ssh-port.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Cannot SSH Dataproc Master in Cluster - ssh

Related

Unable to ssh to master node of Google Cloud Dataproc, but can ssh to Compute Engine VM

gcloud compute ssh from one VM to another VM on Google Cloud

How to resolve Permission denied (publickey,gssapi-keyex,gssapi-with-mic)?

Creating Instances from Snapshots

Using knife ec2 plugin to create VM in VPC private subnet

Categories

Resources