I've an f1-micro instance which I've been testing docker on created as such:
$ gcloud compute instances create dockerbox \
--image container-vm-v20140731 \
--image-project google-containers \
--zone europe-west1-b \
--machine-type f1-micro
This all works fine.
I'm now in the process of upgrading to a larger google compute engine VM. I've taken a snapshot of the fi-micro dockerbox, then used this as the Boot Source for the larger n1-standard-8 VM... this seems to create without problems until I try to ssh onto it.
via the command line:
$ gcloud compute --project "secure-electron-631" ssh --zone "europe-west1-b" "me#biggerbox"
ssh: connect to host xx.xx.xx.xx port 22: Connection timed out
ERROR: (gcloud.compute.ssh) Your SSH key has not propagated to your instance yet. Try running this command again.
via the browser, ssh connection I get:
Connection Failed
We are unable to connect to the VM on port 22. Please check that the VM is healthy and the SSH server is running.
I've tried multiple times but same result
I've confirmed it biggerbox is RUNNING. not sure about sshd
OK, problem seemed to stem from not detaching the micro instance from a mounted persistant disk when I took the snapshot. Detached and unmounted the PD volume and snapshotted the micro-instance again and based a new n1-standard-8 on it. Works ok now.
FYI, also handy for those troubleshooting GCE instance ssh:
https://github.com/GoogleCloudPlatform/compute-ssh-diagnostic-sh
Related
I think I'm missing one step in the script below.
The first time I run it, the VM gets created just fine, but the connection is refused. It continues to be refused even if I wait ten minutes after creating the VM.
However, if I use the GCP console to connect manually "Open in browser window", I get the message "Transferring SSH keys...", and the connection works. After this step, the script can connect fine.
What should I add to this script to get it to work without having to manually connect from the console?
#!/bin/bash
MY_INSTANCE="janne"
MY_TEMPLATE="dev-tf-nogpu-template"
HOME_PATH="/XXX/data/celeba/"
# Create instance
gcloud compute instances create $MY_INSTANCE --source-instance-template $MY_TEMPLATE
# Start instance
gcloud compute instances start $MY_INSTANCE
# Copy needed directories & files
gcloud compute scp ${HOME_PATH}src/ $MY_INSTANCE:~ --recurse --compress
gcloud compute scp ${HOME_PATH}save/ $MY_INSTANCE:~ --recurse --compress
gcloud compute scp ${HOME_PATH}pyinstall $MY_INSTANCE:~
gcloud compute scp ${HOME_PATH}gcpstartup.sh $MY_INSTANCE:~
# Execute startup script
gcloud compute ssh --zone us-west1-b $MY_INSTANCE --command "bash gcpstartup.sh"
# Connect over ssh
gcloud compute ssh --project XXX --zone us-west1-b $MY_INSTANCE
The full output of this script is:
(base) xxx#ubu-dt:/XXX/data/celeba$ bash gcpcreate.sh
Created [https://www.googleapis.com/compute/v1/projects/XXX/zones/us-west1-b/instances/janne].
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
janne us-west1-b n1-standard-1 XXX XXX RUNNING
Starting instance(s) janne...done.
Updated [https://compute.googleapis.com/compute/v1/projects/xxx/zones/us-west1-b/instances/janne].
ssh: connect to host 34.83.3.161 port 22: Connection refused
lost connection
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
ssh: connect to host 34.83.3.161 port 22: Connection refused
lost connection
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
ssh: connect to host 34.83.3.161 port 22: Connection refused
lost connection
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
ssh: connect to host 34.83.3.161 port 22: Connection refused
lost connection
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
ssh: connect to host 34.83.3.161 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
ssh: connect to host 34.83.3.161 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
Edit: adding gcloud version info
(base) bjorn#ubu-dt:/media/bjorn/data/celeba$ gcloud version
Google Cloud SDK 269.0.0
alpha 2019.10.25
beta 2019.10.25
bq 2.0.49
core 2019.10.25
gsutil 4.45
kubectl 2019.10.25
The solution I found is this: wait.
For OS login, SSH starts working about 20 seconds after the instance is started.
For non-OS login, it takes about a minute.
So I just added this after gcloud compute instances start $MY_INSTANCE
sleep 20s
When you connect through Console it manages the keys for you.
Your last comment leads me to believe that when you connect from console you are generating an SSH key and it somehow allows you to run the script, I would recommend you to take a look at how to manage SSH keys in metadata and creating your own SSH key to access through the SDK.
If outside of the script through the SDK you cannot directly SSH either then I assume that it's because of the same reason of the generated key.
Also please make sure that when using the SDK the service account has the correct permissions.
Let me know.
I am attempting to connect (via SSH) one GCE VM instance to another GCE VM instance (which will be referred to as Machine 1 and Machine 2 from now one).
So far I have generated (via ssh-keygen -t rsa -f ~/.ssh/ssh_key) a public and private key on Machine 1, and have added the contents of ssh_key.pub to the ~/.ssh/authorized_keys file on Machine 2.
However, whenever I try to connect them via ssh using the following command: gcloud compute ssh --project [PROJECT_ID] --zone [ZONE] [Machine_2_Name] it simply times out (Connection timed out. ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].)
I have doubled checked that each VM instance has plenty of disk space, and their firewall settings are permissive, and OS Login is not enabled. I have read through the answer here but nothing is working.
What am I doing wrong? How do I properly SSH from one GCE VM instance to another?
The problem I was having was that each VM was using a different network/sub-network with different firewall configurations. After making one using the same network/sub-network, I was able to easily ssh into one from the other via
username#machine1:~$ ssh machine2
I tested the same scenario on my side and I got the same result as you said. Then I ran this command inside the machine to debug the SSH process to try to narrow down the issue:
gcloud compute ssh YOUR_INSTANCE_NAME --zone ZONE --ssh-flag="-vvv"
Then I got this result:
debug1: connect to address 35.x.x.x port 22: Connection timed out
ssh: connect to host 35.x.x.x port 22: Connection timed out
So, means the instance 1 is unable to connect to the external IP address of instance 2. I only added a new firewall rule and it works.
After running above mentioned command, if you see any permission denied message, it means you did not copy the public key to the source machine properly.
small cluster. 1 master, 2 workers. I can access all nodes (master+slave) just fine using gcloud SDK. However, once I access the master node and try to ssh to a slave node, I get "permission denied (publickey)" error. Note that I can ping the node successfully, but SSH does not work.
Dataproc does not install SSH keys between the master and worker nodes, so that is working as intended.
You may be able to use SSH agent forwarding. With something like:
# Add Compute Engine private key to SSH agent
ssh-add ~/.ssh/google_compute_engine
# Forward key to SSH agent of master
gcloud compute ssh --ssh-flag="-A" [CLUSTER]-m
# SSH into worker
ssh [CLUSTER]-w-0
You could also configure SSH keys using an initialization action or use gcloud ssh from the master node (if you gave the cluster the compute.rw scope).
I am trying to ssh into a VM from another VM in Google Cloud using the gcloud compute ssh command. It fails with the below message:
/usr/local/bin/../share/google/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py:9: DeprecationWarning: the sets module is deprecated
import sets
Connection timed out
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]. See https://cloud.google.com/compute/docs/troubleshooting#ssherrors for troubleshooting hints.
I made sure the ssh keys are in place but still it doesn't work. What am I missing here?
There is an assumption that you have connected to the externally-visible instance using SSH beforehand with gcloud.
From your local machine, start ssh-agent with the following command to manage your keys for you:
me#local:~$ eval `ssh-agent`
Call ssh-add to load the gcloud compute public keys from your local computer into the agent, and use them for all SSH commands for authentication:
me#local:~$ ssh-add ~/.ssh/google_compute_engine
Log into an instance with an external IP address while supplying the -A argument to enable authentication agent forwarding.
gcloud compute ssh --ssh-flag="-A" INSTANCE
source: https://cloud.google.com/compute/docs/instances/connecting-to-instance#sshbetweeninstances.
I am not sure about the 'flags' because it's not working for me bu maybe I have a different OS or Gcloud version and it will work for you.
Here are the steps I ran on my Mac to connect to the Google Dataproc master VM and then hop onto a worker VM from the master MV. I ssh'd to the master VM to get the IP.
$ gcloud compute ssh cluster-for-cameron-m
Warning: Permanently added '104.197.45.35' (ECDSA) to the list of known hosts.
I then exited. I enabled forwarding for that host.
$ nano ~/.ssh/config
Host 104.197.45.35
ForwardAgent yes
I added the gcloud key.
$ ssh-add ~/.ssh/google_compute_engine
I then verified that it was added by listing the key fingerprints with ssh-add -l. I reconnected to the master VM and ran ssh-add -l again to verify that the keys were indeed forwarded. After that, connecting to the worker node worked just fine.
ssh cluster-for-cameron-w-0
About using SSH Agent Forwarding...
Because instances are frequently created and destroyed on the cloud, the (recreated) host fingerprint keeps changing. If the new fingerprint doesn't match with ~/.ssh/known_hosts, SSH automatically disables Agent Forwarding. The solution is:
$ ssh -A -o UserKnownHostsFile=/dev/null ...
I'm having issues using ssh to log in to a VM created from a custom image.
I followed the steps for creating an image from an existing GCE instance.
I have successfully created the image, uploaded it to Google Cloud Storage and added it as an image to my project, yet when I try to connect to the new image, I get a "Connection Refused".
I can see other applications running on other ports for the new image, so it seems to be just ssh that is affected.
The steps I did are below:
...create an image from existing GCE instance (one I can log into fine via ssh)..then:
gcutil --project="river-ex-217" addimage example2 http://storage.googleapis.com/example-image/f41aca6887c339afb0.image.tar.gz
gcutil --project="river-ex-217" addinstance --image=example2 --machinetype=n1-standard-1 anothervm
gcutil --service_version="v1" --project="river-ex-217" ssh --zone="europe-west1-a" "anothervm"
Which outputs:
INFO: Running command line: ssh -o UserKnownHostsFile=/dev/null -o CheckHostIP=no -o StrictHostKeyChecking=no -i /Users/mark1/.ssh/google_compute_engine -A -p 22 mark1#23.251.133.2 --
ssh: connect to host 23.251.133.2 port 22: Connection refused
I've tried deleting the sshKeys metadata as suggested in another SO answer, and reconnecting which did this:
INFO: Updated project with new ssh key. It can take several minutes for the instance to pick up the key.
INFO: Waiting 120 seconds before attempting to connect.
INFO: Running command line: ssh -o UserKnownHostsFile=/dev/null -o CheckHostIP=no -o StrictHostKeyChecking=no -i /Users/mark1/.ssh/google_compute_engine -A -p 22 mark1#23.251.133.2 --
ssh: connect to host 23.251.133.2 port 22: Connection refused
I then try for the first instance in another zone, it works fine with the new key:
gcutil --service_version="v1" --project="river-ex-217" ssh --zone="europe-west1-b" "image1"
Both instances are running on the same "default" network with port 22 running, and ssh works for the first instance the image is created from.
I tried nc command from the other instance and my local machine, it shows no output:
nc 23.251.133.2 22
...whilst the original VM's ip shows this output:
nc 192.157.29.255 22
SSH-2.0-OpenSSH_6.0p1 Debian-4
I've tried remaking the image again and re-adding the instance, no difference.
I've tried logging in to the first instance, and switching user to one on that machine (which should be the same as the second machine?), and ssh from there.
WARNING: You don't have an ssh key for Google Compute Engine. Creating one now...
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
INFO: Updated project with new ssh key. It can take several minutes for the instance to pick up the key.
INFO: Waiting 300 seconds before attempting to connect.
INFO: Running command line: ssh -o UserKnownHostsFile=/dev/null -o CheckHostIP=no -o StrictHostKeyChecking=no -i /home/mark/.ssh/google_compute_engine -A -p 22 mark#23.251.133.2 -- --zone=europe-west1-a
ssh: connect to host 23.251.133.2 port 22: Connection refused
I'm out of ideas, any help greatly appreciated :) The maddening thiing is I can see the new VM is live with the application ready, I just need to add a few files to it and set up some cronjobs. I guess I could do this pre-image making, but I would like to be able to log in at a later date and modify it, without needing to take 1hr to create images and launch new instances every time.
Yours faithfully,
Mark
This question appears to be about how to debug SSH connectivity problems with images, so here is my answer to that.
It appears that your instance may not be running the SSH server properly. There may be something amiss with the prepared image.
Possibly useful debugging questions to ask yourself:
Did you use gcimagebundle to bundle up the image or did it manually? Consider using the tool to make sure there isn't something you missed.
Did you change anything about the ssh server configuration before bundling the image?
When the instance is booting, check it's console output for ssh messages - it should mention regenerating the keys, starting sshd daemon and listening on port 22. If it does not or complains about something related to ssh, you should follow up on that.
You covered these, but for sake of completeness, these should also be checked:
Can you otherwise reach the VM after it comes up? Does it respond on webserver ports (if any) or respond to ping?
Double check that the network you VM is on allows SSH (port 22) access from the host you are connecting from.
You can compare your ssh setup to that of a working image:
Create a new disk (disk-mine-1) from your image.
Create a new disk (disk-upstream-1) from any working boot image, for example the debian wheezy one.
Attach both of these to a VM you can access (either on console or from cli).
SSH into the VM.
Mount both of the images (sudo mkdir /mnt/{mine,upstream} && sudo mount /dev/sdb1 /mnt/mine && sudo mount /dev/sdc1 /mnt/upstream). Note that whether your image is sdb or sdc depends on the order you attached the images!
Look for differences between the ssh config (diff -waur /mnt/{mine,upstream}/etc/ssh). There should not be any unless you specifically need them.
Also check if your image has proper /mnt/mine/etc/init.d/{ssh,generate-ssh-hostkeys} scripts. They should also be linked from /mnt/mine/etc/rc{S,2}.d (S10generate-ssh-hostkeys and S02ssh respectively).