GCP: how to use CLI to connect with SSH to newly created VM? - ssh

I think I'm missing one step in the script below.
The first time I run it, the VM gets created just fine, but the connection is refused. It continues to be refused even if I wait ten minutes after creating the VM.
However, if I use the GCP console to connect manually "Open in browser window", I get the message "Transferring SSH keys...", and the connection works. After this step, the script can connect fine.
What should I add to this script to get it to work without having to manually connect from the console?
#!/bin/bash
MY_INSTANCE="janne"
MY_TEMPLATE="dev-tf-nogpu-template"
HOME_PATH="/XXX/data/celeba/"
# Create instance
gcloud compute instances create $MY_INSTANCE --source-instance-template $MY_TEMPLATE
# Start instance
gcloud compute instances start $MY_INSTANCE
# Copy needed directories & files
gcloud compute scp ${HOME_PATH}src/ $MY_INSTANCE:~ --recurse --compress
gcloud compute scp ${HOME_PATH}save/ $MY_INSTANCE:~ --recurse --compress
gcloud compute scp ${HOME_PATH}pyinstall $MY_INSTANCE:~
gcloud compute scp ${HOME_PATH}gcpstartup.sh $MY_INSTANCE:~
# Execute startup script
gcloud compute ssh --zone us-west1-b $MY_INSTANCE --command "bash gcpstartup.sh"
# Connect over ssh
gcloud compute ssh --project XXX --zone us-west1-b $MY_INSTANCE
The full output of this script is:
(base) xxx#ubu-dt:/XXX/data/celeba$ bash gcpcreate.sh
Created [https://www.googleapis.com/compute/v1/projects/XXX/zones/us-west1-b/instances/janne].
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
janne us-west1-b n1-standard-1 XXX XXX RUNNING
Starting instance(s) janne...done.
Updated [https://compute.googleapis.com/compute/v1/projects/xxx/zones/us-west1-b/instances/janne].
ssh: connect to host 34.83.3.161 port 22: Connection refused
lost connection
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
ssh: connect to host 34.83.3.161 port 22: Connection refused
lost connection
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
ssh: connect to host 34.83.3.161 port 22: Connection refused
lost connection
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
ssh: connect to host 34.83.3.161 port 22: Connection refused
lost connection
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
ssh: connect to host 34.83.3.161 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
ssh: connect to host 34.83.3.161 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
Edit: adding gcloud version info
(base) bjorn#ubu-dt:/media/bjorn/data/celeba$ gcloud version
Google Cloud SDK 269.0.0
alpha 2019.10.25
beta 2019.10.25
bq 2.0.49
core 2019.10.25
gsutil 4.45
kubectl 2019.10.25

The solution I found is this: wait.
For OS login, SSH starts working about 20 seconds after the instance is started.
For non-OS login, it takes about a minute.
So I just added this after gcloud compute instances start $MY_INSTANCE
sleep 20s

When you connect through Console it manages the keys for you.
Your last comment leads me to believe that when you connect from console you are generating an SSH key and it somehow allows you to run the script, I would recommend you to take a look at how to manage SSH keys in metadata and creating your own SSH key to access through the SDK.
If outside of the script through the SDK you cannot directly SSH either then I assume that it's because of the same reason of the generated key.
Also please make sure that when using the SDK the service account has the correct permissions.
Let me know.

Related

Problems connecting one GCE instance to another via SSH

I am attempting to connect (via SSH) one GCE VM instance to another GCE VM instance (which will be referred to as Machine 1 and Machine 2 from now one).
So far I have generated (via ssh-keygen -t rsa -f ~/.ssh/ssh_key) a public and private key on Machine 1, and have added the contents of ssh_key.pub to the ~/.ssh/authorized_keys file on Machine 2.
However, whenever I try to connect them via ssh using the following command: gcloud compute ssh --project [PROJECT_ID] --zone [ZONE] [Machine_2_Name] it simply times out (Connection timed out. ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].)
I have doubled checked that each VM instance has plenty of disk space, and their firewall settings are permissive, and OS Login is not enabled. I have read through the answer here but nothing is working.
What am I doing wrong? How do I properly SSH from one GCE VM instance to another?
The problem I was having was that each VM was using a different network/sub-network with different firewall configurations. After making one using the same network/sub-network, I was able to easily ssh into one from the other via
username#machine1:~$ ssh machine2
I tested the same scenario on my side and I got the same result as you said. Then I ran this command inside the machine to debug the SSH process to try to narrow down the issue:
gcloud compute ssh YOUR_INSTANCE_NAME --zone ZONE --ssh-flag="-vvv"
Then I got this result:
debug1: connect to address 35.x.x.x port 22: Connection timed out
ssh: connect to host 35.x.x.x port 22: Connection timed out
So, means the instance 1 is unable to connect to the external IP address of instance 2. I only added a new firewall rule and it works.
After running above mentioned command, if you see any permission denied message, it means you did not copy the public key to the source machine properly.

GCE VM cannot SSH to the new GCE VM it has just created in a different project

I'd like to solve the following problem using command line:
I'm trying to run the following PoC script from a GCE VM in project-a.
gcloud config set project project-b
gcloud compute instances create gce-vm-b --zone=us-west1-a
gcloud compute ssh --zone=us-west1-a gce-vm-b -- hostname
The VM is created successfully:
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
gce-vm-b us-west1-a n1-standard-16 10.12.34.56 12.34.56.78 RUNNING
But get the following error when trying to SSH:
WARNING: The public SSH key file for gcloud does not exist.
WARNING: The private SSH key file for gcloud does not exist.
WARNING: You do not have an SSH key for gcloud.
WARNING: SSH keygen will be executed to generate a key.
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/google_compute_engine.
Your public key has been saved in /root/.ssh/google_compute_engine.pub.
The key fingerprint is:
...
Updating project ssh metadata...
.....................Updated [https://www.googleapis.com/compute/v1/projects/project-b].
>.done.
>Waiting for SSH key to propagate.
>ssh: connect to host 12.34.56.78 port 22: Connection timed out
>ERROR: (gcloud.compute.ssh) Could not SSH into the instance. It is possible that your SSH key has not propagated to the instance yet. Try running this command again. If you still cannot connect, verify that the firewall and instance are set to accept ssh traffic.
Running gcloud compute config-ssh hasn't changed anything in the error message. It's still ssh: connect to host 12.34.56.78 port 22: Connection timed out
I've tried adding a firewall rule to the project:
gcloud compute firewall-rules create default-allow-ssh --allow tcp:22
.
Creating firewall...
...........Created [https://www.googleapis.com/compute/v1/projects/project-b/global/firewalls/default-allow-ssh].
done.
NAME NETWORK DIRECTION PRIORITY ALLOW DENY
default-allow-ssh default INGRESS 1000 tcp:22
The error is now Permission denied (publickey).
gcloud compute ssh --zone=us-west1-a gce-vm-b -- hostname
.
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added 'compute.4123124124324242' (ECDSA) to the list of known hosts.
Permission denied (publickey).
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
P.S. The project-a "VM" is a container run by Prow cluster (which is run by GKE).
"Permission denied (publickey)" means it is unable to validate the public key for the username.
You haven't specified the user in your command, so the user from the environment is selected and it may not be allowed into the instance gce-vm-b. Specify a valid user for the instance in your command according to the public SSH key metadata.

GCP- SSH connection timed out

I've been using ssh to connect to my Google Cloud Compute instance and it's been working fine. However, I left some code running on my instance and shut down my laptop. After turning it back on, I saw that the connection was disconnected with a port 22: Broken pipe error. Since then, I haven't been able to ssh into my instance. I get this error each time-
ssh: connect to host <IP> port 22: Operation timed out
I'm new to SSH (just a data scientist trying to train some models on GCP..) and not sure how to proceed. Would appreciate any pointers. Thanks!
ssh /authorized_keys using
command ls -la
if you have run this ssh -i [PATH_TO_PRIVATE_KEY] [USERNAME]#[EXTERNAL_IP_ADDRESS]
if not configure keygen to have private key

gcloud compute ssh refuses connection (return code 255)

I cannot get ssh access into the vm instance created by Google Cloud command line tool (gcloud).
Symptom:
sudo gcloud compute ssh myuser#ubuntu
ssh: connect to host 104.155.16.104 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
What I did:
1: Verify that firewall is open on port 22
gcloud compute firewall-rules list
returned
NAME NETWORK SRC_RANGES RULES SRC_TAGS TARGET_TAGS
allow-rstudio default 0.0.0.0/0 tcp:8787 allow-rstudio
default-allow-http default 0.0.0.0/0 tcp:80 http-server
default-allow-https default 0.0.0.0/0 tcp:443
https-server
default-allow-icmp default 0.0.0.0/0 icmp
default-allow-internal default 10.128.0.0/9 tcp:0-65535,udp:0-65535,icmp
default-allow-rdp default 0.0.0.0/0 tcp:3389
default-allow-ssh default 0.0.0.0/0 tcp:22
2: Renew public key
ssh-keygen -t rsa -f ~/.ssh/google_compute_engine -C myuser
3: Update metadata with new public key
sudo gcloud compute ssh myuser#ubuntu
Updating project ssh metadata...
Updating project ssh metadata...done.
Waiting for SSH key to propagate.
Then, still the same error message:
ssh: connect to host 35.187.38.82 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
I should add that I could access ssh until today, and of course, I did authentication before with
gcloud auth login
SSH from the Google Cloud web interface works! What is different there?
Would be grateful for any help!!
After a long search, I finally found the underlying reason for this tricky problem. I hope that this will help some people in desperation...
The reason you may get your ssh connection refused is that accidentally, the internal routing for external ip requests was deleted. You can check this by:
gcloud compute routes list
If this does not return a list including the following entry:
default-internet default 0.0.0.0/0 default-internet-gateway 1000
Then you must re-create this entry by:
gcloud compute routes create default-internet \
--destination-range 0.0.0.0/0 \
--next-hop-gateway default-internet-gateway
In my case, after I made an upgrade of the GCP instance (just added more processor and memory).
My Circle CI deploy started throwing:
Authentication failed.
Exited with code 255
After a couple of hours trying to figure out what messed up, I found that the contents of the /etc/ssh/sshd_config was emptied with no reason at all.
What fixed my problem is to recreate this file and restart the ssh service.
Note: PasswordAuthentication should be set to:
PasswordAuthentication no
For the poor, suffering souls who stumble upon this.
The following works for me with consistency:
On your machine in gcloud CLI run gcloud init and go through the
prompts.
The end. I hope this helps you my dear, internet fellow-sufferer.
My scenario was that I was running a nohup process on the instance, all of a sudden the process stopped working.
After spending a lot of time investigating, I found that the instance itself hung-up. We miss small things like that, getting caught debugging the bigger problem
Check if you're able to ping your instance. If not, restart it and ssh, it will work fine. This is one of the solutions.

Creating Instances from Snapshots

I've an f1-micro instance which I've been testing docker on created as such:
$ gcloud compute instances create dockerbox \
--image container-vm-v20140731 \
--image-project google-containers \
--zone europe-west1-b \
--machine-type f1-micro
This all works fine.
I'm now in the process of upgrading to a larger google compute engine VM. I've taken a snapshot of the fi-micro dockerbox, then used this as the Boot Source for the larger n1-standard-8 VM... this seems to create without problems until I try to ssh onto it.
via the command line:
$ gcloud compute --project "secure-electron-631" ssh --zone "europe-west1-b" "me#biggerbox"
ssh: connect to host xx.xx.xx.xx port 22: Connection timed out
ERROR: (gcloud.compute.ssh) Your SSH key has not propagated to your instance yet. Try running this command again.
via the browser, ssh connection I get:
Connection Failed
We are unable to connect to the VM on port 22. Please check that the VM is healthy and the SSH server is running.
I've tried multiple times but same result
I've confirmed it biggerbox is RUNNING. not sure about sshd
OK, problem seemed to stem from not detaching the micro instance from a mounted persistant disk when I took the snapshot. Detached and unmounted the PD volume and snapshotted the micro-instance again and based a new n1-standard-8 on it. Works ok now.
FYI, also handy for those troubleshooting GCE instance ssh:
https://github.com/GoogleCloudPlatform/compute-ssh-diagnostic-sh