How to check logs for - Error: Could not connect, retrying (3/3) - Not able to ssh into Instance - ssh

I have a GCP instance which was running successfully until last week. But suddenly I am not able to ssh into it. I am getting the error
Could not connect, retrying (3/3)
When I tried to find the root cause of it. I somehow managed to learn about os-login and configured it in the instance metadata.
Though my os-login setup was successful and was able to ssh through os-login I am not able to access it since it asks for username and password which I don't have any clue on what to give. I tried so many articles but still failed to understand what to give in that.
I can easily delete the instance and set up it from scratch but all I worry is I am not able to find the root cause of it.
Can someone help me what are the ways to find out the logs/cause for why I am not able to ssh into it suddenly?
I tried seeing the instance logs from the options in the console. But it did not have much info.

Related

gcloud compute -- Instance ssh-key metadata ignored?

Starting with a service account JSON key, I attempt to add a throwaway "foo" ssh key to the gcloud instances create metadata and then connect to the instance using vanilla ssh and the throwaway key.
Script
here.
Expected behavior
At boot, the account daemon would create a user account corresponding to the supplied ssh key.
Observed behavior
In the Cloud Console, the instance shows correctly applied ssh metadata.
ssh -i throwaway_private_key foo#${IP} fails.
Logs on the instance show:
Apr 6 16:58:34 sshkey-test-x0rmqgh7 sshd[497]: Invalid user foo from 209.6.197.126 port 39792
How do I correctly trigger the account daemon?
If not through the metadata, then what?
Thanks!
For anyone struggling with a similar issue, there is a HUGE gotcha with os-login that can lead to the problem behavior.
In a nutshell, os-login="TRUE" can be (and is likely to be) set project-wide on GCE. If that's the case, then ssh-key metadata is ignored. I only discovered this by chance from reading other issues in the Google bug tracker.
As soon as I toggled os-login, my issue went away.

Zeek cluster fails with pcap_error: socket: Operation not permitted (pcap_activate)

I'm trying to setting up a Zeek IDS cluster (v.3.2.0-dev.271) on 3 Ubuntu 18.04 LTS hosts to no avail - running zeek deploy command fails with the following output:
fatal error: problem with interface ens3 (pcap_error: socket: Operation not permitted (pcap_activate))
I have followed the official documentation (which is pretty generic at best) and set up passwordless SSH authentication between the zeek nodes.
I also preemptively created the /usr/local/zeek path on all hosts and gave the zeek user full permissions on that directory. The documentation says The Zeek user must be able to either create this directory or, where it already exists, must have write permission inside this directory on all hosts.
The documentation also says that on the worker nodes this user must have access to the target network interface in promiscuous mode.
My zeek user is a sudoer AND a member of netdev group on all 3 nodes. Yet, the cluster deployment fails. Apparently, when zeekctl establishes the SSH connection to the workers it cannot get a hold of the network interfaces and set caps.
Eventually I was able to successfully run the cluster by following this article - however it requires you to set up the entire cluster as root, which I would like to avoid if at all possible.
So my question is, is there anything blatantly obvious that I am missing? To the best of my knowledge this setup should work, otherwise I don't know how to force zeekctl to run 'sudo' in front of every SSH command it is supposed to run on the workers, or how to satisfy this requirement.
Any guidance will be greatly appreciated, thanks!
I was experiencing the same error for my standalone setup. Found this question from googling it. More googling the error brought me to a few blogs including one in which the comments mentioned the same error. The author mentioned giving the binaries permissions using setcap:
$sudo setcap cap_net_raw,cap_net_admin=eip /usr/local/zeek/bin/zeek
$sudo setcap cap_net_raw,cap_net_admin=eip /usr/local/zeek/bin/zeekctl
After running them both, my instance of zeek is now running successfully.
Source: https://www.ericooi.com/zeekurity-zen-part-i-how-to-install-zeek-on-centos-8/#comment-1586
So, just in case someone else stumbles upon the same issue - I figured out what was happening.
I streamlined the cluster deployment with Ansible (using 'become' directive at task level) and did not elevate when running the handlers responsible for issuing the zeekctl deploy command.
Once I did, the Zeek Cluster deployment succeeded.

ERROR: (gcloud.compute.ssh) Could not fetch resource: - Insufficient Permission

I am having trouble working through the Compute Engine Quickstart: Build a to-do app with a MongoDB tutorial. (edit: I am running the tutorial from within the compute engine console; i.e. https://console.cloud.google.com/compute/instances?project=&tutorial=compute_quickstart)
I SSH into the backend instance. I enter the "gcloud compute" command as copied from the tutorial. I am prompted to enter a passphrase. The following is returned:
WARNING: The public SSH key file for gcloud does not exist.
WARNING: The private SSH key file for gcloud does not exist.
WARNING: You do not have an SSH key for gcloud.
WARNING: SSH keygen will be executed to generate a key.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in
...
<< Identifying detail ommitted >>
...
**ERROR: (gcloud.compute.ssh) Could not fetch resource:
- Insufficient Permission**
I had run through this stage of the tutorial on a previous occasion with no problems.
I am working from a Windows 10 PC with the google-cloud-sdk installed. I am using google chrome. I have tried in both regular and incognito modes.
Any help or advice greatfully received!
DaveDub
It looks like the attempt to SSH is recognising the instance in your project, but the user doesn't have the required permissions to access the machine.
Have you tried running:
gcloud auth login
and completing the web-based authorization to ensure you are attempting to access the machine as the correct (authenticated) user? This process ensures the Cloud SDK you are running inherits the permissions of the user specified in the web-based authorisation. See here for more information on this.
It's also worth adding the link to the tutorial you are following to your question.
Besides the accepted answer, be sure you are in the correct gcloud project
gcloud projects list
Then
gcloud config set project <your-project>
I just ran into this for yet another reason. Google has always had poor handling of multi-user auth conflicts with their business products. Whatever you sign into a clean chrome session with 'first' gets a 'special', invisible role. I've noticed with gsuite that I get 'forced' into that first user when I try to access the admin panel, and the only way to escape is to make sure that whatever google user I use for the gsuite admin is 'first', or open an incognito window. I've seen this bug for years, can't believe it still exists.
Anyways, I ran into a similar issue. Somehow I was the wrong google user, so the link I got when copy/pasting out of 'connect with gcloud command' was implying wrong google user. Only noticed later when I just gave up and used the terminal that I was not my normal user... So, might look into that.

Openshift connection error

I don't know if I'm on the right place but if not sorry in advance!
I have an app in openshift with tomcat running and since a couple days ago my app gives me "Not found" in the browser. I did restart the app in the webconsole and nothing.
So I thought that mybe the problem is in the tomcat so I tryed to check the log but I couldn't connect over ssh to the app. Then I runned again the setup to generate a new keypair but when the command rhc setup runs gives me:
Your private SSH key file should be set as readable only to yourself. Please
run 'chmod 600 C:\Users\Artur\.ssh\id_rsa'
An SSH connection could not be established to
standard-projectxserver.rhcloud.com. Your SSH configuration may not be correct,
or the application may not be responding. Authentication failed for user
XXXXX#standard-projectxserver.rhcloud.com
(Net::SSH::AuthenticationFailed)
Checking for a domain ... projectxserver
Checking for applications ... found 1
app http://app-projectxserver.rhcloud.com/
I search over the net I already set the keys manualy and in the ssh console gives me error and I cannot connect:
"Permission denied (publickey, gssapi-keyex,gssapi-with-mic)"
What I'm doing wrong? Or even better, what I can do to connect?
(I cannot recreate this app because the war I use I don't have anymore, so I need this working to save everything again!)
Thanks
Could you change permission on private key file via change mode command(chmod 600) as it is showing in error.
Your private SSH key file should be set as readable only to yourself.
Please run 'chmod 600 C:\Users\Artur.ssh\id_rsa'
otherwise you delete the existing public key from settings panel on openshift web console and create your keys with rhc setup command.
The problem was afterall in the openshift DNS!! they fixed that for me
Thanks for the effort

ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]

I kept getting kicked out of my compute engine instance after a few seconds of idle with the indicated error (255).
I used 'gcloud compute ssh' to log in.
I am using the default firewall setting, which I believe would be good enough for ssh.
But if I am missing something, please so indicate and suggest the fix for this error.
Basically I can't get any efficient work done at this point having to ssh in so many times.
gcloud denies an ssh connection if there was a change in the setup, e.g.
after you changed your default zone or region, or you created another instance.
Then, you must update the ssh keys in your metadata by
sudo gcloud compute config-ssh
If this complains about different entries in your config file where your ssh key entries are stored, ~/.ssh/config, delete this file and execute the above command again.
If you have installed gcloud without sudo, you can omit sudo.
255 is the interactive ssh exit code for ssh failure - otherwise interactive ssh exits with the exit code of the last command executed in the ssh session.
The next time you get exit code 255 from ssh try running with --ssh-flag="-vvv" (more v's => more debugging output) and see if it helps track down connection problems.
For those who stop by this page. This helped me to solve the problem.
Try to the following:
Go to your Google and remove the SSH key for the server
Go to your google cloud console -> compute engine -> Metadata -> "SSH
keys" tab and click on edit. Here you can delete the ssh keys.
Run the gcloud command again
Click on the "Instances" link on the left side of your google cloud account, which will list down all the instances on the right side. Under
connect column, you will see "SSH" drop-down, click on "View cloud
Command" and this will bring a new dialog. Copy that command and run on your PC's terminal. This will let you SSH into the google compute engine.
It seems a feature/issue from Google Cloud Platform itself, we are going to continue checking it.
If the default network was edited, or if not using the default network, you may need to explicitly enable ssh access by adding a firewall-rule:
$ gcloud compute firewall-rules create --network=YOUR_NETWORK \
default-allow-ssh --allow tcp:22
After that, retry the 'gcloud compute ssh' command.
This is a real problem with very little documentation to dealing with it.
Sometime after creating the instance using the gcloud sdk ssh snippet provided via GCP console stopped working and continually errors with 255 making connecting to ssh on the instance only available through browser via GCP console for the compute instance in question. Not to mention this has happened to me on many different instances some without touching the default account permissions after initial setup and deployment which is overly frustrating. Cause for no reason it just stops working...works, then doesn't...
The only thing that worked for me was creating a new user to connect with through gcloud sdk! Be it Windows/PowerShell or Linux locally, using the following snippet:
gcloud compute ssh newuser-name#instance-name
That all per GCP documentation here: https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-ssh
Everything else passed per suggestions in documentation - port 22 open with access meaning it has to be a a problem with the default users authorization_keys WHICH they provide absolutely no documentation on how to fix that - at least nothing I could find on fixing (not creating or deleting)
I've tried updating the account, tried deleting the user and credentials from the instance, nothing appears to work. using:
gcloud compute --project "project-name" ssh --zone "us-east4-a" "instance-name"
Just doesn't work...
- even tried 'gcloud compute config-ssh --force-key-file-overwrite' NOTHING WORKS...
But creating a new user works every time, and once the user is created you can keep using that user via gcloud sdk
It's a work around, and I hate work around's for things like this but for my sanity this works at least until I can figure out how to reset the default account permissions, so if anyone has any ideas there or can point me in a direction for that I'd more than appreciate it!
IT was my mistake stating that the default firewall would allow all connections into an instance. The contrary turned out to be true. Please refer to an appropriate firewall rule must be set up to allow connection into an instance
Anh-
If you have Identity-Aware Proxy (IAP) enabled for your setup, try adding the --tunnel-through-iap option to the gcloud compute ssh command.
$ gcloud compute ssh --zone <zone> --project <project> --tunnel-through-iap <instance-name>
More information for people landing on this page, if you're using preemptible instances to save some compute costs, that could also be the reason for getting kicked out like this. Your instance may have just randomly stopped.
In my case, the I had created a bootable disk for the VM without adding the information of what source-image it needs to have. Because of this, even though the instance was coming up alright and ssh-allow rule was there, the VM was not booting up.
Finally added the source image to the disk and I was able to ssh into the VM.
Hope this helps for someone.
I had the same error . i restarted the VM instance and ssh workis fine
I had the problem where after clicking on the SSH button it would keep trying to establish a connection and fail. After long struggle I resolved it by adding Service Account User role to myself. If your account was created after the VM instance was created, it might result in this situation.
I know this was opened a long time ago, but for a more recent update on this topic. I had the same trouble connecting via ssh. It was giving the error code 225. Obviously there was a connectivity issue. There was already a firewall rule set under VPC network-> Firewall to allow ssh. However, to fix this problem I had to go to the specific network and create a rule under the network Firewall Rules. VPC network details -> FIREWALL RULES and create an inbound TCP rule for port 22.
if you are having a problem trying to access you g-cloud VM instance from your computer terminal remotely, and are getting the error code 255,the problem is that the ssh protocols in your computer are wrong or not updated.
In this case the best way to fix it is to go to your home directory (in your computer) check the hidden files and find the folder ".ssh" .Just delete this folder and re-open your bash terminal. Then run again your gcloud vm command.
Example:
you#your_computer:~$ gcloud beta compute ssh --zone "us-central1-a" "your_VM_name" --project "your_project_name"
You should this time instead of getting the error 255 code, the messages below:
WARNING: The private SSH key file for gcloud does not exist.
WARNING: The public SSH key file for gcloud does not exist.
WARNING: You do not have an SSH key for gcloud.
WARNING: SSH keygen will be executed to generate a key.
This tool needs to create the directory [/home/your_name/.ssh] before being able to
generate SSH keys.
Do you want to continue (Y/n)?
Type "Y" and gcloud will setup the new protocols by creating a brand new updated .ssh file.
After that you should be able to access your VM with your gcloud command without any problem.
That should solve the problem
Cheers
https://blackpearlmatrix.com
had the exact same symptoms - in my case the reason appeared to be the following. I was using root user + ssh key whereas root login is by default disabled in /etc/ssh/sshd_config (PermitRootLogin property).
I eventually had to delete my instance and make a new one with the same disk. See https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-ssh#use_your_disk_on_a_new_instance for details.
For me, my other teammates were able to login into the machine, but not me. So I asked them to create a user of my name with sudo rights, logged into serial console and changed passwordAuthentication to yes followed by sudo service ssh restart (for few this could be sudo service sshd restart.)
Post this I was able to login with
ssh -o PreferredAuthentications=password username#publicIP -p 22
This trick worked fine for me.
Reinitializing the gcloud with "gcloud init" and generating new ssh keys resolved the problem for me.
I had same issue.
I had connected the serial control and had checked logs. and there was some error log like "there is no disk space". Then I had resized disk as written in this document.
Now I am able to connect to instance with ssh.
Try switching to a different Internet connection
So, I was getting the same error but in my case I was not able to log in to the instance at all.
(base) girish#girish:~$ gcloud beta compute ssh --zone "asia-east1-b" "fp-1" --project "fp-public"
ssh: connect to host 12.345.678.90 port 22: Resource temporarily unavailable
ERROR: (gcloud.beta.compute.ssh) [/usr/bin/ssh] exited with return code [255].
(base) girish#girish:~$ gcloud beta compute ssh --ssh-flag='-vvv' --zone "asia-east1-b" "fp-1" --project "fp-public"
OpenSSH_7.6p1 Ubuntu-4ubuntu0.3, OpenSSL 1.0.2n 7 Dec 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: resolving "12.345.678.90" port 22
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to 12.345.678.90 [12.345.678.90] port 22.
[debug1: connect to address 12.345.678.90 port 22: Resource temporarily unavailable
ssh: connect to host 12.345.678.903 port 22: Resource temporarily unavailable
ERROR: (gcloud.beta.compute.ssh) [/usr/bin/ssh] exited with return code [255].
What worked for me:
I tried reinstalling lots of things and re-initializing various config and then landed on a thread which suggest to change the Internet network you are using and it worked!!
It's possible you have a rule that only allows whiltelisted IPs to ssh into a gcloud VM. So you may have forgotten to enable your work VPN or out of your work's office IP.
Try restarting your computer.
I got the same error and tried gcloud config ssh as mentioned previously to no avail. I then checked that the IDs and roles of serviceaccount and developer had 'editor' permissions, and that was fine. I started a new instance and logged out of all of my other google accounts and it still threw the error. Then, I restarted my computer and did not log back into my other google accounts. That fixed it.
When using IAP, GCP stores the key in instance metadata and then propagate
that to the ~/.ssh/authorized_keys file.
You might get the error OP talks about when you remove the key from the ~/.ssh/authorized_keys file and it's still in the instance metadata. Reason being:
GCP check that the user, key combo that you are using to ssh is already in the instance metadata.
It assumes that the exists in the ~/.ssh/authorized_keys file for that user and doesn't propagate the key.
As the key doesn't exist in ~/.ssh/authorized_keys file for whatever reason (you deleted it, someone else deleted it etc. etc.) - you get access denied.
If this is the case with you, then fix is simple: remove the instance metadata entry for that user, key combo (have attached an image for ref, just click X and remove your faulty key) and try ssh again
What worked for me was turning my firewall on. (On a Mac, ssh'ing into a gcp instance).
In another instance of the error, my connection worked fine when I was on ethernet, but not when I was on wifi. Switching back to ethernet allowed me to connect again.
In my case sorted out the issue after restarting the VM.
if you are able to access the VM previously and suddenly giving SSH issues, give it a try by restarting.
Permission wise check whether you have IAP-secured Tunnel User
gcloud compute ssh --zone "your_zone" "instance_name" --tunnel-through-iap --project "project_name"
If this not works check with the GCP built-in SSH client, and click open in browser window.
Hope this help !!!