I run a Fedora instance in Amazon EC2. I can access and work on it perfectly by Putty.
I also set Seconds between keepalives to 1 for not losing the connection due to inactivity (I mean in Putty).
Nevertheless, if a network/electric failure happens on my local computer, it shuts down the Putty connection, so the session logs off and the executions in the instances stop.
Can anybody help me in keeping a session alive and being able to connect/disconnect to it whenever I want?
Use the screen command.
Related
My ansible playbook consist several task in it and I am running my ansible playbook on Virtual Machine. I am using ssh method to log in to VM and run the playbook. if my ssh window gets closed during the execution of any task (when internet connection is not stable and not reliable), the execution of ansible playbook stops as the ssh window already got closed.
It takes around 1 hour for My play book to run, and sometimes even if I loose internet connectivity for few seconds , the ssh terminal lost its connection and thus entire playbook stops. any idea how to make ansible script more redundant to avoid this problem ?
Thanks in advance !!
If you need to run a job on an external system that hangs for a long time and it is relevant that the task completes. It is extremly bad idea to run that job in the foreground.
It is not important that the task is Ansible or the connection is SSH. In every case you would always just "push" the command to the remote host and send it to background with something like "nohup" if available. The problem is of course the tree of processes. Your connection creates a process on the remote system and that creates the job you want to run. Is the connection gets lost, al subprocesses will be killed automatically by the OS.
So - under Windows - maybe use RDP to open a screen that stays available even after connection is lost or use something like Cygwin and nohup via SSH to change the hung up your process from the ssh session.
Or - when you need to run a playbook on that system install for example a AWX container and use that. There are many options based on your requirements, resources and administrative options.
GCP VM instance: OS: Ubuntu (18.04 bionic) Disk size: 10GB. Later added another disk of 10 GB.
While working on the GCP VM instance, I was facing the issue for 'no-disk space left'. Then, I created another disk of 10 GB and added to this GCP VM instance as referred in https://cloud.google.com/compute/docs/disks/add-persistent-disk?&_ga=2.217520662.-1058595688.1590395241#formatting
Now, I exited the GCP VM instance and stopped it.
Later on, when I restarted the GCP VM instance, I am unable to connect. I tried to connect using the SSH connection option available on GCP, putty, WinSCP and telnet, but I am unable to connect now.
My understanding to this is that it might be possible that some services might have stopped on the GCP VM instance. Is there a way to check whether the services are running or not on the GCP VM instances. If yes,then how?
If you think, there is some other issue for connecting to the GCP VM instance then please let me know.
There may be several reasons;
Firewall rules - check them to be sure nothing blocks SSH traffic to your machine.
Have a look at the serial console output - you can do it via console gui or gcloud compute instances get-serial-port-output instance_name --zone=my_zone.
If your drive gets full you may not be able to login (no matter how).
Adding another persistend disk won't help if the first one is full.
You can increase it's size though - also via console or gcloud compute disks resize example-disk-1 --size=11GB - this will add 1GB more and if it's the matter of disk space it should allow you to log in.
If you're still not able to log in try enabling interaction with serial console gcloud compute instances add-metadata instance-name --metadata serial-port-enable=TRUE and connect to it gcloud compute connect-to-serial-port instance-name since this is the most full-proof method if everything else fails.
If you're able to connect via serial console check if the SSH service is listening:
sudo service ssh status - if not start it with sudo service ssh start and watch for any errors.
Similar case was also discussed here.
Been using a GCP preemptible VM for a few months without problems, but in the last 4 weeks my instances have consistently shut off anywhere from 10 minutes to 20 minutes into operation.
I'll be in the middle of training, and my notebook will suddenly disconnect. The terminal will show this error:
jupyter#fastai-instance:~$ Connection to 104.154.142.171 closed by remote host.
Connection to 104.154.142.171 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
I then check the status of my VM, to see that it has shutdown.
I searched the terminal traceback and found this thread, which seemed promising: ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]
When I ran sudo gcloud compute config-ssh my VM ran for much longer than usual before shutting down, yet shutdown in the same way after about an hour. Since then, back to the same behavior.
I know preemptible instances can be shutdown when the platform needs resources, but my understanding is that comes with some kind of warning. I've checked the status of GCP's servers after shutdowns and they appear to be fine. This is also happening the same way every time I turn my VM on, which seems too frequent for preempting.
I am not sure where to look for any clues – has anyone else had a problem like this? What's especially puzzling to me is, if it is in fact an SSH problem, why would that cause the VM itself to shutdown, rather than just break the connection?
Thanks very much for any help!
Did you try to set a shutdown script and to print something in a file for validating the state of the VM when it goes down ?
Try this as shutdown script
#!/bin/bash
curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted" -H "Metadata-Flavor: Google" > /tmp/preempted.log
If there is TRUE in the file, it's because the VM has been preempted.
If a VM stops and you have an active SSH connection to that VM (via gcloud compute ssh), then it's normal that you are receiving an error. Since the VM goes down, all connections are closed, so does your SSH connection (you cannot connect to a stopped instance). The VM termination causes the SSH error, not the opposite.
When using preemptible instances, Google can reclaim the instance whenever it's needed. Note that (from the docs about preemptible instances limitations) :
Compute Engine might terminate preemptible instances at any time due to system events. The probability that Compute Engine will terminate a preemptible instance for a system event is generally low, but might vary from day to day and from zone to zone depending on current conditions.
It means that one day, your instance may be running for 24 hours without being terminated, but an other day, your instance may be stopped 30 minutes after being started if Compute Engine needs to reclaim some resources.
A comment on the "continuously shutting down" part:
(I have experienced this as well)
Keep in mind that Google prefers to shut down RECENTLY STARTED preemptible instances, over ones started earlier.
The link below (and supplied earlier) has the statement:
Generally, Compute Engine avoids preempting too many instances from a single customer and preempts new instances over older instances whenever possible.
This would generally mean that, yes, I suppose, if you are preempted, and boot up again, it is quite likely that you are going to be preempted again and again until the load in the zone reduces.
I'm surprised that Google don't simply preclude you starting the preemptible VM for a while (like 30-60 minutes?). - How much CPU is being wasted bouncing VMs up and down and crossing our fingers???
P.S. There is a dirty trick to end-around your frustration - Have 2 VMs identically configured, except for preemptibility, but only 1 underlying book disk. If you are having a bad day with preempts, simply 'move' the boot disk to the non-preemptible VM, boot it, and carry on. - It's a couple of simple gcloud commands to achieve this, easily scripted and very fast. Don't tell Google I told ya....
https://cloud.google.com/compute/docs/instances/preemptible#limitations
A compute instance I had running stopped working and I am no longer able to ssh to it from the browser. When I try it hangs forever and eventually I get the error message:
You cannot connect to the VM instance because of an unexpected error.
Wait a few moments and then try again. (#13)
I looked here for common issues. I made a snapshot and tried recreating with a larger disk, in a different region and with a bigger compute instance but I was still unable to connect. When other users try to connect they have the same problem. I'm using a standard container so I expect the google daemon should be running.
This instance was collecting tweets and writing output to GCS regularly. Since ssh stopped working the instance has also stopped writing output.
Does anyone have any idea what could have gone wrong?
I would also suggest checking the Serial Console of the machine to see if there are any messages which provide any clues. For example, if the boot disk has run out of space (which can prevent SSH connectivity), there will be some messages displayed in the Serial Console implying this.
You could also try connecting to the machine via the Serial Console to troubleshoot the issue by following the advice here.
When you try to SSH into the instance from the Cloud Shell for example, using the following command, the output should provide some clues as to why you cannot SSH into the machine:
$ gcloud compute ssh INSTANCE_NAME --zone ZONE
If you are on a VPC network, try to check the applicable network TAG that allows the instance to use SSH and provide that tag to your instance. Because it could be the Firewall rules that are blocking your instance from creating the ssh connection.
I've got a VMware ESXi server that I connected to via SSH to run a process. (I was running "vmkfstools --punchzero myVirtualDrive.vmdk" to reclaim space on a virtual disk). The SSH client connection was disconnected when the process was only 70% complete (with several hours of estimated time remaining).
Was the process terminated when the SSH client dropped its connection to the ESXi host?
Is there any way to tell if the process is still running?
I did a quick ps|grep to find it, but didn't see anything that looked like vmkfstools.
On most standard linux machines, I use "screen" to avoid SSH client disconnection issues, but since the ESXi shell is purposely a very stripped down environment, I just ran the command straight away.
Thoughts?
The command was killed when you disconnected.
What you can do to prevent this in the future is to run the command with nohup. This will run your command in the background, continuing even when your shell "hangs up".
What I usually do when I expect a command to run long:
I ssh to a Linux machine that runs 24/7
start screen
ssh to the ESXi host
start the command
This way I don't have to worry about disconnects, I can just detach screen and go home after work and reattach screen the next morning to check the status of the command.
This was exactly what I was looking for, thanks all. I attempted to unmap unused block on LUNs which is vmfs 5 file system. I run this cmd esxcli storage vmfs unmap -l [LUN-NAME] I wonder that what will happen if I close remote SSH connection while this cmd process still running.