I am running a process that takes long time to finish on a remote server. The process gets killed when my ssh connection to the remote machine is dropped. Is there a way I can continue running the process even after my ssh connection drops?
There is another similar question.
The answer provided there was to use the POSIX command nohup to ignore the ssh termination and allow the process to continue running.
Related
The case:
Multiple servers try to connect a server with delegate_to and execute task simultaneously where ssh connection is limited with /etc/security/limits.conf and /etc/ssh/sshd_config and some of the servers fail due to limitation.
e.g: Server X has limits of 3 ssh connection at the same time. Servers 1,2,3,4,5 try to execute a task on Server X at the same time. They get error: too many logins for <remote_user>.
What I have tried:
Adding ssh retries to both ansible.cfg and delegate_to vars but it does not work because ssh retries for only return code of 255 where too many logins give 254
and doesn't try retry.
Adding wait_for_connection as a pre-task. This waits for possible connections but since this task and next task are independent, the ssh connection is independent too so it fails if another connection comes in at the same time after wait_for_connection.
Looking for any possible solution for this.
Notes: Every server runs its own ansible task. Ansible version is 2.9.9 on servers.
My ansible playbook consist several task in it and I am running my ansible playbook on Virtual Machine. I am using ssh method to log in to VM and run the playbook. if my ssh window gets closed during the execution of any task (when internet connection is not stable and not reliable), the execution of ansible playbook stops as the ssh window already got closed.
It takes around 1 hour for My play book to run, and sometimes even if I loose internet connectivity for few seconds , the ssh terminal lost its connection and thus entire playbook stops. any idea how to make ansible script more redundant to avoid this problem ?
Thanks in advance !!
If you need to run a job on an external system that hangs for a long time and it is relevant that the task completes. It is extremly bad idea to run that job in the foreground.
It is not important that the task is Ansible or the connection is SSH. In every case you would always just "push" the command to the remote host and send it to background with something like "nohup" if available. The problem is of course the tree of processes. Your connection creates a process on the remote system and that creates the job you want to run. Is the connection gets lost, al subprocesses will be killed automatically by the OS.
So - under Windows - maybe use RDP to open a screen that stays available even after connection is lost or use something like Cygwin and nohup via SSH to change the hung up your process from the ssh session.
Or - when you need to run a playbook on that system install for example a AWX container and use that. There are many options based on your requirements, resources and administrative options.
Any idea why when I connect remotely (ssh session) to my Google Compute Engine instance, if I run a command (run an HTTP API) and leave, this one stops running as well?
./main PORT // Stops when I leave
./main PORT & // Stops when I leave as well..
No matter what, if I disconnect from my current ssh session, my API stops, even if the engine still seems to run fine
When you disconnect your terminal, all processes started by that terminal are sent a "hangup" signal which, by default, causes the process to terminate. You can trap the hangup signal when you launch a process at cause the signal to be silently ignored. The easiest way to achieve this is with the nohup command. For example:
nohup ./main PORT &
References:
What Is Nohup and How Do You Use It?
Unix Nohup: Run a Command or Shell-Script Even after You Logout
nohup(1) - Linux man page
I've got a VMware ESXi server that I connected to via SSH to run a process. (I was running "vmkfstools --punchzero myVirtualDrive.vmdk" to reclaim space on a virtual disk). The SSH client connection was disconnected when the process was only 70% complete (with several hours of estimated time remaining).
Was the process terminated when the SSH client dropped its connection to the ESXi host?
Is there any way to tell if the process is still running?
I did a quick ps|grep to find it, but didn't see anything that looked like vmkfstools.
On most standard linux machines, I use "screen" to avoid SSH client disconnection issues, but since the ESXi shell is purposely a very stripped down environment, I just ran the command straight away.
Thoughts?
The command was killed when you disconnected.
What you can do to prevent this in the future is to run the command with nohup. This will run your command in the background, continuing even when your shell "hangs up".
What I usually do when I expect a command to run long:
I ssh to a Linux machine that runs 24/7
start screen
ssh to the ESXi host
start the command
This way I don't have to worry about disconnects, I can just detach screen and go home after work and reattach screen the next morning to check the status of the command.
This was exactly what I was looking for, thanks all. I attempted to unmap unused block on LUNs which is vmfs 5 file system. I run this cmd esxcli storage vmfs unmap -l [LUN-NAME] I wonder that what will happen if I close remote SSH connection while this cmd process still running.
Is there a way to start a process using ssh that doesn't terminate when the ssh session terminates? I want the job to keep running on the computer I'm ssh-ing into without me having to keep the connection open.
you can use nohup (assuming you are SSHing into *nix server)
You could use the screen utility.
An alternative to screen is dtach. dtach is smaller and more lightweight - in fact it is just the detach part of the screen utility.