I use SSH to connect to my AWS EC2 instances and run code that takes a long time to complete. I find that if my local computer sleeps (or even if I leave it unattended for a bit) the SSH connection hangs up (which is not fatal in itself) but this seems to terminate the code on the EC2 instance that I launched using SSH.
Also, I use SSH to locally monitor the exception of my remote code, so even if there's a way to tell the remote process to stay alive after SSH has gone, I still need a way to locally see the output of the process as it continues to run (without SSH).
How do I keep code running on my AWS EC2 instance after SSH has hung up; how can I monitor the output of such a process?
When you close your tty (ssh close in your case) your process gets a SIGHUP and the default action on SIGHUP is to terminate. To avoid that you can use the command nohup to trap and not send the SIGHUP to your command, or trap the SIGHUP in your code and ignore it.
There are a bunch of ways to track a background process, but perhaps the easiest is to have it write to a file and in that other ssh you can read that file. If your process is really a command on the command line you can redirect its standard output and standard error to a file. When such a file keeps getting new content, it may be annoying to keep reading it to refresh, in which case the command "tail -f" handy.
Here is how you can config your ssh connection to stay alive :
vi ~/.ssh/config # on your client side
add this line to engage sending a "null packet" every 120 seconds :
ServerAliveInterval 120
If you own the server side do a similar change :
vi /etc/ssh/sshd_config
add these lines at bottom of config file
ClientAliveInterval 120
ClientAliveCountMax 720
this is for linux YMMV on other OS settings
Use screen
local> ssh ...
remote> screen
remote+screen> python long_running.py ...
You can then detach from screen and even disconnect from SSH, and when you return by SSHing back in again, you can
remote> screen -r
to reconnect to your running code.
Related
I have some difficulities on understanding how ssh works behind the scene when I run it with a command string.
Normally, I type ssh rick#1.2.3.4 then I am logged into the remote machine and run some commands. If I don't use nohup or disown, once I close the session, all running processes started by ssh will stop. That's the ordinary case.
But if I run ssh with a command string, things become different. The process started by ssh won't stop if I close the session.
For example:
Run from local command line: ssh rick#1.2.3.4 "while true; do echo \"123test\" >> sshtest.txt"; done
Run a remote script ssh rick#1.2.3.4 './remoteScript_whichDoTheSameAsAbove.sh'
After closing the session by Ctrl + C or kill pid on the local machine, on the remote machine I see the process still running with ps -ef .
So my question is, could someone make a short introduction on how ssh works when I run it with a command string like above?
Also, I get very confused when I see these 2 related questions during searching:
Q1: Getting ssh to execute a command in the background on target machine . What is this question asking for? Isn't ssh rick#1.2.3.4 'some command' already run as a seperate shell/pts? I don't understand what "in the background" is for.
Q2: Keep processes running after SSH session disconnects Isn't simply running a remote script meets his requirement? ssh rick#1.2.3.4 "./remoteScript.sh. I get very confused when I see so many "magic" answers under that question.
I am wanting to rsync files from my home computer to a cloud server. I am able to set up my continuous rsync with the following:
#!/bin/bash
while :
do
rsync -rav * --include=*.bz2 --exclude=*.* --exclude=ZIP.sh --exclude=UPLOAD.sh
--chmod=a+rwx user#server.com:/home/user/date
sleep 180
done
This of course will run continuously if i set up a keygen as here states. I want to run the rsync continuously with me entering the password the first time and after that it will continuously run until I push CTRL+C. Is there a way to do this?
Yes, using SSH connection sharing:
Add this top your ~/.ssh/config file:
ControlMaster auto
ControlPath /tmp/ssh_%r#%n:%p
ControlPersist 8h
Connection sharing means that all your SSH connections to the same server will share the same connection. This means you can skip the authentication process for all but the first connection. The ControlPersist setting controls how long the connection will idle for before being closed (8 hours means I can login in the morning, and the connection will still be active at the end of the day, but will have expired before the next day).
The ControlPath specifies where the cached sockets will live. It can be anywhere you like, and they can be called anything you like, but the /tmp directory will do fine, and the name must be unique to each user, server, and port you wish to use, or else you'll get clashes.
Incidentally, you should probably check out the lsyncd tool as an alternative to continuous active scanning. It uses kernel notifications to watch the file-system, and launches rsync only when something actually changes.
Ansible seems to be sending SIGHUP signals at the end of (certain?) tasks. This is a problem as the tasks call a bash script which in turn starts a server instance.
Now if the closing of Ansibles SSH session sends a SIGHUP, this will actually shutdown the server - the start of which was the key point of the Ansible task in the first place.
So, is there a way I can guarantee that Ansible will use SSH in a way that will not send a SIGHUP signal when closing the task/session?
I could theoretically start the bashscript using nohup. But this seems to be just a dirty workaround, as I know that SSH is capable of doing what I want: if I manually compose and pass the script command via SSH as such:
ssh user#server "scriptToStartMyServer.sh params"
.. then it works fine and no SIGHUP is sent (thus the server survives and is not shutdown immediately after being started).
Edit:
Sadly we cannot avoid using these bash scripts to start servers and we cannot really change them as this is something given to us by the customer.
I am using some EC2 instances to run some large jobs I can not run locally. The issue I am seeing is that after a while (X hours since the process started) my connection on my shell gives me a broken pipe error
ubuntu#ip-10-122-xxx-xxx:~/stratto/ode$ Write failed: Broken pipe
The instance is still there because I can reconnect with no problems, but how can I reconnect and get back at seeing the logs of the process as before the 'Broken Pipe'
Any tip much appreciated,
Thanks!
Redirect your output to a file and then run the program "nohup ..." to ensure the disconnect doesn't kill it. Use "tail -f" to monitor the redirected file.
Note: Originally said to use "tee" but that won't work. I think a straight redirect and then tail on the file works.
You can use screen to run processes in the cloud even when you are not connected to the server.
sudo apt install screen
To specifically address the issue described in the original post (e.g. connecting to AWS EC2 instances) I a basic example and a more advanced example of using screen.
You can use "screen". Detach from it and ping to google.com. So there ssh session will be active through out the installation.
I'm using SSH to remotely launch Tornado on Amazon Web Service. It works fine when I launch it by:
python startTornado.py
However, after my SSH session times out or terminated, the Tornado server is also stopped immediately, so I can't access the webpage anymore. I did quite some search but couldn't find an answer on Google.
How can I keep Tornado and the site running after my SSH session terminated?
The process will shut down when you logout if it's running in the foreground or if it tries to write to stdout and the terminal it's outputting to no longer exists. Try starting the server with
nohup python startTornado.py &
The nohup command redirects output to a file, and the & at the end runs the command in the background. Alternatively, you can use the screen utility which allows you to detach a terminal and reattach it in a different ssh session (see the screen man page for details).
While all the above solutions solve the immediate problem, what you might really need to run such processes in production, control them (start/restart/stop) is supervisor. It is python based and its more useful when you have to run multiple instances of tornado behind nginx.
In addition to nohup as Kevin has mentioned, you can also use disown command if you are using bash:
disown <job-id>