What happens to a process in an EC2 instance when I get a 'Broken Pipe' error on ssh? - ssh

I am using some EC2 instances to run some large jobs I can not run locally. The issue I am seeing is that after a while (X hours since the process started) my connection on my shell gives me a broken pipe error
ubuntu#ip-10-122-xxx-xxx:~/stratto/ode$ Write failed: Broken pipe
The instance is still there because I can reconnect with no problems, but how can I reconnect and get back at seeing the logs of the process as before the 'Broken Pipe'
Any tip much appreciated,
Thanks!

Redirect your output to a file and then run the program "nohup ..." to ensure the disconnect doesn't kill it. Use "tail -f" to monitor the redirected file.
Note: Originally said to use "tee" but that won't work. I think a straight redirect and then tail on the file works.

You can use screen to run processes in the cloud even when you are not connected to the server.
sudo apt install screen
To specifically address the issue described in the original post (e.g. connecting to AWS EC2 instances) I a basic example and a more advanced example of using screen.

You can use "screen". Detach from it and ping to google.com. So there ssh session will be active through out the installation.

Related

Cannot ssh into Google-Engine, connecting in a loop

I am unable to connect through SSH to my GCE instance. I was connecting without any problem, the only think I changes was my user name through top right corner of the browser then selected Change Linux Username.
When I try to ssh into my google engine via browser, I keep having following message in a endless loop:
When I try to ssh via cloud shell I also get following error message, (serial console output):
Permission denied (publickey).
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
[Q] Is there any way to fix this problem? Since I have no access to the engine now, I don't know what to do.
However you could always get back access through serial console then from there you could internally y troubleshoot user/ssh issue.
1) $ gcloud compute instances add-metadata [INSTANCE_NAME] --metadata=serial-port-enable=1
You can then connect to the instance through the serial port
NOTE:The root password have must been already set in order to use the serial port
2)
$ gcloud compute connect-to-serial-port [INSTANCE_NAME]
If you never set the root password you could set it by adding a startup-script to your instance that will set a password as root by running the below command :
NOTE: the instance must be rebooted in order to run the startup script.
3) $ gcloud compute instances add-metadata [instance name] --metadata startup-script='echo "root:YourPasswdHere" | chpasswd'
Reboot the instance run the command on the step "2)" authenticate your self as root with the password that you set on the startup script in the step "3)" .
I had the same problem, It took me several days to figure out what was happening in my case.
To find out, I created a new instance from scratch and started making all modifications I've done to those that eventually couldn't connect to, one by one, exiting the ssh connection and re entering so as to test it.
I've tried it a couple of times, in both cases, the connection was impossible after uninstalling python (I only needed 3.7 version so I was uninstalling all others and installing that one I needed).
My command for uninstalling it was
sudo apt purge python2.7-minimal
and
sudo apt purge python3.5-minimal
I don't know if it was specifically because of deleting python, or because of using purge (in which case this problem might reproduce if using purge with another program).
I don't even know why would this affect ssh connection.
Could it be that google cloud is somehow using destination python for the ssh web?
In any case, if you are experiencing this problem try to avoid uninstalling anything from the base VM.

Desired port for google cloudSQL connection is not able to be used

I am following the steps here, to setup a CloudSQL DB in Google Cloud Platform. I'm stuck at the step with:
./cloud_sql_proxy -instances="[YOUR_INSTANCE_CONNECTION_NAME]"=tcp:3306
I get the message below:
2018/02/07 19:44:10 listen tcp 127.0.0.1:3306: bind: address already in use
I've tried: lsof -i tcp:3306 but nothing shows up. Alternatively, I am able to start a connection to tcp:3307, but that's not what's required in the tutorial, and may prevent the rest of the tutorial from working. When I do lsof -i tcp:3307 however, I am able to see the PID, and kill the SQL connection.
How is the port address 3306 already in use?? Even rebooted my computer.
My Steps I took to fix
I stop Mysql on my local machine
brew services stop mysql
But I had a problem of giving a directory for
Directory to use for placing Unix sockets representing database instances as seen by the console error
Then I did
sudo mkdir /cloudsql; sudo chmod 777 /cloudsql
My Final Script
./cloud_sql_proxy -instances=MyInstanceConnName=tcp:3306 -projects=myproject -dir=/cloudsql/
UPDATE: After trying to dig through many methods to kill the sql process, find out whats actually running on it, joining the gcloud slack group to ask around, etc etc, I ended up uninstalling mysql, and reinstalling it. Fixed it. :shrug:

AWS process launched from SSH terminate when SSH hangs up

I use SSH to connect to my AWS EC2 instances and run code that takes a long time to complete. I find that if my local computer sleeps (or even if I leave it unattended for a bit) the SSH connection hangs up (which is not fatal in itself) but this seems to terminate the code on the EC2 instance that I launched using SSH.
Also, I use SSH to locally monitor the exception of my remote code, so even if there's a way to tell the remote process to stay alive after SSH has gone, I still need a way to locally see the output of the process as it continues to run (without SSH).
How do I keep code running on my AWS EC2 instance after SSH has hung up; how can I monitor the output of such a process?
When you close your tty (ssh close in your case) your process gets a SIGHUP and the default action on SIGHUP is to terminate. To avoid that you can use the command nohup to trap and not send the SIGHUP to your command, or trap the SIGHUP in your code and ignore it.
There are a bunch of ways to track a background process, but perhaps the easiest is to have it write to a file and in that other ssh you can read that file. If your process is really a command on the command line you can redirect its standard output and standard error to a file. When such a file keeps getting new content, it may be annoying to keep reading it to refresh, in which case the command "tail -f" handy.
Here is how you can config your ssh connection to stay alive :
vi ~/.ssh/config # on your client side
add this line to engage sending a "null packet" every 120 seconds :
ServerAliveInterval 120
If you own the server side do a similar change :
vi /etc/ssh/sshd_config
add these lines at bottom of config file
ClientAliveInterval 120
ClientAliveCountMax 720
this is for linux YMMV on other OS settings
Use screen
local> ssh ...
remote> screen
remote+screen> python long_running.py ...
You can then detach from screen and even disconnect from SSH, and when you return by SSHing back in again, you can
remote> screen -r
to reconnect to your running code.

How can I start go server permanently?

I've got problem with my Go server.
When I'm connected to my NAS via SSH and do ./gogs web, the server is starting. But when I close the SSH connection, the server is stopped.
How I can start my Go server permanently?
You have scripts in gogs allowing you to launch the server as a daemon:
scripts/init/debian/gogs (recently fixed with issue 519)
scripts/init/centos/gogs
That would allow the process to remain while the session is closed.
You have other options in issues 172.
This is not a Go-specifioc problem, what is happening is that the Go program is still attached to your terminal and when you log out, the kernel will trigger a SIGHUP to every binary still connected to that terminal session.
Your best option is probably to use nohup ./gogs web.
Second-best option would be to rewrite main, so that it intercepts and handles SIGHUP, stopping it from killing your program. However, doing so requires handling quite a few things properly (you really should close stdin, stdout and stderr; make sure all your logging is done through the log library, ...)

Tornado stopped running on AWS immediately after I terminate my remote session

I'm using SSH to remotely launch Tornado on Amazon Web Service. It works fine when I launch it by:
python startTornado.py
However, after my SSH session times out or terminated, the Tornado server is also stopped immediately, so I can't access the webpage anymore. I did quite some search but couldn't find an answer on Google.
How can I keep Tornado and the site running after my SSH session terminated?
The process will shut down when you logout if it's running in the foreground or if it tries to write to stdout and the terminal it's outputting to no longer exists. Try starting the server with
nohup python startTornado.py &
The nohup command redirects output to a file, and the & at the end runs the command in the background. Alternatively, you can use the screen utility which allows you to detach a terminal and reattach it in a different ssh session (see the screen man page for details).
While all the above solutions solve the immediate problem, what you might really need to run such processes in production, control them (start/restart/stop) is supervisor. It is python based and its more useful when you have to run multiple instances of tornado behind nginx.
In addition to nohup as Kevin has mentioned, you can also use disown command if you are using bash:
disown <job-id>