autossh quits because ssh (dropbear) can't resolve host - ssh

I run autossh on a system which might have internet connectivity or might not. I don't really know when it has a connection but if so I want autossh to establish a ssh tunnel by:
autossh -M 2000 -i /etc/dropbear/id_rsa -R 5022:localhost:22 -R user#host.name -p 6022 -N
After several seconds it throws:
/usr/bin/ssh: Exited: Error resolving 'host.name' port '6022'. Name or service not known
And thats it. Isn't autossh meant to keep the ssh process running no matter what? Do I really have to check for a connection by ping or so?

You need to set the AUTOSSH_GATETIME environment variable to 0. From autossh(1):
Startup behaviour
If the ssh session fails with an exit status of 1 on the very first try, autossh
1. will assume that there is some problem with syntax or the connection setup,
and will exit rather than retrying;
2. There is a "starting gate" time. If the first ssh process fails within the
first few seconds of being started, autossh assumes that it never made it
"out of the starting gate", and exits. This is to handle initial failed
authentication, connection, etc. This time is 30 seconds by default, and can
be adjusted (see the AUTOSSH_GATETIME environment variable below). If
AUTOSSH_GATETIME is set to 0, then both behaviours are disabled: there is no
"starting gate", and autossh will restart even if ssh fails on the first run
with an exit status of 1. The "starting gate" time is also set to 0 when the
-f flag to autossh is used.
AUTOSSH_GATETIME
Specifies how long ssh must be up before we consider it a successful connecā€
tion. The default is 30 seconds. Note that if AUTOSSH_GATETIME is set to 0,
then not only is the gatetime behaviour turned off, but autossh also ignores
the first run failure of ssh. This may be useful when running autossh at
boot.
Usage:
AUTOSSH_GATETIME=0 autossh -M 2000 -i /etc/dropbear/id_rsa -R 5022:localhost:22 -R user#host.name -p 6022 -N

Related

Is there a stability advantage using autossh instead of a while true loop calling ssh with ServerAliveInterval and ServerAliveCountMax set?

I want to establish a stable ssh tunnel between two machines. I have been using autossh for this in the past. However, the present setup does not allow me to perform local port forwarding (this is disabled in sshd_config on both sides for security reasons). As a consequence, it seems that autossh gets confused (it cannot set up a double, local and remote, port forwarding tunnel, to "ping itself", so it seems to be resetting the ssh tunnel periodically). So, I consider instead relying on a "pure ssh" solution, something like:
while true; do
echo "start tunnel..."
ssh -v -o ServerAliveInterval=120 -o ServerAliveCountMax=2 -R remote_port:localhost:local_port user#remote
echo "ssh returned, there was a problem. sleep a bit and retry..."
sleep 15
echo "... ready to retry"
done
My question is: is there some guarantees / stability features that I "used to have" with autossh, but that I will not have with the new solution? Anything I should be aware of? This solution should well check that the server is alive and communicating thanks to the 2 -o options, and restart the tunnel if needed, right?

calling spark-ec2 from within an EC2 instance: ssh connection to host refused

In order to run Amplab's training exercises, I've create a keypair on us-east-1 , have installed the training scripts (git clone git://github.com/amplab/training-scripts.git -b ampcamp4) and created the env. variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY following the instructions in http://ampcamp.berkeley.edu/big-data-mini-course/launching-a-bdas-cluster-on-ec2.html
Now running
./spark-ec2 -i ~/.ssh/myspark.pem -r us-east-1 -k myspark --copy launch try1
generates the following messages:
johndoe#ip-some-instance:~/projects/spark/training-scripts$ ./spark-ec2 -i ~/.ssh/myspark.pem -r us-east-1 -k myspark --copy launch try1
Setting up security groups...
Searching for existing cluster try1...
Latest Spark AMI: ami-19474270
Launching instances...
Launched 5 slaves in us-east-1b, regid = r-0c5e5ee3
Launched master in us-east-1b, regid = r-316060de
Waiting for instances to start up...
Waiting 120 more seconds...
Copying SSH key /home/johndoe/.ssh/myspark.pem to master...
ssh: connect to host ec2-54-90-57-174.compute-1.amazonaws.com port 22: Connection refused
Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem root#ec2-54-90-57-174.compute-1.amazonaws.com 'mkdir -p ~/.ssh'' returned non-zero exit status 255, sleeping 30
ssh: connect to host ec2-54-90-57-174.compute-1.amazonaws.com port 22: Connection refused
Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem root#ec2-54-90-57-174.compute-1.amazonaws.com 'mkdir -p ~/.ssh'' returned non-zero exit status 255, sleeping 30
...
...
subprocess.CalledProcessError: Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem root#ec2-54-90-57-174.compute-1.amazonaws.com '/root/spark/bin/stop-all.sh'' returned non-zero exit status 127
where root#ec2-54-90-57-174.compute-1.amazonaws.com is the user & master instance. I've tried -u ec2-user and increasing -w all the way up to 600, but get the same error.
I can see the master and slave instances in us-east-1 when I log into the AWS console, and I can actually ssh into the Master instance from the 'local' ip-some-instance shell.
My understanding is that the spark-ec2 script takes care of defining the Master/Slave security groups (which ports are listened to and so on), and I shouldn't have to tweak these settings. This said, master and slaves all listen to post 22 (Port:22, Protocol:tcp, Source:0.0.0.0/0 in the ampcamp3-slaves/masters sec. groups).
I'm at a loss here, and would appreciate any pointers before I spend all my R&D funds on EC2 instances.... Thanks.
This is most likely caused by SSH taking a long time to start up on the instances, causing the 120 second timeout to expire before the machines could be logged into. You should be able to run
./spark-ec2 -i ~/.ssh/myspark.pem -r us-east-1 -k myspark --copy launch --resume try1
(with the --resume flag) to continue from where things left off without re-launching new instances. This issue will be fixed in Spark 1.2.0, where we have a new mechanism that intelligently checks the SSH status rather than relying on a fixed timeout. We're also addressing the root causes behind the long SSH startup delay by building new AMIs.

"Connection to localhost closed by remote host." when rsyncing over ssh

I'm trying to set up an automatic rsync backup (using cron) over an ssh tunnel but am getting an error "Connection to localhost closed by remote host.". I'm running Ubuntu 12.04. I've searched for help and tried many solutions such as adding ALL:ALL to /etc/hosts.allow, check for #MaxStartups 10:30:60 in sshd_config, setting UsePrivilegeSeparation no in sshd_config, creating /var/empty/sshd but none have fixed the problem.
I have autossh running to make sure the tunnel is always there:
autossh -M 25 -t -L 2222:destination.address.edu:22 pbeyersdorf#intermediate.address.edu -N -f
This seems to be running fine, and I've been able to use the tunnel for various rsync tasks, and in fact the first time I ran the following rsync task via cron it succeeded:
rsync -av --delete-after /tank/Documents/ peteman#10.0.1.5://Volumes/TowerBackup/tank/Documents/
with the status of each file and the output
sent 7331634 bytes received 88210 bytes 40215.96 bytes/sec
total size is 131944157313 speedup is 17782.61
Ever since that first success, every attempt gives me the following output
building file list ... Connection to localhost closed by remote host.
rsync: connection unexpectedly closed (8 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9]
An rsync operation of a smaller subdirectory works as expected. I'd appreciate any ideas on what could be the problem.
It seems the issues is related to autossh. If I create my tunnel via ssh instead of autossh it works fine. I suspect I could tweak the environment variables that affect the autossh configuration, but for my purposes I've solved the problem by wrapping the rsycn command in a script that first opens a tunnel via ssh, executes the backup then kills the ssh tunnel, thereby eliminating the need for the always open tunnel created by autossh:
#!/bin/sh
#Start SSH tunnel
ssh -t -L 2222:destination.address.edu:22 pbeyersdorf#intermediate.address.edu -N -f
#execute backup commands
rsync -a /tank/Documents/ peteman#localhost://Volumes/TowerBackup/tank/Documents/ -e "ssh -p 2222"
#Kill SSH tunnel
pkill -f "ssh.*destination.address"

Rsync over SSH - timeout in ssh or rsync?

I'm dealing with a crappy ISP that resets my WAN connection at random points while my script is running. I want the transfer to survive this reset and go on. I manually launch this script vs using cron / launchd currently.
I have a fairly basic script as shown below:
rsync -rltv --progress --partial -e "ssh -i <key> -o ConnectTimeout=300" <remotedir> <localdir>
Am I better off putting the timeout in the rsync section instead?
For example:
rsync -rltv --progress--partial --timeout=300 -e "ssh -i <key>" <remotedir> <localdir>
Thanks!
ConnectTimeout only applies when SSH is trying to establish the connection with the server, it doesn't have anything to do with timeouts during the data transfer. So you need to use the --timeout option to do what you want.
Try re-running the rsync. Also try without the ssh option. The job failed probably due to losing your network connection. I have an rsync job copying files between datacenters running every 2 hours via cron and it will fail about once per day.

How to use ssh to run a local command after connection and quit after this local command is executed?

I wish to use SSH to establish a temporary port forward, run a local command and then quit, closing the ssh connection.
The command has to be run locally, not on the remote site.
For example consider a server in a DMZ and you need to allow an application from your machine to connect to port 8080, but you have only SSH access.
How can this be done?
Assuming you're using OpenSSH from the command line....
SSH can open a connection that will sustain the tunnel and remain active for as long as possible:
ssh -fNT -Llocalport:remotehost:remoteport targetserver
You can alternately have SSH launch something on the server that runs for some period of time. The tunnel will be open for that time. The SSH connection should remain after the remote command exits for as long as the tunnel is still in use. If you'll only use the tunnel once, then specify a short "sleep" to let the tunnel expire after use.
ssh -f -Llocalport:remotehost:remoteport targetserver sleep 10
If you want to be able to kill the tunnel from a script running on the local side, then I recommend you background it in your shell, then record the pid to kill later. Assuming you're using an operating system that includes Bourne shell....
#/bin/sh
ssh -f -Llocalport:remotehost:remoteport targetserver sleep 300 &
sshpid=$!
# Do your stuff within 300 seconds
kill $sshpid
If backgrounding your ssh using the shell is not to your liking, you can also use advanced ssh features to control a backgrounded process. As described here, the SSH features ControlMaster and ControlPath are how you make this work. For example, add the following to your ~/.ssh/config:
host targetserver
ControlMaster auto
ControlPath ~/.ssh/cm_sockets/%r#%h:%p
Now, your first connection to targetserver will set up a control, so that you can do things like this:
$ ssh -fNT -Llocalport:remoteserver:remoteport targetserver
$ ssh -O check targetserver
Master running (pid=23450)
$ <do your stuff>
$ ssh -O exit targetserver
Exit request sent.
$ ssh -O check targetserver
Control socket connect(/home/sorin/.ssh/cm_socket/sorin#192.0.2.3:22): No such file or directory
Obviously, these commands can be wrapped into your shell script as well.
You could use a script similar to this (untested):
#!/bin/bash
coproc ssh -L 8080:localhost:8080 user#server
./run-local-command
echo exit >&${COPROC[1]}
wait