ClientAliveInterval is not closing the idle connection - ssh

I have the task to close the idle ssh connection if they are idle for more than 5 minutes. I have tried setting these value on sshd_config
TCPKeepAlive no
ClientAliveInterval 300
ClientAliveCountMax 0
But nothing seems to work the idle remains active and does not get lost even after 5 minutes of idle time.
Then I came across this https://bbs.archlinux.org/viewtopic.php?id=254707 they guys says
These are not for user-idle circumstances, they are - as that man page
excerpt notes - for unresponsive SSH clients. The client will be
unresponsive if the client program has frozen or the connection has
been broken. The client should not be unresponsive simply because the
human user has stepped away from the keyboard: the ssh client will
still receive packets sent from the server.
I can't even use TMOUT because there are ssh client scripts that do not run bash program.
How to achieve this?
Openssh version
OpenSSH_8.2p1 Ubuntu-4ubuntu0.4, OpenSSL 1.1.1f 31 Mar 2020

close the idle ssh connection if they are idle for more than 5 minutes
This task is surprisingly difficult. OpenSSH itself has no functionality to set a idle-timeout on shell commands, probably for a good reason: killing "idle" shells itself is non-trivial:
There's multiple ways to define "idleness", e.g., no stdin, no stdout, no I/O activity whatsoever, no CPU consumption etc
Even when a process is deemed "idle", it's difficult to kill the process and all its child processes that have possibly been created.
Given that, it's not surprising that there's only few solutions for killing idle shell sessions in general. Those that I could find with (little) research rely on background daemons that check the idle status of all processes running on a system (e.g., doinkd/idled, idleout).
One possible solution is to check if any of those solutions can be adapted to enforce an idle timeout on a specific shell session.
Another option is to adapt the OpenSSH source code to support your specific requirement. In principle, OpenSSH should be able to easily access console I/O activity and session duration, so assessing the "idle" property is probably relative easy. As for "killing" the shell and all involved children, running (and killing) the remote shell in a PID namespace is an effective option on Linux systems.
Both options a relatively complex -- so before pursuing them further, I'd further check if there's existing solutions to enforce an idle timeout on a shell session. Using them under OpenSSH will be straightforward.

Related

Ansible playbook stops after loosing connection (even for few seconds) with ssh window of VM on which it is running?

My ansible playbook consist several task in it and I am running my ansible playbook on Virtual Machine. I am using ssh method to log in to VM and run the playbook. if my ssh window gets closed during the execution of any task (when internet connection is not stable and not reliable), the execution of ansible playbook stops as the ssh window already got closed.
It takes around 1 hour for My play book to run, and sometimes even if I loose internet connectivity for few seconds , the ssh terminal lost its connection and thus entire playbook stops. any idea how to make ansible script more redundant to avoid this problem ?
Thanks in advance !!
If you need to run a job on an external system that hangs for a long time and it is relevant that the task completes. It is extremly bad idea to run that job in the foreground.
It is not important that the task is Ansible or the connection is SSH. In every case you would always just "push" the command to the remote host and send it to background with something like "nohup" if available. The problem is of course the tree of processes. Your connection creates a process on the remote system and that creates the job you want to run. Is the connection gets lost, al subprocesses will be killed automatically by the OS.
So - under Windows - maybe use RDP to open a screen that stays available even after connection is lost or use something like Cygwin and nohup via SSH to change the hung up your process from the ssh session.
Or - when you need to run a playbook on that system install for example a AWX container and use that. There are many options based on your requirements, resources and administrative options.

Run application on server without ssh session

I have an application written in python, which run on a VPS server. It is a small application that writes, reads and receives read requests from a SQLite database, through a TCP socket.
The downside is that the application runs only when the console is open (using the ssh protocol), when closing the console, that is, the ssh session closes the application.
How should it be implemented? Or I must implement it? because the server is a ubuntu server
nohup should help in your case:
in your ssh session, launch your python app prefixed with nohup, as recommended here
exit your ssh session
The program should continue working even if its parent shell (the ssh session) is terminated.
There are (at least) two solutions:
1- The 'nohup' command, use it as follows: nohup python3 yourappname.py &
This will run your program in the background and won't be killed if you terminate the ssh session, It'll also give you a free prompt after running this command to continue your work.
2- Another GREAT option is the 'screen' command.
This gives you everything that nohup gives you, besides It allows you to check the output of your program (if any) in later logins. Although It may look a little complicated at first sight, but it's SUPER COOL! and I highly recommend you to learn it and enjoy it for the rest of your life!
A good explanation of it is available here

GCP VM consistently shutting down without warning

Been using a GCP preemptible VM for a few months without problems, but in the last 4 weeks my instances have consistently shut off anywhere from 10 minutes to 20 minutes into operation.
I'll be in the middle of training, and my notebook will suddenly disconnect. The terminal will show this error:
jupyter#fastai-instance:~$ Connection to 104.154.142.171 closed by remote host.
Connection to 104.154.142.171 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
I then check the status of my VM, to see that it has shutdown.
I searched the terminal traceback and found this thread, which seemed promising: ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]
When I ran sudo gcloud compute config-ssh my VM ran for much longer than usual before shutting down, yet shutdown in the same way after about an hour. Since then, back to the same behavior.
I know preemptible instances can be shutdown when the platform needs resources, but my understanding is that comes with some kind of warning. I've checked the status of GCP's servers after shutdowns and they appear to be fine. This is also happening the same way every time I turn my VM on, which seems too frequent for preempting.
I am not sure where to look for any clues – has anyone else had a problem like this? What's especially puzzling to me is, if it is in fact an SSH problem, why would that cause the VM itself to shutdown, rather than just break the connection?
Thanks very much for any help!
Did you try to set a shutdown script and to print something in a file for validating the state of the VM when it goes down ?
Try this as shutdown script
#!/bin/bash
curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted" -H "Metadata-Flavor: Google" > /tmp/preempted.log
If there is TRUE in the file, it's because the VM has been preempted.
If a VM stops and you have an active SSH connection to that VM (via gcloud compute ssh), then it's normal that you are receiving an error. Since the VM goes down, all connections are closed, so does your SSH connection (you cannot connect to a stopped instance). The VM termination causes the SSH error, not the opposite.
When using preemptible instances, Google can reclaim the instance whenever it's needed. Note that (from the docs about preemptible instances limitations) :
Compute Engine might terminate preemptible instances at any time due to system events. The probability that Compute Engine will terminate a preemptible instance for a system event is generally low, but might vary from day to day and from zone to zone depending on current conditions.
It means that one day, your instance may be running for 24 hours without being terminated, but an other day, your instance may be stopped 30 minutes after being started if Compute Engine needs to reclaim some resources.
A comment on the "continuously shutting down" part:
(I have experienced this as well)
Keep in mind that Google prefers to shut down RECENTLY STARTED preemptible instances, over ones started earlier.
The link below (and supplied earlier) has the statement:
Generally, Compute Engine avoids preempting too many instances from a single customer and preempts new instances over older instances whenever possible.
This would generally mean that, yes, I suppose, if you are preempted, and boot up again, it is quite likely that you are going to be preempted again and again until the load in the zone reduces.
I'm surprised that Google don't simply preclude you starting the preemptible VM for a while (like 30-60 minutes?). - How much CPU is being wasted bouncing VMs up and down and crossing our fingers???
P.S. There is a dirty trick to end-around your frustration - Have 2 VMs identically configured, except for preemptibility, but only 1 underlying book disk. If you are having a bad day with preempts, simply 'move' the boot disk to the non-preemptible VM, boot it, and carry on. - It's a couple of simple gcloud commands to achieve this, easily scripted and very fast. Don't tell Google I told ya....
https://cloud.google.com/compute/docs/instances/preemptible#limitations

How can I limit the rate of new outgoing ssh connections when using GNU parallel?

Background: The default setting for MaxStartups in OpenSSH is 10:30:60, and most Linux distributions keep this default. That means there can be only 10 ssh connections at a time that are exchanging keys and authenticating before sshd starts dropping 30% of new incoming connections, and at 60 unauthenticated connections, all new connections will be dropped. Once a connection is set up, it doesn't count against this limit. See e.g. this question.
Problem: I'm using GNU parallel to run some heavy data processing on a large number of backend nodes. I need to access those nodes through a single frontend machine, and I'm using ssh:s ProxyCommand to set up a tunnel to transparently access the backends. However, I'm constantly hitting the maximum unauthenticated connection limit because parallel is spawning more ssh connections than the frontend can authenticate at once.
I've tried to use ControlMaster auto to reuse a single connection to the frontend, but no luck.
Question: How can I limit the rate at which new ssh connections are opened? Could I control how many unauthenticated connections there are open at a given time, and delay new connections until another connection has become authenticated?
I think we need a 'spawn at most this many jobs per second per host' option for GNU Parallel. It would probably make sense to have the default work for hosts with MaxStartups = 10:30:60, fast CPUs, but with 500 ms latency.
Can we discuss it on parallel#gnu.org?
Edit:
--sshdelay was implemented in version 20130122.
Using ControlMaster auto still sounds like the way to go. It shouldn't hit MaxStartups, since it keeps a single connection open (and opens sessions on that connection). In what way didn't it work for you?
Other relevant settings that might prevent ControlMaster from working, given your ProxyCommand setup are ControlPath:
ControlPath %r#%h:%p - name the socket {user}#{host}:{port}
and ControlPersist:
ControlPersist yes - persists initial connection (even if closed) until told to quit (-O exit)
ControlPersist 1h - persist for 1 hour

using "vim" can lead ssh timeout but "top" not

When I use ssh to log in a remote server and open vim, if I don't type any words the session will timeout and I have to log in again.
But if I run command like top the session will never timeout?
What's the reason?
Note that the behavior you're seeing isn't related to vim or to top. Chances are good some router along the way is culling "dead" TCP sessions. This is often done by a NAT firewall or a stateful firewall to reduce memory pressure and protect against simple denial of service attacks.
Probably the ServerAliveInterval configuration option can keep your idle-looking sessions from being reaped:
ServerAliveInterval
Sets a timeout interval in seconds after which if no
data has been received from the server, ssh(1) will
send a message through the encrypted channel to request
a response from the server. The default is 0,
indicating that these messages will not be sent to the
server, or 300 if the BatchMode option is set. This
option applies to protocol version 2 only.
ProtocolKeepAlives and SetupTimeOut are Debian-specific
compatibility aliases for this option.
Try adding ServerAliveInterval 180 to your ~/.ssh/config file. This will ask for the keepalive probes every three minutes, which should be faster than many firewall timeouts.
vim will just sit there waiting for input, and (unless you've got a clock or something on the terminal screen) will also produce no output. If this continues for very long, most firewalls will see the connection as dead and kill them, since there's no activity.
Top, by comparison, updates the screen once every few seconds, which is seen as activity and the connection is kept open, since there IS data flowing over it on a regular basis.
There are options you can add the SSH server's configuration to send timed "null" packets to keep a connection alive, even though no actual user data is going across the link: http://www.howtogeek.com/howto/linux/keep-your-linux-ssh-session-from-disconnecting/
Because "top" is always returning data through your SSH console, it will remain active.
"vim" will not because it is static and only transmits data according to your key presses.
The lack of transferred data causes the SSH session to time out