using "vim" can lead ssh timeout but "top" not - ssh

When I use ssh to log in a remote server and open vim, if I don't type any words the session will timeout and I have to log in again.
But if I run command like top the session will never timeout?
What's the reason?

Note that the behavior you're seeing isn't related to vim or to top. Chances are good some router along the way is culling "dead" TCP sessions. This is often done by a NAT firewall or a stateful firewall to reduce memory pressure and protect against simple denial of service attacks.
Probably the ServerAliveInterval configuration option can keep your idle-looking sessions from being reaped:
ServerAliveInterval
Sets a timeout interval in seconds after which if no
data has been received from the server, ssh(1) will
send a message through the encrypted channel to request
a response from the server. The default is 0,
indicating that these messages will not be sent to the
server, or 300 if the BatchMode option is set. This
option applies to protocol version 2 only.
ProtocolKeepAlives and SetupTimeOut are Debian-specific
compatibility aliases for this option.
Try adding ServerAliveInterval 180 to your ~/.ssh/config file. This will ask for the keepalive probes every three minutes, which should be faster than many firewall timeouts.

vim will just sit there waiting for input, and (unless you've got a clock or something on the terminal screen) will also produce no output. If this continues for very long, most firewalls will see the connection as dead and kill them, since there's no activity.
Top, by comparison, updates the screen once every few seconds, which is seen as activity and the connection is kept open, since there IS data flowing over it on a regular basis.
There are options you can add the SSH server's configuration to send timed "null" packets to keep a connection alive, even though no actual user data is going across the link: http://www.howtogeek.com/howto/linux/keep-your-linux-ssh-session-from-disconnecting/

Because "top" is always returning data through your SSH console, it will remain active.
"vim" will not because it is static and only transmits data according to your key presses.
The lack of transferred data causes the SSH session to time out

Related

Can SSH be fault-tolerant, or, is there a way to overcome RST messing up my TCP connections (some kind of retry pipe at both ends?)

I'm trying to use "scp" to copy TB-sized files, which is fine, until whatever router or other issue throws a tantrum and drops my connections (lost packets or unwanted RSTs or whatever).
# scp user#rmt1:/home/user/*z .
user#rmt1's password:
log_backups_2019_02_09_07h44m14.gz
16% 6552MB 6.3MB/s 1:27:46 ETAclient_loop: send disconnect: Broken pipe
lost connection
It occurs to me that (if ssh doesn't already support this) it should be possible for something at each end point and in between the connection to simply connect with its peer, and when "stuff goes wrong", to transparently just bloody handle-it (to re-try indefinitely and reconnect basically).
Anyone know the solution?
My "normal" way of tunnelling remote machines into a local connection is using ssh of course, catch-22 - that's the thing that's breaking so I can't do that here...
SSH uses TCP, and TCP is generally designed to be relatively fault-tolerant, with retries for dropped packets, acknowledgements, and other techniques to overcome occasional network problems.
If you're seeing dropped connections nevertheless, then you are seeing excessive network problems, more than any standard protocol can be expected to handle, or you are seeing a malicious attacker intentionally try to disrupt the connection, which cannot be avoided. Those are both issues that no reasonable network protocol can overcome, and so you're going to have to deal with them. That's true whether you're using SSH or some other protocol.
You could try using SFTP instead of SCP, because SFTP supports resuming (e.g., put -a), but that's the best that's going to be possible. You can also try a command like lftp, which may have more scripting possibilities to copy and retry (e.g., mirror --continue --loop), and can also use SFTP under the hood.
Your best bet is to find out what the network problem is and get that fixed. mtr may be helpful for finding where your packet loss is.

Why does this connection sometimes closes (RESET) Flags: 0x014 (RST, ACK) TCP

We have an issue on the acceptance environment during a handshake process. External service tries to send some data and during handshake sometimes we reset the connection after timeout around 2 minutes. In the picture below you can see communication between two services our server IP ends with 11 and external service IP ends with 5.
The strange thing is that is more than 50% it's working and when it's happened (they will try to send us data every hour and we reject each of them). In between, if we send data to them the next try from them will be successful (picture below). In this case, they use the server IP ends with .6.
Does someone have a clue what can be a problem here? We have tried to find something in our logs but nothing wasn't logged. Some help regarding additional logging will be appreciated (we tried with https://learn.microsoft.com/en-us/dotnet/framework/network-programming/how-to-configure-network-tracing and https://learn.microsoft.com/en-us/dotnet/framework/wcf/diagnostics/tracing/configuring-tracing?redirectedfrom=MSDN). Our backed in written in C# WCF. The additional fact, when we try to send data to them we never have an issue, it's always working.

JMeter and Connect Times for SSL Connections

For a benchmarking test, I have a very basic test setup wherein I have a single user looping for 100 times (loop delay 100ms) hitting an https endpoint (GET) with HttpClient4 implementation, keep-alive has been turned on.
In the test results, I have observed a pattern wherein every 5/6th request the connect metric is higher as if a full SSL handshake is occurring, check the image below. I am a bit confused with this, any ideas on whats going on here and why the connect times are higher every n request?
[UPDATE]
I was able to troubleshoot this issue a bit further today after turning on access logs on the load balancer (target of this test) and I can see a pattern wherein JMeter seems to be switching the ports on the client side every few requests - the frequency matches the pattern observed previously with the JMeter test results.
This should probably explain the elevated connect times, now the question is why JMeter switches the port?
This could be keep-alive, it certainly was for my issue. Firstly make sure it's enabled on the sampler. Then there's also this JMeter setting to say how long to keep connections alive for.
httpclient4.time_to_live
I've set to 120000 in jmeter.properties but looking at the docs user.properties file should be used. I know jmeter.properties with a setting of 120000 worked for me.
I set the value high to see if it is an http keep alive causing the port switch. Whatever you set it to you need to ensure the client you are emulating does the same.
As you get some quick results I would guess it is a short timer somewhere and not the server side not allowing keep alive at all. Wireshark can help you pin point this as it could be the server side resetting the connection after a certain time. The above config extends the client side time which may get the information you need, if not have a look at the server side equivalent which will vary depending on what services the endpoint.

RabbitMQ: Server heartbeat must fail 3 times before connection drop?

We have a HA RabbitMQ cluster (v3.2.x) with two nodes that sits behind a load-balancer. Our clients are configured to use a 300s heartbeat. Everything works as expected most of the time.
However, if the client's connection drops (say the client's NIC is disconnected), we have noticed (via TCPDump/wireshark) that the RabbitMQ node will attempt 3 heartbeat messages (in our case nearly 15 mins) before it closes the connection. Why? Why not close it after one failure?
Is there some means to change this behavior on the RabbitMQ server? Or do we have to shorten our heartbeat to something much smaller like 5s or 10s in order to get the connection to close sooner, thoughts?
Related issue...
Looking at the TCPDump (captured on load-balancer), I wonder why the LB doesn't close the connection when it doesn't receive the TCP-ACK from the dead client in response to the proxied RabbitMQ server heartbeat request? In fact, the LB will attempt to send the request several times (never receiving a response, of course). Wouldn't it make sense for the LB to make the assumption the connection has been dropped and close the entire session (including the connection to RabbitMQ node)?
It appears as though RabbitMQ is configured to tolerate two missed heartbeats before it terminates the connection. However, it waits until the next heartbeat would need to be sent before it drops the connection, that's what gives it the appearance of requiring 3 missed heartbeats.
Heartbeat1 (no response) wait Heartbeat2 (no response) wait Heartbeat3 terminate
There is a slight bug in MQ (it sends a 3rd heartbeat but immediately terminates the connection) but it isn't really affecting anything.

How can I limit the rate of new outgoing ssh connections when using GNU parallel?

Background: The default setting for MaxStartups in OpenSSH is 10:30:60, and most Linux distributions keep this default. That means there can be only 10 ssh connections at a time that are exchanging keys and authenticating before sshd starts dropping 30% of new incoming connections, and at 60 unauthenticated connections, all new connections will be dropped. Once a connection is set up, it doesn't count against this limit. See e.g. this question.
Problem: I'm using GNU parallel to run some heavy data processing on a large number of backend nodes. I need to access those nodes through a single frontend machine, and I'm using ssh:s ProxyCommand to set up a tunnel to transparently access the backends. However, I'm constantly hitting the maximum unauthenticated connection limit because parallel is spawning more ssh connections than the frontend can authenticate at once.
I've tried to use ControlMaster auto to reuse a single connection to the frontend, but no luck.
Question: How can I limit the rate at which new ssh connections are opened? Could I control how many unauthenticated connections there are open at a given time, and delay new connections until another connection has become authenticated?
I think we need a 'spawn at most this many jobs per second per host' option for GNU Parallel. It would probably make sense to have the default work for hosts with MaxStartups = 10:30:60, fast CPUs, but with 500 ms latency.
Can we discuss it on parallel#gnu.org?
Edit:
--sshdelay was implemented in version 20130122.
Using ControlMaster auto still sounds like the way to go. It shouldn't hit MaxStartups, since it keeps a single connection open (and opens sessions on that connection). In what way didn't it work for you?
Other relevant settings that might prevent ControlMaster from working, given your ProxyCommand setup are ControlPath:
ControlPath %r#%h:%p - name the socket {user}#{host}:{port}
and ControlPersist:
ControlPersist yes - persists initial connection (even if closed) until told to quit (-O exit)
ControlPersist 1h - persist for 1 hour