I am doing a customer demo where I need to stress out a sshd server with repeated sequential requests. So I wrote a small shell script with a loop in it. The first connection is successful, however the sshd refuses connection immediately after the first connection. So all my subsequent requests fail.
Right now the SSHD is running in a docker container and I am running the script from the host. So no external factor such as network proxy is in picture here.
So far I have checked the following things
The SSHD config file contains the following line (I bumped up the value)
MaxStartups 100:300:600
Checked everything here - http://edoceo.com/notabene/ssh-exchange-identification
Have been googling around for what could be the problem (too many links to post here). Any ideas?
Ok. So the SSHD daemon was being spawned in the debug mode. Therefore it could not fork. It would get killed after one connection. Tried putting it in the regular mode and now the test is flying :)
Related
While trying to solve another issue with connection problems to our servers, I thought to solve the problem by setting the MaxConnections and MaxStartups to my sshd.conf
When restarting ssh everything seemed fine, but this morning I found out that our Jenkins server didn't connect to any of the dev servers. So I tried logging into the system, finding out that I cannot log in to any of our dev servers anymore.
Looks like I made a F##$up in the sshd.conf and created a lockout for all the dev servers.
When trying to login I get an "port 22: Connection refused" error.
Is there any other way to get into the systems without having to connect every disk to another server to adjust the sshd.conf??
There are several options available for recovery in this situation:
Use the interactive serial console. This requires advanced preparation.
Add a startup script to fix the file, and then reboot to trigger the script.
Shutdown the instance, attach the disk to a recovery instance, use the recovery instance to mount the disk and fix the file, then detatch the disk and recreate the instance using the fixed disk. (you can also do this from a snapshot for added safety).
I have been working on a http server which accepts connections and then based on the host name, loads up the right project from .so, generates the page the client is asking for, then sends them back.
Now that I have several working projects, I am interested in making them available to others but here is my problem :
I am connecting to my dedicated server through ssh, and starting my daemon from there, but after a while, the pages are no longer accessible because my program is no longer running.
I also get kicked by the server after a while. I wonder :
How do I keep my server running ? Does the fact that I keep getting kicked out by ssh after a little idle time explains why my daemon is being shutdown ?
Thanks in advance to whoever will be able to give me some element of answer.
When your SSH session times out SIGHUP was sent to the sub-processes forked from the current interactive shell. That's why the processes were terminated (server no longer running).
To avoid idle SSH connection being kicked by the server, set the ServerAliveInterval to send a request for response from server (e.g. ~/.ssh/config)
Host *
ServerAliveInterval 30
To avoid shell sub-process termination, refer to
https://askubuntu.com/questions/348836/keep-the-running-processes-alive-when-disconneting-the-remote-connection/348921#348921
https://askubuntu.com/questions/349262/run-a-nohup-command-over-ssh-then-disconnect
In short, there are 3 options:
nohup
disown / setsid
start the servers in CLI in tmux or screen session on the server
NOTE: If the server instances are already properly daemonized, try looking at monit or supervisord to keep them running ;-D
I have a google compute engine VM, running ubuntu, and utilising Laravel Forge.
I seem to get blocked by the VM after accessing SSH a few times (2-4), even if I'm logging in correctly. Restarting the VM unblocks me.
I first noticed the issue as I was having trouble logging into SSH, after a few attempts it would become unreachable. My website hosted on it also wouldn't resolve. After restarting the vm, I could try log into ssh again and my website works. This happened a couple time before I figured out how to correctly log in with SSH.
Next, trying to log in to the database with HeidiSQL, which uses plink, I log in fine. But it seems to keep reconnecting via SSH every time I do something, and after 2-4 of these reconnects, I get the same problem with the VM being unreachable by SSH and my website hosted on it being down.
Using SQLyog, which seems to maintain the one SSH connection, rather than constantly reconnecting like HeidiSQL, I have no problems.
When my website is down, I use those "down for everyone or just me" websites to see if it is down, and apparently it's just down for me, so I must be getting blocked.
So I guess my questions are:
1. Is this normal?
2. Can I unblock myself without restarting the VM?
3. Can I make blocking occur in a less strict way?
4. Why does HeidiSQL keep reconnecting via SSH rather than maintaining the one connection like SQLyog seems to?
You have encountered sshguard, which is enabled by default on the GCE Ubuntu images (at least on the 14.10 image, where I encountered it myself). There is a whitelist file at /etc/sshguard/whitelist.
The sshguard default configuration on my VM has a "dangerousness" threshold of 40. Most "attacks" that sshguard detects incur dangerousness of 10, so getting blocked after 4 reconnects sounds about right.
The attack signatures are listed here: http://www.sshguard.net/docs/reference/attack-signatures/
I would bet that you are connecting from an IP that has an invalid reverse DNS configuration (I was). Four connects like that and the default config blocks you for 20 minutes.
Using OpenShift Enterprise 2.0, I have a simple jbossews (tomcat7) + mysql 5.1 app that uses JSP files connected to a mysql database. The app was created as a non-scaled app (fwiw the same issue happens when scaling is enabled).
Using a JMeter driver with only a single concurrent user and no think time, it will chug along for about 2 minutes (at about 200 req/sec) and then it will start returning "503 Service Temporarily Unavailable" in batches (a few seconds at a time) on and off for the remainder of the test. Even if I change nothing (don't restart the app) if I wait a moment and then try again, it will do the same thing--first it seems fine, but then it will start with the errors.
The gear is far from fully-utilized (memory/cpu), and the only log I can find that shows a problem is the /var/log/httpd/error_log, which fills up with these entries:
[Tue Mar 25 15:51:13 2014] [error] (99)Cannot assign requested address: proxy: HTTP: attempt to connect to 127.8.162.129:8080 (*) failed
Looking at the 'top' command on the node host at the time that the errors start to occur, I see several httpd processes surge to the top on and off.
So it looks like I am somehow running out of proxy connections or something similar. However, I'm not sure how that is happening with only a single concurrent user. Any ideas of how to fix this? I couldn't find any similar posts.
The core problem is that the system is running out of ephemeral ports due to connections stuck in TIME_WAIT. Check using:
netstat -pan --tcp | less
or
netstat -pan --tcp | grep -c ".*TIME_WAIT"
to just count the number of connections in time wait state.
These are connections made by the node port proxy (httpd) to the tomcat backend. There are several ways to change TCP settings in order to lessen the problem. First attempt is to enable reuse. Append the following to /etc/sysctl.conf:
# allow reuse of time_wait connections
net.ipv4.tcp_tw_reuse=1
This will allow connections in TIME_WAIT state to be reused if there are no ephemeral ports available.
However,the problem mostly remains that these connections are not being properly pooled. I do not run into this issue outside of a gear with the same app+driver--meaning that the connections are properly pooled and don't have to sit in TIME_WAIT state at all. Something in the proxy must be interfering with the connection closure.
Looks like the mod_proxy / mod_rewrite are not configured for connection pooling/keepalive or they are not compatible with it.
You should first try and move to vhost routing, if your hitting this issue, but tcp tw reuse can help if vhost connection are still so high that you still run out of port.
https://access.redhat.com/articles/1203843 also has a lot of good information on the topic, including this on possible causes of error 503:
Understanding that HAProxy preforms health checks on Gear
(application) contexts is important because if these checks fail you
can see 502 or 503 errors when trying to access your application,
because the Proxy disables the route to the application (IE: puts the
gear in maintenance mode).
...and...
...if you are seeing 502 or 503 errors when trying to access your
application, it could be because the proxy is disabling the routes to
the application (IE: puts the gear in maintenance mode), because it is
failing health checks...
I am able to issue commands to my EC2 instances via SSH and these commands logs answers which I'm supposed to keep watching for a long time. The bad thing is that SSH command is closed after some time due to my inactivity and I'm no longer able to see what's going on with my instances.
How can I disable/increase timeout in Amazon Linux machines?
The error looks like this:
Read from remote host ec2-50-17-48-222.compute-1.amazonaws.com: Connection reset by peer
You can set a keep alive option in your ~/.ssh/config file on your computer's home dir:
ServerAliveInterval 50
Amazon AWS usually drops your connection after only 60 seconds of inactivity, so this option will ping the server every 50 seconds and keep you connected indefinitely.
Assuming your Amazon EC2 instance is running Linux (and the very likely case that you are using SSH-2, not 1), the following should work pretty handily:
Remote into your EC2 instance.
ssh -i <YOUR_PRIVATE_KEY_FILE>.pem <INTERNET_ADDRESS_OF_YOUR_INSTANCE>
Add a "client-alive" directive to the instance's SSH-server configuration file.
echo 'ClientAliveInterval 60' | sudo tee --append /etc/ssh/sshd_config
Restart or reload the SSH server, for it to recognize the configuration change.
The command for that on Ubuntu Linux would be..
sudo service ssh restart
On any other Linux, though, the following is probably correct..
sudo service sshd restart
Disconnect.
logout
The next time you SSH into that EC2 instance, those super-annoying frequent connection freezes/timeouts/drops should hopefully be gone.
This is also helps with Google Compute Engine instances, which come with similarly annoying default settings.
Warning: Do note that TCPKeepAlive settings (which also exist) are subtly, yet distinctly different from the ClientAlive settings that I propose above, and that changing TCPKeepAlive settings from the default may actually hurt your situation rather than help.
More info here: http://man.openbsd.org/?query=sshd_config
Consider using screen or byobu and the problem will likely go away. What's more, even if the connection is lost, you can reconnect and restore access to the same terminal screen you had before, via screen -r or byobu -r.
byobu is an enhancement for screen, and has a wonderful set of options, such as an estimate of EC2 costs.
I know for Putty you can utilize a keepalive setting so it will send some activity packet every so often as to not go "idle" or "stale"
http://the.earth.li/~sgtatham/putty/0.55/htmldoc/Chapter4.html#S4.13.4
If you are using other client let me know.
You can use Mobaxterm, free tabbed SSH terminal with below settings-
Settings -> Configuration -> SSH -> SSH keepalive
remember to restart Mobaxterm app after changing the setting.
I have a 10+ custom AMIs all based on Amazon Linux AMIs and I've never run into any timeout issues due to inactivity on a SSH connection. I've had connections stay open more than 24 hrs, without running a single command. I don't think there are any timeouts built into the Amazon Linux AMIs.