Monit only executing once - monit

Monit is currently executing once. I can see in the log file that it does a check every cycle, however, the execution only happens once when I reload monit.
check host somehost with address example.com
# every "* 8-19 * * 1-5"
if failed
port 443
protocol https
and certificate valid > 1095 days
then exec "/var/local/bin/mtCert.sh"

Monit is based on triggers, it basically only tracks changes.
So if the configured state is not changing, monit will not trigger the script again by default. See note on 5.16.0 at Monit Changelog:
Fixed: The exec action is now executed only once on state change, the same way as the alert action. The new repeat option can be used to repeat the exec action after a given number of cycles if the error persists. Syntax:
if <test> then exec <script> [repeat every [x] cycle(s)]
If you want the old behaviour, use "repeat every cycle". Example:
if failed port 1234 then exec "/usr/bin/myscript.sh" repeat every cycle
So if you in fact need the script to be called multiple times, just add the repeat:
check host somehost with address example.com
# every "* 8-19 * * 1-5"
if failed
port 443
protocol https
and certificate valid > 1095 days
then exec "/var/local/bin/mtCert.sh"
and repeat every 10 cycles

Related

Nginx not taking into account renewed let's encrypt certificates

I have a server running some NodeJs apps (MeteorJs to be precise) on internal ports. I use Nginx to proxy_pass requests that are targeting URLs to the apps.
Let's say app_1 is running on localhost:3000, I would proxy_pass app1.domain.com to localhost:3000 and then add firewall rule to restrict access on port 3000.
Then I add SSL on the incoming connection for app1.domain.com using letsencrypt. I generate certs using certbot certonly -w /var/www/app1 -d app1.domain.com and then set the nginx config file to use it.
Everything works flawlessly until it's time to renew the cert.
To do the renewal, I have the following cron job :
12 6 * * 3 /root/renew.sh
with the following script /root/renew.sh :
certbot renew
service nginx reload
The problem I have is that upon expiration, the nginx webserver is not serving the new certificate !
So I added the following cron job :
30 6 * * 3 service nginx restart
but it still fails to refresh the certificate (which leads to error in navigators, saying connexion is not secure because of cert expiration). So I need to manually log in and reload nginx.
What is wrong in my setup ?
Thanks
You can set everything in one cronjob line (modified basic setup):
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
0 */12 * * * root test -x /usr/bin/certbot -a \! -d /run/systemd/system && perl -e 'sleep int(rand(43200))' && certbot -q renew --deploy-hook "nginx -t && systemctl restart nginx"
This cron job is triggered twice every day to check if certificate is getting expired in next 30 days or not. It shouldn't cause performance problems.
If it is getting expired then it will auto renew it quietly without generating output and restart NGINX to apply changes. If certificate is not getting expired then it will not perform any action.
Be aware --deploy-hook argument was added in certbot version 0.17, released in July 2017
After more testing, here is the answer to this issue:
Set the cron job to point to a bash script:
12 6 * * 3 /root/renew.sh
And set the bash script like this:
certbot renew
sleep 1m
service nginx reload
Note the presence of the sleep command which allows to wait until the renewal is done

Google Cloud VM instance SSH connection ~60 seconds timeout with 30 second keepalive

I've been connecting to a Google Cloud VM instance via gcloud ssh from my macOS:
$ gcloud compute ssh [username]#[instance]
Starting from a week ago, the connection will just drop after ~60 seconds of idle connection and returns:
Connection to [my_external_ip] closed by remote host.
Connection to [my_external_ip] closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
I configured the TCP keepalive time to 30 seconds on both my macbook and the VM. But that did not solve the problem.
Any idea how do I extend the connection duration?
This is unlikely an issue with your timeout setting, but more likely an issue with your firewall rules or routes.
Firstly I would suggest checking your firewall rules and ensure you have an ingress firewall rule opening port 22. If you have, check the configuration of this rule, in particular:
Check the IP range in 'Source filters'. Does the range include the IP address of your home computer? For testing purposes, to ensure it does, you could temporarily set this to 0.0.0.0/0 to include all IP addresses.
Check the 'Targets' drop-down. Is this set to apply to 'All instances in the network' or is it set to 'Specified target tags'? If you have set it to 'Specified target tags', make sure that the same tag is added to the 'Network tags' section of the instance, otherwise the firewall rule will not apply to the instance and allow SSH traffic.
Ensure this rule has a higher priority than any other rules that could counteract it (when I say higher priority I mean lower number, for example, a a rule with a priority of 1000 is a higher priority than a rule with a priority of 20000).
If the above doesn't resolve the issue, run the following command to check the routes:
gcloud compute routes list
Ensure there is an entry which contains the following:
default 0.0.0.0/0 default-internet-gateway
EDIT
If you are able to sometimes SSH into the instance but then the connection drops, there may be some useful information in the logs, or the serial console.
You can access the serial console by clicking on the instance name in the GCP Console, then clicking on "Serial port 1 ".
When you SSH into the instance, information about the SSH session populates the serial console output (this can be refreshed by hitting the 'Refresh' at the top of the page.) Information about the session ending also populates the serial console. There may be some useful information/clues about why the session ends in this output.
It might also be worth checking the status of SSH daemon on the instance and giving it a restart to see if that makes a difference:
Check status of sshd:
systemctl status sshd
Restart sshd:
sudo systemctl restart sshd

how do I kill idle redis clients

I want to timeout and kill idle redis clients. Is there a setting I can set to do this? I seem to remember setting a configuration somewhere but I can't seem to find it again.
I want this to be done automatically, rather than manually calling the client kill command.
Have a look into the Redis configuration file (the one you use to launch Redis).
# Close the connection after a client is idle for N seconds (0 to disable)
timeout 0
Just check the parameter is not commented out, and change the timeout parameter to put a non zero value in seconds. The instance should be restarted to take this parameter in account.
To change this parameter on a running Redis instance, you can use a client command:
> src/redis-cli config set timeout 10
OK
> src/redis-cli config get timeout
1) "timeout"
2) "10"

Delay restart of processes in monit

Can I modify montrc so that it will not restart process immediately. The process has to be down for a full cycle before a restart is triggered. This is so I can keep my existing capistrano deploys.
you can use something like :
check process x with pidfile /var/run/x.pid
every y cycles
or
start program = "/etc/init.d/x start" with timeout 90 seconds
I do not think it is currently possible to do that if you're monitoring only the PID file. If however, you are also monitoring the service by listening in on a port, you can add a if failed port 8080 X times within Y cycles then restart clause. Monit will then curl that port every cycle, and when the count of failures reaches X across Y cycles, it will attempt to restart the service.
Keep in mind that this only affects the port monitor. If monit notices the PID file is gone, it will immediately try to restart it.
Try
check process x with pidfile /var/run/x.pid
if does not exist for 2 cycles then start
This will wait a minimum of 1 full cycle before restarting the dead process.

Monit on CentOS causes httpd.pid not to be created

The solution was to replace this line:
check process apache with pidfile /var/run/httpd.pid
With this line:
check process httpd with pidfile /var/run/httpd/httpd.pid
And I also removed the 'group apache'.
Original post:
After installing Monit on CentOS, and setting an alert for the Apache (httpd) service, the service no longer creates the /var/run/httpd.pid file.
The httpd service IS running properly.
On top of it, as if that's not enough, Monit reports the status of the service as: Execution failed
Naturally, the only way to restart such a service is by killing it, since the 'restart' script doesn't see any running process.
These are the contents of the /etc/monit.d/monitrc file:
set daemon 10
set logfile syslog facility log_daemon
set mailserver localhost
set mail-format { from: me#server.com }
set alert bugs#server.com
set httpd port 2812 and
# SSL ENABLE
# PEMFILE /var/certs/monit.pem
allow user:password
check process apache with pidfile /var/run/httpd.pid
group apache
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if cpu is greater than 180% for 1 cycles then alert
if totalmem > 1200 MB for 2 cycles then restart
if children > 250 then restart
check process sshd with pidfile /var/run/sshd.pid
start program "/etc/init.d/sshd start"
stop program "/etc/init.d/sshd stop"
if failed port 22 protocol ssh for 5 cycles then restart
if 5 restarts within 25 cycles then timeout
Output of "service httpd restart":
Stopping httpd: [FAILED]
Starting httpd: (98)Address already in use: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
Unable to open logs
[FAILED]
Any help will be greatly appreciated.
Try to replace stop program with /usr/sbin/httpd -k stop. It work for me.
I had the same problem but /usr/sbin/httpd -k stop didn't seem to help since this still tries to look up the process id from the pid file.
I opted for stop program = "/usr/bin/killall httpd". I don't think this is very elegant (probably kills open requests) but it was the only way I could find to restart apache and have the pid file recreated by monit.
I think that monit is doing a restart as 'stop; start' and is not waiting for 'stop' to finish before starting a new process, and thus is deleting the pid file at an inappropriate time. At least, that's my conclusion after tinkering with all this.
I found a reference to someone who fixed this issue by making monit sleep after the 'stop' statement.
Personally, I found that replacing 'restart' with 'start' when the http server is down worked just fine.