monit: monitoring certificates - ssl

I am currently trying to keep watch of my systems certificates to make sure I get alerting 30 days before expiring. As an example I took google.com and monitoring its certificates. Looking at monitrc file I have added this:
check host google.com with address google.com
if failed
port 443
protocol https
with ssl options {verify: enable}
certificate valid > 1095 days
then alert
When I restart monit I get the error there is no service called google.com

When setting up new monit config or change/edit current config, should reload first to make sure monit reinitializing the daemon
Reproduce issue
# Edit monit config
root#home:~# vim /etc/monit/conf-enabled/test
# Restart right way will fail
root#home:~# monit restart all
There is no service named "google.com"
# Logs
[CEST Sep 12 19:16:29] info : 'home' trying to restart
[CEST Sep 12 19:16:29] info : 'home' restart action done
[CEST Sep 12 19:17:09] info : 'home' restart on user request
[CEST Sep 12 19:17:09] error : HttpRequest: error -- client [::1]: HTTP/1.0 400 There is no service named "google.com"
[CEST Sep 12 19:17:09] error : There is no service named "google.com"
Solution
# Edit monit config
root#home:~# vim /etc/monit/conf-enabled/test
# Reload to reinitialize monit
root#home:~# monit reload
Reinitializing monit daemon
root#home:~# monit status
Monit 5.25.2 uptime: 24m
Remote Host 'google.com'
status OK
monitoring status Monitored
monitoring mode active
on reboot start
port response time 878.069 ms to google.com:443 type TCP/IP using TLS (certificate valid for 69 days) protocol HTTP
data collected Thu, 12 Sep 2019 19:24:06

you can also run periodic tests with a software like Overseer, and receive notifications using a Notify17 notification template (see the sample recipe).
You could use a test rule like:
https://myurl.com/path must run https
Or
https://myurl.com/path must run ssl
These rules evaluate if a website is reachable over SSL and if the certificate will expire soon (you can see more options in the source code).
P.S. To have an easy start with Overseer, you can check out the Kubernetes deployment example.

Related

When will monit actually start or restart a service

Can someone please let me know on what basis monit decides that its time to restart an application? For instance, if I want monit to monitor my web application, what information should I provide to monit based on which it will restart?
thanks
Update:
I was able to kind of make it work using the following monit config
check host altamides with address web.dev1.ams
if failed port 80 with protocol http
then alert
However, I was wondering if I can use any absolute URL of my application. Something like http://foo:5453/test/url/1.html/
Can someone help me on that please?
Monit by himself will not restart any service, but you can provide to it the rules you want to perform it, you can do something like
check process couchdb with pidfile /usr/local/var/run/couchdb/couchdb.pid
start program = "/etc/init.d/couchdb start"
stop program = "/etc/init.d/couchdb stop"
if cpu > 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if memory usage > 70% MB for 5 cycles then restart
check host mmonit.com with address mmonit.com
if failed port 80 protocol http then alert
if failed port 443 protocol https then alert
I figured the answer from monit help page
if failed
port 80
protocol http
request "/data/show?a=b&c=d"
then restart

Trying to install new SSL cert with apache graceful restart returns [OK], but website stays down

I'm having trouble gracefully restarting the Apache service when a new SSL certificate is installed.
I have a AWS Elastic Beanstalk application that serves websites for multiple customers, each with their own domain. To enable SSL for each site, I upload de certificate files to an S3 Bucket. I've configured the whole thing so that SSL is terminated in every instance and not in the load balancer. This all works OK.
Now, I want to be able to add new certificates dinamically. A client can buy a domain and have it point to their website automatically. When this happens, a new certificate is generated (via Lets Encrypt API) and uploaded to S3. The final step is having a cron job in the server instance that syncs new files from S3 (via aws-cli), loads the new apache vhost file and restarts apache. The sync works fine, and the graceful reload seems to run ok, but then the sites are down. My script looks like this:
#!/bin/bash
exec 3>&1 1>>/var/log/httpd/cert-sync.log 2>&1
# start
echo "[$(date)]: ----------- Start Sync-----------"
echo "[$(date)]: Syncing certs..."
aws s3 sync s3://bucket/certs /etc/pki/tls/
echo "[$(date)]: Syncing conf..."
aws s3 sync s3://bucket/conf/ /etc/httpd/conf.d/ssl-vhosts/
echo "[$(date)]: Attempting Apache graceful reload..."
sudo /etc/init.d/httpd graceful
exit 0
The log output is this:
[Mon Jul 1 20:09:01 UTC 2019]: ----------- Start Sync-----------
[Mon Jul 1 20:09:01 UTC 2019]: Syncing certs...
...
[Mon Jul 1 20:09:01 UTC 2019]: Syncing conf...
....
[Mon Jul 1 20:09:02 UTC 2019]: Attempting Apache graceful reload...
Equivalent Upstart operations: start httpd, stop httpd, restart httpd, status httpd
Gracefully restarting httpd
[OK]
This leads me to believe everything is ok, but then I visit the entire application id down (I get 503). When the cron job runs again, this is the output:
[Mon Jul 1 22:57:01 UTC 2019]: ----------- Start Sync-----------
[Mon Jul 1 22:57:01 UTC 2019]: Syncing certs...
[Mon Jul 1 22:57:02 UTC 2019]: Syncing conf...
[Mon Jul 1 22:57:03 UTC 2019]: Attempting Apache graceful reload...
Equivalent Upstart operations: start httpd, stop httpd, restart httpd, status httpd
Gracefully restarting httpd
/etc/init.d/httpd: line 60: kill: (10478) - No such process
Stopping httpd
stop: Unknown instance:
Starting httpd
httpd start/running, process 14139
[OK]
So now I know the service was down after the graceful restart, even though the log said it was OK. I can't find anything on the apache logs. I can rule out that its a problem with the certificate, or the configuration file, because if I make an empty commit and deploy using the cli (eb deploy app-ssl) the websites are back on and the new SSL cert is installed correctly (I can visit the domain with https on with no trouble). It is also weird that it says the service was stopped and it started it again, but the sites are still down. A server restart from the AWS Console doesn't fix it either.
What I've tried:
several versions of the graceful command, including:
/etc/init.d/httpd graceful
/etc/init.d/apachectl -k graceful
/etc/init.d/httpd restart
service httpd graceful
running with sudo and without sudo
But nothing seems to make a difference. This only works with a fresh deploy, which I guess causes a new instance to be created and maybe Apache only starts after the new files have been synce (I have an ebextension file that handles the sync upon instance creation). I need to be able to reload the configuration on an existing instance to fully automate the process.
I've been at this for 3 days. Any idea what might be happening?
EDIT:
The AWS EBS app is running Amazon Linux 2, and Apache 2.4. Also, apachectl, apache2ctl, and systemctl aren't found. The only way I can even try restarting apache is with httpd.

SSH connection timed out while connecting to ec2 after apache2 installation

I created an ec2 instance in AWS with ubuntu AMI and done all necesaary things to connect to the domain by setting up elsatic IP, Security Group, Route 53. Then I got the PEM file and connected to the SSH using private keys.
It is all working fine till I installed apache2 and restarted the apache server.
Then after, it is showing connection timed out to port 22 (SSH)
here is the security group inbound rules
then checked in instance log got this thing at the bottom
[[0;32m OK [0m] Started The Apache HTTP Server.
[[0;32m OK [0m] Started Dispatcher daemon for systemd-networkd.
[[0;32m OK [0m] Started Snappy daemon.
Starting Wait until snapd is fully seeded...
[[0;32m OK [0m] Started Wait until snapd is fully seeded.
[[0;32m OK [0m] Reached target Multi-User System.
[[0;32m OK [0m] Reached target Graphical Interface.
Starting Update UTMP about System Runlevel Changes...
Starting Apply the settings specified in cloud-config...
[[0;32m OK [0m] Started Update UTMP about System Runlevel Changes.
[ 13.456104] cloud-init[1033]: Cloud-init v. 18.3-9-g2e62cb8a-0ubuntu1~18.04.2 running 'modules:config' at Wed, 06 Feb 2019 12:07:07 +0000. Up 13.29 seconds.
[[0;32m OK [0m] Started Apply the settings specified in cloud-config.
Starting Execute cloud user/final scripts...
[ 14.093385] cloud-init[1060]: Cloud-init v. 18.3-9-g2e62cb8a-0ubuntu1~18.04.2 running 'modules:final' at Wed, 06 Feb 2019 12:07:08 +0000. Up 13.95 seconds.
[ 14.108125] cloud-init[1060]: Cloud-init v. 18.3-9-g2e62cb8a-0ubuntu1~18.04.2 finished at Wed, 06 Feb 2019 12:07:08 +0000. Datasource DataSourceEc2Local. Up 14.08 seconds
[[0;32m OK [0m] Started Execute cloud user/final scripts.
[[0;32m OK [0m] Reached target Cloud-init target.
EDIT:
AMI has some issues, created new instance and configured all again now it is working fine.
May be you have enable firewall during Apache installation and allow only few ports in firewall.
I follow this video and was able to connect with instance through session manager.
Steps to connect with instance though session manager when ssh not works
A. Create Role and assign policy
1. Choose entity type AWS and use case ec2
2. Attach policy
3. tags skip
4. Review: Add role name
B. Attach above role with instance and save.
c. Reboot your instance and try to connect with session manager
After login just disable firewall sudo ufw disable and check all inbound rule defined in security group should work

Google Cloud Load Balancer - 502 - Unmanaged instance group failing health checks

I currently have an HTTPS Load Balancer setup operating with a 443 Frontend, Backend and Health Check that serves a single host nginx instance.
When navigating directly to the host via browser the page loads correctly with valid SSL certs.
When trying to access the site through the load balancer IP, I receive a 502 - Server error message. I check the Google logs and I notice "failed_to_pick_backend" errors at the load balancer. I also notice that it failing health checks.
Some digging around leads me to these two links: https://cloudplatform.googleblog.com/2015/07/Debugging-Health-Checks-in-Load-Balancing-on-Google-Compute-Engine.html
https://github.com/coreos/bugs/issues/1195
Issue #1 - Not sure if google-address-manager is running on the server
(RHEL 7). I do not see an entry for the HTTPS load balancer IP in the
routes. The Google SDK is installed. This is a Google-provided image
and if I update the IP address in the console, it also gets updated on
the host. How do I check if google-address-manager is running on
RHEL7?
[root#server]# ip route ls table local type local scope host
10.212.2.40 dev eth0 proto kernel src 10.212.2.40
127.0.0.0/8 dev lo proto kernel src 127.0.0.1
127.0.0.1 dev lo proto kernel src 127.0.0.1
Output of all google services
[root#server]# systemctl list-unit-files
google-accounts-daemon.service enabled
google-clock-skew-daemon.service enabled
google-instance-setup.service enabled
google-ip-forwarding-daemon.service enabled
google-network-setup.service enabled
google-shutdown-scripts.service enabled
google-startup-scripts.service enabled
Issue #2: Not receiving a 200 OK response. The certificate is valid
and the same on both the LB and server. When running curl against the
app server I receive this response.
root#server.com curl -I https://app-server.com
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html
Thoughts?
You should add firewall rules for the health check service -
https://cloud.google.com/compute/docs/load-balancing/health-checks#health_check_source_ips_and_firewall_rules and make sure that your backend service listens on the load balancer ip (easiest is bind to 0.0.0.0) - this is definitely true for an internal load balancer, not sure about HTTPS with an external ip.
A couple of updates and lessons learned:
I have found out that "google-address-manager" is now deprecated and replaced by "google-ip-forward-daemon" which is running.
[root#server ~]# sudo service google-ip-forwarding-daemon status
Redirecting to /bin/systemctl status google-ip-forwarding-daemon.service
google-ip-forwarding-daemon.service - Google Compute Engine IP Forwarding Daemon
Loaded: loaded (/usr/lib/systemd/system/google-ip-forwarding-daemon.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2017-12-22 20:45:27 UTC; 17h ago
Main PID: 1150 (google_ip_forwa)
CGroup: /system.slice/google-ip-forwarding-daemon.service
└─1150 /usr/bin/python /usr/bin/google_ip_forwarding_daemon
There is an active firewall rule allowing IP ranges 130.211.0.0/22 and 35.191.0.0/16 for port 443. The target is also properly set.
Finally, the health check is currently using the default "/" path. The developers have put an authentication in front of the site during the development process. If I bypassed the SSL cert error, I received a 401 unauthorized when running curl. This was the root cause of the issue we were experiencing. To remedy, we modified nginx basic authentication configuration to disable authentication to a new route (eg. /health)
Once nginx configuration was updated and the path was updated to the new /health route at the health check, we were receivied valid 200 responses. This allowed the health check to return healthy instances and allowed the LB to pass through traffic

SSL certificate for mail server on root domain vs wildcard on subdomain

I installed a general RapidSSL certificate on my Ubuntu 12.04 x64 Apache/iRedMail server at Digital Ocean. It verifies fine in the browser and when I use RapidSSL's checker tool.
However, this server is exclusively for mail, and when I set up an account in Mail (Mac OS) or on my iPhone, etc., I have to make a security exception because it gives me the error "This root certificate is not trusted." And it shows the location as being GuangDong, China for some reason.
I spoke with a support person at eNom (where I bought the SSL certificate) and he mentioned that there might be an issue using the root domain instead of a subdomain for mail, and that I may need a wildcard certificate instead. That suggestion makes no logical sense to me.
My general question is: Is there any difference between setting up a mail server with a general SSL certificate on the root domain, as opposed to a mailserver on a subdomain with a wildcard SSL certificate?
Or is something wrong with my Apache configuration, perhaps?
Thanks! :)
Update:
So now I did the following:
in /etc/dovecot/dovecot.conf
changed:
ssl_cert = </etc/ssl/certs/iRedMail_CA.pem
ssl_key = </etc/ssl/private/iRedMail.key
To:
ssl_cert = </etc/ssl/certificate.crt
ssl_key = </etc/ssl/certificate.key
ssl_ca = </etc/ssl/intermediate.crt
Then in /etc/postfix/main.cf
I changed:
smtpd_tls_cert_file = /etc/ssl/certs/iRedMail_CA.pem
smtpd_tls_key_file = /etc/ssl/private/iRedMail.key
To:
smtpd_tls_cert_file = /etc/ssl/certs/certificate.crt
smtpd_tls_key_file = /etc/ssl/certificate.key
smtpd_tls_CAfile = /etc/ssl/intermeidate.crt
Then, I reboot the server. And Apache hangs, get this error:
root#host:~# service apache2 status
Apache2 is NOT running.
root#host:~# service apache2 restart
* Restarting web server apache2
(98)Address already in use: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
Unable to open logs
Action 'start' failed.
The Apache error log may have more information.
...fail!
root#host:~#
So, then I do this:
netstat -ltnp | grep ':80'
result:
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 1464/apache2
Then I do this:
kill -9 1464
And Apache restarts fine after that, but roundcube won't let me log in and I can't connect to IMAP or SMTP at all.
Dovecot log says:
Nov 07 04:31:43 imap-login: Error: SSL private key file is password protected, but password isn't given
Nov 07 04:31:43 imap-login: Fatal: Couldn't parse private ssl_key
Update Again:
Everything in Dovecot is working great now. Had to do the following, since my certificate is encrypted with a password:
killall dovecot
dovecot -p
Then enter my password.
Now my problem is with Postfix, which isn't working at all. I'm assuming it doesn't like the password protected key.
You need to configure dovecot to use SSL.
You have installed and configured SSL only for apache, IMAP/POP/SMTP connections are not handled by apache.
And no, there is no difference in using a domain and subdomain.
How to setup iRedmail to use SSL