Is there a way to stop monit from sending monitor and unmonitor email messages - monit

I've recently started using Monit to monitor some production machines. And it does this well. But the annoying issue I have is that part of the routine is to restart the servers once a day on a rotating basis and each of those restarts generates a unmonitor and monitor message.
I can't find a specific alert setting to turn this off and consequently I'm bombarded by correct but unnecessary messages. There does not seem to be a specific event related to this.
Does anyone know of a way to do this? To be clear, I want to tell monit to unmonitor a server/task, restart or do something, then restart the server/task, then monitor it again. But I don't want to know about the unmonitor or monitor, only failure situations.

You can disable alerts for user actions in general, like the monitrc states:
## Do not alert when Monit starts, stops or performs a user initiated action.
## This filter is recommended to avoid getting alerts for trivial cases.
#
# set alert your-name#your.domain not on { instance, action }
Documentation: https://mmonit.com/monit/documentation/monit.html#Setting-an-event-filter

Related

Splunk 7.2.9.1 Universal forwarder on SUSE Linux12.4 not communicating and forwarding logs to Indexer after certain period of time

I have noticed Splunk 7.2.9.1 Universal forwarder on SUSE Linux12.4 is not communicating to deployment server and forwarding logs to indexer after certain period of time. "splunkd" process appears to be running while this issue persists.
I have to restart UFW for it to resume communication to deployment and forward logs. But this will again stop communication after certain period of time.
I cannot see any specific logs in splunkd.log while this issue occurs.
However, i noticed below message from watchdog.log
06-16-2020 11:51:09.055 +0200 ERROR Watchdog - No response received from IMonitoredThread=0x7f24365fdcd0 within 8000 ms. Looks like thread name='Shutdown' is busy !? Starting to trace with 8000 ms interval.
Can somebody help to understand what is causing this issue.
This appears to be a Known Issue. From the 7.2.9.1 release notes:
Universal Forwarders stop sending data repeatedly throughout the day
Workaround: In limits.conf, try changing file_tracking_db_threshold_mb
in the [inputproc] stanza to a lower value.
I did not find a version where this is not listed as a known problem.

Monit cannot start/stop service

Monit cannot start/stop service,
If i stop the service, just stop monitoring the service in Monit.
Attached the log and config for reference.
#Monitor vsftpd#
check process vsftpd
matching vsftpd
start program = "/usr/sbin/vsftpd start"
stop program = "/usr/sbin/vsftpd stop"
if failed port 21 protocol ftp then restart
The log states: "stop on user request". The process is stopped and monitoring is disabled, since monitoring a stopped (= non existing) process makes no sense.
If you Restart service (by cli or web) it should print info: 'test' restart on user request to the log and call the stop program and continue with the start program (if no dedicated restart program is provided).
In fact one problem can arise: if the stop scripts fails to create the expected state (=NOT(check process matching vsftpd)), the start program is not called. So if there is a task running that matches vsftpd, monit will not call the start program. So it's always better to use a PID file for monitoring where possible.
Finally - and since not knowing what system/versions you are on, an assumption: The vsftpd binary on my system is really only the daemon. It is not supporting any options. All arguments are configuration files as stated in the man page. So supplying start and stop only tries to create new daemons loading start and stop file. -- If this is true, the one problem described above applies, since your vsftpd is never stopped.

Hosts in Nagios are disappearing

This may belong in ServerFault, but I wanted to approach this community first. If this is not correct, please move this thread or close and I will open on the correct thread.
PROBLEM:
Hosts, along with their associated services, disappear and reappear upon refresh (F5 / Ctrl+F5 / etc).
STEPS TO REPRODUCE:
1. Log into Nagios
2. Click Service Detail
3. See a breakdown of services but you don't see the last one you added.
4. Refresh screen by using F5 / Ctrl+F5 / etc and it doesn't show up still
5. Refresh screen by using F5 / Ctrl+F5 / etc and it doesn't show up still
6. Refresh screen and it will show up.
(!) - Steps 4-6 vary
WHAT I'VE TRIED:
Restarting Nagios service (service Nagios restart)
Restarting HTTPD service (service httpd restart)
Restarting VPS
Refresh browser including "Clear Cache and Hard Reload"
Tried different browsers
Tried different computers
Tried different networks
SCREENSHOTS:
GOOD
https://i.imgur.com/KUW5C6E.png
BAD
https://i.imgur.com/rWFLEaf.png
POSSIBLE CAUSE:
The reason we're in this situation now is because we had an intern add this latest host and its associated service. He added it correctly, and I even checked his work. He did the normal preflight but instead of issuing the reset command via SSH he issued the command on the Web interface itself by accessing "Process Info > Restart the Nagios process". Seems like it would work OK, but we've never restarted like this and is the only reason I suspect it's the culprit of the issue we are seeing. Is there something different that this restart does over the normal SSH restart?
EDIT: To add to all of this, we have updated a different file today, unrelated to this host or it's services and Nagios is not updating.
Thanks for helping!
Rich
EXTRA:
Here is a screenshot of the config file:
https://i.imgur.com/2UsYZcw.png
This can happen if you have multiple Nagios services running, There could be a secondary instance of the service running which hasn't been updated with the new configuration files as it technically hasn't been restarted. I've had this happen once or twice.
First, shut down Nagios
service nagios stop
Next, kill all remaining instances.
killall -9 nagios
Finally, start Nagios back up
service nagios start
That should fix your problem.

Making Config for Monit to check program started from bash

I'm hoping someone out there is used to monit and can help me.
Im running a home data server, with Ubuntu 13.10.
I have CGminer setup to start when the PC boots, from a bash script of my own creation. It contains a few tweaks and setting that need running before it gets going.
But if for some reason my interweb goes down...cgminer will close after a small amount of time. Now, if im asleep, and it closes. That valuable mining time, and a waste of the electric. So I'm looking into monit as a way of fixing that.
Im hoping to be able to have monit (or something similar, doesnt have to be monit) Start CGMiner from my script, check every so often that CGminer is still running, and if not, restart it from my script.
I just cant get my head around the config file for monit...Help would be awesome
Yes, you can achieve that with monit. You only need that your start script writes pid into pidfile:
check process xyz with pidfile /var/run/xyz.pid
start = "/bin/xyz start"
stop = "/bin/xyz stop"

Apache CGI Timeout - how does it kill and/or notify the child?

I have some potentially long lived CGI applications which must clean up their environment regardless of whether they complete normally or if they're killed by Apache because they're taking too long. They're using shared memory so I can't rely on the operating system's normal process cleanup mechanisms.
How does Apache kill its CGI children when they're timing out? I can't find any documentation or specification for how its done, nor whether its possible for the child to intercept that so it can shut down cleanly.
I could not find any official Apache documentation on this, but the following script shows that CGI scripts are sent SIGTERM on timeout, not SIGKILL (at least in my version of Apache, 2.2.15):
#!/usr/bin/perl
use strict;
use warnings;
use sigtrap 'handler' => \&my_handler, 'normal-signals';
use CGI;
sub my_handler {
my ($sig) = #_;
open my $fh, ">", "/var/www/html/signal.log" or die $!;
print $fh "Caught SIG$sig";
close $fh;
}
sleep 10 while 1;
Output:
Caught SIGTERM
Nop, Apache send kill signal and this signal can not be caught or handled. So signal handler do nothing in this case.
looks like apache doesn't do anything? I just added a signal handler to one of my perl cgi scripts that Apache timed out on, and I get nothing :(
bit of a shame really.
Note that in case these tasks are really taking too long and there isn't really a reply expected by the client, you could instead start a background process on your server whenever you receive such a request.
This of course means that you probably want to make sure you don't start the background process more than a certain number of times (possibly just once) and you can have that process save information in a file or shared memory so the client can check progress.
Not allowing the background process from being started too many times will save your server memory / CPU... otherwise it will become unresponsive.
And that way you do not have to worry too much about Apache killing your long process since there is no more timeout concerns with it.