simulate nagios notifications - testing

My normal method of testing the notification and escalation chain is to simulate a failure by causing one, for example blocking a port.
But this is thoroughly unsatisfying. I don't want down time recorded in nagios where there was none. I also don't want to wait.
Does anyone know a way to test a notification chain without causing the outage? For example something like this:
$ ./check_notifications_chain <service|host> <time down>
at <x> minutes notification email sent to group <people>
at <2x> minutes notification email sent to group <people>
at <3x> minutes escalated to group <management>
at <200x> rm -rf; shutdown -h now executed.
Extending this paradigm I might make the notification chain a nagios check in itself, but I'll stop here before my brain explodes.
Anyone?

If you only want to verify that the email alerts are working properly, you could create a simple test service, which generates a warning once a day.
test_alert.sh:
#!/bin/bash
date=`date -u +%H%M`
echo $date
echo "Nagios test script. Intentionally generates a warning daily."
if [[ "$date" -ge "1900" && "$date" -le "1920" ]] ; then
exit 1
else
exit 0
fi
commands.cfg:
define command{
command_name test_alert
command_line /bin/bash /usr/local/scripts/test_alert.sh
}
services.cfg:
define service {
host localhost
service_description Test Alert
check_command test_alert
use generic-service
}

This is an old post but maybe my solution can help someone.
I use the plugin "check_dummy" which is in the Nagios plugins pack.
As it says, it is stupid.
See some exemple of how it works :
Usage:
check_dummy <integer state> [optional text]
$ ./check_dummy 0
OK
$ ./check_dummy 2
CRITICAL
$ ./check_dummy 3 salut
UNKNOWN: salut
$ ./check_dummy 1 azerty
WARNING: azerty
$ echo $?
1
I create a file which contain the interger state and the optional text :
echo 0 OKAY | sudo tee /usr/local/nagios/libexec/dummy.txt
sudo chown nagios:nagios /usr/local/nagios/libexec/dummy.txt
With the command :
# Dummy check (notifications tests)
define command {
command_name my_check_dummy
command_line $USER1$/check_dummy $(cat /usr/local/nagios/libexec/dummy.txt)
}
Associated with the service description :
define service {
use generic-service
host_name localhost
service_description Dummy check
check_period 24x7
check_interval 1
max_check_attempts 1
retry_interval 1
notifications_enabled 1
notification_options w,u,c,r
notification_interval 0
notification_period 24x7
check_command my_check_dummy
}
So I just change the contents of the file "dummy.txt" to change the service state :
echo "2 Oups" | sudo tee /usr/local/nagios/libexec/dummy.txt
echo "1 AHHHH" | sudo tee /usr/local/nagios/libexec/dummy.txt
echo "0 Parfait !" | sudo tee /usr/local/nagios/libexec/dummy.txt
This allowed me to debug my notification program.
Hope it helps !

Related

micro:bit & /dev/ttyACM*on GNU/Linux systems

I have micro:bit attached to my laptop on which running Xubuntu 18.04.4 LTS.
After I attached micro:bit an icon appeares on XFCE4 Desktop which can
to use to mount this device to
/media/MyUserName/MICROBIT/
This way I can pair the device 'BBC micro:bit CMSIS-DAP' and my laptop
by using https://python.microbit.org/v/2.0 in my Google Chrome browser.
But in mu-editor I can't do this, can't use neither REPL, nor FILE
because I get this message box:
"Colud not find an attached device
Please make sure the device is plugged into this computer.
It must have a version of MicroPython (or CircuitPython) flashed onto it
before the REPL will work.
Finally, press the device's reset button and wait a few seconds before
trying again."
$ lsusb
ID 0d28:0204 NXP LPC1768
This line above is for the micro:bit attached.
$ ls /dev/ | grep tty
In the output of the command above there is not a /dev/ttyACM0
or other ACM* device out there.
Why is not there such a device /dev/ttyACM* out there?
I suspect mu-editor does not find the device because there is no such
device /dev/ttyACM* out there.
How can I solve the problem for mu-editor?
I use Debian Linux. There are two things you may need to do:
I had to update the firmware on the micro:bits recently to be able to continue using the mu-editor. The instructions on how to do this are here:
[https://microbit.org/get-started/user-guide/firmware/]
Mount the micro:bit. This can be done by double clicking on the 'MICROBIT' shown in e.g. Nautilus, or from the command line using udisksctl. Please find a bash script below called microbit_mount.sh which uses udisksctl to mount and dismount a microbit. To mount a microbit, use the command:
microbit_mount.sh mount
To unmount a microbit, use
microbit_mount.sh unmount
I have these commands aliased to mm amd md.The microbit will appear in /media/MICROBIT. You may need to remount the microbit after each flash.
#!/bin/bash
# microbit_mount.sh
# mount and unmount microbit
# modified from https://askubuntu.com/questions/342188/how-to-auto-mount-from-command-line
BASEPATH="/media/$(whoami)/"
MICRO="MICROBIT"
if [ $# -eq 0 ]
then
echo "no argument supplied, use 'mount' or 'unmount'"
exit 1
fi
if [ $1 == "--help" ]
then
echo "mounts or unmounts a BBC micro:bit"
echo "args: mount - mount the microbit, unmout - unmount the microbit"
fi
# how many MICRO found in udisksctl dump
RESULTS=$(udisksctl dump | grep IdLabel | grep -c -i $MICRO)
case "$RESULTS" in
0 ) echo "no $MICRO found in 'udkisksctl dump'"
exit 0
;;
1 ) DEVICELABEL=$(udisksctl dump | grep IdLabel | grep -i $MICRO | cut -d ":" -f 2 | sed 's/^[ \t]*//')
DEVICE=$(udisksctl dump | grep -i "IdLabel: \+$DEVICELABEL" -B 12 | grep " Device:" | cut -d ":" -f 2 | sed 's/^[ \t]*//')
DEVICEPATH="$BASEPATH""$DEVICELABEL"
echo "found one $MICRO, device: $DEVICE"
if [[ -z $(mount | grep "$DEVICE") ]]
then
echo "$DEVICELABEL was unmounted"
if [ $1 == "mount" ]
then
udisksctl mount -b "$DEVICE"
exit 0
fi
else
echo "$DEVICELABEL was mounted"
if [ $1 == "unmount" ]
then
udisksctl unmount -b "$DEVICE"
exit 0
fi
fi
;;
* ) echo "more than one $MICRO found"
;;
esac
echo "exiting without doing anything"
I installed Xubuntu 20.04 and on this system mu-editor works in the Files mode and REPL mode with the attached micro:bit.

Piping the output of ssh sudo

I sometimes have a need to run commands as root on a remote server, and parse the output of the command on my local server. The remote server does not allow root login by ssh, but has sudo configured in a way that requires a password. A simplified example of what I need to do is
ssh remote sudo echo bar | tr bar foo
(Obviously in this simplified example, there's no good reason to need to run echo on a different machine to tr: this is just a toy example to explain what I'm trying to do.)
If I run the command above, I get an error that sudo has no way to prompt for a password:
richard#local:~$ ssh remote sudo echo bar | tr bar foo
sudo: no tty present and no askpass program specified
One way I can try to fix this is by adding the -t option to ssh. If I do that, sudo does wait for and accept a password, but the output of ssh's pseudo-terminal goes to stdout, meaning the sudo prompt message is piped to tr and not displayed to the user. If the user doesn't know sudo is waiting for a password, they will think the script has hung, and passing the prompt message to the pipe probably breaks further processing:
richard#local:~$ ssh -t remote sudo echo bar | tr bar foo
[sudo] posswood foo oichood:
foo
(This admittedly silly example shows the prompt has been processed by the tr command the output is piped to.)
The other way I can see to try to fix this is by adding the -S option to sudo. If I do that, sudo prompts on stderr for the password, so the prompt is not passed down the pipeline. That's good, but sudo also accepts the password on standard input meaning it's echoed to the terminal where anyone looking over the user's shoulder can read it:
richard#local:~$ ssh remote sudo -S echo bar | tr bar foo
[sudo] password for richard: p8ssw0rd
foo
I've found inelegant ways of working around the problems with these two options, but my workarounds hit a problem if the user gets their password wrong the first time. That in itself is a problem. Examples of this are:
richard#local:~$ echo "[sudo] password for $USER:"; \
ssh -t remote sudo echo bar | tail +2 | tr bar foo
richard#local:~$ (read -s password; echo $password; echo >&2) \
| ssh remote sudo -S echo bar | tr bar foo
I'm sure there must be a good solution to this, as it doesn't seem an uncommon thing to want to do. Any ideas?
The best solution I've come up with is to use sudo -S and disable local echo so the password isn't shown as you type it:
$ { stty -echo; ssh remote sudo -S echo hello; stty echo; echo 1>&2; }
[sudo] password for user:
hello
This leaves sudo in charge of the password prompting, so it works properly if the user types the password wrong.
I don't think any solution using ssh -t can ever work properly, since it combines stderr and stdout.

Monit wait for a file to start a process?

Basically the monit to start a process "CAD" when a file "product_id" is ready. My config is as below:
check file product_id with path /etc/platform/product_id
if does not exist then alert
check process cad with pidfile /var/run/cad.pid
depends on product_id
start = "/bin/sh -c 'cd /home/root/cad/scripts;./run-cad.sh 2>&1 | logger -t CAD'" with timeout 120 seconds
stop = "/bin/sh -c 'cd /home/root/cad/scripts;./stop-cad.sh 2>&1 | logger -t CAD'"
I’m expecting “monit” to call “start” until the file is available. But it seems it restarted the process (stop and start) every cycle.
Is there anything configured wrong here?
Appreciate any help.
The reason it's restarting every cycle is because the product_id file is not ready. Anything that depends on product_id will be restarted if the check fails.
I would suggest writing a script that checks for the existence of product_id and starts CAD if it's there. You could then run this script from a "check program" block in monit.
This is how I do it:
check program ThisIsMyProgram with path "/home/user/program_check.sh"
every 30 cycles
if status == 1 then alert
This will run the shell script, and error if status = 1.
Shell script:
#!/bin/bash
FILE=/path/to/file/that/needs/to/exist.json
PID=$(sudo pidof ThisIsMyProgram)
if [ -s $FILE ]; then
if [ ! -z "$PID" ];then
exit 0
else
sudo service thisismyprogram start 2>&1 >> /dev/null
exit 1
fi
else
exit 0
fi
Shell script checks if file exist, if it does it will start process and keep it running.

cannot kill process in FreeBSD

I have a script in FreeBSD 10.1 release, it's purpose is to monitor another process and keep the process alive.
When I try to kill itself, it always fail.
I try killall [name | pid]; pkill -9 [name]; service watchtas stop, none of them work.
Below is my script, please advise the solution.
#!/bin/sh
. /etc/rc.subr
prog="Thin-Agent WatchDog"
TAS_BIN="/etc/supermicro/tas-freebsd.x86_64"
TAS_LOG="/etc/supermicro/tas_system_crush.log"
monitor=1
name="watchtas"
rcvar=${name}_enable
command=/etc/rc.d/{$name}
start_cmd="watchdog"
stop_cmd="stop_watching"
load_rc_config $name
recover_tas() {
$TAS_BIN -agent start-service
RETVAl=$?
return $RETVAL
}
stop_watching() {
monitor=0
}
watchdog() {
while [ $monitor == 1 ]
do
tas_count=`ps -x | grep tas-freebsd.x86_64 | grep -v grep | wc -l | sed 's/ *//g'`
if [ $tas_count -eq 0 ]; then
timestamp=`date`
echo "[$timestamp]TAS shutdown unexpectedly, restarting TAS now..." >> $TAS_LOG
echo $?
recover_tas
else
sleep 10
fi
done
}
run_rc_command "$1"
Your start-up script fails in a couple of respects. service watchtas start does not return to the command line because the daemon process does not detach. service watchtas stop does not work as required because the variable monitor is local to the executing script.
I would separate the start-up script and the watchdog code into separate files and use daemon(8) to monitor the watchdog.
The /usr/local/etc/rc.d start-up script would look like this:
#!/bin/sh
. /etc/rc.subr
name="watchtas"
rcvar=${name}_enable
pidfile="/var/run/${name}.pid"
command="/usr/sbin/daemon"
command_args="-c -f -P ${pidfile} -r /usr/local/sbin/${name}"
load_rc_config $name
run_rc_command "$1"
The /usr/local/sbin/watchtas watchdog code would look something like this:
#!/bin/sh
TAS_BIN="/etc/supermicro/tas-freebsd.x86_64"
TAS_LOG="/etc/supermicro/tas_system_crush.log"
recover_tas() {
$TAS_BIN -agent start-service
RETVAl=$?
return $RETVAL
}
while true
do
tas_count=`ps -x | grep tas-freebsd.x86_64 | grep -v grep | wc -l | sed 's/ *//g'`
if [ $tas_count -eq 0 ]; then
timestamp=`date`
echo "[$timestamp]TAS shutdown unexpectedly, restarting TAS now..." >> $TAS_LOG
echo $?
recover_tas
else
sleep 10
fi
done
It seems you have a daemon watching a daemon watching a daemon.

glassfish start script fails through crontab

I have a created a script to check to see if my glassfish server is running (installed on a freebsd system), if it isn't, the script attempts to kill the java process to ensure it's not hung, and then issues the asadmin start-domain command
If this script runs from the command line it is successful 100% of the time. When it is run from the cron tab, every line runs except the asadmin start-domain line - it does not seem to execute or at least does not complete, i.e. the server is not running after this script runs.
For anyone not familiar with glassfish or the asadmin utility used to start the server, it is my understanding that a forked process is used. could this be causing a problem via cron?
Again, in all my tests today, the script runs to completion when run from the command line. Once it's executed through the cron, it does not complete... what would be different running this from the crontab???
thanks in advance for any help... i'm pulling my hair out trying to make this work!
#!/bin/bash
JAVA_HOME=/usr/local/diablo-jdk1.6.0/; export JAVA_HOME
timevar=`date +%d-%m-%Y_%H.%M.%S`
process_name='java'
get_contents=`cat urls.txt`
for i in $get_contents
do
echo checking $i
statuscode=$(curl --connect-timeout 10 --write-out %{http_code} --silent --output /dev/null $i)
case $statuscode in
200)
echo "$timevar $i $statuscode okay" >> /usr/home/user1/logfile.txt
;;
*)
echo "$timevar $i $statuscode bad" >> /usr/home/user1/logfile.txt
echo "Status $statuscode found" | mail -s "Check of $i failed" some.address#gmail.com
process_id=`ps acx | grep -i $process_name | awk {'print $1'}`
if [ -z "$process_id" ]
then
echo "java wasn't found in the process list"
else
echo "Killing java, currently process $process_id"
kill -9 $process_id
fi
/usr/home/user1/glassfish3/bin/asadmin start-domain domain1
;;
esac
done
Also, just for completeness, here is the entry in the cron tab:
*/2 * * * * /usr/home/user1/server.check.sh >> /usr/home/user1/cron.log
Ok... found the answer to this on another site, but I thought I'd add the answer in here for future reference.
The problem was the PATH!! even though java_home was set, java itself wasn't in the path for the cron daemon.
A quick test to see what path is available to your cron, add this line:
*/2 * * * * env > /usr/home/user1/env.output
From what I can gather, the PATH initially available to cron is pretty minimal. Since java was in /usr/local/bin, i added that to the path right in the crontab and kaboom! it worked!
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
*/2 * * * * /usr/home/user1/server.check.sh >> /usr/home/user1/cron.log