monit insists on timing out a program that's running fine - monit

I'm having a problem monitoring a program using monit.
I'm running this on a raspberry pi, having built monit 5.11 from source; I tried using the version from the repositories, but it was 5.4 and didn't support some of syntax below that I want.
I'm trying to follow the "Q: I have a program that does not create its own pid file. Since monit requires all programs to have a pid file, what do I do?" entry in the FAQ.
Here's my start_sensors.sh script (which just runs my python program, instead of
the java program in the wiki example):
#!/bin/bash
case $1 in
start)
echo $$ > /var/run/start_sensors.pid;
exec 2>&1 /usr/bin/python /home/pi/temperature/post_temps.py 1>/tmp/post_temps.out
;;
stop)
kill `cat /var/run/start_sensors.pid` ;;
*)
echo "usage: start_sensors {start|stop}" ;;
esac
exit 0
Here's my /etc/monit/monitrc entry:
# Run temperature sensor monitor
check process start_sensors.sh with pidfile /var/run/start_sensors.pid
start = "/home/pi/temperature/start_sensors.sh start"
stop = "/home/pi/temperature/start_sensors.sh stop"
The output in the monit log looks like:
[EST Jan 24 14:21:16] info : 'raspberrypi' Monit reloaded
[EST Jan 24 14:21:16] error : 'start_sensors.sh' process is not running
[EST Jan 24 14:21:16] info : 'start_sensors.sh' trying to restart
[EST Jan 24 14:21:16] info : 'start_sensors.sh' start: /home/pi/temperature/start_sensors. sh
[EST Jan 24 14:21:46] error : 'start_sensors.sh' failed to start (exit status -1) -- Program /home/pi/temperature/start_sensors.sh timed out
So as you can see, monit starts up the program, it runs fine, and then monit kills it thirty seconds later due to the "timeout".
My program is running fine, and producing the proper output
that I'm sending to the /tmp/post_temps.out file.
I don't understand why monit is timing the program out... it's supposed to be a long-running process!
I've tried changing the start_sensors.sh script so that it puts the program in the background (and has it write its own /var/run/start_sensors.pid file), but then monit starts a new instance up every thirty seconds or so, not stopping the old ones, and writing over the pid file. It's like it's not even looking at the pid file.
THANKS!

The following works:
#!/bin/bash
case $1 in
start)
/usr/bin/python /home/pi/temperature/post_temps.py 1>/tmp/post_temps.out &
echo $! > /var/run/start_sensors.pid ;
;;
stop)
kill `cat /var/run/start_sensors.pid` ;;
*)
echo "usage: start_sensors {start|stop}" ;;
esac
exit 0

Related

Using tail to monitor an active logging file

I'm running multiple 'shred' commands on multiple hard drives in a workstation. The 'shred' commands are all run in the background in order to run the commands concurrently. The output of each 'shred' is redirected to a text file, and I also have the output directed to the terminal as well. I'm using tail to monitor the log file for errors, and halt the script if any are encountered. If there are no errors, the script should simply continue on to conclusion. When I test it by forcing a drive failure (disconnecting a drive), it detects the I/O errors and the script halts as expected. The problem I'm having is that when there are NO errors, I cannot get 'tail' to terminate once the 'shred' commands have completed, and the script just hangs at that point. Since I put the 'tail' command in the 'while' loop below, I would have thought that 'tail' would continue to run as long as the 'shred' processes were running, but would then halt after the 'shred' processes stopped, thus ending the 'while' loop. But that hasn't been the case. The script still hangs even after the 'shred' processes have ended. If I go to another terminal window while the script is "hangiing," and kill the 'tail' process, the script continues as normal. Any ideas how to get the 'tail' process to end when the 'shred' processes are gone?
My code:
shred -n 3 -vz /dev/sda 2>&1 | tee -a logfile &
shred -n 3 -vz /dev/sdb 2>&1 | tee -a logfile &
shred -n 3 -vz /dev/sdc 2>&1 | tee -a logfile &
pids=$(pgrep shred)
while kill -0 $pids 2> /dev/null; do
tail -qn0 -f logfile | \
read LINE
echo "$LINE" | grep -q "error"
if [ $? = 0 ]; then
killall shred > /dev/null 2>&1
echo "Error encountered. Halting."
exit
fi
done
wait $pids
There is other code after the 'wait' that does other stuff, but this is where the script is hanging
Not directly related to the question, but you can use Daggy - Data Aggregation Utility
In this case, all subprocesses will be end with main daggy process.

Monit wait for a file to start a process?

Basically the monit to start a process "CAD" when a file "product_id" is ready. My config is as below:
check file product_id with path /etc/platform/product_id
if does not exist then alert
check process cad with pidfile /var/run/cad.pid
depends on product_id
start = "/bin/sh -c 'cd /home/root/cad/scripts;./run-cad.sh 2>&1 | logger -t CAD'" with timeout 120 seconds
stop = "/bin/sh -c 'cd /home/root/cad/scripts;./stop-cad.sh 2>&1 | logger -t CAD'"
I’m expecting “monit” to call “start” until the file is available. But it seems it restarted the process (stop and start) every cycle.
Is there anything configured wrong here?
Appreciate any help.
The reason it's restarting every cycle is because the product_id file is not ready. Anything that depends on product_id will be restarted if the check fails.
I would suggest writing a script that checks for the existence of product_id and starts CAD if it's there. You could then run this script from a "check program" block in monit.
This is how I do it:
check program ThisIsMyProgram with path "/home/user/program_check.sh"
every 30 cycles
if status == 1 then alert
This will run the shell script, and error if status = 1.
Shell script:
#!/bin/bash
FILE=/path/to/file/that/needs/to/exist.json
PID=$(sudo pidof ThisIsMyProgram)
if [ -s $FILE ]; then
if [ ! -z "$PID" ];then
exit 0
else
sudo service thisismyprogram start 2>&1 >> /dev/null
exit 1
fi
else
exit 0
fi
Shell script checks if file exist, if it does it will start process and keep it running.

Cant Terminate process which is launched at bootup with at daemon

I have fooinit.rt process launched at boot (/etc/init.d/boot.local)
Here is boot.local file
...
/bin/fooinit.rt &
...
I create an order list at job in order to kill fooinit.rt. that is Triggered in C code
and I wrote a stop script (in)which kill -9 pidof fooinit.rt is written
Here is stop script
#!/bin/sh
proc_file="/tmp/gdg_list$$"
ps -ef | grep $USER > $proc_file
echo "Stop script is invoked!!"
suff=".rt"
pid=`fgrep "$suff" $proc_file | awk '{print $2}'`
echo "pid is '$pid'"
rm $proc_file
When at job timer expires 'kill -9 pid'( of fooinit.rt) command can not terminate fooinit.rt process!!
I checked pid number printed and the sentence "Stop script is invoked!!" is Ok !
Here is "at" job command in C code (I verified that the stop scriptis is called after 1 min later)
...
case 708: /* There is a trigger signal here*/
{
result = APP_RES_PRG_OK;
system("echo '/sbin/stop' | at now + 1 min");
}
...
On the other hand, It works properly in case launching fooinit.rt manually from shell as a ordinary command. (not from /etc/init.d/boot.local). So kill -9 work and terminates fooinit.rt process
Do you have any idea why kill -9 can not terminate foo.rt process if it is launched from /etc/init.d/boot.local
Your solution is built around a race condition. There is no guarantee it will kill the right process (an unknowable amount of time can pass between the ps call and the attempt to make use of the pid), plus it's also vulnerable to a tmp exploit: someone could create a few thousand symlinks under /tmp called "gdg_list[1-32767]" that point to /etc/shadow and your script would overwrite /etc/shadow if it runs as root.
Another potential problem is the setting of $USER -- have you made sure it's correct? Your at job will be called as the user your C program runs as, which may not be the same user your fooinit.rt runs as.
Also, your script doesn't include a kill command at all.
A much cleaner way of doing this would be to run your fooinit.rt under some process supervisor like runit and use runit to shut it down when it's no longer needed. That avoids the pid bingo as well as the /tmp attack vector.
But even using pkill -u username -f fooinit.rt would be less racy than the script you provided.

Run a php script in background on debian (Apache)

I'm trying to make a push notification work on my debian vps (apace2, mysql).
I use a php script from this tutorial (http://www.raywenderlich.com/3525/apple-push-notification-services-tutorial-part-2).
Basically, the script is put in an infintive loop, that check a mysql table for new records every couple of seconds. The tutorial says it should be run as a background process.
// This script should be run as a background process on the server. It checks
// every few seconds for new messages in the database table push_queue and
// sends them to the Apple Push Notification Service.
//
// Usage: php push.php development &
So I have four questions.
How do I start the script from the terminal? What should I type? The script location on the server is:
/var/www/development_folder/scripts/push2/push.php
How can I kill it if I need to (without having to restart apace)?
Since the push notification is essential, I need a way to check if the script is running.
The code (from the tutorial) calls a function is something goes wrong:
function fatalError($message)
{
writeToLog('Exiting with fatal error: ' . $message);
exit;
}
Maybe I can put something in there to restart the script? But It would also be nice to have a cron job or something that check every 5 minute or so if the script is running, and start it if it doens't.
4 - Can I make the script automatically start after a apace or mysql restart? If the server crash or something else happens that need a apace restart?
Thanks a lot in advance
You could run the script with the following command:
nohup php /var/www/development_folder/scripts/push2/push.php > /dev/null &
The nohup means that that the command should not quit (it ignores hangup signal) when you e.g. close your terminal window. If you don't care about this you could just start the process with "php /var/www/development_folder/scripts/push2/push.php &" instead. PS! nohup logs the script output to a file called nohup.out as default, if you do not want this, just add > /dev/null as I've done here. The & at the end means that the proccess will run in the background.
I would only recommend starting the push script like this while you test your code. The script should be run as a daemon at system-startup instead (see 4.) if it's important that it runs all the time.
Just type
ps ax | grep push.php
and you will get the processid (pid). It will look something like this:
4530 pts/3 S 0:00 php /var/www/development_folder/scripts/push2/push.php
The pid is the first number you'll see. You can then run the following command to kill the script:
kill -9 4530
If you run ps ax | grep push.php again the process should now be gone.
I would recommend that you make a cronjob that checks if the php-script is running, and if not, starts it. You could do this with ps ax and grep checks inside your shell script. Something like this should do it:
if ! ps ax | grep -v grep | grep 'push.php' > /dev/null
then
nohup php /var/www/development_folder/scripts/push2/push.php > /dev/null &
else
echo "push-script is already running"
fi
If you want the script to start up after booting up the system you could make a file in /etc/init.d (e.g. /etc.init.d/mypushscript with something like this inside:
php /var/www/development_folder/scripts/push2/push.php
(You should probably have alot more in this file)
You would also need to run the following commands:
chmod +x /etc/init.d/mypushscript
update-rc.d mypushscript defaults
to make the script start at boot-time. I have not tested this so please do more research before making your own init script!

How to properly manage rabbitmq with supervisord

The current section in my supervisord.conf looks like:
[program:rabbitmq]
command=/usr/sbin/rabbitmq-server
When I try to stop the rabbitmq with supervisord ( supervisorctl stop rabbitmq), the rabbitmq processes simply do not shut down. The rabbitmq documentation also mentions to never use kill but rather use rabbitmqctl stop . I'm guessing supervisord simply kills the processes - hence the poor results with rabbitmq. I couldn't find any options in supervisord to specify a custom stop command.
Do you have any recommendations?
My solution is to write a wrapper script named rabbitmq.sh as follows:
# call "rabbitmqctl stop" when exiting
trap "{ echo Stopping rabbitmq; rabbitmqctl stop; exit 0; }" EXIT
echo Starting rabbitmq
rabbitmq-server
After that, modify supervisord.conf:
[program:rabbitmq]
command=path/to/rabbitmq.sh
You have answered your own question. In normal operation, never use kill on any process unless that is the documented normal way of managing it. In the case of RabbitMQ, the documented process is to use rabbitmqctl stop or to use rabbitmqserver stop.
There is no good reason to manage RabbitMQ with anything more sophisticated than a shell script that makes one attempt to restart via rabbitmqserver start. If that doesn't work right away, then RabbitMQ is down hard due to something like, lack of RAM, ran out of disk space, or a rogue system management tool deleted some of the rabbitmq binary components.
In normal operation RabbitMQ has an internal supervisor that will attempt to shutdown and restart RabbitMQ, so if you delete binaries, it will fail to restart. When using tools like chef, puppet, cfengine, don't repeatedly push out binary package files. Just check that everything is there as it should be.
This script starts RabbitMQ as a background process (using '&') which causes a pid file to be updated/created (see 'wait' under http://www.rabbitmq.com/man/rabbitmqctl.1.man.html).
After rabbit has been started, a loop is used to verify that the pid is still running. If rabbit crashes or is manually shut down (outside of supervisord) then the script will exit with 1 and supervisord takes over.
The echo >> ./rmq.txt file is there for debugging purposes and can be commented out in production (I used this to monitor the startup/shutdown/died status).
supervisord is happy because it can see a running process and an EXIT will trigger the stop_rmq function which calls 'rabbitmqctl stop' for a clean shutdown.
#!/bin/bash
# Script to manage RMQ with supervisord
# Shut down rmq
function stop_rmq {
echo "Stopping RabbitMQ..."
echo "Stopping RabbitMQ..." >> ./rmq.txt
rabbitmqctl stop
echo "RabbitMQ stopped"
echo "RabbitMQ stopped" >> ./rmq.txt
#exit 0
}
# Set up the trap
#trap stop_rabbit TERM KILL HUP INT SIGTERM SIGKILL SIGHUP SIGINT
trap stop_rmq exit
# Start rmq
echo "Starting RabbitMQ..."
echo "Starting RabbitMQ..." >> ./rmq.txt
# Start Rabbitmq in the background (causes the pid file to be updated)
# Note that the pid file location can be overridden with the rmq 'RABBITMQ_PID_FILE' variable
/usr/sbin/rabbitmq-server &
rabbitmqctl wait /var/lib/rabbitmq/mnesia/rabbit#$HOSTNAME.pid
echo "RabbitMQ Started"
echo "RabbitMQ Started" >> ./rmq.txt
while true; do
#ps $(cat /var/lib/rabbitmq/mnesia/rabbit#$HOSTNAME.pid)
ps -o pid,cmd,etime $(cat /var/lib/rabbitmq/mnesia/rabbit#$HOSTNAME.pid)
if (($? > 0)); then
echo "RabbitMQ Died"
echo "RabbitMQ Died" >> ./rmq.txt
exit 1
fi
#echo "Sleeping..."
sleep 10
done
Here's the output generated by the script to supervisord:
foo#bar:/# supervisorctl tail rmq
Starting RabbitMQ...
Waiting for rabbit#a2d2c8f9cad2 ...
pid is 45220 ...
RabbitMQ 3.3.5. Copyright (C) 2007-2014 GoPivotal, Inc.
## ## Licensed under the MPL. See http://www.rabbitmq.com/
## ##
########## Logs: /var/log/rabbitmq/rabbit#a2d2c8f9cad2.log
###### ## /var/log/rabbitmq/rabbit#a2d2c8f9cad2-sasl.log
##########
Starting broker... completed with 0 plugins.
...done.
RabbitMQ Started
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:05
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:15
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:25
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:35
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:45
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 00:55
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 01:05
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 01:15
PID CMD ELAPSED
45220 /usr/lib/erlang/erts-6.1/bi 01:25
I would advise you to use Monit (http://mmonit.com/), it is better suited for daemons such as RabbitMQ and it is also feature rich.
First of all, you must install the Monit package. If you are under Ubuntu/Debian:
sudo apt-get update
sudo apt-get install monit
Afterwards, you must create a configuration script.
Here is a sample script to get you running (place it on /etc/monit/conf.d/):
set daemon 1800
set logfile /var/log/monit.log
check process rabbit with pidfile /var/run/rabbitmq/pid
start program = "/etc/init.d/rabbitmq-server start"
stop program = "/etc/init.d/rabbitmq-server stop"
noalert foo#bar
Then, just restarts monit and you are finished:
sudo /etc/init.d/monit restart