monit call exec on process recovery - monit

What I would like to do, is the following:
if process-x fails (to (re)start) then execute cmd-x
if it recovers then execute cmd-y
For the alerting via E-mail, a notificaton is sent per default on recovery. For the exec method however, I can not find a way to make this work. If I try this in the monitrc:
check process proc_x with pidfile /var/run/proc_x.pid
start program = "/bin/sh -c '/etc/init.d/Sxxproc_x start'"
stop program = "/bin/sh -c '/etc/init.d/Sxxproc_x stop'"
if 3 restarts within 5 cycles then exec "<some error cmd>"
else if succeeded then exec "<some restore cmd>"
this results in a "syntax error 'else'". If I remove the else line, the error command is called as expected. Apparently, the 'else' can not be used for the restarts test. But how can I add to execute a command is program starting succeeds or recovers?

I found a solution thanks to the answer to this topic:
get monit to alert first and restart later
The "if not exist for ..." with corresponding "else" did the trick for me to report the recover. The error report is separate. My monitrc code now:
check process proc_x with pidfile /var/run/proc_x.pid
start program = "/bin/sh -c '/etc/init.d/Sxxproc_x start'"
stop program = "/bin/sh -c '/etc/init.d/Sxxproc_x stop'"
if 1 restart within 1 cycle then exec "<some error cmd>"
repeat every 1 cycle
if not exist for 3 cycles then restart
else if succeeded 2 times within 2 cycles then exec "<some restore cmd>"

Related

Sun Grid Engine resubmit job stuck in 'Rq' state

I have what I hope is a pretty simple question, but I'm not super familiar with Sun Grid, so I've been having trouble finding the answer. I am currently submitting jobs to a grid using a bash submission script that generates a command and then executes it. I have read online that if a sun grid job exits with a code of 99, it gets re-submitted to the grid. I have successfully written my bash script to do this:
[code to generate command, stores in $command]
$command
STATUS=$?
if [[ $STATUS -ne 0 ]]; then
exit 99
fi
exit 0
When I submit this job to the grid with a command that I know has a non-zero exit status, the job does indeed appear to be resubmitted, however the scheduler never sends it to another host, instead it just remains stuck in the queue with the status "Rq":
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
2150015 0.55500 GridJob.sh my_user Rq 04/08/2013 17:49:00 1
I have a feeling that this is something simple in the config options for the queue, but I haven't been able to find anything googling. I've tried submitting this job with the qsub -r y option, but that doesn't seem to change anything.
Thanks!
Rescheduled jobs will only get run in queues that have their rerun attribute (FALSE by default) set to TRUE, so check your queue configuration (qconf -mq myqueue). Without this, your job remains in the rescheduled-pending state indefinitely because it has nowhere to go.
IIRC, submitting jobs with qsub -r yes only qualifies them for automatic rescheduling in the event of an exec node crash, and that exiting with status 99 should trigger a reschedule regardless.

Query remains active while calling a batch file from stored procedure in SQL Server2008

I created the following stored procedure:
CREATE PROCEDURE [dbo].[_StartWebcamStream]
AS
BEGIN
declare #command varchar(200)
set #command = 'C:\startStream.bat'
exec master..xp_cmdshell #command
END
for executing a the batch file startStream.bat. This batch contain the following code:
"C:\Program Files (x86)\VideoLAN\VLC\vlc.exe" -I dummy -vvv rtsp://mycamaddress/live_mpeg4.sdp --network-caching=4096 --sout=#transcode{vcodec=mp4v,fps=15,vb=512,scale=1,height=240,width=320,acodec=mp4a,ab=128,channels=2}:duplicate{dst=http{mux=asf,dst=:11345/},dst=display} :sout-keep}
The batch file is launched correctly, but the query continues to run until the vlc is stopped.
What can I do for stopping the query letting the vlc running?
DISCLAIMER: Don't run these examples in production!
SQL Server watches all processes spawned by xp_cmdshell and waits for them to finish. to see that you can just run this statement:
EXEC xp_cmdshell 'cmd /C "start cmd /K dir"'
This starts a command shell that in-turn starts another command shell to execute the dir command. The first shell terminates right away after spawning the second (/C switch) while the second executes the "dir" and then does not terminate (/K switch). Because of that second lingering process, even though the process that SQL Server started directly is gone, the query won't return. You cannot even cancel it. Even if you close the window in SSMS, the request continues to run. You can check that with
SELECT r.session_id, t.text
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
The NO_OUTPUT parameter does not help either:
EXEC xp_cmdshell 'cmd /C "start cmd /K dir"', NO_OUTPUT;
You will see the same "sticky" behavior.
The only way to get rid of these is to restart the computer or manually kill the processes (requires taskmanager to be executed as administrator). Restarting the SQL Server service does not stop the spawned processes.
As a solution you can use the SQL Agent. It has a "Operating System (CmdExec)" step type that you can use to run your program. You can create the job and then start it using sp_start_job.
Either way, your process actually needs to finish at some point. Otherwise you will create a pile of process-"bodies" that will cause performance problems in the long run.

Bad spawn_id while executing expect command

I am writing a script that will copy Valgrind onto whatever shelf that we enter on the command line. The syntax is as follows:
vgrindCopy [shelf number]
For some reason, the files will copy over without any issue, but after the copy completes the follow error is observed:
bad spawn_id (process died earlier?)
while executing
"expect "#""
Here is a copy of the relevant code:
function login_shelf {
expect -c "
set timeout 15
spawn $1
expect \"password:\"
send \"$PW\r\"
expect \"#\"
sleep 1
exit
"
}
# login and make the valgrind directory at /sfs/software/shelf/current
set -- /opt/swe/tools/ext/gnu/valgrind-3.7.0/i686-linux2.6/lib/valgrind/*
login_shelf "/opt/corp/projects/shelftools/bin/app rsync -Lau $* $shelf:/shelf/valgrind"
After playing around with the code, I found that if I remove the line "expect \"#\"", then the program doesn't copy any of the files over anymore. What odd as well is that I'm seeing the issue when I run the script, but a co-worker is not.
Has anyone had a similar issue and determined the cause? Any help would be greatly appreciated as always!
Your code is spawning the rsync and at the expect \"#\" is waiting for rsync to output a #, which it never does, so it exits and expect reports the error.
When you remove the expect \"#\" the expect script exits, terminating the rsync.
Instead of expect \"#\" you should wait for rsync to exit:
expect eof
wait

Expect script does not work under crontab

I have an expect script which I need to run every 3 mins on my management node to collect tx/rx values for each port attached to DCX Brocade SAN Switch using the command #portperfshow#
Each time I try to use crontab to execute the script every 3 mins, the script does not work!
My expect script starts with #!/usr/bin/expect -f and I am calling the script using the following syntax under cron:
3 * * * * /usr/bin/expect -f /root/portsperfDCX1/collect-all.exp sanswitchhostname
However, when I execute the script (not under cron) it works as expected:
root# ./collect-all.exp sanswitchhostname
works just fine.
Please Please can someone help! Thanks.
The script collect-all.exp is:
#!/usr/bin/expect -f
#Time and Date
set day [timestamp -format %d%m%y]
set time [timestamp -format %H%M]
#logging
set LogDir1 "/FPerf/PortsLogs"
et timeout 5
set ipaddr [lrange $argv 0 0]
set passw "XXXXXXX"
if { $ipaddr == "" } {
puts "Usage: <script.exp> <ip address>\n"
exit 1
}
spawn ssh admin#$ipaddr
expect -re "password"
send "$passw\r"
expect -re "admin"
log_file "$LogDir1/$day-portsperfshow-$time"
send "portperfshow -tx -rx -t 10\r"
expect timeout "\n"
send \003
log_file
send -- "exit\r"
close
I had the same issue, except that my script was ending with
interact
Finally I got it working by replacing it with these two lines:
expect eof
exit
Changing interact to expect eof worked for me!
Needed to remove the exit part, because I had more statements in the bash script after the expect line (calling expect inside a bash script).
There are two key differences between a program that is run normally from a shell and a program that is run from cron:
Cron does not populate (many) environment variables. Notably absent are TERM, SHELL and HOME, but that's just a small proportion of the long list that will be not defined.
Cron does not set up a current terminal, so /dev/tty doesn't resolve to anything. (Note, programs spawned by Expect will have a current terminal.)
With high probability, any difficulties will come from these, especially the first. To fix, you need to save all your environment variables in an interactive session and use these in your expect script to repopulate the environment. The easiest way is to use this little expect script:
unset -nocomplain ::env(SSH_AUTH_SOCK) ;# This one is session-bound anyway
puts [list array set ::env [array get ::env]]
That will write out a single very long line which you want to put near the top of your script (or at least before the first spawn). Then see if that works.
Jobs run by cron are not considered login shells, and thus don't source your .bashrc, .bash_profile, etc.
If you want that behavior, you need to add it explicitly to the crontab entry like so:
$ crontab -l
0 13 * * * bash -c '. .bash_profile; etc ...'
$

cron script to act as a queue OR a queue for cron?

I'm betting that someone has already solved this and maybe I'm using the wrong search terms for google to tell me the answer, but here is my situation.
I have a script that I want to run, but I want it to run only when scheduled and only one at a time. (can't run the script simultaneously)
Now the sticky part is that say I have a table called "myhappyschedule" which has the data I need and the scheduled time. This table can have multiple scheduled times even at the same time, each one would run this script. So essentially I need a queue of each time the script fires and they all need to wait for each one before it to finish. (sometimes this can take just a minute for the script to execute sometimes its many many minutes)
What I'm thinking about doing is making a script that checks myhappyschedule every 5 min and gathers up those that are scheduled, puts them into a queue where another script can execute each 'job' or occurrence in the queue in order. Which all of this sounds messy.
To make this longer - I should say that I'm allowing users to schedule things in myhappyschedule and not edit crontab.
What can be done about this? File locks and scripts calling scripts?
add a column exec_status to myhappytable (maybe also time_started and time_finished, see pseudocode)
run the following cron script every x minutes
pseudocode of cron script:
[create/check pid lock (optional, but see "A potential pitfall" below)]
get number of rows from myhappytable where (exec_status == executing_now)
if it is > 0, exit
begin loop
get one row from myhappytable
where (exec_status == not_yet_run) and (scheduled_time <= now)
order by scheduled_time asc
if no such row, exit
set row exec_status to executing_now (maybe set time_started to now)
execute whatever command the row contains
set row exec_status to completed
(maybe also store the command output/return as well, set time_finished to now)
end loop
[delete pid lock file (complementary to the starting pid lock check)]
This way, the script first checks if none of the commands is running, then runs first not-yet run command, until there are no more commands to be run at the given moment. Also, you can see what command is executing by querying the database.
A potential pitfall: if the cron script is killed, a scheduled task will remain in "executing_now" state. That's what the pid lock at beginning and end is for: to see if the cron script terminated properly. pseudocode of create/check pidlock:
if exists pidlockfile then
check if process id given in file exists
if not exists then
update myhappytable set exec_status = error_cronscript_died_while_executing_this
where exec_status == executing_now
delete pidlockfile
else (previous instance still running)
exit
endif
endif
create pidlockfile containing cron script process id
You can use the at(1) command inside your script to schedule its next run. Before it exits, it can check myhappyschedule for the next run time. You don't need cron at all, really.
I came across this question while researching for a solution to the queuing problem. For the benefit of anyone else searching here is my solution.
Combine this with a cron that starts jobs as they are scheduled (even if they are scheduled to run at the same time) and that solves the problem you described as well.
Problem
At most one instance of the script should be running.
We want to cue up requests to process them as fast as possible.
ie. We need a pipeline to the script.
Solution:
Create a pipeline to any script. Done using a small bash script (further down).
The script can be called as
./pipeline "<any command and arguments go here>"
Example:
./pipeline sleep 10 &
./pipeline shabugabu &
./pipeline single_instance_script some arguments &
./pipeline single_instance_script some other_argumnts &
./pipeline "single_instance_script some yet_other_arguments > output.txt" &
..etc
The script creates a new named pipe for each command. So the above will create named pipes: sleep, shabugabu, and single_instance_script
In this case the initial call will start a reader and run single_instance_script with some arguments as arguments. Once the call completes, the reader will grab the next request off the pipe and execute with some other_arguments, complete, grab the next etc...
This script will block requesting processes so call it as a background job (& at the end) or as a detached process with at (at now <<< "./pipeline some_script")
#!/bin/bash -Eue
# Using command name as the pipeline name
pipeline=$(basename $(expr "$1" : '\(^[^[:space:]]*\)')).pipe
is_reader=false
function _pipeline_cleanup {
if $is_reader; then
rm -f $pipeline
fi
rm -f $pipeline.lock
exit
}
trap _pipeline_cleanup INT TERM EXIT
# Dispatch/initialization section, critical
lockfile $pipeline.lock
if [[ -p $pipeline ]]
then
echo "$*" > $pipeline
exit
fi
is_reader=true
mkfifo $pipeline
echo "$*" > $pipeline &
rm -f $pipeline.lock
# Reader section
while read command < $pipeline
do
echo "$(date) - Executing $command"
($command) &> /dev/null
done