Terminate NSTask even if app crashes - objective-c

If my app crashes I don't get a chance to terminate the NSTasks it spawned, so they stay around eating up resources.
Is there any way to launch a task such that it terminates when your app terminates (even if it crashes)?

I suppose you need to handle application crashes manually and in a different way to terminate spawned processes. For example, you can check following article http://cocoawithlove.com/2010/05/handling-unhandled-exceptions-and.html and in exception/signal handler when the application crashes send terminate signal to your child processes using kill(pid, SIGKILL), but for this you need also to keep the pid of child processes (NSTask - (int)processIdentifier) somewhere to get it from exception/signal handler.

What I've done in the past is create a pipe in the parent process, and pass the write end of that pipe into the child. The parent never closes the read end, and the child watches the write end to close. If the write end ever closes, that means the parent exited. You'll also need to mark parent's end of the pipe to close on exec.

I actually wrote a program / script / whatever that does just this… Here's the shell script that was the foundation of it… The project actually implements it within X-code as a single file executable.. weird that apple makes this so precarious, IMO.
#!/bin/bash
echo "arg1 is the SubProcess: $1, arg2 is sleepytime: $2, and arg3 is ParentPID, aka $$: $3"
CHILD=$1 && SLEEPYTIME=$2 || SLEEPYTIME=10; PARENTPID=$3 || PARENTPID=$$
GoSubProcess () { # define functions, start script at very end.
$1 arguments & # "&" puts SubP in background subshell
CHILDPID=$! # what all the fuss is about.
if kill -0 $CHILDPID; then # rock the cradle to make sure it aint dead
echo "Child is alive at $!" # glory be to god
else echo "couldnt start child. dying."; exit 2; fi
babyRISEfromtheGRAVE # keep an eye on child process
}
babyRISEfromtheGRAVE () {
echo "PARENT is $PARENTPID"; # remember where you came from, like j.lo
while kill -0 $PARENTPID; do # is that fount of life, nstask parent alive?
echo "Parent is alive, $PARENTPID is it's PID"
sleep $SLEEPTIME # you lazy boozehound
if kill -0 $CHILDPID; then # check on baby.
echo "Child is $CHILDPID and is alive."
sleep $SLEEPTIME # naptime!
else echo "Baby, pid $CHILDPID died! Respawn!"
GoSubProcess; fi # restart daemon if it dies
done # if this while loop ends, the parent PID crashed.
logger "My Parent Process, aka $PARENTPID died!"
logger "I'm killing my baby, $CHILDPID, and myself."
kill -9 $CHILDPID; exit 1 # process table cleaned. nothing is left. all three tasks are dead. long live nstask.
}
GoSubProcess # this is where we start the script.
exit 0 # this is where we never get to

You could have your tasks periodically check to see if their parent process still exists.

Related

Why i can't kill Erlang process?

I am spawning 2 processes and it seems i can not kill either of them:
restarter - process that spawns the worker whenever it goes down
worker -process that gets messages from the shell, concatenates them and returns them in the reason of an exit to the restarter which in turn forwards them to the shell.
The worker process can't be killed since the restarter would restart it on any trap exit message. But what keeps the restarter process alive?
-module(mon).
-compile_flags([debug_info]).
-export([worker/1,init/0,restarter/2,clean/1]).
% ctrl+g
init()->
Pid=spawn(?MODULE,restarter,[self(),[]]),
register(restarter,Pid),
Pid.
restarter(Shell,Queue)->
process_flag(trap_exit,true),
Wk=spawn_link(?MODULE,worker,[Queue]),
register(worker,Wk),
receive
{'EXIT',Pid,{Queue,normal}}->Shell ! {Queue,"From res: worker died peacefully, wont restart"};
{'EXIT',Pid,{Queue,horrible}} ->
Shell ! {Queue,"Processed so far:"},
Shell ! "will restart in 5 seconds, select fresh/stale -> 1/0",
receive
1 ->
Shell ! "Will restart fresh",
restarter(Shell,[]);
0 ->Shell ! "Will continue work",
restarter(Shell,Queue)
after 5000 ->
Shell ! "No response -> started with 666",
restarter(Shell,[666])
end;
{MSG}->Shell ! {"Unknown message...closing",MSG}
end.
worker(Queue)->
receive
die->exit({Queue,horrible});
finish->exit({Queue,normal});
MSG->worker([{time(),MSG}|Queue])
end.
Usage
mon:init().
regs(). %worker and restarter are working
whereis(worker) ! "msg 1", whereis(worker) ! "msg2".
whereis(worker) ! finish.
flush(). % should get the first clause from restarter
regs(). % worker should be up and running again
exit(whereis(restarter),reason).
regs(). % restarter should be dead
In this scenario, the restarter process is trapping exits, so exit(whereis(restarter), reason) doesn't kill it. The exit signal gets converted to a message, and gets put into the message queue of the process:
> process_info(whereis(restarter), messages).
{messages,[{'EXIT',<0.76.0>,reason}]}
The reason it's still in the message queue is that none of the clauses in the receive expression matches this message. The first two clauses are specific to the exit reasons used by the worker process, and the last clause might look like a catch-all clause but it actually isn't - it matches any message that is a tuple with one element. If it were written MSG instead of {MSG}, it would have received the exit reason message, and sent "Unknown message" to the shell.
If you really want to kill the process, use the kill reason:
exit(whereis(restarter), kill).
A kill exit signal is untrappable, even if the process is trapping exits.
Another thing: the first two receive clauses will only match if the worker's queue is empty. That is because it reuses the variable name Queue, so the queue in {'EXIT',Pid,{Queue,normal}} must be equal to the value passed as an argument to the restarter function. In a situation like this, you'd normally use NewQueue or something as the variable in the receive clauses.

Monit false alerts

I am monitoring a java daemon process with PID. Below is the code.
check process SemanticReplication with pidfile "/ngs/app/edwt/opsmonit /monit/scripts/process.pid"
start = "/ngs/app/edwt/scripts/javadaemon/start_daemon.ksh"
stop = "/ngs/app/edwt/scripts/javadaemon/stop_daemon.ksh"
Many times, even though java daemon process is up and running, I get false alert as process not running.
In the next monit check cycle (after a minute), another monit alert triggers as process is up and running.
Can someone help how do we avoid this false alerts ?
Your check statement is to have monit check for the existence of the pid file (which looks weird with the spaces, btw). If there isn't any, it'll send an alert by default and then runs the start directive.
I get around this by having a check process ... matching statement like so:
check process app-pass matching 'Passenger RubyApp: \/home\/app\/app-name\/public'
Essentially, "matching" does the equivalent of ps aux | grep ... which does a better job when I can't rely on a pid file existing, like with a child process.

Get output from EXE launched in Tcl and pause further processing until EXE finishes

I'm launching a single EXE from a Tcl script, and would like to get the output from the EXE and display it using a simple PUTS command to provide user feedback. At the moment, I am launching the EXE in a CMD window where the user can see the progress, and waiting for the EXE to create a file. The first script here works whenever the output file LUC.CSV is created.
file delete -force luc.csv
set cmdStatus [open "| cmd.exe /c start /wait uc.exe"]
while {![file exists "luc.csv"]} {
}
# continue after output file is created
However, sometimes the file is not created, so I can't rely on this method.
I've been trying to get my head around the use of fileevent and pipes, and have tried several incarnations of the script below, but I'm obviously either missing the point or just not getting the syntax right.
puts "starting"
set fifo [open "| cmd.exe /c start uc.exe" r]
fconfigure $fifo -blocking 0
proc read_fifo {fifo} {
puts "calling read_fifo"
if {[gets $fifo x] < 0} {
if {[eof $fifo]} {
close $fifo
}
}
puts "x is $x"
}
fileevent $fifo readable [list read_fifo $fifo]
vwait forever
puts"finished"
Any help would be greatly appreciated!
If you just want to launch a subprocess and do nothing else until it finishes, Tcl's exec command is perfect.
exec cmd.exe /c start /wait uc.exe
(Since you're launching a GUI application via start, there won't be any meaningful result unless there's an error in launching. And in that case you'll get a catchable error.) Things only get complicated when you want to do several things at once.
To make your original code work, you have to understand that the subprocess has finished. Tcl's just vwaiting forever because your code says to do that. We need to put something in to make the wait finish. A good way is to make the wait be for something to happen to the fifo variable, which can be unset after the pipe is closed as it no longer contains anything useful. (vwait will become eligible to return once the variable it is told about is either written to or destroyed; it uses a variable trace under the covers. It also won't actually return until the event handlers it is currently processing return.)
puts "starting"
# ***Assume*** that this code is in the global scope
set fifo [open "| cmd.exe /c start uc.exe" r]
fconfigure $fifo -blocking 0
proc read_fifo {} {
global fifo
puts "calling read_fifo"
if {[gets $fifo x] < 0} {
if {[eof $fifo]} {
close $fifo
unset fifo
}
}
puts "x is $x"
}
fileevent $fifo readable read_fifo
vwait fifo
puts "finished"
That ought to work. The lines that were changed were the declaration of read_fifo (no variable passed in), the adding of global fifo just below (because we want to work with that instead), the adding of unset fifo just after close $fifo, the setting up of the fileevent (don't pass an extra argument to the callback), and the vwait (because we want to wait for fifo, not forever).

Sun Grid Engine resubmit job stuck in 'Rq' state

I have what I hope is a pretty simple question, but I'm not super familiar with Sun Grid, so I've been having trouble finding the answer. I am currently submitting jobs to a grid using a bash submission script that generates a command and then executes it. I have read online that if a sun grid job exits with a code of 99, it gets re-submitted to the grid. I have successfully written my bash script to do this:
[code to generate command, stores in $command]
$command
STATUS=$?
if [[ $STATUS -ne 0 ]]; then
exit 99
fi
exit 0
When I submit this job to the grid with a command that I know has a non-zero exit status, the job does indeed appear to be resubmitted, however the scheduler never sends it to another host, instead it just remains stuck in the queue with the status "Rq":
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
2150015 0.55500 GridJob.sh my_user Rq 04/08/2013 17:49:00 1
I have a feeling that this is something simple in the config options for the queue, but I haven't been able to find anything googling. I've tried submitting this job with the qsub -r y option, but that doesn't seem to change anything.
Thanks!
Rescheduled jobs will only get run in queues that have their rerun attribute (FALSE by default) set to TRUE, so check your queue configuration (qconf -mq myqueue). Without this, your job remains in the rescheduled-pending state indefinitely because it has nowhere to go.
IIRC, submitting jobs with qsub -r yes only qualifies them for automatic rescheduling in the event of an exec node crash, and that exiting with status 99 should trigger a reschedule regardless.

cron script to act as a queue OR a queue for cron?

I'm betting that someone has already solved this and maybe I'm using the wrong search terms for google to tell me the answer, but here is my situation.
I have a script that I want to run, but I want it to run only when scheduled and only one at a time. (can't run the script simultaneously)
Now the sticky part is that say I have a table called "myhappyschedule" which has the data I need and the scheduled time. This table can have multiple scheduled times even at the same time, each one would run this script. So essentially I need a queue of each time the script fires and they all need to wait for each one before it to finish. (sometimes this can take just a minute for the script to execute sometimes its many many minutes)
What I'm thinking about doing is making a script that checks myhappyschedule every 5 min and gathers up those that are scheduled, puts them into a queue where another script can execute each 'job' or occurrence in the queue in order. Which all of this sounds messy.
To make this longer - I should say that I'm allowing users to schedule things in myhappyschedule and not edit crontab.
What can be done about this? File locks and scripts calling scripts?
add a column exec_status to myhappytable (maybe also time_started and time_finished, see pseudocode)
run the following cron script every x minutes
pseudocode of cron script:
[create/check pid lock (optional, but see "A potential pitfall" below)]
get number of rows from myhappytable where (exec_status == executing_now)
if it is > 0, exit
begin loop
get one row from myhappytable
where (exec_status == not_yet_run) and (scheduled_time <= now)
order by scheduled_time asc
if no such row, exit
set row exec_status to executing_now (maybe set time_started to now)
execute whatever command the row contains
set row exec_status to completed
(maybe also store the command output/return as well, set time_finished to now)
end loop
[delete pid lock file (complementary to the starting pid lock check)]
This way, the script first checks if none of the commands is running, then runs first not-yet run command, until there are no more commands to be run at the given moment. Also, you can see what command is executing by querying the database.
A potential pitfall: if the cron script is killed, a scheduled task will remain in "executing_now" state. That's what the pid lock at beginning and end is for: to see if the cron script terminated properly. pseudocode of create/check pidlock:
if exists pidlockfile then
check if process id given in file exists
if not exists then
update myhappytable set exec_status = error_cronscript_died_while_executing_this
where exec_status == executing_now
delete pidlockfile
else (previous instance still running)
exit
endif
endif
create pidlockfile containing cron script process id
You can use the at(1) command inside your script to schedule its next run. Before it exits, it can check myhappyschedule for the next run time. You don't need cron at all, really.
I came across this question while researching for a solution to the queuing problem. For the benefit of anyone else searching here is my solution.
Combine this with a cron that starts jobs as they are scheduled (even if they are scheduled to run at the same time) and that solves the problem you described as well.
Problem
At most one instance of the script should be running.
We want to cue up requests to process them as fast as possible.
ie. We need a pipeline to the script.
Solution:
Create a pipeline to any script. Done using a small bash script (further down).
The script can be called as
./pipeline "<any command and arguments go here>"
Example:
./pipeline sleep 10 &
./pipeline shabugabu &
./pipeline single_instance_script some arguments &
./pipeline single_instance_script some other_argumnts &
./pipeline "single_instance_script some yet_other_arguments > output.txt" &
..etc
The script creates a new named pipe for each command. So the above will create named pipes: sleep, shabugabu, and single_instance_script
In this case the initial call will start a reader and run single_instance_script with some arguments as arguments. Once the call completes, the reader will grab the next request off the pipe and execute with some other_arguments, complete, grab the next etc...
This script will block requesting processes so call it as a background job (& at the end) or as a detached process with at (at now <<< "./pipeline some_script")
#!/bin/bash -Eue
# Using command name as the pipeline name
pipeline=$(basename $(expr "$1" : '\(^[^[:space:]]*\)')).pipe
is_reader=false
function _pipeline_cleanup {
if $is_reader; then
rm -f $pipeline
fi
rm -f $pipeline.lock
exit
}
trap _pipeline_cleanup INT TERM EXIT
# Dispatch/initialization section, critical
lockfile $pipeline.lock
if [[ -p $pipeline ]]
then
echo "$*" > $pipeline
exit
fi
is_reader=true
mkfifo $pipeline
echo "$*" > $pipeline &
rm -f $pipeline.lock
# Reader section
while read command < $pipeline
do
echo "$(date) - Executing $command"
($command) &> /dev/null
done