Is there a simple way to make mrJob scripts interruptable? Pretty simple question, but it makes a big difference for debugging. I'm mainly interested in canceling python-only test jobs, because this is where most debugging happens.
python my_mr_script.py my-mr-input.txt
You've got a few easy options:
Send SIGQUIT (that's ctrl-\ in your terminal). Most processes don't bother catching SIGQUIT, so it's one strategy to deal with something that has a less-than-awesome SIGINT (aka ctrl-c) handler.
ctrl-z + kill -9 %1 - aka, the nuclear option. Even more rare than catching SIGQUIT is catching SIGTSTP. That's what your shell uses to do job control. In this case you're suspending the job and then sending it SIGKILL, which it cannot catch.
N.b. that the %1 above is a bash-ism and it assumes this is the only job running.
Related
Recently I found myself several times in situations where I need to let run some operation in some background xterm and I'd need to be notified when my input is requested.
I know how to make it so I'm notified when the command ends, but that doesn't help in the cases where the command is not 100% batch (it puts up a prompt every now and then; a common example would be apt-get) or where the command hangs (because of some network failure, for example).
So I'd like to be notified when there's been no output in the last N minutes. Is there some way to configure xterm to do that for me, or maybe some other tool (screen maybe) that could do it?
xterm doesn't notice if the application is actually waiting for input, or simply doing nothing. An application (or shell) could be modified to do this, but that seems like a lot more work than you expected (i.e., many programs could be modified).
I also don't know of a way how to do it for applications that might be waiting for input, but if you have a batch application that should always output log info within a certain time span then you could run an extra process that does the notification if it doesn't get killed within a timeout. The process gets killed whenever a new line is read. Maybe that will help you or someone else to adapt it to processes that might wait for input:
i=0;{ while true;do echo $i;((i++));sleep $i;done }|while read line;do if [ $pid ];then sudo kill $pid;fi;bash -c 'sleep 5;notify-send boom'& pid=$!;echo $line;done
The part before the pipe sign is a process that outputs slower and slower and if it becomes slower than the threshold, notify-send sends notifications. If you wanted output to happen within 3 minutes, use sleep 3m.
First I run this query to see the running queries:
select * from pg_stat_activity;
then I run this query to stop them:
SELECT pg_cancel_backend(pid);
but, when I run the pg_stat_activity again, it still shows all the queries!
why it didn't kill the queries?
A number of possible explanations:
You're not looking at an active query, the query text is just the last query that ran on a currently-idle backed. In that case pg_cancel_backend will do nothing since there's nothing to cancel. Check the state field in pg_stat_activity.
The active query is running in extension code that does not CHECK_FOR_INTERRUPTS() during whatever it is doing. This is most typically the case when you're running some extension that does lots of CPU, I/O or network activity using its own libraries, sockets, etc. Particularly things like PL/Perl, PL/Python, etc.
The active query is running in PostgreSQL back-end code that doesn't check for interrupts in a long running loop or similar. This is a bug; if you find such a case, report it.
The backend is stuck waiting on a blocking operating system call, commonly disk I/O or a network socket write. It may be unable to respond to a cancel message until that blocking operation ends, but if it receives a SIGTERM its signal handler can usually cause it to bail out, but not always.
In general it's safe to use pg_terminate_backend as a "bigger hammer". SIGTERM as sent by pg_terminate_backend() will often, but not always, cause a backend that can't respond to a cancel to exit.
Do not kill -9 (SIGKILL) a PostgreSQL backend (postgres process). It will cause the whole PostgreSQL server to emergency-restart to protect shared memory safety.
I should use pg_terminate_backend(pid) instead of pg_cancel_backend(pid).
Hi spring batch users,
regarding the documentation http://docs.spring.io/spring-batch/reference/htmlsingle/#d5e1320
"If the process died ("kill -9" or server failure) the job is, of course, not running, but the JobRepository has no way of knowing because no-one told it before the process died."
I try to find and restart the stale job executions by using
Set<JobExecution> jobExecutions = jobExplorer.findRunningJobExecutions(jobName);
...
jobExecution.setStatus(FAILED);
jobExecution.setEndTime(new Date());
jobRepository.update(jobExecution);
jobOperator.restart(jobExecution.getId());
But this seems to be very inconvenient.
1) I have to do this before other (new) jobs could be started.
2) I have to handle multiple instances of running servers so findRunningJobExecutions will not do the trick.
You can find other questions regarding this topic:
https://jira.spring.io/browse/BATCH-2433?jql=project%20%3D%20BATCH%20AND%20status%20%3D%20Open%20ORDER%20BY%20priority%20DESC
Spring Batch after JVM crash
I would love to see a solution to register a "start up clean jobs listener". This will still not fix the problems originated by the multi server environment because spring batch does not know if the JobExecution marked by STARTED is not running on an other instance.
Thanks for any advice
Alex
Your job cannot and should not recover "automatically" from a kill -9 scenario. A kill -9 is treated very differently than you application throwing a caught Exception. The reason for this is that you've effectively pulled the carpet out from under the application without giving it a chance to reach a synchronization point with the database to commit any necessary information to the ExecutionContext or update the job/step status(es). Therefore, the last status touchpoint with the database will remain and the job will still look STARTED.
"OK, fine" you say, "but if I start another execution, I want it to find that STARTED execution, and pick up where it left off." The problem here is that there is no clean way for the application to distinguish a job that is ACTUALLY RUNNING from one that has failed but couldn't up the database. The framework here correctly errs on the side of caution and prevents you from starting a job that already appears running, and this is a GOOD thing.
Why? Because let's assume your job was actually still running and you restarted by accident. As coded, the framework will start to spin up, see your running execution and fail with the following message A job execution for this job is already running. I can't tell you how many times we've been saved by this because someone accidentally launched a job twice!
If you were to implement the listener you suggest, the 2nd execution would instead be allowed to start and you'd have 2 different JVMs repeating the same work, possibly writing to the same files/tables and causing a huge data mess that could be impossible to clean up.
Trust me, in the event the Linux terminal kills your job or your job dies because the connection to the database has been severed, you WANT human eyes on those execution states before you attempt a restart.
Finally, on the off chance you actually wanted to kill you job, you can leverage several other standard patterns for stopping jobs:
Stop via throw Exception
Stop via JobOperator.stop()
I've recently started learning VB.NET and I'm wondering is there an easy way of killing off all processes a VB.NET application uses, for example I've created a form which pings a given IP address, this application creates a process cmd.exe and sends the ping argument, this in turn creates following processes:
cmd.exe
conhost.exe
ping.exe
If I Kill () the main process it kills off cmd.exe but not conhost.exe nor ping.exe, do I need to manually kill these also? By killing off the main process will it not automatically kill off associated processes? If that makes sense. Another thing I don't understand, I tried using Close () but nothing appears to happen, all processes keep on running. I want to be able for a user to close the form and for all associate processes to be closed/killed.
It is much better to use the System.Net.NetworkInformation.Ping class to perform a ping (as Hans Passant mentioned).
In general, if you use proc = System.Diagnostics.Process.Start(...), you should be able to kill the process and its child processes with proc.kill. However, it is possible for a process to launch other processes that will not be immediately terminated with kill. It would be a bad idea to terminate leftover processes manually, for a number of reasons.
I want to write a program, that should be notified by O.S. whenever any running process on that OS dies.
I don't want to myself poll and compare everytime if a previously existing process has died. I want my program to be alerted by OS whenever a process termination happens.
How do I go about it? Some sample code would be very helpful.
PS: Looking for approaches in Java/C++.
Sounds like you want PsSetCreateProcessNotifyRoutine(). See this article to get started:
http://www.codeproject.com/KB/threads/procmon.aspx
Under Unix, you could use the sigchld signal to get notified of the death of the process. This requires, however, that the process being monitored is a child process of the monitoring process.
Under Windows, you might need to have a valid handle to the process. If you spawn the process yourself using CreateProcess, you get the handle for free, otherwise you must acquire by other means. It might then be possible to wait for the process to terminate by calling WaitForSingleObject on the handle.
Sorry, I don't have any example code for this. I am not even sure, that waiting on the process handle under Windows really awaits termination of the process (as opposed to some other "significant" condition, which causes the process handle to enter "signalled" state or something).
I don't have a code sample ready but one idea – on Linux – might be to find out the ID of the process you'd like to watch when first starting your watcher program (e.g. using $ pgrep) and then using inotify to watch /proc/<PID>/ – which gets deleted when the process dies. In contrast to polling, this doesn't cost any significant CPU resources.
Now, procfs is not completely supported by inotify, so I can't guarantee this approach would actually work but it is certainly worth looking into.