Snakemake explicit handling for Out Of Memory (OOM) failures - snakemake

A Snakemake workflow can re-attempt for each restart after any type of failure, including if the error is of an Out Of Memory (OOM) doing e.g.
def get_mem_mb(wildcards, attempt):
return attempt * 100
rule:
input: ...
output: ...
resources:
mem_mb=get_mem_mb
shell:
"..."
Is there anyway in Snakemake to deal explicitly with a memory-related error, as NextFlow does. e.g. when Exit error is memory related (137 in a LSF system)?
process foo {
memory { 2.GB * task.attempt }
time { 1.hour * task.attempt }
errorStrategy { task.exitStatus in 137..140 ? 'retry' : 'terminate' }
maxRetries 3
script:
<your job here>
}
I could not find this information anywhere,
thanks

I am not sure if there is an explicit way for Snakemake to handle out of memory errors. However, the memory function you have in your code example is what I've done to handle memory issues using Snakemake.
To make use of the function, you need to provide the --rerun-incomplete option when executing Snakemake, so that failed jobs will be rerun. You can control the number of times Snakemake will retry with --restart-times.

Related

`errorStrategy` setting to stop current process but continue pipeline

I have a lot of samples that go through a process which sometimes fail (deterministically). In such a case, I would want the failing process to stop, but all other samples to still get submitted and processed independently.
If I understand correctly, setting errorStrategy 'ignore' will continue the script within the failing process, which is not what I want. And errorStrategy 'finish' would stop submitting new samples, even though there is no reason for the other samples to fail too. And while errorStrategy 'retry' could technically work (by repeating the failing processes while the good ones get through), that doesn't seem like a good solution.
Am I missing something?
If a process can fail deterministically, it might be better to handle this situation somehow. Setting the errorStrategy directive to 'ignore' will mean any processes execution errors are ignored and allow your workflow continue. For example, you might get a process execution error if a process exits with a non-zero exit status or if one or more expected output files are missing. The pipeline will continue, however downstream processes will not be attempted.
Contents of test.nf:
nextflow.enable.dsl=2
process foo {
tag { sample }
input:
val sample
output:
path "${sample}.txt"
"""
if [ "${sample}" == "s1" ] ; then
(exit 1)
fi
if [ "${sample}" == "s2" ] ; then
echo "Hello" > "${sample}.txt"
fi
"""
}
process bar {
tag { txt }
input:
path txt
output:
path "${txt}.gz"
"""
gzip -c "${txt}" > "${txt}.gz"
"""
}
workflow {
Channel.of('s1', 's2', 's3') | foo | bar
}
Contents of nextflow.config:
process {
// this is the default task.shell:
shell = [ '/bin/bash', '-ue' ]
errorStrategy = 'ignore'
}
Run with:
nextflow run -ansi-log false test.nf
Results:
N E X T F L O W ~ version 20.10.0
Launching `test.nf` [drunk_bartik] - revision: e2103ea23b
[9b/56ce2d] Submitted process > foo (s2)
[43/0d5c9d] Submitted process > foo (s1)
[51/7b6752] Submitted process > foo (s3)
[43/0d5c9d] NOTE: Process `foo (s1)` terminated with an error exit status (1) -- Error is ignored
[51/7b6752] NOTE: Missing output file(s) `s3.txt` expected by process `foo (s3)` -- Error is ignored
[51/267685] Submitted process > bar (s2.txt)

Is it possible to enable exit on error behavior in an interactive Tcl shell?

I need to automate a huge interactive Tcl program using Tcl expect.
As I realized, this territory is really dangerous, as I need to extend the already existing mass of code, but I can't rely on errors actually causing the program to fail with a positive exit code as I could in a regular script.
This means I have to think about every possible thing that could go wrong and "expect" it.
What I currently do is use a "die" procedure instead of raising an error in my own code, that automatically exits. But this kind of error condition can not be catched, and makes it hard to detect errors especially in code not written by me, since ultimately, most library routines will be error-based.
Since I have access to the program's Tcl shell, is it possible to enable fail-on-error?
EDIT:
I am using Tcl 8.3, which is a severe limitation in terms of available tools.
Examples of errors I'd like to automatically exit on:
% puts $a(2)
can't read "a(2)": no such element in array
while evaluating {puts $a(2)}
%
% blublabla
invalid command name "blublabla"
while evaluating blublabla
%
As well as any other error that makes a normal script terminate.
These can bubble up from 10 levels deep within procedure calls.
I also tried redefining the global error command, but not all errors that can occur in Tcl use it. For instance, the above "command not found" error did not go through my custom error procedure.
Since I have access to the program's Tcl shell, is it possible to
enable fail-on-error?
Let me try to summarize in my words: You want to exit from an interactive Tcl shell upon error, rather than having the prompt offered again?
Update
I am using Tcl 8.3, which is a severe limitation in terms of available
tools [...] only source patches to the C code.
As you seem to be deep down in that rabbit hole, why not add another source patch?
--- tclMain.c 2002-03-26 03:26:58.000000000 +0100
+++ tclMain.c.mrcalvin 2019-10-23 22:49:14.000000000 +0200
## -328,6 +328,7 ##
Tcl_WriteObj(errChannel, Tcl_GetObjResult(interp));
Tcl_WriteChars(errChannel, "\n", 1);
}
+ Tcl_Exit(1);
} else if (tsdPtr->tty) {
resultPtr = Tcl_GetObjResult(interp);
Tcl_GetStringFromObj(resultPtr, &length);
This is untested, the Tcl 8.3.5 sources don't compile for me. But this section of Tcl's internal are comparable to current sources, tested using my Tcl 8.6 source installation.
For the records
With a stock shell (tclsh), this is a little fiddly, I am afraid. The following might work for you (though, I can imagine cases where this might fail you). The idea is
to intercept writes to stderr (this is to where an interactive shell redirects error messages, before returning to the prompt).
to discriminate between arbitrary writes to stderr and error cases, one can use the global variable ::errorInfo as a sentinel.
Step 1: Define a channel interceptor
oo::class create Bouncer {
method initialize {handle mode} {
if {$mode ne "write"} {error "can't handle reading"}
return {finalize initialize write}
}
method finalize {handle} {
# NOOP
}
method write {handle bytes} {
if {[info exists ::errorInfo]} {
# This is an actual error;
# 1) Print the message (as usual), but to stdout
fconfigure stdout -translation binary
puts stdout $bytes
# 2) Call on [exit] to quit the Tcl process
exit 1
} else {
# Non-error write to stderr, proceed as usual
return $bytes
}
}
}
Step 2: Register the interceptor for stderr in interactive shells
if {[info exists ::tcl_interactive]} {
chan push stderr [Bouncer new]
}
Once registered, this will make your interactive shell behave like so:
% puts stderr "Goes, as usual!"
Goes, as usual!
% error "Bye, bye"
Bye, bye
Some remarks
You need to be careful about the Bouncer's write method, the error message has already been massaged for the character encoding (therefore, the fconfigure call).
You might want to put this into a Tcl package or Tcl module, to load the bouncer using package req.
I could imagine that your program writes to stderr and the errorInfo variable happens to be set (as a left-over), this will trigger an unintended exit.

Snakemake: variable that defines whether process is submitted cluster job or the snakefile

My current architecture is that at the start of my Snakefile I have a long running function somefunc which helps decide the "input" to rule all. I realized when I was running the workflow with slurm that somefunc is being executed by each job. Is there some variable I can access that defines whether the code is a submitted job or whether it is the main process:
if not snakemake.submitted_job:
config['layout'] = somefunc()
...
A solution which I don't really recommend is to make somefunc write the list of inputs to a tmp file so that slurm jobs will read this tmp file rather than reconstructing the list from scratch. The tmp file is created by whatever job is executed first so the long-running part is done only once.
At the end of the workflow delete the tmp file so that later executions will start fresh with new input.
Here's a sketch:
def somefunc():
try:
all_output = open('tmp.txt').readlines()
all_output = [x.strip() for x in all_output]
print('List of input files read from tmp.txt')
except:
all_output = ['file1.txt', 'file2.txt'] # Long running part
with open('tmp.txt', 'w') as fout:
for x in all_output:
fout.write(x + '\n')
print('List of input files created and written to tmp.txt')
return all_output
all_output = somefunc()
rule all:
input:
all_output,
rule one:
output:
all_output,
shell:
r"""
touch {output}
"""
onsuccess:
os.remove('tmp.txt')
onerror:
os.remove('tmp.txt')
Since jobs will be submitted in parallel, you should make sure that only one job writes tmp.txt and the others read it. I think the try/except above will do it but I'm not 100% sure. (Probably you want to use some better filename than tmp.txt, see the module tempfile. see also the module atexit) for exit handlers)
As discussed with #dariober it seems the cleanest to check whether the (hidden) snakemake directory has locks since they seem not to be generated until the first rule starts (assuming you are not using the --nolock argument).
import os
locked = len(os.listdir(".snakemake/locks")) > 0
However this results in a problem in my case:
import time
import os
def longfunc():
time.sleep(10)
return range(5)
locked = len(os.listdir(".snakemake/locks")) > 0
if not locked:
info = longfunc()
rule all:
input:
expand("test_{sample}", sample=info)
rule test:
output:
touch("test_{sample}")
run:
"""
sleep 1
"""
Somehow snakemake lets each rule reinterpret the complete snakefile, with the issue that all the jobs will complain that 'info is not defined'. For me it was easiest to store the results and load them for each job (pickle.dump and pickle.load).

Ghostscript for PS integrity test: terminate at EOF, return error unless stack is empty

To test the integrity of PostScript files, I'd like to run Ghostscript in the following way:
Return 1 (or other error code) on error
Return 0 (success) at EOF if stack is empty
Return 1 (or other error code) otherwise
I could run gs in the background, and use a timeout to force termination if gs hangs with items left on the stack. Is there an easier solution?
Ghostscript won't hang if you send files as input (unless you write a program which enters an infinite loop or otherwise fails to reach a halting state). Having items on any of the stacks won't cause it to hang.
On the other hand, it won't give you an error if a PostScript program leaves operands on the operand stack (or dictionaries on the dictionary stack, clips on the clip stack or gstates on the graphics state stack). This is because that's not an error, and since PostScript interpreters normally run in a job server loop its not a problem either. Terminating the job returns control to the job server loop which does a save and restore round the total job, thereby clearing up anything left behind.
I'd suggest that if you really want to do this you need to adopt the same approach, you need to write a PostScript program which executes the PostScript program you want to 'test', then checks the operand stack (and other stacks if required) to see if anything is left. Note that you will want to execute the test program in a stopped context, as an error in the course of the program will clearly potentially leave stuff lying around.
Ghostscript returns 0 on a clean exit and a value less than 0 for errors, if I remember correctly. You would need to use signalerror in your test framework in order to raise an error if items are left at the end of a program.
[EDIT]
Anything supplied to Ghostscript on the command line by either -s or -d is defined in systemdict, so if we do -sInputFileName=/test.pdf then we will find in systemdict a key /InputFileName whose value is a string with the contents (/test.pdf). We can use that to pass the filename to our program.
The stopped operator takes an executable array as an argument, and returns either true or false depending on whether an error occurred while executing the array (3rd Edition PLRM, p 697).
So we need to run the program contained in the filename we've been given, and do it in a 'stopped' context. Something like this:
{InputFileName run} stopped
{
(Error occurred\n) print flush
%% Potentially check $error for more information.
}{
(program terminated normally\n) print flush
%% Here you could check the various stacks
} ifelse
The following, based 90% on KenS's answer, is 99% satisfactory:
Program checkIntegrity.ps:
{Script run} stopped
{
(\n===> Integrity test failed: ) print Script print ( has error\n\n) print
handleerror
(ignore this error which only serves to force a return value of 1) /syntaxerror signalerror
}{
% script passed, now check the stack
count dup 0 eq {
pop (\n===> Integrity test passed: ) print Script print ( terminated normally\n\n) print
} {
(\n===> Integrity test failed: ) print Script print ( left ) print
3 string cvs print ( item(s) on stack\n\n) print
Script /syntaxerror signalerror
} ifelse
} ifelse
quit
Execute with
gs -q -sScript=CodeToBeChecked.ps checkIntegrity.ps ; echo $?
For the last 1% of satisfaction I would need a replacement for
(blabla) /syntaxerror signalerror
It forces exit with return code 1, but is very verbous and distracts from the actual error in the checked script that is reported by handleerror. Therefore a cleaner way to exit(1) would be welcome.

how to see if a process by name is running in tcl

I want to use the pidof by a process given by name in tcl. I have used [exec pidof $proc_name ], but it always returns an error: child process exited abnormally.
I read somewhere exec always treat non-zero return as error as pidof return the process id number. Does anyone know if there is a workaround? Thanks in advance!
I want to use pidof is that i want to see if that process is running if not i will restart the process.
The problem is that pidof does strange things with exit codes:
Exit Status
At least one program was found with the requested name.
No program was found with the requested name.
This interacts badly with exec which treats a non-zero exit code as indicating that it should tell the rest of Tcl that there was an error.
The simplest way of dealing with this is a little extra shell script wrapper. Let's hide it inside a procedure for convenience:
proc pidof {name} {
exec /bin/bash -c "pidof '$name'; exit \$(( \$? - 1 ))"
}
All that does is subtract 1 from the exit code before it hits back into Tcl.
(You could also fix this using the techniques described in the exec manual but I think it's simpler to fix on the bash side this time.)
I ran into this and ended up causing some issues with the old linux environment I run in (no bash and exit code handling was a bit different with busybox).
My solution that should work anywhere would be similar to what a few suggested:
proc pidof {name} {
catch {exec -ignorestderr -- pidof $name} pid
if {[string is entier -strict $pid]} {
return $pid
}
}