Catching snakemake runtime errors with onerror - snakemake

I'm working on a bioinformatics pipeline and one of the rule is to perform genome assembly using SPAdes (https://github.com/ablab/spades):
rule perform_de_novo_assembly_using_spades:
input:
bam = input.bam
output:
spades_contigs = directory(_RESULTS_DIR + "assembled_spades/")
threads:
_NBR_CPUS
run:
out_dir = _RESULTS_DIR + "assembled_spades/"
command = "spades.py -t " + str(threads) + " --12 " + input.bam + " -o " + out_dir
shell(command + " || true")
... More rules
onsuccess: print("Pipeline completed successfully!")
onerror: print("One or more errors occurred. Please refer to log file.")
The problem is sometimes for some problematic input files SPAdes can fail, resulting in my workflow being terminated. Therefore I added " || true " to the shell command to run SPAdes (according to this post: What would be an elegant way of preventing snakemake from failing upon shell/R error?) so that workflow will continue despite SPAdes failing. However right now my pipeline will run and still gives the "Pipeline completed successfully!" onsuccess message at the end. Ideally I want to it to print the onerror message "One or more errors occurred. Please refer to log file." Is there a way to make my workflow continue to the end despite SPAdes giving a runtime error and snakemake catching the error so that it displays the onerror message at the end?

Related

channel checks as empty even if it has content

I am trying to have a process that is launched only if a combination of conditions is met, but when checking if a channel has a path to a file, it always returns it as empty. Probably I am doing something wrong, in that case please correct my code. I tried to follow some of the suggestions in this issue but no success.
Consider the following minimal example:
process one {
output:
file("test.txt") into _chProcessTwo
script:
"""
echo "Hello world" > "test.txt"
"""
}
// making a copy so I check first if something in the channel or not
// avoids raising exception of MultipleInputChannel
_chProcessTwo.into{
_chProcessTwoView;
_chProcessTwoCheck;
_chProcessTwoUse
}
//print contents of channel
println "Channel contents: " + _chProcessTwoView.toList().view()
process two {
input:
file(myInput) from _chProcessTwoUse
when:
(!_chProcessTwoCheck.toList().isEmpty())
script:
def test = _chProcessTwoUse.toList().isEmpty() ? "I'm empty" : "I'm NOT empty"
println "The outcome is: " + test
}
I want to have process two run if and only if there is a file in the _chProcessTwo channel.
If I run the above code I obtain:
marius#dev:~/pipeline$ ./bin/nextflow run test.nf
N E X T F L O W ~ version 19.09.0-edge
Launching `test.nf` [infallible_gutenberg] - revision: 9f57464dc1
[c8/bf38f5] process > one [100%] 1 of 1 ✔
[- ] process > two -
[/home/marius/pipeline/work/c8/bf38f595d759686a497bb4a49e9778/test.txt]
where the last line are actually the contents of _chProcessTwoView
If I remove the when directive from the second process I get:
marius#mg-dev:~/pipeline$ ./bin/nextflow run test.nf
N E X T F L O W ~ version 19.09.0-edge
Launching `test.nf` [modest_descartes] - revision: 5b2bbfea6a
[57/1b7b97] process > one [100%] 1 of 1 ✔
[a9/e4b82d] process > two [100%] 1 of 1 ✔
[/home/marius/pipeline/work/57/1b7b979933ca9e936a3c0bb640c37e/test.txt]
with the contents of the second worker .command.log file being: The outcome is: I'm empty
I tried also without toList()
What am I doing wrong? Thank you in advance
Update: a workaround would be to check _chProcessTwoUse.view() != "" but that is pretty dirty
Update 2 as required by #Steve, I've updated the code to reflect a bit more the actual conditions i have in my own pipeline:
def runProcessOne = true
process one {
when:
runProcessOne
output:
file("inputProcessTwo.txt") into _chProcessTwo optional true
file("inputProcessThree.txt") into _chProcessThree optional true
script:
// this would replace the probability that output is not created
def outputSomething = false
"""
if ${outputSomething}; then
echo "Hello world" > "inputProcessTwo.txt"
echo "Goodbye world" > "inputProcessThree.txt"
else
echo "Sorry. Process one did not write to file."
fi
"""
}
// making a copy so I check first if something in the channel or not
// avoids raising exception of MultipleInputChannel
_chProcessTwo.into{
_chProcessTwoView;
_chProcessTwoCheck;
_chProcessTwoUse
}
//print contents of channel
println "Channel contents: " + _chProcessTwoView.view()
println _chProcessTwoView.view() ? "Me empty" : "NOT empty"
process two {
input:
file(myInput) from _chProcessTwoUse
when:
(runProcessOne)
script:
"""
echo "The outcome is: ${myInput}"
"""
}
process three {
input:
file(defaultInput) from _chUpstreamProcesses
file(inputFromProcessTwo) from _chProcessThree
script:
def extra_parameters = _chProcessThree.isEmpty() ? "" : "--extra-input " + inputFromProcessTwo
"""
echo "Hooray! We got: ${extra_parameters}"
"""
}
As #Steve mentioned, I should not even check if a channel is empty, NextFlow should know better to not initiate the process. But I think in this construct I will have to.
Marius
I think part of the problem here is that process 'one' creates only optional outputs. This makes dealing with the optional inputs in process 'three' a bit tricky. I would try to reconcile this if possible. If this can't be reconciled, then you'll need to deal with the optional inputs in process 'three'. To do this, you'll basically need to create a dummy file, pass it into the channel using the ifEmpty operator, then use the name of the dummy file to check whether or not to prepend the argument's prefix. It's a bit of a hack, but it works pretty well.
The first step is to actually create the dummy file. I like shareable pipelines, so I would just create this in your baseDir, perhaps under a folder called 'assets':
mkdir assets
touch assets/NO_FILE
Then pass in your dummy file if your '_chProcessThree' channel is empty:
params.dummy_file = "${baseDir}/assets/NO_FILE"
dummy_file = file(params.dummy_file)
process three {
input:
file(defaultInput) from _chUpstreamProcesses
file(optfile) from _chProcessThree.ifEmpty(dummy_file)
script:
def extra_parameters = optfile.name != 'NO_FILE' ? "--extra-input ${optfile}" : ''
"""
echo "Hooray! We got: ${extra_parameters}"
"""
}
Also, these lines are problematic:
//print contents of channel
println "Channel contents: " + _chProcessTwoView.view()
println _chProcessTwoView.view() ? "Me empty" : "NOT empty"
Calling view() will emit all values from the channel to stdout. You can ignore whatever value it returns. Unless you enable DSL2, the channel will then be empty. I think what you're looking for here is a closure:
_chProcessTwoView.view { "Found: $it" }
Be sure to append -ansi-log false to your nextflow run command so the output doesn't get clobbered. HTH.

console log using selenium certain messages are getting truncated abruptly

I am trying to validate the browser console messages via selenium api certain messages are getting truncated abruptly and all these messages begin with some icon.
So for example expected messages is this “xyz-ENa50707a1661440e4a3070d8206797cf4.min.js?cb=1523503349:15 icon Rule "xyz" fired.” but what i am getting is "“xyz-ENa50707a1661440e4a3070d8206797cf4.min.js?cb=1523503349:15 icon".
my code is like
LogEntries logEntries = driver.manage().logs().get(LogType.BROWSER);
logEntries.getAll();
for (LogEntry entry : logEntries) {
System.out.println(new Date(entry.getTimestamp()) + " " + entry.getLevel() + " " + entry.getMessage());
}
Any suggestion.
even System.setOut(printStream); not able capture full string in message.

how to pass a function under snakemake run directive

I am building a workflow in snakemake and would like to recycle one of the rules to two different input sources. The input sources could be either source1 or source1+source2 and depending on the input the output directory would also vary. Since this was quite complicated to do in the same rule and I didn't want to create the copy of the full rule I would like to create two rules with different input/output, but running same command.
Is it possible to make this work? I get the DAG resolved correctly but the job don't go through on the cluster (ERROR : bamcov_cmd not defined)..
An example below (both rules use the same command at the end):
this is command
def bamcov_cmd():
return( (deepTools_path+"bamCoverage " +
"-b {input.bam} " +
"-o {output} " +
"--binSize {params.bw_binsize} " +
"-p {threads} " +
"--normalizeTo1x {params.genome_size} " +
"{params.read_extension} " +
"&> {log}") )
this is the rule
rule bamCoverage:
input:
bam = file1+"/{sample}.bam",
bai = file1+"/{sample}.bam.bai"
output:
"bamCoverage/{sample}.filter.bw"
params:
bw_binsize = bw_binsize,
genome_size = int(genome_size),
read_extension = "--extendReads"
log:
"bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
"bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
bamcov_cmd()
this is the optional rule2
rule bamCoverage2:
input:
bam = file2+"/{sample}.filter.bam",
bai = file2+"/{sample}.filter.bam.bai"
output:
"bamCoverage/{sample}.filter.bw"
params:
bw_binsize = bw_binsize,
genome_size = int(genome_size),
read_extension = "--extendReads"
log:
"bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
"bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
bamcov_cmd()
What you asked is possible in python.
It depends if you have JUST python code in the file, or python and Snakemake.
I will answer that first, and then I have a follow up response because I want you to set it up differently so you don't have to do it this way.
Just Python:
from fileContainingMyBamCovCmdFunction import bamcov_cmd
rule bamCoverage:
...
run:
bamcov_cmd()
Visually, see how I do it in this file, to reference access to buildHeader and buildSample. These files are being called by a Snakefile. It should work the same for you.
https://github.com/LCR-BCCRC/workflow_exploration/blob/master/Snakemake/modules/py_buildFile/buildFile.py
EDIT 2017-07-23 - Updating code segment below to reflect user comment
Snakemake and Python:
include: "fileContainingMyBamCovCmdFunction.suffix"
rule bamCoverage:
...
run:
shell(bamcov_cmd())
EDIT END
If the function is truly specific to the bamCoverage call, if you prefer you can put it back in the rule. This implies it's not being called elsewhere, which may be true.
Be careful when annotating files using '.' notation, I use '_' as I find it's easier to prevent creating cyclical dependencies this way.
Also, if you do end up leaving the two rules separately, you will likely end up with ambiguity errors.
http://snakemake.readthedocs.io/en/latest/snakefiles/rules.html?highlight=ruleorder#handling-ambiguous-rules
When possible, it's best practice to have rules generating unique outputs.
As for alternatives, consider setting up the code like this?
from subprocess import call
rule all:
input:
"path/to/file/mySample.bw"
#OR
#"path/to/file/mySample_filtered.bw"
bamCoverage:
input:
bam = file1+"/{sample}.bam",
bai = file1+"/{sample}.bam.bai"
output:
"bamCoverage/{sample}.bw"
params:
bw_binsize = bw_binsize,
genome_size = int(genome_size),
read_extension = "--extendReads"
log:
"bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
"bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
callString= deepTools_path + "bamCoverage " \
+ "-b " + wilcards.input.bam \
+ "-o " + wilcards.output \
+ "--binSize " str(params.bw_binsize) \
+ "-p " + str({threads}) \
+ "--normalizeTo1x " + str(params.genome_size) \
+ " " + str(params.read_extension) \
+ "&> " + str(log)
call(callString, shell=True)
rule filterBam:
input:
"{pathFB}/{sample}.bam"
output:
"{pathFB}/{sample}_filtered.bam"
run:
callString="samtools view -bh -F 512 " + wildcards.input \
+ ' > ' + wildcards.output
call(callString, shell=True)
Thoughts?

noflo 0.5.13 spreadsheet example broken?

I am new to noflo and looking at examples in order to explore it. The spreadsheet example looked interesting but I couldn't make it run. First, it takes some time and manual debugging to identify missing components, not a big deal and I believe will be improved in the future, for now the error message I get is
return process.component.outPorts[port].attach(socket);
^
TypeError: undefined is not a function
Apparently, before this isAddressable() was also undefined. Checked with this SO issue but I don't have any noflo 0.4 as a dependency anywhere. Spent some time to debug it but seemingly stuck at it, decided to post to SO.
The question is, what are the correct steps to run the spreadsheet example?
For reproduction, here is what I have done:
0) install the following components
noflo-adapters
noflo-core
noflo-couchdb
noflo-filesystem
noflo-groups
noflo-objects
noflo-packets
noflo-strings
noflo-tika
noflo-xml
i) edit spreadsheet/parse.fbp, because first error was
throw new Error("No outport '" + port + "' defined in process " + proc
^
Error: No outport 'error' defined in process Read (Read() ERROR -> IN Display())
apparently couchdb ReadDocument component does not provide Error outport. therefore replaced ReadDocument with ReadFile.
18c18
< 'tika-app-0.9.jar' -> TIKA Read(ReadDocument)
---
> 'tika-app-0.9.jar' -> TIKA Read(ReadFile)
ii) at this point, received the following:
if (process.component.outPorts[port].isAddressable()) {
^
TypeError: undefined is not a function
and improvised a stunt by checking if isAddressable is defined at this location of code:
## -259,9 +261,11 ##
throw new Error("No outport '" + port + "' defined in process " + process.id + " (" + (socket.getId()) + ")");
return;
}
- if (process.component.outPorts[port].isAddressable()) {
+ if (process.component.outPorts[port].isAddressable && process.component.outPorts[port].isAddressable()) {
return process.component.outPorts[port].attach(socket, index);
}
return process.component.outPorts[port].attach(socket);
};
and either way fails. Again, the question is What are the correct steps to run the spreadsheet example?
Thanks in advance.

Which processes can be launched using performTaskWithPathArgumentsTimeout function?

I use UIAutomation to automate an iPad application. I have tried to use
(object) performTaskWithPathArgumentsTimeout(path, args, timeout) to run Safari.app from my script:
var target = UIATarget.localTarget();
var host = target.host();
var result = host.performTaskWithPathArgumentsTimeout("/Applications/Safari.app", ["http://www.google.com"], 30);
UIALogger.logDebug("exitCode: " + result.exitCode);
UIALogger.logDebug("stdout: " + result.stdout);
UIALogger.logDebug("stderr: " + result.stderr);
I got the following results:
exitCode: 5
stdout:
stderr:
I’ve also tried to launch echo:
var target = UIATarget.localTarget();
var host = target.host();
var result = host.performTaskWithPathArgumentsTimeout("/bin/echo", ["Hello
World"], 5);
UIALogger.logDebug("exitCode: " + result.exitCode);
UIALogger.logDebug("stdout: " + result.stdout);
UIALogger.logDebug("stderr: " + result.stderr);
Results:
exitCode: 0
stdout: Hello World
stderr:
So, looks like performTaskWithPathArgumentsTimeout works for specific applications only.
Could you please help me to answer the following questions:
1. What does exitCode = 5 mean?
2. Which processes can be launched using performTaskWithPathArgumentsTimeout function?
1) Exit code 5 is most likely EIO, as defined in : Input/Output error. You're attempting to execute "/Applications/Safari.app", which to the launching task is a directory and not a binary.
2) You can launch any application with performTaskWithPathArgumentsTimeout() that NSTask can launch. As long as it's a valid executable, it should work.
For your specific example though, Safari won't accept an argument passed on the command line like that as a URL to visit. You need to use open /Applications/Safari.app "http://www.google.com" instead:
var result = host.performTaskWithPathArgumentsTimeout("/usr/bin/open", ["/Applications/Safari.app", "http://www.google.com"], 30);