I am trying to pass in a parameter to Nextflow that I can use to turn a process on or off, to no avail.
Moreover, when I print the parameter in the log file the case always changes, which seems odd to me (i.e., TRUE turns to true). I have tried setting the conditional statement to match "TRUE" or "true", given this behavior, but neither seems to work.
Here is some code to illustrate the issue.
params.force = "FALSE"
params.in = 1
log.info """\
Force: $params.force
"""
.stripIndent()
process tester {
input:
val x from params.in
output:
stdout testerOut
when:
params.force == "TRUE"
script:
"""
echo "foo"
"""
}
testerOut.view()
If this file is saved as testnf and is run via "nextflow run testnf --force "TRUE" " the process will not run. The output is:
N E X T F L O W ~ version 21.10.0
Launching testnf [soggy_lorenz] - revision: a7399aad3c
Force: true
[- ] process > tester -
The goal is for users to pass in parameters that turn off or on certain processes. This seems like a common use case, but I am stuck. Cheers for any help!
Pallie is correct:
Nextflow automatically converts the --force TRUE param to a boolean
This is because the Nextflow command-line parameter values TRUE and FALSE (case insensitive) are special and will return Boolean.TRUE and Boolean.FALSE, respectively1. When you print (or log) these parameters, you are really just accessing a Boolean that represents the special truth values: true and false.
Be aware that parameters defined inside your Nextflow script will not be coerced. For example, setting params.force = "FALSE" inside you script will just give you a plain old java.lang.String if --force is not specified on the command-line. The problem is that if you do specify --force on the command-line, you'll get a different type: a java.lang.Boolean. The solution is to set params.force to a Boolean value inside your script:
params.force = false
println("Force: ${params.force}")
process test {
echo true
when:
params.force
"""
echo "foo"
"""
}
Some tests, with expected results:
nextflow run test.nf
N E X T F L O W ~ version 21.04.3
Launching `test.nf` [clever_mccarthy] - revision: e7a5148ea1
Force: false
[- ] process > test -
$ nextflow run test.nf --force
N E X T F L O W ~ version 21.04.3
Launching `test.nf` [distraught_jepsen] - revision: e7a5148ea1
Force: true
executor > local (1)
[d0/c9a3d2] process > test [100%] 1 of 1 ✔
foo
$ nextflow run test.nf --force FALSE
N E X T F L O W ~ version 21.04.3
Launching `test.nf` [astonishing_feynman] - revision: e7a5148ea1
Force: false
[- ] process > test -
$ nextflow run test.nf --force True
N E X T F L O W ~ version 21.04.3
Launching `test.nf` [sleepy_sammet] - revision: e7a5148ea1
Force: true
executor > local (1)
[b5/9fac0f] process > test [100%] 1 of 1 ✔
foo
This last example shows that non-empty strings are coerced to true according to Groovy truth:
$ nextflow run test.nf --force foobar
N E X T F L O W ~ version 21.04.3
Launching `test.nf` [trusting_hilbert] - revision: e7a5148ea1
Force: foobar
executor > local (1)
[d1/c1b190] process > test [100%] 1 of 1 ✔
foo
Nextflow automatically converts the --force TRUE param to a boolean, so just change it to:
params.force = "FALSE"
params.in = 1
log.info """\
Force: $params.force
"""
.stripIndent()
process tester {
input:
val x from params.in
output:
stdout testerOut
when:
params.force
script:
"""
echo "foo"
"""
}
testerOut.view()
output:
$ ~/nextflow run main.nf
N E X T F L O W ~ version 21.10.6
Launching `main.nf` [high_hamilton] - revision: cac46af672
Force: FALSE
executor > local (1)
[1e/5cb38e] process > tester [100%] 1 of 1 ✔
foo
Related
I have files with identical names but in different folders. Nextflow stages these files into the same work directory resulting in name collisions. My question is how to deal with that without renaming the files. Example:
# Example data
mkdir folder1 folder2
echo 1 > folder1/file.txt
echo 2 > folder2/file.txt
# We read from samplesheet
$ cat samplesheet.csv
sample,file
sample1,/home/atpoint/foo/folder1/file.txt
sample1,/home/atpoint/foo/folder2/file.txt
# Nextflow main.nf
#! /usr/bin/env nextflow
nextflow.enable.dsl=2
// Read samplesheet and group files by sample (first column)
samplesheet = Channel
.fromPath(params.samplesheet)
.splitCsv(header:true)
.map {
sample = it['sample']
file = it['file']
tuple(sample, file)
}
ch_samplesheet = samplesheet.groupTuple(by:0)
// That creates a tuple like:
// [sample1, [/home/atpoint/foo/folder1/file.txt, /home/atpoint/foo/folder2/file.txt]]
// Dummy process that stages both files into the same work directory folder
process PRO {
input:
tuple val(samplename), path(files)
output:
path("out.txt")
script:
"""
echo $samplename with files $files > out.txt
"""
}
workflow { PRO(ch_samplesheet) }
# Run it
NXF_VER=21.10.6 nextflow run main.nf --samplesheet $(realpath samplesheet.csv)
...obviously resulting in:
N E X T F L O W ~ version 21.10.6
Launching `main.nf` [adoring_jennings] - revision: 87f26fa90b
[- ] process > PRO -
Error executing process > 'PRO (1)'
Caused by:
Process `PRO` input file name collision -- There are multiple input files for each of the following file names: file.txt
So, what now? The real world application here is sequencing replicates of the same fastq file, which then have the same name, but are in different folders, and I want to feed them into a process that merges them. I am aware of this section in the docs but cannot say that any of it was helpful or that I understand it properly.
You can use stageAs option in your process definition.
#! /usr/bin/env nextflow
nextflow.enable.dsl=2
samplesheet = Channel
.fromPath(params.samplesheet)
.splitCsv(header:true)
.map {
sample = it['sample']
file = it['file']
tuple(sample, file)
}
.groupTuple()
.set { ch_samplesheet }
// [sample1, [/path/to/folder1/file.txt, /path/to/folder2/file.txt]]
process PRO {
input:
tuple val(samplename), path(files, stageAs: "?/*")
output:
path("out.txt")
shell:
def input_str = files instanceof List ? files.join(" ") : files
"""
cat ${input_str} > out.txt
"""
}
workflow { PRO(ch_samplesheet) }
See an example from nf-core and the path input type docs
How do I allow a process to take an input from either one of two channels that are outputs of processes with mutually exclusive conditions for running? For example, something like:
params.condition = false
process a {
output:
path "a.out" into a_into_c
when:
params.condition == true
"""
touch a.out
"""
}
process b {
output:
path "b.out" into b_into_c
when:
params.condition == false
"""
touch b.out
"""
}
process c {
publishDir baseDir, mode: 'copy'
input:
path foo from a_into_c or b_into_c
output:
path "final.out"
"""
echo $foo > final.out
"""
}
where final.out will contain a.out if params.condition is true (e.g. --condition is given on the command line), and b.out if it is false.
You can use the mix operator for this:
process c {
publishDir baseDir, mode: 'copy'
input:
path foo from a_into_c.mix(b_into_c)
output:
path "final.out"
"""
echo $foo > final.out
"""
}
I am attempting to merge x number of bam files produced via performing multiple alignments at once (on batches of y number of fastq files) into one single bam file in Nextflow.
So far I have the following when performing the alignment and sorting/indexing the resulting bam file:
//Run minimap2 on concatenated fastqs
process miniMap2Bam {
publishDir "$params.bamDir"
errorStrategy 'retry'
cache 'deep'
maxRetries 3
maxForks 10
memory { 16.GB * task.attempt }
input:
val dirString from dirStr
val runString from stringRun
each file(batchFastq) from fastqBatch.flatMap()
output:
val runString into stringRun1
file("${batchFastq}.bam") into bamFiles
val dirString into dirStrSam
script:
"""
minimap2 --secondary=no --MD -2 -t 10 -a $params.genome ${batchFastq} | samtools sort -o ${batchFastq}.bam
samtools index ${batchFastq}.bam
"""
}
Where ${batchFastq}.bam is a bam file containing a batch of y number of fastq files.
This pipeline completes just fine, however, when attempting to perform samtools merge on these bam files in another process (samToolsMerge), the process runs each time an alignment is run (in this case, 4), instead of once for all bam files collected:
//Run samtools merge
process samToolsMerge {
echo true
publishDir "$dirString/aligned_minimap/", mode: 'copy', overwrite: 'false'
cache 'deep'
errorStrategy 'retry'
maxRetries 3
maxForks 10
memory { 14.GB * task.attempt }
input:
val runString from stringRun1
file bamFile from bamFiles.collect()
val dirString from dirStrSam
output:
file("**")
script:
"""
samtools merge ${runString}.bam ${bamFile}
"""
}
With the output being:
executor > lsf (9)
[49/182ec0] process > catFastqs (1) [100%] 1 of 1 ✔
[- ] process > nanoPlotSummary -
[0e/609a7a] process > miniMap2Bam (1) [100%] 4 of 4 ✔
[42/72469d] process > samToolsMerge (2) [100%] 4 of 4 ✔
Completed at: 04-Mar-2021 14:54:21
Duration : 5m 41s
CPU hours : 0.2
Succeeded : 9
How can I take just the resulting bam files from miniMap2Bam and run them through samToolsMerge a single time, instead of the process running multiple times?
Thanks in advance!
EDIT:
Thanks to Pallie in the comments below, the issue was feeding the runString and dirString values from a prior process into miniMap2Bam and then samToolsMerge, causing the process to repeat itself each time a value was passed on.
The solution was as simple as removing the vals from miniMap2Bam (as follows):
//Run minimap2 on concatenated fastqs
process miniMap2Bam {
errorStrategy 'retry'
cache 'deep'
maxRetries 3
maxForks 10
memory { 16.GB * task.attempt }
input:
each file(batchFastq) from fastqBatch.flatMap()
output:
file("${batchFastq}.bam") into bamFiles
script:
"""
minimap2 --secondary=no --MD -2 -t 10 -a $params.genome ${batchFastq} | samtools sort -o ${batchFastq}.bam
samtools index ${batchFastq}.bam
"""
}
The simplest fix would probably to stop passing the static dirstring and runstring around via channels:
// Instead of a hardcoded path use a parameter you passed via CLI like you did with bamDir
dirString = file("/path/to/fastqs/")
runString = file("/path/to/fastqs/").getParent()
fastqBatch = Channel.from("/path/to/fastqs/")
//Run minimap2 on concatenated fastqs
process miniMap2Bam {
publishDir "$params.bamDir"
errorStrategy 'retry'
cache 'deep'
maxRetries 3
maxForks 10
memory { 16.GB * task.attempt }
input:
each file(batchFastq) from fastqBatch.flatMap()
output:
file("${batchFastq}.bam") into bamFiles
script:
"""
minimap2 --secondary=no --MD -2 -t 10 -a $params.genome ${batchFastq} | samtools sort -o ${batchFastq}.bam
samtools index ${batchFastq}.bam
"""
}
//Run samtools merge
process samToolsMerge {
echo true
publishDir "$dirString/aligned_minimap/", mode: 'copy', overwrite: 'false'
cache 'deep'
errorStrategy 'retry'
maxRetries 3
maxForks 10
memory { 14.GB * task.attempt }
input:
file bamFile from bamFiles.collect()
output:
file("**")
script:
"""
samtools merge ${runString}.bam ${bamFile}
"""
I have a gitlab-ci.yml file that I'm invoking a python script from like:
- /usr/bin/python3.6 file.py
This file.py file returns either True or False
In pseudocode I'm trying to do:
- run file.py
- if True: do x
- else do y
How can I achieve this in gitlabci ?
Thanks
How does it return True or False - by writing to stdout?
If so then
- RES=$(run file.py)
- |
if [[ $RES == "True" ]]; then
...
else
...
fi
How to check response coming from a command while using SSHOperator?
t1 = SSHOperator(ssh_conn_id='conn_box2',
task_id='t1',
command='Rscript /code/demo.R',
do_xcom_push=True,
response_check=lambda response: True if "status:200" in response.text else False,
dag=dag
)
My R scripts returns status:200 if the execution goes well. And I want to track it. My task t1 should only complete if status is 200.
If R script returns status:300 its a failed one. But since the execution is completed without any error in UI task turns into green(which i don't want)
I code above is able to capture the response in xcom, but how do i validate it?
Try the following code:
bash_command = """
set -e;
Rscript /code/demo.R | grep 'status:200' &> /dev/null
if [ $? == 0 ]; then
echo "Task Successful"
else
echo "Task Failed"
exit 1
fi
"""
t1 = SSHOperator(ssh_conn_id='conn_box2',
task_id='t1',
command=bash_command,
dag=dag)
Alternatively, you can also use the following bash_command:
if Rscript /code/demo.R | grep -q 'status:200'; then
echo "Task Successful"
else
echo "Task Failed"
exit 1
fi
The SSHOperator does not have response_check parameter.
Airflow is unable to interpret exit command
[2021-09-07 06:36:58,164] {ssh.py:142} INFO - ps_count is 23, There might be some processes are running
[2021-09-07 06:36:58,169] {taskinstance.py:1455} ERROR - SSH operator error: error running cmd:
code:
set -e;
ps_count=jpsexec | grep -v execute | wc -l
if [ $ps_count -ne 0 ]
then
echo "ps_count is $ps_count, There might be some processes are running"
exit 1
else
echo "All processed were stopped..!"
fi