handle input from multiple processes example in nextflow dsl2

handle input from multiple processes example in nextflow dsl2 - nextflow

How do I go about defining a workflow that executes two initial processes in parallel and then handles both outputs of those processes in a third process? The simple examples I was able to find in tutorials always define sequential flows and often use stdin/stdout to transport information.
In order to illustrate what I want to achieve, here is a diagram of the DAG that I imagine:
I imagine the .nf file to look something like the following but cannot fill in the blanks
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process produceRandomX {
output:
x // what do I put in these places?
"""
print $RANDOM
"""
}
process produceRandomY {
output:
y
"""
print -$RANDOM
"""
}
process calculateSum {
input:
x
y
output:
sum
"""
print $x+$y
"""
}
process printResult {
input:
sum
output:
stdout
"""
print $sum
"""
}
workflow {
// syntax here is broken and just meant to illustrate what I want to achieve
produceRandomX \
| calculateSum | printResult
produceRandomY /
}

Nextflow processes can define one or more input and output channels. The interaction between these, and ultimately the pipeline execution itself, is implicitly defined by these input and output declarations1. In the following example, produceRandomX and produceRandomY could be run in parallel (assuming there are sufficient system resources available) but even if they're not, calculateSum will wait until it receives a complete input configuration (i.e. until it receives a value from each input channel).
process produceRandomX {
output:
stdout
"""
echo "\$RANDOM"
"""
}
process produceRandomY {
output:
stdout
"""
echo "-\$RANDOM"
"""
}
process calculateSum {
input:
val x
val y
output:
stdout
"""
echo \$(( $x + $y ))
"""
}
workflow {
x = produceRandomX()
y = produceRandomY()
sum = calculateSum( x, y )
sum.view()
}
Results:
$ nextflow run -ansi-log false main.nf
N E X T F L O W ~ version 22.10.0
Launching `main.nf` [lonely_torvalds] DSL2 - revision: 3b752ca2cc
[07/8e1473] Submitted process > produceRandomX
[fd/d9a629] Submitted process > produceRandomY
[15/c88aac] Submitted process > calculateSum
-1331

Related

Nextflow DSL2: how to combine outputs (channels) from multiple processes into input of another process by part of filename?

I'm trying to combine outputs from two separate processes A and B, where each of them outputs multiple files, into input of process C. All file names have in common a chromosome number(for example "chr1"). The process A outputs files: /path/chr1_qc.vcf.gz, /path/chr2_qc.vcf.gz and etc (genotype files).
Process B outputs files: /path/chr1.a.bcf, /path/chr1.b.bcf, /path/chr1.c.bcf.../path/chr2.a.bcf, /path/chr2.b.bcf and etc (region files). And the number of both file-sets could vary each time.
Part of the code:
process A {
module "bcftools/1.16"
publishDir "${params.out_dir}", mode: 'copy', overwrite: true
input:
path vcf
path tbi
output:
path ("${(vcf =~ /chr\d{1,2}/)[0]}_qc.vcf.gz")
script:
"""
bcftools view -R ${params.sites_list} -Oz -o ${(vcf =~ /chr\d{1,2}/)[0]}_qc.vcf.gz ${vcf} //generates QC-ed genome files
tabix -f ${(vcf =~ /chr\d{1,2}/)[0]}_qc.vcf.gz //indexing QC-ed genomes
"""
}
process B {
publishDir "${params.out_dir}", mode: 'copy', overwrite: true
input:
path(vcf)
output:
tuple path("${(vcf =~ /chr\d{1,2}/)[0]}.*.bed")
script:
"""
python split_chr.py ${params.chr_lims} ${vcf} //generates region files
"""
}
process C {
publishDir "${params.out_dir}", mode: 'copy', overwrite: true
input:
tuple path(vcf), path(bed)
output:
path "${bed.SimpleName}.vcf.gz"
script:
"""
bcftools view -R ${bed} -Oz -o ${bed.SimpleName}.vcf.gz ${vcf}
"""
}
workflow {
A(someprocess.out)
B(A.out)
C(combined_AB_files)
}
Process B output.view() output:
[/path/chr1.a.bed, /path/chr1.b.bed]
[/path/chr2.a.bed, /path/chr2.b.bed]
How can I get the process C to receive an input as a channel of tuples (A and B outputs combined by chromosome name) like this:
[ /path/chr1_qc.vcf.gz, /path/chr1.a.bcf ]
[ /path/chr1_qc.vcf.gz, /path/chr1.b.bcf ]
...
[ /path/chr2_qc.vcf.gz, /path/chr2.a.bcf ]
...

I think what you want is the second form of the combine operator, which allows you to combine items that share a matching key using the by parameter. If one or more of your channels are missing a shared key in the first element, you can just use the map operator to produce such a key. To get the desired output, use the transpose operator and specify the index (zero based) of the element to be transposed, again using the by parameter. For example:
workflow {
Channel
.fromPath( './data/*.bed' )
.map { tuple( it.simpleName, it ) }
.groupTuple()
.set { bed_files }
Channel
.fromPath( './data/*_qc.vcf.gz' )
.map { tuple( it.simpleName - ~/_qc$/, it ) }
.combine( bed_files, by: 0 )
.transpose( by: 2 )
.map { chrom, vcf, bed -> tuple( vcf, bed ) }
.view()
}
Results:
$ touch ./data/chr{1..3}.{a..c}.bed
$ touch ./data/chr{1..3}_qc.vcf.gz
$ nextflow run main.nf
N E X T F L O W ~ version 22.10.0
Launching `main.nf` [pedantic_woese] DSL2 - revision: 9c5abfca90
[/data/chr1_qc.vcf.gz, /data/chr1.c.bed]
[/data/chr1_qc.vcf.gz, /data/chr1.a.bed]
[/data/chr1_qc.vcf.gz, /data/chr1.b.bed]
[/data/chr2_qc.vcf.gz, /data/chr2.b.bed]
[/data/chr2_qc.vcf.gz, /data/chr2.a.bed]
[/data/chr2_qc.vcf.gz, /data/chr2.c.bed]
[/data/chr3_qc.vcf.gz, /data/chr3.c.bed]
[/data/chr3_qc.vcf.gz, /data/chr3.b.bed]
[/data/chr3_qc.vcf.gz, /data/chr3.a.bed]
Note that when two or more queue channels are declared as process inputs (like in your process A), the process will block until it receives a value from each input channel. As these are run in parallel and asynchronously, there's no guarantee that items will be emitted in the order that they were received. This can result in mix-ups, where for example, you unexpectedly end up with an index file that belongs to another VCF. Most of the time, what you want is one queue channel and one or more value channels. The section in the docs on multiple input channels explains this quite well in my opinion, and is well worth the time reading if you haven't already. Also, joining and combining channels becomes a lot easier when your processes define tuples in their input and output declarations, where the first element is a key, like a sample name/id. I think you want something something like the following:
params.vcf_files = './data/*.vcf.gz{,.tbi}'
params.sites_list = './data/sites.tsv'
params.chr_lims = './data/file.txt'
params.outdir = './results'
process proc_A {
tag "${sample}: ${indexed_vcf.first()}"
publishDir "${params.outdir}/proc_A", mode: 'copy', overwrite: true
module "bcftools/1.16"
input:
tuple val(sample), path(indexed_vcf)
path sites_list
output:
tuple val(sample), path("${sample}_qc.vcf.gz{,.tbi}")
script:
def vcf = indexed_vcf.first()
"""
bcftools view \\
-R "${sites_list}" \\
-Oz \\
-o "${sample}_qc.vcf.gz" \\
"${vcf}"
bcftools index \\
-t \\
"${sample}_qc.vcf.gz"
"""
}
process proc_B {
tag "${sample}: ${indexed_vcf.first()}"
publishDir "${params.outdir}/proc_B", mode: 'copy', overwrite: true
input:
tuple val(sample), path(indexed_vcf)
path chr_lims
output:
tuple val(sample), path("*.bed")
script:
def vcf = indexed_vcf.first()
"""
split_chr.py "${chr_lims}" "${vcf}"
"""
}
process proc_C {
tag "${sample}: ${indexed_vcf.first()}: ${bed.name}"
publishDir "${params.outdir}/proc_C", mode: 'copy', overwrite: true
input:
tuple val(sample), path(indexed_vcf), path(bed)
output:
tuple val(sample), path("${bed.simpleName}.vcf.gz")
script:
def vcf = indexed_vcf.first()
"""
bcftools view \\
-R "${bed}" \\
-Oz \\
-o "${bed.simpleName}.vcf.gz" \\
"${vcf}"
"""
}
workflow {
vcf_files = Channel.fromFilePairs( params.vcf_files )
sites_list = file( params.sites_list )
chr_lims = file( params.chr_lims )
proc_A( vcf_files, sites_list )
proc_B( proc_A.out, chr_lims )
proc_A.out \
| combine( proc_B.out, by: 0 ) \
| map { sample, indexed_vcf, bed_files ->
bed_list = bed_files instanceof Path ? [bed_files] : bed_files
tuple( sample, indexed_vcf, bed_list )
} \
| transpose( by: 2 ) \
| proc_C \
| view()
}
The above should produce results like:
$ nextflow run main.nf
N E X T F L O W ~ version 22.10.0
Launching `main.nf` [mighty_elion] DSL2 - revision: 5ea25ae72c
executor > local (15)
[b4/08df9d] process > proc_A (foo: foo.vcf.gz) [100%] 3 of 3 ✔
[93/55e467] process > proc_B (foo: foo_qc.vcf.gz) [100%] 3 of 3 ✔
[8b/cd7193] process > proc_C (foo: foo_qc.vcf.gz: b.bed) [100%] 9 of 9 ✔
[bar, ./work/90/53b9c6468ca54bb0f4eeb99ca82eda/a.vcf.gz]
[bar, ./work/24/cca839d5f63ee6988ead96dc9fbe1d/b.vcf.gz]
[bar, ./work/6f/61e1587134e68d2e358998f61f6459/c.vcf.gz]
[baz, ./work/f8/1484e94b9187ba6aae81d68f0a18cf/b.vcf.gz]
[baz, ./work/9c/20578262f5a2c13c6c3b566dc7b7d8/c.vcf.gz]
[baz, ./work/f5/3405b54f81f6f500a3ee4a78f5e6df/a.vcf.gz]
[foo, ./work/39/945fb0d3f375260e75afbc9caebc5d/a.vcf.gz]
[foo, ./work/de/cecd94ff39f204e799cb8e4c4ad46f/c.vcf.gz]
[foo, ./work/8b/cd7193107f6be5472d2e29982e3319/b.vcf.gz]
Also note that third party scripts, like your Python script, can be moved to a folder called bin in the root of your project repository (i.e. the same directory as your main.nf). And if you make your script executable, you will be able to invoke "as-is", i.e. without the need for an absolute path to it.

This can be done with channel operators. Check the code below, with some comments:
workflow {
// Let's start by building channels similar to the ones you described
Channel
.of(file('/path/chr1_qc.vcf.gz'), file('/path/chr2_qc.vcf.gz'))
.set { pAoutput}
Channel
.of(file('/path/chr1.a.bcf'), file('/path/chr1.b.bcf'), file('/path/chr1.c.bcf'),
file('/path/chr2.a.bcf'), file('/path/chr2.b.bcf'), file('/path/chr2.c.bcf'))
.set { pBoutput }
// Now, let's create keys to relate the elements in the two channels
pAoutput
.map { filepath -> [filepath.name.tokenize('_')[0], filepath ] }
.set { pAoutput_tuple }
// The channel now looks like this:
// [chr1, /path/chr1_qc.vcf.gz]
// [chr2, /path/chr2_qc.vcf.gz]
pBoutput
.map { filepath -> [filepath.name.tokenize('.')[0], filepath ] }
.set { pBoutput_tuple }
// And:
// [chr1, /path/chr1.a.bcf]
// [chr1, /path/chr1.b.bcf]
// [chr1, /path/chr1.c.bcf]
// [chr2, /path/chr2.a.bcf]
// [chr2, /path/chr2.b.bcf]
// [chr2, /path/chr2.c.bcf]
// Combine the two channels and group by key
pAoutput_tuple
.mix(pBoutput_tuple)
.groupTuple()
.flatMap { chrom, path_list ->
path_list.split {
it.name.endsWith('.vcf.gz')
}.combinations()
}
.view()
}
You can check the output below:
N E X T F L O W ~ version 22.10.4
Launching `ex.nf` [maniac_pike] DSL2 - revision: f87873ef13
[/path/chr1_qc.vcf.gz, /path/chr1.a.bcf]
[/path/chr1_qc.vcf.gz, /path/chr1.b.bcf]
[/path/chr1_qc.vcf.gz, /path/chr1.c.bcf]
[/path/chr2_qc.vcf.gz, /path/chr2.a.bcf]
[/path/chr2_qc.vcf.gz, /path/chr2.b.bcf]
[/path/chr2_qc.vcf.gz, /path/chr2.c.bcf]

nextflow input and output a tuple with keys

I am processing file using Nextflow, that have a sample Id and would like to carry this sampleID across processes, so im using tuples. The relevant snippet of the code is here:
process 'rsem_quant' {
input:
val genome from params.genome
tuple val(sampleId), file(read1), file(read2) from samples_ch
output:
tuple sampleId , path "${sampleId}.genes.results" into rsem_ce
script:
"""
module load RSEM
rsem-calculate-expression --star --keep-intermediate-files \
--sort-bam-by-coordinate --star-output-genome-bam --strandedness reverse \
--star-gzipped-read-file --paired-end $genome \
$read1 $read2 $sampleId
"""
The problem is that when using a tuple as an output, I get the following error:
No such variable: sampleId
If I remove the tuple, and just output either part (sampleId, or the path) it works fine, any help is appreciated

I was unable to reproduce the error with the code supplied. I suspect your output block needs to define the output type val for the 'sampleId' variable:
output:
tuple val(sampleId) , path("${sampleId}.genes.results") into rsem_ce
A minimal example to run RSEM on paired-end reads (using Conda) might look like:
nextflow.enable.dsl=2
params.ref_name = 'GRCh38_GENCODE_v31'
params.ref_fasta = 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/GRCh38.primary_assembly.genome.fa.gz'
params.ref_gtf = 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.primary_assembly.annotation.gtf.gz'
params.strandedness = 'reverse'
include { gunzip as gunzip_fasta } from './gzip.nf'
include { gunzip as gunzip_gtf } from './gzip.nf'
process 'rsem_prepare_ref' {
conda 'rsem star samtools'
input:
val ref_name
path ref_fasta
path ref_gtf
output:
path "${ref_name}"
"""
mkdir "${ref_name}"
rsem-prepare-reference \\
--gtf "${ref_gtf}" \\
--star \\
"${ref_fasta}" \\
"${ref_name}/${ref_name}"
"""
}
process 'rsem_calculate_expression' {
tag { sample }
conda 'rsem star samtools'
input:
tuple val(sample), path(reads)
path ref_name
output:
tuple val(sample), path("${sample}.genes.results")
script:
def (read1, read2) = reads
"""
rsem-calculate-expression \\
--star \\
--sort-bam-by-coordinate \\
--star-output-genome-bam \\
--strandedness "${params.strandedness}" \\
--star-gzipped-read-file \\
--paired-end \\
"${read1}" \\
"${read2}" \\
"${ref_name}/${ref_name}" \\
"${sample}"
"""
}
workflow {
reads = Channel.fromFilePairs( './data/*_{1,2}.fastq.gz' )
ref_fasta = gunzip_fasta( params.ref_fasta )
ref_gtf = gunzip_gtf( params.ref_gtf )
rsem_prepare_ref( params.ref_name, ref_fasta, ref_gtf )
rsem_calculate_expression( reads, rsem_prepare_ref.out )
}
Contents of gzip.nf:
process gunzip {
tag { gzfile.name }
input:
path gzfile
output:
path "${gzfile.getBaseName()}"
when:
gzfile.getExtension() == "gz"
"""
gzip -dc "${gzfile}" > "${gzfile.getBaseName()}"
"""
}
Run using:
nextflow run test.nf -resume -ansi-log false
Results:
N E X T F L O W ~ version 21.04.3
Launching `main.nf` [awesome_poincare] - revision: 51040c89cc
[cf/ffec1a] Cached process > gunzip_fasta (GRCh38.primary_assembly.genome.fa.gz)
[ce/b7a04b] Cached process > gunzip_gtf (gencode.v38.primary_assembly.annotation.gtf.gz)
[f1/bcb8e3] Cached process > rsem_prepare_ref
[de/f7906e] Submitted process > rsem_calculate_expression (HBR_Rep2)
[1e/3984da] Submitted process > rsem_calculate_expression (UHR_Rep1)
[59/907f56] Submitted process > rsem_calculate_expression (UHR_Rep3)
[26/41db23] Submitted process > rsem_calculate_expression (HBR_Rep1)
[e8/2c98fe] Submitted process > rsem_calculate_expression (UHR_Rep2)
[03/bbb42b] Submitted process > rsem_calculate_expression (HBR_Rep3)

Extract sample ids from nextflow fromPath()

I am new to nextflow and here is a practice that I wanted to test for a real job.
#!/usr/bin/env nextflow
params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns)
cns_ch.view()
The output of this script is:
N E X T F L O W ~ version 21.04.0
Launching `cnvkit_call.nf` [festering_wescoff] - revision: 886ab3cf13
/data1/deliver/phase2/CNVkit/002-002_L4_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/015-002_L4.SSHT89_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/004-005_L1_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/018-008_L1.SSHT31_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/003-002_L3_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/002-004_L6_sorted_dedup.cns
Here 002-002, 015-002, 004-005 etc are sample ids. I am trying to write a simple process to output a file such as ${sample.id}_sorted_dedup.calls.cns but I am not sure how to extract these ids and output it.
process cnvcalls {
input:
file(cns_file) from cns_ch
output:
file("${sample.id}_sorted_dedup.calls.cns") into cnscalls_ch
script:
"""
cnvkit.py call ${cns_file} -o ${sample.id}_sorted_dedup.calls.cns
"""
}
How to revise the process cnvcalls to make it work with sample.id?

There's lots of ways to extract the sample names/ids from filenames. One way could be to split on the underscore and take the first element:
params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns)
process cnvcalls {
input:
path(cns_file) from cns_ch
output:
path("${sample_id}_sorted_dedup.calls.cns") into cnscalls_ch
script:
sample_id = cns_file.name.split('_')[0]
"""
cnvkit.py call "${cns_file}" -o "${sample_id}_sorted_dedup.calls.cns"
"""
}
Though, my preference would be to input the sample name/id alongside the input file using a tuple:
params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns).map {
tuple( it.name.split('_')[0], it )
}
process cnvcalls {
input:
tuple val(sample_id), path(cns_file) from cns_ch
output:
path "${sample_id}_sorted_dedup.calls.cns" into cnscalls_ch
"""
cnvkit.py call "${cns_file}" -o "${sample_id}_sorted_dedup.calls.cns"
"""
}

Nextflow join file to tuple

I have problem with nextflow, I have a tuple with 3 elements (id, fastq_File, out_file) and I need join a new file to every tuple element (same file for all tuple element).
Well first I have a fastq, and I split this in chunks, and map with their id, after I have a process (simple process in the example), but this process return the id with other file.
reads = Channel.fromPath( 'data/illumina.fastq' )
.splitFastq(by: 150_000, file:true)
reads.map { it -> [it.name - ~/\.fastq/, it] }
.into{tuple_reads ; tuple_reads2}
process pr1 { /*is an example my real process is more complex*/
echo true
input:
tuple val(id), path(file) from tuple_reads
output:
tuple val(id), file("example${id}.out") into example_test
script:
"""
echo example${id} > example${id}.out
"""
}
readss = tuple_reads2.join(example_test)
I join the channels and I obtain something like this:
[illumina.1, /home/qs/work/../illumina.1.fastq, /home/qs/work/../exampleillumina.1.out]
[illumina.2, /home/qs/work/../illumina.2.fastq, /home/qs/work/../exampleillumina.2.out]
[illumina.3, /home/qs/work/../illumina.3.fastq, /home/qs/work/../exampleillumina.3.out]
Now, I have a channel with my id, the fastq file, and the out from process pr1, perfect for me, but this is the problem now, I need to create other process to run with a static file.
I need that every id run with the static_file but I don't know how do this. I need a new channel with something like this:
[illumina.1, /home/qs/work/../illumina.1.fastq, /home/qs/work/../exampleillumina.1.out,/home/qs/work/../static_file.txt]
[illumina.2, /home/qs/work/../illumina.2.fastq, /home/qs/work/../exampleillumina.2.out,/home/qs/work/../static_file.txt]
[illumina.3, /home/qs/work/../illumina.3.fastq, /home/qs/work/../exampleillumina.3.out,/home/qs/work/../static_file.txt]
or I need a process that repet the static file with every run.
The below code only run with the first element from the tuple :( (I tried with each but doesn't work.
process pr2 {
echo true
input:
tuple val(id), path(fastq_file), path(out_file) from example_test
path(st_file) from static_file
script:
"""
echo ${id} ${st_file}
"""
}
Thanks!!

You just need to make sure your second channel (i.e. the one for your static file) is a value channel. You didn't show how the static_file channel came was created, but you'll get the behaviour you're seeing if it's a regular queue channel, see here: Understand how multiple input channels work. To fix your example, all you need is:
static_file = file(params.static)
process pr2 {
echo true
input:
tuple val(id), path(fastq_file), path(out_file) from example_test
path(st_file) from static_file
script:
"""
echo ${id} ${st_file}
"""
}
Which is the same as:
static_file = file(params.static)
process pr2 {
echo true
input:
tuple val(id), path(fastq_file), path(out_file) from example_test
path static_file
script:
"""
echo ${id} ${static_file}
"""
}

channel checks as empty even if it has content

I am trying to have a process that is launched only if a combination of conditions is met, but when checking if a channel has a path to a file, it always returns it as empty. Probably I am doing something wrong, in that case please correct my code. I tried to follow some of the suggestions in this issue but no success.
Consider the following minimal example:
process one {
output:
file("test.txt") into _chProcessTwo
script:
"""
echo "Hello world" > "test.txt"
"""
}
// making a copy so I check first if something in the channel or not
// avoids raising exception of MultipleInputChannel
_chProcessTwo.into{
_chProcessTwoView;
_chProcessTwoCheck;
_chProcessTwoUse
}
//print contents of channel
println "Channel contents: " + _chProcessTwoView.toList().view()
process two {
input:
file(myInput) from _chProcessTwoUse
when:
(!_chProcessTwoCheck.toList().isEmpty())
script:
def test = _chProcessTwoUse.toList().isEmpty() ? "I'm empty" : "I'm NOT empty"
println "The outcome is: " + test
}
I want to have process two run if and only if there is a file in the _chProcessTwo channel.
If I run the above code I obtain:
marius#dev:~/pipeline$ ./bin/nextflow run test.nf
N E X T F L O W ~ version 19.09.0-edge
Launching `test.nf` [infallible_gutenberg] - revision: 9f57464dc1
[c8/bf38f5] process > one [100%] 1 of 1 ✔
[- ] process > two -
[/home/marius/pipeline/work/c8/bf38f595d759686a497bb4a49e9778/test.txt]
where the last line are actually the contents of _chProcessTwoView
If I remove the when directive from the second process I get:
marius#mg-dev:~/pipeline$ ./bin/nextflow run test.nf
N E X T F L O W ~ version 19.09.0-edge
Launching `test.nf` [modest_descartes] - revision: 5b2bbfea6a
[57/1b7b97] process > one [100%] 1 of 1 ✔
[a9/e4b82d] process > two [100%] 1 of 1 ✔
[/home/marius/pipeline/work/57/1b7b979933ca9e936a3c0bb640c37e/test.txt]
with the contents of the second worker .command.log file being: The outcome is: I'm empty
I tried also without toList()
What am I doing wrong? Thank you in advance
Update: a workaround would be to check _chProcessTwoUse.view() != "" but that is pretty dirty
Update 2 as required by #Steve, I've updated the code to reflect a bit more the actual conditions i have in my own pipeline:
def runProcessOne = true
process one {
when:
runProcessOne
output:
file("inputProcessTwo.txt") into _chProcessTwo optional true
file("inputProcessThree.txt") into _chProcessThree optional true
script:
// this would replace the probability that output is not created
def outputSomething = false
"""
if ${outputSomething}; then
echo "Hello world" > "inputProcessTwo.txt"
echo "Goodbye world" > "inputProcessThree.txt"
else
echo "Sorry. Process one did not write to file."
fi
"""
}
// making a copy so I check first if something in the channel or not
// avoids raising exception of MultipleInputChannel
_chProcessTwo.into{
_chProcessTwoView;
_chProcessTwoCheck;
_chProcessTwoUse
}
//print contents of channel
println "Channel contents: " + _chProcessTwoView.view()
println _chProcessTwoView.view() ? "Me empty" : "NOT empty"
process two {
input:
file(myInput) from _chProcessTwoUse
when:
(runProcessOne)
script:
"""
echo "The outcome is: ${myInput}"
"""
}
process three {
input:
file(defaultInput) from _chUpstreamProcesses
file(inputFromProcessTwo) from _chProcessThree
script:
def extra_parameters = _chProcessThree.isEmpty() ? "" : "--extra-input " + inputFromProcessTwo
"""
echo "Hooray! We got: ${extra_parameters}"
"""
}
As #Steve mentioned, I should not even check if a channel is empty, NextFlow should know better to not initiate the process. But I think in this construct I will have to.
Marius

I think part of the problem here is that process 'one' creates only optional outputs. This makes dealing with the optional inputs in process 'three' a bit tricky. I would try to reconcile this if possible. If this can't be reconciled, then you'll need to deal with the optional inputs in process 'three'. To do this, you'll basically need to create a dummy file, pass it into the channel using the ifEmpty operator, then use the name of the dummy file to check whether or not to prepend the argument's prefix. It's a bit of a hack, but it works pretty well.
The first step is to actually create the dummy file. I like shareable pipelines, so I would just create this in your baseDir, perhaps under a folder called 'assets':
mkdir assets
touch assets/NO_FILE
Then pass in your dummy file if your '_chProcessThree' channel is empty:
params.dummy_file = "${baseDir}/assets/NO_FILE"
dummy_file = file(params.dummy_file)
process three {
input:
file(defaultInput) from _chUpstreamProcesses
file(optfile) from _chProcessThree.ifEmpty(dummy_file)
script:
def extra_parameters = optfile.name != 'NO_FILE' ? "--extra-input ${optfile}" : ''
"""
echo "Hooray! We got: ${extra_parameters}"
"""
}
Also, these lines are problematic:
//print contents of channel
println "Channel contents: " + _chProcessTwoView.view()
println _chProcessTwoView.view() ? "Me empty" : "NOT empty"
Calling view() will emit all values from the channel to stdout. You can ignore whatever value it returns. Unless you enable DSL2, the channel will then be empty. I think what you're looking for here is a closure:
_chProcessTwoView.view { "Found: $it" }
Be sure to append -ansi-log false to your nextflow run command so the output doesn't get clobbered. HTH.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

handle input from multiple processes example in nextflow dsl2 - nextflow

Related

Nextflow DSL2: how to combine outputs (channels) from multiple processes into input of another process by part of filename?

nextflow input and output a tuple with keys

Extract sample ids from nextflow fromPath()

Nextflow join file to tuple

channel checks as empty even if it has content

Categories

Resources