Nextflow adding def function in to script - nextflow

I have got errors like .command.sh: line 2: syntax error near unexpected token `('
/*
* Step 3
*/
chr_length = file(params.chr_length)
process create_bedgraph_and_bigwig {
publishDir "${params.outdir}/bedgraphandbigwig", mode: 'copy'
input:
set val(sample_id), file(vector_log) from vector_log_ch
set val(sample_id), file(target_query_bam) from target_query_bam_ch
file chr_length
output:
set val(sample_id), file("${sample_id}.bedgraph.log.txt") into bed_log_ch
set val(sample_id), file("${sample_id}.bed") into bed_ch
set val(sample_id), file("${sample_id}.clean.bed") into clean_bed_ch
set val(sample_id), file("${sample_id}.fragments.bed") into fragments_bed_ch
set val(sample_id), file("${sample_id}.sorted.fragments.bed") into sorted_fragments_bed_ch
shell:
'''
def fp = file(${vector_log})
def lines = fp.readLines()
def line3 = lines[3].split(' ')[4].toInteger()
def line4 = lines[4].split(' ')[4].toInteger()
def aln_sum = (10000/(line3 + line4)).toString()
bedtools bamtobed -bedpe -i !{target_query_bam} > !{sample_id}.bed 2>!{sample_id}.bedgraph.log.txt
awk '$1==$4 && $6-$2 < 1000 {{print $0}}' !{sample_id}.bed > !{sample_id}.clean.bed 2>!{sample_id}.bedgraph.log.txt
cut -f 1,2,6 !{sample_id}.clean.bed > !{sample_id}.fragments.bed 2>!{sample_id}.bedgraph.log.txt
sort -k 1,1 !{sample_id}.fragments.bed > !{sample_id}.sorted.fragments.bed
'''
}

The simple answer is to avoid using 'def' if the variable needs to be used in a shell definition or template. I couldn't actually find this after a quick search of the documentation, but I did find this note from the author:
Using groovy native string interpolation that would work, but when using the !{..} syntax scripts variable cannot be declared locally using the def keyword.
To summarise:
script/shell variable should be defensively declared in the local scope using the def keyboard
do not use def when:
i. the variable needs to be referenced as a output value
ii. the variable needs to be used in a shell template
https://github.com/nextflow-io/nextflow/issues/678#issuecomment-386206123

Related

Why do I get a `java.nio.file.ProviderMismatchException` when I access `isEmpty()` on a staged file

I am getting a java.nio.file.ProviderMismatchException when I run the following script:
process a {
output:
file _biosample_id optional true into biosample_id
script:
"""
touch _biosample_id
"""
}
process b {
input:
file _biosample_id from biosample_id.ifEmpty{file("_biosample_id")}
script:
def biosample_id_option = _biosample_id.isEmpty() ? '' : "--biosample_id \$(cat _biosample_id)"
"""
echo \$(cat ${_biosample_id})
"""
}
i'm using a slightly modified version of Optional Input pattern.
Any ideas on why I'm getting the java.nio.file.ProviderMismatchException?
In your script block, _biosample_id is actually an instance of the nextflow.processor.TaskPath class. So to check if the file (or directory) is empty you can just call it's .empty() method. For example:
script:
def biosample_id_option = _biosample_id.empty() ? '' : "--biosample_id \$(< _biosample_id)"
I like your solution - I think it's neat. And I think it should be robust (but I haven't tested it). The optional input pattern that is recommended will fail when attempting to stage missing input files to a remote filesystem/object store. There is a solution however, which is to keep an empty file in your $baseDir and point to it in your scripts. For example:
params.inputs = 'prots/*{1,2,3}.fa'
params.filter = "${baseDir}/assets/null/NO_FILE"
prots_ch = Channel.fromPath(params.inputs)
opt_file = file(params.filter)
process foo {
input:
file seq from prots_ch
file opt from opt_file
script:
def filter = opt.name != 'NO_FILE' ? "--filter $opt" : ''
"""
your_commad --input $seq $filter
"""
}

Extract sample ids from nextflow fromPath()

I am new to nextflow and here is a practice that I wanted to test for a real job.
#!/usr/bin/env nextflow
params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns)
cns_ch.view()
The output of this script is:
N E X T F L O W ~ version 21.04.0
Launching `cnvkit_call.nf` [festering_wescoff] - revision: 886ab3cf13
/data1/deliver/phase2/CNVkit/002-002_L4_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/015-002_L4.SSHT89_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/004-005_L1_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/018-008_L1.SSHT31_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/003-002_L3_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/002-004_L6_sorted_dedup.cns
Here 002-002, 015-002, 004-005 etc are sample ids. I am trying to write a simple process to output a file such as ${sample.id}_sorted_dedup.calls.cns but I am not sure how to extract these ids and output it.
process cnvcalls {
input:
file(cns_file) from cns_ch
output:
file("${sample.id}_sorted_dedup.calls.cns") into cnscalls_ch
script:
"""
cnvkit.py call ${cns_file} -o ${sample.id}_sorted_dedup.calls.cns
"""
}
How to revise the process cnvcalls to make it work with sample.id?
There's lots of ways to extract the sample names/ids from filenames. One way could be to split on the underscore and take the first element:
params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns)
process cnvcalls {
input:
path(cns_file) from cns_ch
output:
path("${sample_id}_sorted_dedup.calls.cns") into cnscalls_ch
script:
sample_id = cns_file.name.split('_')[0]
"""
cnvkit.py call "${cns_file}" -o "${sample_id}_sorted_dedup.calls.cns"
"""
}
Though, my preference would be to input the sample name/id alongside the input file using a tuple:
params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns).map {
tuple( it.name.split('_')[0], it )
}
process cnvcalls {
input:
tuple val(sample_id), path(cns_file) from cns_ch
output:
path "${sample_id}_sorted_dedup.calls.cns" into cnscalls_ch
"""
cnvkit.py call "${cns_file}" -o "${sample_id}_sorted_dedup.calls.cns"
"""
}

channel checks as empty even if it has content

I am trying to have a process that is launched only if a combination of conditions is met, but when checking if a channel has a path to a file, it always returns it as empty. Probably I am doing something wrong, in that case please correct my code. I tried to follow some of the suggestions in this issue but no success.
Consider the following minimal example:
process one {
output:
file("test.txt") into _chProcessTwo
script:
"""
echo "Hello world" > "test.txt"
"""
}
// making a copy so I check first if something in the channel or not
// avoids raising exception of MultipleInputChannel
_chProcessTwo.into{
_chProcessTwoView;
_chProcessTwoCheck;
_chProcessTwoUse
}
//print contents of channel
println "Channel contents: " + _chProcessTwoView.toList().view()
process two {
input:
file(myInput) from _chProcessTwoUse
when:
(!_chProcessTwoCheck.toList().isEmpty())
script:
def test = _chProcessTwoUse.toList().isEmpty() ? "I'm empty" : "I'm NOT empty"
println "The outcome is: " + test
}
I want to have process two run if and only if there is a file in the _chProcessTwo channel.
If I run the above code I obtain:
marius#dev:~/pipeline$ ./bin/nextflow run test.nf
N E X T F L O W ~ version 19.09.0-edge
Launching `test.nf` [infallible_gutenberg] - revision: 9f57464dc1
[c8/bf38f5] process > one [100%] 1 of 1 ✔
[- ] process > two -
[/home/marius/pipeline/work/c8/bf38f595d759686a497bb4a49e9778/test.txt]
where the last line are actually the contents of _chProcessTwoView
If I remove the when directive from the second process I get:
marius#mg-dev:~/pipeline$ ./bin/nextflow run test.nf
N E X T F L O W ~ version 19.09.0-edge
Launching `test.nf` [modest_descartes] - revision: 5b2bbfea6a
[57/1b7b97] process > one [100%] 1 of 1 ✔
[a9/e4b82d] process > two [100%] 1 of 1 ✔
[/home/marius/pipeline/work/57/1b7b979933ca9e936a3c0bb640c37e/test.txt]
with the contents of the second worker .command.log file being: The outcome is: I'm empty
I tried also without toList()
What am I doing wrong? Thank you in advance
Update: a workaround would be to check _chProcessTwoUse.view() != "" but that is pretty dirty
Update 2 as required by #Steve, I've updated the code to reflect a bit more the actual conditions i have in my own pipeline:
def runProcessOne = true
process one {
when:
runProcessOne
output:
file("inputProcessTwo.txt") into _chProcessTwo optional true
file("inputProcessThree.txt") into _chProcessThree optional true
script:
// this would replace the probability that output is not created
def outputSomething = false
"""
if ${outputSomething}; then
echo "Hello world" > "inputProcessTwo.txt"
echo "Goodbye world" > "inputProcessThree.txt"
else
echo "Sorry. Process one did not write to file."
fi
"""
}
// making a copy so I check first if something in the channel or not
// avoids raising exception of MultipleInputChannel
_chProcessTwo.into{
_chProcessTwoView;
_chProcessTwoCheck;
_chProcessTwoUse
}
//print contents of channel
println "Channel contents: " + _chProcessTwoView.view()
println _chProcessTwoView.view() ? "Me empty" : "NOT empty"
process two {
input:
file(myInput) from _chProcessTwoUse
when:
(runProcessOne)
script:
"""
echo "The outcome is: ${myInput}"
"""
}
process three {
input:
file(defaultInput) from _chUpstreamProcesses
file(inputFromProcessTwo) from _chProcessThree
script:
def extra_parameters = _chProcessThree.isEmpty() ? "" : "--extra-input " + inputFromProcessTwo
"""
echo "Hooray! We got: ${extra_parameters}"
"""
}
As #Steve mentioned, I should not even check if a channel is empty, NextFlow should know better to not initiate the process. But I think in this construct I will have to.
Marius
I think part of the problem here is that process 'one' creates only optional outputs. This makes dealing with the optional inputs in process 'three' a bit tricky. I would try to reconcile this if possible. If this can't be reconciled, then you'll need to deal with the optional inputs in process 'three'. To do this, you'll basically need to create a dummy file, pass it into the channel using the ifEmpty operator, then use the name of the dummy file to check whether or not to prepend the argument's prefix. It's a bit of a hack, but it works pretty well.
The first step is to actually create the dummy file. I like shareable pipelines, so I would just create this in your baseDir, perhaps under a folder called 'assets':
mkdir assets
touch assets/NO_FILE
Then pass in your dummy file if your '_chProcessThree' channel is empty:
params.dummy_file = "${baseDir}/assets/NO_FILE"
dummy_file = file(params.dummy_file)
process three {
input:
file(defaultInput) from _chUpstreamProcesses
file(optfile) from _chProcessThree.ifEmpty(dummy_file)
script:
def extra_parameters = optfile.name != 'NO_FILE' ? "--extra-input ${optfile}" : ''
"""
echo "Hooray! We got: ${extra_parameters}"
"""
}
Also, these lines are problematic:
//print contents of channel
println "Channel contents: " + _chProcessTwoView.view()
println _chProcessTwoView.view() ? "Me empty" : "NOT empty"
Calling view() will emit all values from the channel to stdout. You can ignore whatever value it returns. Unless you enable DSL2, the channel will then be empty. I think what you're looking for here is a closure:
_chProcessTwoView.view { "Found: $it" }
Be sure to append -ansi-log false to your nextflow run command so the output doesn't get clobbered. HTH.

How to fix UnboundLocalError: local variable 'file_lines' referenced before assignment

So this is my function which is meant to read the lines of text from a file.
This is extracted from a larger program hence some of the comments may seem out of place. Anyways I need to use the functions text and file_lines in numerous other functions but even after declaring them as global I still get the UnboundLocalError: local variable 'file_lines' referenced before assignment error and I don't know what to do.
import sys
text = []
case = ''
file_lines = []
def read_file(file): # function to read a file and split it into lines
global text #sets variable text as a global variable for use in multiple locations
global case #handles case sensitivity.
try: #tests the statement after colon
open(file)
except:
print('oops no file found bearing that name')
else:
while file == '': #if the file name is blank print error, this prevents program from crashing
print ('error')
filename = input('enter a file and its extension ie file.ext\n>>')
with open(file) as f : #opens filewith name
text = f.read() #read all lines of program text.
print ("file successfully read")
print('TEXT SENSITIVITY TURNED ON !!!!!!')
text = text.split('\n')# breaks lines in file into list instead of using ".readlines"
# so that the program doesn't count blank lines
case == True
global file_lines
file_lines = text
a function that tries to use the read_lines variable would be
def find_words(words):
line_num = 0 # to count the line number
number_of_word = 0
if case == False:
words = words.lower()
file_lines = [file_lines.lower() for file_lines in text]
while "" in file_lines:
file_lines.remove('')
for lines in file_lines:
line_num += 1
if words in lines: #checks each for the words being looks for
print(line_num,"...", text[line_num-1])
number_of_word = 1
if number_of_word == 0: #to check if the word is located in the file
print('Words not found')
In your function find_words you forgot to specify that find_lines is global. Try
def find_words(words):
global file_lines
line_num = 0 # to count the line number
The function errors because file_lines is not defined within the scope of find_words otherwise.

Define a variable and set it to a default value if something goes wrong during definition

I have the following code in my build.gradle
Contents in version.properties are:
buildVersion=1.2.3
Value of $v variable during the Gradle build is coming as: 1.2.3
Value of $artifactoryVersion variable in JENKINS build is coming as: 1.2.3.1, 1.2.3.2, 1.2.3.x ... and so on where the 4th digit is Jenkins BUILD_NUMBER available to gradle build script during Jenkins build.
BUT, when I'm running this build.gradle on my desktop where I dont have BUILD_NUMBER variable available or set in my ENVIRONMENT variables, I get an error saying trim() can't work on null. (as there's no BUILD_NUMBER for Desktop/local build).
I'm trying to find a way i.e.
What should I code in my script so that if BUILD_NUMBER is not available, then instead of gradle build processing failing for an error, it'd set jenkinsBuild = "0" (hard coded) otherwise, pick what it gets during Jenkins build.
For ex: in Bash, we set a variable var1=${BUILD_NUMBER:-"0"} which will set var1 to a valid Jenkins BUILD number if it's available and set to a value, otherwise if it's NULL, then var1 = "0".
I DON'T want to have each developer/user set this BUILD_NUMBER in some property file. All I want is, if this variable doesn't exist, then the code should put "0" in jenkinsBuilds variable and doesn't error out during desktop builds. I know during Jenkins build, it's working fine.
// Build Script
def fname = new File( 'version.properties' )
Properties props = new Properties()
props.load( new FileInputStream( fname ) )
def v = props.get( 'buildVersion' )
def env = System.getenv()
def jenkinsBuild = env['BUILD_NUMBER'].trim()
if( jenkinsBuild.length() > 0 ) {
artifactoryVersion = "$v.$jenkinsBuild"
}
All you need is some regular Java/Groovy code:
def jenkinsBuild = System.getenv("BUILD_NUMBER") ?: "0"
The code above uses Groovy's "elvis" operator, and is a shorthand for the following code, which uses Java's ternary operator:
def buildNumber = System.getenv("BUILD_NUMBER")
def jenkinsBuild = buildNumber != null ? buildNumber : "0"
Here's the answer to using a Java plain object (JDK8):
public class Sample {
private String region;
private String fruit;
public Sample() {
region = System.getenv().getOrDefault("REGION", null);
fruit = System.getenv().getOrDefault("FRUIT", "apple");
}
}
With the Env-Inject plugin you can get and set build parameters.
For example, under "Inject environment variables to the build process", add a Groovy script such as:
def paramsMap = [:]
def build = Thread.currentThread().executable
def my_var = build.getEnvVars()["MY_PARAM"]
if (!my_var) paramsMap.put("MY_PARAM", "default value")
// Return parameters map
out.println("Injecting parameters:\n" + paramsMap)
return paramsMap