nextflow: change part of the script basing on a parameter - nextflow

I have a Nextflow workflow that's like this (reduced):
params.filter_pass = true
// ... more stuff
process concatenate_vcf {
cpus 6
input:
file(vcf_files) from source_vcf.collect()
file(tabix_files) from source_vcf_tbi.collect()
output:
file("assembled.vcf.gz") into decompose_ch
script:
"""
echo ${vcf_files} | tr " " "\n" > vcflist
bcftools merge \
-l vcflist \
-m none \
-f PASS,. \
--threads ${task.cpus} \
-O z \
-o assembled.vcf.gz
rm -f vcflist
"""
}
Now, I want to add the -f PASS,. part of the command in the script in the bcftools merge call only if params.filter_pass is true.
In other words, the script would be executed like this, if params.filter_pass is true (other lines removed for clarity):
bcftools merge \
-l vcflist \
-m none \
-f PASS,. \
--threads ${task.cpus} \
-O z \
-o assembled.vcf.gz
and if it instead params.filter_pass is false:
bcftools merge \
-l vcflist \
-m none \
--threads ${task.cpus} \
-O z \
-o assembled.vcf.gz
I know I can use conditional scripts but that would mean replicating the whole script stanza just to change one parameter.
Is this use case possible with Nextflow?

The general pattern is to use a local variable in the 'script' block and a ternary operator to add the -f PASS,. filter option when params.filter_pass is true:
process concatenate_vcf {
...
script:
def filter_pass = params.filter_pass ? '-f PASS,.' : ''
"""
echo "${vcf_files.join('\n')}" > vcf.list
bcftools merge \\
-l vcf.list \\
-m none \\
${filter_pass} \\
--threads ${task.cpus} \\
-O z \\
-o assembled.vcf.gz
"""
}
An if/else statement could also be used in place of the ternary operator if preferred.

Related

Delete unwanted Snakemake Outputs

I have looked at a few other post about Snakemake and deleting unneeded data to clean up diskspace. I have designed a rule called: rule BamRemove that touches my rule all. However, my the workflow manager isnt recognizing. I am getting this error: WildcardError in line 35 of /PATH:
No values given for wildcard 'SampleID'. I am not seeing why. Any help to get this to work would be nice.
sampleIDs = d.keys()
rule all:
input:
expand('bams/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam', sampleID=sampleIDs),
expand('bams/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam.bai', sampleID=sampleIDs),
expand('logs/{SampleID}_removed.txt', sampleID=sampleIDs) #Line 35
# Some tools require unzipped fastqs
rule AnnotateUMI:
input: 'bams/{sampleID}_unisamp_L001_001.star_rg_added.sorted.dmark.bam'
output: 'bams/{sampleID}_L001_001.star_rg_added.sorted.dmark.bam.UMI.bam',
# Modify each run
params: '/data/Test/fastqs/{sampleID}_unisamp_L001_UMI.fastq.gz'
threads: 32
run:
# Each user needs to set tool path
shell('java -Xmx220g -jar /data/Tools/fgbio-2.0.0.jar AnnotateBamWithUmis \
-i {input} \
-f {params} \
-o {output}')
rule SortSam:
input: rules.AnnotateUMI.output
output: 'bams/{sampleID}_Qsorted.MarkUMI.bam'
params:
threads: 32
run:
# Each user needs to set tool path
shell('java -Xmx110g -jar /data/Tools/picard.jar SortSam \
INPUT={input} \
OUTPUT={output} \
SORT_ORDER=queryname')
rule MItag:
input: rules.SortSam.output
output: 'bams/{sampleID}_Qsorted.MarkUMI.MQ.bam'
params:
threads: 32
run:
# Each user needs to set tool path
shell('java -Xmx220g -jar /data/Tools/fgbio-2.0.0.jar SetMateInformation \
-i {input} \
-o {output}')
rule GroupUMI:
input: rules.MItag.output
output: 'bams/{sampleID}_grouped.Qsorted.MarkUMI.MQ.bam'
params:
threads: 32
run:
# Each user needs to set tool path
shell('java -Xmx220g -jar /data/Tools/fgbio-2.0.0.jar GroupReadsByUmi \
-i {input} \
-s adjacency \
-e 1 \
-m 20 \
-o {output}')
rule ConcensusUMI:
input: rules.GroupUMI.output
output: 'bams/{sampleID}_concensus.Qunsorted.MarkUMI.MQ.bam'
params:
threads: 32
run:
# Each user needs to set tool path
shell('java -Xmx220g -jar /data/Tools/fgbio-2.0.2.jar CallMolecularConsensusReads \
--input={input} \
--min-reads=1 \
--output={output}')
rule STARmap:
input: rules.ConcensusUMI.output
output:
log = 'bams/{sampleID}_UMI_Concensus_Log.final.out',
bam = 'bams/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam'
params: 'bams/{sampleID}_UMI_Concensus_'
threads: 32
run:
# Each user needs to genome path
shell('STAR \
--runThreadN {threads} \
--readFilesIn {input} \
--readFilesType SAM PE \
--readFilesCommand samtools view -h \
--genomeDir /data/reference/star/STAR_hg19_v2.7.5c \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--limitBAMsortRAM 220000000000 \
--outFileNamePrefix {params}')
rule Index:
input: rules.STARmap.output.bam
output: 'bams/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam.bai'
params:
threads: 32
run:
shell('samtools index {input}')
rule BamRemove:
input:
AnnotateUMI_BAM = rules.AnnotateUMI.output,
AnnotateUMI_BAI = '{sampleID}_L001_001.star_rg_added.sorted.dmark.bam.UMI.bai',
SortSam = rules.SortSam.output,
MItag = rules.MItag.output,
GroupUMI = rules.GroupUMI.output,
ConcensusUMI = rules.ConcensusUMI.output
output: touch('logs/{SampleID}_removed.txt')
threads: 32
run:
shell('rm {input}')
expand('logs/{SampleID}_removed.txt', sampleID=sampleIDs) #Line 35
^^^ ^^^
The error is due to SampleID being different from sampleID, make them consistent throughout the script.

Have Snakemake recognize complete files upon relaunch

I have created this Snakemake workflow. This pipeline works really well; however, if any rule fails and I relaunch, Snakemake isnt recognizing all completed files. For instances, Sample A finishes all the way through and creates all files for rule all, but Sample B fails at rule Annotate UMI. When I relaunch, snakemake wants to do all jobs for both A and B, instead of just B. What do I need to get this to work?
sampleIDs = [A, B]
rule all:
input:
expand('PATH/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam', sampleID=sampleIDs),
expand('PATH/bams/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam.bai', sampleID=sampleIDs),
expand('/PATH/logfiles/{sampleID}_removed.txt', sampleID=sampleIDs)
# Some tools require unzipped fastqs
rule AnnotateUMI:
# Modify each run
input: 'PATH/{sampleID}_unisamp_L001_001.star_rg_added.sorted.dmark.bam'
# Modify each run
output: 'PATH/{sampleID}_L001_001.star_rg_added.sorted.dmark.bam.UMI.bam'
# Modify each run
params: 'PATH/{sampleID}_unisamp_L001_UMI.fastq.gz'
threads: 36
run:
# Each user needs to set tool path
shell('java -Xmx220g -jar PATH/fgbio-2.0.0.jar AnnotateBamWithUmis \
-i {input} \
-f {params} \
-o {output}')
rule SortSam:
input: rules.AnnotateUMI.output
# Modify each run
output: 'PATH/{sampleID}_Qsorted.MarkUMI.bam'
params:
threads: 32
run:
# Each user needs to set tool path
shell('java -Xmx110g -jar PATH/picard.jar SortSam \
INPUT={input} \
OUTPUT={output} \
SORT_ORDER=queryname')
rule MItag:
input: rules.SortSam.output
# Modify each run
output: 'PATH/{sampleID}_Qsorted.MarkUMI.MQ.bam'
params:
threads: 32
run:
# Each user needs to set tool path
shell('java -Xmx220g -jar PATH/fgbio-2.0.0.jar SetMateInformation \
-i {input} \
-o {output}')
rule GroupUMI:
input: rules.MItag.output
# Modify each run
output: 'PATH/{sampleID}_grouped.Qsorted.MarkUMI.MQ.bam'
params:
threads: 32
run:
# Each user needs to set tool path
shell('java -Xmx220g -jar PATH/fgbio-2.0.0.jar GroupReadsByUmi \
-i {input} \
-s adjacency \
-e 1 \
-m 20 \
-o {output}')
rule ConcensusUMI:
input: rules.GroupUMI.output
# Modify each run
output: 'PATH/{sampleID}_concensus.Qunsorted.MarkUMI.MQ.bam'
params:
threads: 32
run:
# Each user needs to set tool path
shell('java -Xmx220g -jar PATH/fgbio-2.0.2.jar CallMolecularConsensusReads \
--input={input} \
--min-reads=1 \
--output={output}')
rule STARmap:
input: rules.ConcensusUMI.output
# Modify each run
output:
log = 'PATH/{sampleID}_UMI_Concensus_Log.final.out',
bam = 'PATH/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam'
# Modify each run
params: 'PATH/{sampleID}_UMI_Concensus_'
threads: 32
run:
# Each user needs to genome path
shell('STAR \
--runThreadN {threads} \
--readFilesIn {input} \
--readFilesType SAM PE \
--readFilesCommand samtools view -h \
--genomeDir PATH/STAR_hg19_v2.7.5c \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--limitBAMsortRAM 220000000000 \
--outFileNamePrefix {params}')
rule Index:
input: rules.STARmap.output.bam
# Modify each run
output: 'PATH/{sampleID}_UMI_Concensus_Aligned.sortedByCoord.out.bam.bai'
params:
threads: 32
run:
shell('samtools index {input}')
rule BamRemove:
input:
AnnotateUMI_BAM = rules.AnnotateUMI.output,
# Modify each run and include in future version to delete
#AnnotateUMI_BAI = 'PATH/{sampleID}_L001_001.star_rg_added.sorted.dmark.bam.UMI.bai',
SortSam = rules.SortSam.output,
MItag = rules.MItag.output,
GroupUMI = rules.GroupUMI.output,
ConcensusUMI = rules.ConcensusUMI.output,
STARmap = rules.STARmap.output.bam,
Index = rules.Index.output
# Modify each run
output: touch('PATH/logfiles/{sampleID}_removed.txt')
threads: 32
run:
shell('rm {input.AnnotateUMI_BAM} {input.SortSam} {input.MItag} {input.GroupUMI} {input.ConcensusUMI}')

Use makeindex with pandoc while creating a PDF from a Markdown file

I have got a markdown file and like to convert it to pdf. I need an index. For this I found makeindx
I am using this command for creating a pdf from a markdown file:
pandoc "test.md" \
--from markdown --to latex \
--toc \
--toc-depth=1 \
--include-in-header index.tex \
--include-in-header chapter_break.tex \
--include-in-header inline_code.tex \
--include-in-header bullet_style.tex \
--include-in-header pdf_properties.tex \
--highlight-style pygments.theme \
-V toc-title='Inhaltsverzeichnis' \
-V title='Mein Titel' \
-V author='Ich' \
-V book=true \
-V date="24. Mai 2021" \
-V classoption='oneside' \
--template ./eisvogel.latex \
--listings \
-o "test.pdf"
The template Eisvogel I found on github. I asked the question in an issue here, too.
The file index.tex has this content:
\usepackage{makeidx}
\makeindex
In my document I use
Installation\index{Installation}
and at the end
\printindex
Unfortunately, no index is printed.
Am I doing something wrong?
Edit I found a temporarily solution:
First I create a *.latex.
pandoc JoomlaEnPDF.en.md \
-o 1.en.tex \
--from markdown \
--template ./eisvogel.latex \
--listings \
--toc \
--toc-depth=1 \
-V toc-title="Content" \
--top-level-division="part" \
--number-sections \
Then I run the commands manually.
pdflatex 1.en.tex
pdflatex 1.en.tex
makeindex 1.en.idx
cat ./1.en.ind >> JoomlaEnPDF.en.md
Then I create the pdf
pandoc JoomlaEnPDF.en.md \
-o 1.en.pdf \
--from markdown \
--template ./eisvogel.latex \
--listings \
--toc \
--toc-depth=1 \
-V toc-title="Table of Contents" \
--top-level-division="part" \
--number-sections \

How to convert configure options for use with cmake

I have a script for building a project that I need to upgrade from using configure to cmake. The original configure command is
CFLAGS="$SLKCFLAGS" \
CXXFLAGS="$SLKCFLAGS" \
./configure \
--with-clang \
--prefix=$PREFIX \
--libdir=$PREFIX/lib${LIBDIRSUFFIX} \
--incdir=$PREFIX/include \
--mandir=$PREFIX/man/man1 \
--etcdir=$PREFIX/etc/root \
--docdir=/usr/doc/$PRGNAM-$VERSION \
--enable-roofit \
--enable-unuran \
--disable-builtin-freetype \
--disable-builtin-ftgl \
--disable-builtin-glew \
--disable-builtin-pcre \
--disable-builtin-zlib \
--disable-builtin-lzma \
$GSL_FLAGS \
$FFTW_FLAGS \
$QT_FLAGS \
--enable-shared \
--build=$ARCH-slackware-linux
I am not familiar enough with cmake to know how to do the equivalent. I would prefer a command line option but am open to modifying the CMakeLists.txt file as well.

OCLint got compile errors in html report file, but my project build success. WHY

xcodebuild -workspace ${myworkspace} -scheme ${myscheme} \
-sdk iphonesimulator \
-derivedDataPath ./build/derivedData \
-configuration Debug \
COMPILER_INDEX_STORE_ENABLE=NO \
| xcpretty -r json-compilation-database -o compile_commands.json
I run the command line above to build my project, it build success, but when i run the command line below to generate oclint html report file, get 15 compiler errors.
oclint-json-compilation-database -e Pods -- \
-extra-arg=-Wno-error=everything \
-report-type html \
-rc LONG_LINE=200 \
-rule MultipleUnaryOperator \
-max-priority-1=0 \
-max-priority-2=10 \
-max-priority-3=20 \
-o ./oclint_report.html
try this
oclint-json-compilation-database <YOUR OPTIONS HERE> -- -extra-arg=-Wno-everything
as:
oclint-json-compilation-database -e Pods -- \
-extra-arg=-Wno-error=everything \
-report-type html \
-rc LONG_LINE=200 \
-rule MultipleUnaryOperator \
-max-priority-1=0 \
-max-priority-2=10 \
-max-priority-3=20 \
-extra-arg=-Wno-everything \
-o ./oclint_report.html