snakemake - replacing command line parameters with wildcards by cluster profile

snakemake - replacing command line parameters with wildcards by cluster profile - snakemake

I am writing a snakemake pipeline to eventually identify corona virus variants.
Below is a minimal example with three steps:
LOGDIR = '/path/to/logDir'
barcodes = ['barcode49', 'barcode50', 'barcode51']
rule all:
input:
expand([
# guppyplex
"out/guppyplex/{barcode}/{barcode}.fastq",
# catFasta
"out/catFasta/cat_consensus.fasta",
], barcode = barcodes)
rule guppyplex:
input:
FQ = f"fastq/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
output:
"out/guppyplex/{barcode}/{barcode}.fastq"
shell:
"touch {output}" # variables in CAPITALS are parsed from config.yaml
rule minion:
input:
INFQ = rules.guppyplex.output,
FAST5 = f"fasta/{{barcode}}"
params:
OUTDIR = "out/nanopolish/{barcode}"
output:
"out/nanopolish/{barcode}/{barcode}.consensus.fasta"
shell:
"""
touch {output} && echo {wildcards.barcode} > {output}
"""
rule catFasta:
input:
expand("out/nanopolish/{barcode}/{barcode}.consensus.fasta", barcode = barcodes)
output:
"out/catFasta/cat_consensus.fasta"
shell:
"cat {input} > {output}"
If I run the snakemake locally by calling snakemake -p --cores 1 all everything works. Yet my ultimate goal is to use qsub to run the jobs on a cluster. I also want the stderr and stdout from qsub to have meaningful names, which include wildcards and the rule names for each job.
However, if I call snakemake with
snakemake -p --cluster "qsub -q onlybngs05b -e {LOGDIR} -o {LOGDIR} -j y" -j 5 --jobname "{wildcards.barcode}.{rule}.{jobid}" all
I will get the following error:
AttributeError: 'Wildcards' object has no attribute 'barcode'
I have recently read the snakemake documentation where it appears that I could replace the command line parameters (--cluster "qsub -q onlybngs05b -e {LOGDIR} -o {LOGDIR} -j y" -j 5 --jobname "{wildcards.barcode}.{rule}.{jobid}") by a yaml file. Although the documentation is not all that clear to me.
I have created a config.yaml file at /home/user/.config/snakemake which looks like so:
cluster: 'qsub'
q: 'onlybngs05b'
e: '/home/ngs/tempOutSnakemake'
o: '/home/ngs/tempOutSnakemake'
j: 5
jobname: "{wildcards.barcode}.{rule}.{jobid}
But then it appears that snakemake is not properly parsing the config.yaml. I am getting
snakemake: error: ambiguous option: --o=/home/ngs/tempOutSnakemake could match --omit-from, --output-wait, --overwrite-shellcmd
I also tried to replace o in the config file by stdout (kind of the long version of the parameter (-h vs --help for several programs), though it does not work.
Therefore my question is how I can replace the command line parameters --cluster "qsub -q onlybngs05b -e {LOGDIR} -o {LOGDIR} -j y" -j 5 --jobname "{wildcards.barcode}.{rule}.{jobid}" by a config.yaml file that accepts wildcards?

I think the problem is that rule catFasta doesn't contain the wildcard barcode. If you think about it, what job name would you expect in {wildcards.barcode}.{rule}.{jobid}?
Maybe a solution could be to add to each rule a jobname parameter that could be {barcode} for guppyplex and minion and 'all_barcodes' for catFasta. Then use --jobname "{params.jobname}.{rule}.{jobid}"

Related

Snakemake Megahit output issue

A few days ago I started using Snakemake for the first time. I am having an issue when I am trying to run the megahit rule in my pipeline.
It gives me the following error "Outputs of incorrect type (directories when expecting files or vice versa). Output directories must be flagged with directory(). ......"
So initially it runs and then crashes with the above error. I implemented the solution with the directory() option in my pipeline but I think its not a good practice since, for various reasons, you can loose files without even knowing it.
Is there a way to run the rule without using the directory() ?
I would appreciate any help on the issue!
Thanking you in advance
sra = []
with open("run_ids") as f:
for line in f:
sra.append(line.strip())
rule all:
input:
expand("raw_reads/{sample}/{sample}.fastq", sample=sra),
expand("trimmo/{sample}/{sample}.trimmed.fastq", sample=sra),
expand("megahit/{sample}/final.contigs.fa", sample=sra)
rule download:
output:
"raw_reads/{sample}/{sample}.fastq"
params:
"--split-spot --skip-technical"
log:
"logs/fasterq-dump/{sample}.log"
benchmark:
"benchmarks/fastqdump/{sample}.fasterq-dump.benchmark.txt"
threads: 8
shell:
"""
fasterq-dump {params} --outdir /home/raw_reads/{wildcards.sample} {wildcards.sample} -e {threads}
"""
rule trim:
input:
"raw_reads/{sample}/{sample}.fastq"
output:
"trimmo/{sample}/{sample}.trimmed.fastq"
params:
"HEADCROP:15 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36"
log:
"logs/trimmo/{sample}.log"
benchmark:
"benchmarks/trimmo/{sample}.trimmo.benchmark.txt"
threads: 6
shell:
"""
trimmomatic SE -phred33 -threads {threads} {input} trimmo/{wildcards.sample}/{wildcards.sample}.trimmed.fastq {params}
"""
rule megahit:
input:
"trimmo/{sample}/{sample}.trimmed.fastq"
output:
"megahit/{sample}/final.contigs.fa"
params:
"-m 0.7 -t"
log:
"logs/megahit/{sample}.log"
benchmark:
"benchmarks/megahit/{sample}.megahit.benchmark.txt"
threads: 10
shell:
"""
megahit -r {input} -o {output} -t {threads}
"""

IMHO it is a bad design of the megahit software that it takes a directory as a parameter and outputs into a file in this directory with a hardcoded name. Flagging the filename with directory() doesn't solve the issue, as in this case what you expect to be a file with the .fa extension megahit treats as a directory. The rest of the pipeline is broken in this case.
But this issue can be solved in Snakemake like that:
rule megahit:
input:
"trimmo/{sample}/{sample}.trimmed.fastq"
output:
"megahit/{sample}/final.contigs.fa"
# ...
shell:
"""
megahit -r {input} -o megahit/{wildcards.sample} -t {threads}
"""

A better design of the megahit rule would look as follows:
rule megahit:
input:
"trimmo/{sample}/{sample}.trimmed.fastq"
output:
out_dir = directory("megahit/{sample}/"),
fasta = "megahit/{sample}/final.contigs.fa"
log:
"logs/megahit/{sample}.log"
benchmark:
"benchmarks/megahit/{sample}.megahit.benchmark.txt"
threads:
10
shell:
"megahit -r {input} -f -o {output.out_dir} -t {threads}"
This guarantees that the output directory is removed upon failure, while the -f argument to megahit tells it to ignore the fact that the output folder exists (it is created by Snakemake automatically because one of the outputs is a file inside it: final.contigs.fa).
BTW, the -m (--memory) parameter is best implemented as a resource. The only problem though is that snakemake's default resource, mem_mb is in megabytes. One workaround would be as follows:
resources:
mem_mb = mem_mb_limit_for_megahit # could be a fraction of a global constant
params:
mem_bytes = lambda w, resources: round(resources.mem_mb * 1e6)
shell:
"megahit ... -m {params.mem_bytes}"

snakemake - define input for aggregate rule without wildcards

I am writing a snakemake to produce Sars-Cov-2 variants from Nanopore sequencing. The pipeline that I am writing is based on the artic network, so I am using artic guppyplex and artic minion.
The snakemake that I wrote has the following steps:
zip all the fastq files for all barcodes (rule zipFq)
perform read filtering with guppyplex (rule guppyplex)
call the artic minion pipeline (rule minion)
move the stderr and stdout from qsub to a folder under the working directory (rule mvQsubLogs)
Below is the snakemake that I wrote so far, which works
barcodes = ['barcode49', 'barcode50', 'barcode51']
rule all:
input:
expand([
# zip fq
"zipFastq/{barcode}/{barcode}.zip",
# guppyplex
"guppyplex/{barcode}/{barcode}.fastq",
# nanopolish
"nanopolish/{barcode}",
# directory where the logs will be moved to
"logs/{barcode}"
], barcode = barcodes)
rule zipFq:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}"
output:
"zipFastq/{barcode}/{barcode}.zip"
shell:
"zip {output} {input.FQ}/*"
rule guppyplex:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
output:
"guppyplex/{barcode}/{barcode}.fastq"
shell:
"/home/ngs/miniconda3/envs/artic-ncov2019/bin/artic guppyplex --skip-quality-check --min-length {MINLENGTHGUPPY} --max-length {MAXLENGTHGUPPY} --directory {input.FQ} --prefix {wildcards.barcode} --output {output}" # variables in CAPITALS are parsed from config.yaml
rule minion:
input:
INFQ = rules.guppyplex.output,
FAST5 = f"{FAST5_PATH}/{{barcode}}"
params:
OUTDIR = "nanopolish/{barcode}"
output:
directory("nanopolish/{barcode}")
shell:
"""
mkdir {params.OUTDIR};
cd {params.OUTDIR};
export PATH=/home/ngs/miniconda3/envs/artic-ncov2019/bin:$PATH;
artic minion --normalise {NANOPOLISH_NORMALISE} --threads {THREADS} --scheme-directory {PRIMERSDIR} --read-file ../../{input.INFQ} --sequencing-summary {Seq_Sum} --fast5-directory {input.FAST5} nCoV-2019/{PRIMERVERSION} {wildcards.barcode} # variables in CAPITALS are parsed from config.yaml
"""
rule mvQsubLogs:
input:
# zipFQ
rules.zipFq.output,
# guppyplex
rules.guppyplex.output,
# nanopolish
rules.minion.output
output:
directory("logs/{barcode}")
shell:
"mkdir -p {output} \n"
"mv {LOGDIR}/{wildcards.barcode}* {output}/"
The above snakemake works and now I am trying to add another rule, but the difference here is that this rule is an aggregate function i.e. it should not be called for every barcode, but only once after all the rules are called for all barcodes
The rule that I am trying to incorporate (catFasta) would cat all {barcode}.consensus.fasta (generated by rule minion) into in a single file, as shown below (incorporated into the snakemake above):
barcodes = ['barcode49', 'barcode50', 'barcode51']
rule all:
input:
expand([
# zip fq
"zipFastq/{barcode}/{barcode}.zip",
# guppyplex
"guppyplex/{barcode}/{barcode}.fastq",
# nanopolish
"nanopolish/{barcode}",
# catFasta
"catFasta/cat_consensus.fasta",
# directory where the logs will be moved to
"logs/{barcode}"
], barcode = barcodes)
rule zipFq:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}"
output:
"zipFastq/{barcode}/{barcode}.zip"
shell:
"zip {output} {input.FQ}/*"
rule guppyplex:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
output:
"guppyplex/{barcode}/{barcode}.fastq"
shell:
"/home/ngs/miniconda3/envs/artic-ncov2019/bin/artic guppyplex --skip-quality-check --min-length {MINLENGTHGUPPY} --max-length {MAXLENGTHGUPPY} --directory {input.FQ} --prefix {wildcards.barcode} --output {output}" # variables in CAPITALS are parsed from config.yaml
rule minion:
input:
INFQ = rules.guppyplex.output,
FAST5 = f"{FAST5_PATH}/{{barcode}}"
params:
OUTDIR = "nanopolish/{barcode}"
output:
directory("nanopolish/{barcode}")
shell:
"""
mkdir {params.OUTDIR};
cd {params.OUTDIR};
export PATH=/home/ngs/miniconda3/envs/artic-ncov2019/bin:$PATH;
artic minion --normalise {NANOPOLISH_NORMALISE} --threads {THREADS} --scheme-directory {PRIMERSDIR} --read-file ../../{input.INFQ} --sequencing-summary {Seq_Sum} --fast5-directory {input.FAST5} nCoV-2019/{PRIMERVERSION} {wildcards.barcode} # variables in CAPITALS are parsed from config.yaml
"""
rule catFasta:
input:
expand("nanopolish/{barcode}/{barcode}.consensus.fasta", barcode = barcodes)
output:
"catFasta/cat_consensus.fasta"
shell:
"cat {input} > {output}"
rule mvQsubLogs:
input:
# zipFQ
rules.zipFq.output,
# guppyplex
rules.guppyplex.output,
# nanopolish
rules.minion.output,
# catFasta
rules.catFasta.output
output:
directory("logs/{barcode}")
shell:
"mkdir -p {output} \n"
"mv {LOGDIR}/{wildcards.barcode}* {output}/"
However, when I call snakemake with
(artic-ncov2019) ngs#bngs05b:/nexusb/SC2/ONT/scripts/SnakeMake> snakemake -np -s Snakefile_v2 --cluster "qsub -q onlybngs05b -e {LOGDIR} -o {LOGDIR} -j y" -j 5 --jobname "{wildcards.barcode}.{rule}.{jobid}" all # LOGDIR parsed from config.yaml
I get:
Building DAG of jobs...
MissingInputException in line 178 of /nexusb/SC2/ONT/scripts/SnakeMake/Snakefile_v2:
Missing input files for rule guppyplex:
/nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/barcode49/barcode49.consensus.fasta
Which I don't find easy to understand: snakemake is complaining about /nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/barcode49/barcode49.consensus.fasta whereas /nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/ is FASTQ_PATH and I am not defining f"{FASTQ_PATH}/{{barcode}}.consensus.fasta" anywhere
A very same problem is described here, though the strategy in the accepted answer (the input for rule catFasta would be expand("nanopolish/{{barcode}}/{{barcode}}.consensus.fasta")) does not work for me.
Does anyone know how I can circumvent this?

The rule that fails is rule guppyplex, which looks for an input in the form of {FASTQ_PATH}/{{barcode}}.
Looks like the wildcard {barcode} is filled with barcode49/barcode49.consensus.fasta, which happened because of two reasons I think:
First (and most important): The workflow does not find a better way to produce the final output. In rule catFasta, you give an input file which is never described as an output in your workflow. The rule minion has the directory as an output, but not the file, and it is not perfectly clear for the workflow where to produce this input file.
It therefore infers that the {barcode} wildcard somehow has to contain this .consensus.fasta that it has never seen before. This wildcard is then handed over to the top, where the workflow crashes since it cannot find a matching input file.
Second: This initialisation of the wildcard with sth. you don't want is only possible since you did not constrain the wildcard properly. You can for example forbid the wildcard to contain a . (see wildcard_constraints here)
However, the main problem is that catFasta does not find the desired input. I'd suggest changing the output of minion to "nanopolish/{barcode}/{barcode}.consensus.fasta", since the you already take the OUTDIR from the params, that should not hurt your rule here.
Edit: Dummy test example:
barcodes = ['barcode49', 'barcode50', 'barcode51']
rule all:
input:
expand([
# guppyplex
"guppyplex/{barcode}/{barcode}.fastq",
# catFasta
"catFasta/cat_consensus.fasta",
], barcode = barcodes)
rule guppyplex:
input:
FQ = f"fastq/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
output:
"guppyplex/{barcode}/{barcode}.fastq"
shell:
"touch {output}" # variables in CAPITALS are parsed from config.yaml
rule minion:
input:
INFQ = rules.guppyplex.output,
FAST5 = f"fasta/{{barcode}}"
params:
OUTDIR = "nanopolish/{barcode}"
output:
"nanopolish/{barcode}/{barcode}.consensus.fasta"
shell:
"""
touch {output} && echo {wildcards.barcode} > {output}
"""
rule catFasta:
input:
expand("nanopolish/{barcode}/{barcode}.consensus.fasta", barcode = barcodes)
output:
"catFasta/cat_consensus.fasta"
shell:
"cat {input} > {output}"

Does snakemake support none file Input?

I get a MissingInputException when I run the following rule:
configfile: "Configs.yaml"
rule download_data_from_ZFIN:
input:
anatomy_item = config["ZFIN_url"]["anatomy_item"],
xpat_stage_anatomy = config["ZFIN_url"]["xpat_stage_anatomy"],
xpat_fish = config["ZFIN_url"]["xpat_fish"],
anatomy_synonyms = config["ZFIN_url"]["anatomy_synonyms"]
output:
anatomy_item = os.path.join(os.getcwd(), config["download_data_from_ZFIN"]["dir"], "anatomy_item.tsv"),
xpat_stage_anatomy = os.path.join(os.getcwd(), config["download_data_from_ZFIN"]["dir"], "xpat_stage_anatomy.tsv"),
xpat_fish = os.path.join(os.getcwd(), config["download_data_from_ZFIN"]["dir"], "xpat_fish.tsv"),
anatomy_synonyms = os.path.join(os.getcwd(), config["download_data_from_ZFIN"]["dir"], "anatomy_synonyms.tsv")
shell:
"wget -O {output.anatomy_item} {input.anatomy_item};" \
"wget -O {output.anatomy_synonyms} {input.anatomy_synonyms};" \
"wget -O {output.xpat_stage_anatomy} {input.xpat_stage_anatomy};" \
"wget -O {output.xpat_fish} {input.xpat_fish};"
And this is the content of my configs.yaml file:
ZFIN_url:
# Zebrafish Anatomy Term
anatomy_item: "https://zfin.org/downloads/file/anatomy_item.txt"
# Zebrafish Gene Expression by Stage and Anatomy Term
xpat_stage_anatomy: "https://zfin.org/downloads/file/xpat_stage_anatomy.txt"
# ZFIN Genes with Expression Assay Records
xpat_fish: "https://zfin.org/downloads/file/xpat_fish.txt"
# Zebrafish Anatomy Term Synonyms
anatomy_synonyms: "https://zfin.org/downloads/file/anatomy_synonyms.txt"
download_data_from_ZFIN:
dir: ZFIN_data
The error message is:
Building DAG of jobs...
MissingInputException in line 10 of /home/zhangdong/works/NGS/coevolution/snakemake/coevolution.rule:
Missing input files for rule download_data_from_ZFIN:
https://zfin.org/downloads/file/anatomy_item.txt
I want to make sure that if this exception is caused by none file input for the input rule?

Note that you can also use remote files as input so you may avoid rule download_data_from_ZFIN altogether. E.g.:
from snakemake.remote.HTTP import RemoteProvider as HTTPRemoteProvider
HTTP = HTTPRemoteProvider()
rule all:
input:
'output.txt',
rule one:
input:
# Some file from the web
x= HTTP.remote('https://plasmodb.org/common/downloads/release-49/PbergheiANKA/txt/PlasmoDB-49_PbergheiANKA_CodonUsage.txt', keep_local=True)
output:
'output.txt',
shell:
r"""
# Do something with the remote file
head {input.x} > {output}
"""
The remote file will be downloaded and stored locally under plasmodb.org/common/.../PlasmoDB-49_PbergheiANKA_CodonUsage.txt

Many thanks #dariober, I tried the follwing code and it worked,
import os
from snakemake.remote.HTTP import RemoteProvider as HTTPRemoteProvider
configfile: "Configs.yaml"
HTTP = HTTPRemoteProvider()
rule all:
input:
expand(os.path.join(os.getcwd(),config["download_data_from_ZFIN"]["dir"],"{item}.tsv"),
item=list(config["ZFIN_url"].keys()))
rule download_data_from_ZFIN:
input:
lambda wildcards: HTTP.remote(config["ZFIN_url"][wildcards.item], keep_local=True)
output:
os.path.join(os.getcwd(),config["download_data_from_ZFIN"]["dir"],"{item}.tsv")
threads:
1
shell:
"mv {input} > {output}"
Such code is more snakemake-like, but I have two further questions:
Is there a way to specify the output file name for the downloading? Now I use the mv command to achieve that.
Does this remote files function support parallel works? I tried the above code together with --cores 6, but it still download the file one by one.

snakemake running single jobs in parallel from all files in folder

My problem is related to Running parallel instances of a single job/rule on Snakemake but I believe different.
I cannot create a all: rule for it in advance because the folder of input files will be created by a previous rule and depends on the user initial data
pseudocode
rule1: get a big file (OK)
rule2: split the file in parts in Split folder (OK)
rule3: run a program on each file created in Split
I am now at rule3 with Split containing 70 files like
Split/file_001.fq
Split/file_002.fq
..
Split/file_069.fq
Could you please help me creating a rule for pigz to run compress the 70 files in parallel to 70 .gz files
I am running with snakemake -j 24 ZipSplit
config["pigt"] gives 4 threads for each compression job and I give 24 threads to snakemake so I expect 6 parallel compressions but my current rule merges the inputs to one archive in a single job instead of parallelizing !?
Should I build the list of input fully in the rule? how?
# parallel job
files, = glob_wildcards("Split/{x}.fq")
rule ZipSplit:
input: expand("Split/{x}.fq", x=files)
threads: config["pigt"]
shell:
"""
pigz -k -p {threads} {input}
"""
I tried to define input directly with
input: glob_wildcards("Split/{x}.fq")
but syntax error occures
# InSilico_PCR Snakefile
import os
import re
from snakemake.remote.HTTP import RemoteProvider as HTTPRemoteProvider
HTTP = HTTPRemoteProvider()
# source config variables
configfile: "config.yaml"
# single job
rule GetRawData:
input:
HTTP.remote(os.path.join(config["host"], config["infile"]), keep_local=True, allow_redirects=True)
output:
os.path.join("RawData", config["infile"])
run:
shell("cp {input} {output}")
# single job
rule SplitFastq:
input:
os.path.join("RawData", config["infile"])
params:
lines_per_file = config["lines_per_file"]
output:
pfx = os.path.join("Split", config["infile"] + "_")
shell:
"""
zcat {input} | split --numeric-suffixes --additional-suffix=.fq -a 3 -l {params.lines_per_file} - {output.pfx}
"""
# parallel job
files, = glob_wildcards("Split/{x}.fq")
rule ZipSplit:
input: expand("Split/{x}.fq", x=files)
threads: config["pigt"]
shell:
"""
pigz -k -p {threads} {input}
"""

I think the example below should do it, using checkpoints as suggested by #Maarten-vd-Sande.
However, in your particular case of splitting a big file and compress the output on the fly, you may be better off using the --filter option of split as in
split -a 3 -d -l 4 --filter='gzip -c > $FILE.fastq.gz' bigfile.fastq split/
The snakemake solution, assuming your input file is called bigfile.fastq, split and compress output will be in directory splitting./bigfile/
rule all:
input:
expand("{sample}.split.done", sample= ['bigfile']),
checkpoint splitting:
input:
"{sample}.fastq"
output:
directory("splitting/{sample}")
shell:
r"""
mkdir splitting/{wildcards.sample}
split -a 3 -d --additional-suffix .fastq -l 4 {input} splitting/{wildcards.sample}/
"""
rule compress:
input:
"splitting/{sample}/{i}.fastq",
output:
"splitting/{sample}/{i}.fastq.gz",
shell:
r"""
gzip -c {input} > {output}
"""
def aggregate_input(wildcards):
checkpoint_output = checkpoints.splitting.get(**wildcards).output[0]
return expand("splitting/{sample}/{i}.fastq.gz",
sample=wildcards.sample,
i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fastq")).i)
rule all_done:
input:
aggregate_input
output:
touch("{sample}.split.done")

rule not picked up by snakemake

I'm starting with snakemake. I managed to define some rules which I can run indepently, but not in a workflow. Maybe the issue is that they have unrelated inputs and outputs.
My current workflow is like this:
configfile: './config.yaml'
rule all:
input: dynamic("task/{job}/taskOutput.tab")
rule split_input:
input: "input_fasta/snp.fa"
output: dynamic("task/{job}/taskInput.fa")
shell:
"rm -Rf tasktmp task; \
mkdir tasktmp task; \
split -l 200 -d {input} ./tasktmp/; \
ls tasktmp | awk '{{print \"mkdir task/\"$0}}' | sh; \
ls tasktmp | awk '{{print \"mv ./tasktmp/\"$0\" ./task/\"$0\"/taskInput.fa\"}}' | sh"
rule task:
input: "task/{job}/taskInput.fa"
output: "task/{job}/taskOutput.tab"
shell: "cp {input} {output}"
rule make_parameter_file:
output:
"par/parameters.txt
shell:
"rm -Rf par;mkdir par; \
echo \"\
minimumFlankLength=5\n\
maximumFlankLength=200\n\
alignmentLengthDifference=2\
allowedMismatch=4\n\
allowedProxyMismatch=2\n\
allowedIndel=3\n\
ambiguitiesAsMatch=1\n\" \
> par/parameters.txt"
rule build_target:
input:
"./my_target"
output:
touch("build_target.done")
shell:
"build_target -template format_nt -source {input} -target my_target"
If I call this as such:
snakemake -p -s snakefile
The first three rules are being executed, the others not.
I can run the last rule by specifying it as an argument.
snakemake -p -s snakefile build_target
But I don't see how I can run all.
Thanks a lot for any suggestion on how to solve this.

By default snakemake executes only the first rule of a snakefile. Here it is rule all. In order to produce rule all's input dynamic("task/{job}/taskOutput.tab"), it needs to run the following two rules task and split_input, and so it does.
If you want the other rules to be run as well, you should put their output in rule all, eg.:
rule all:
input:
dynamic("task/{job}/taskOutput.tab"),
"par/parameters.txt",
"build_target.done"

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

snakemake - replacing command line parameters with wildcards by cluster profile - snakemake

Related

Snakemake Megahit output issue

snakemake - define input for aggregate rule without wildcards

Does snakemake support none file Input?

snakemake running single jobs in parallel from all files in folder

rule not picked up by snakemake

Categories

Resources