How to write a Snakemake rule with no output? - snakemake

My Snakefile has a print_to_screen rule with no explicit output file. The following is a simplified example:
rule all:
placeholder_output # What should I put here?
rule create_file:
output:
"file.txt"
shell:
"echo Hello World! > {output}"
rule print_to_screen: # This rule has no output
input:
"file.txt"
shell:
"cat {input}"
How can I write the print_to_screen rule so that it triggers other rules, meaning that:
it can be used as an input in other rules, so running snakemake {placeholder_output} also triggers the previous rule create_file?
it can be included in rule all, so running snakemake triggers all rules?

The output section of the rule is optional, but you need to make it the first rule (define it the first in your Snakefile) to take any effect:
rule print_to_screen:
input:
"file.txt"
shell:
"cat {input}"
rule create_file:
output:
"file.txt"
shell:
"echo Hello World! > {output}"
If you need some flexibility (for example you have several rules like that) you should use flags.

Related

Use snakemake's localrules in wildcard specific manner

localrules can be used to run specific rule(s) locally instead of running it as a cluster job. Is it possible to define this in wildcard specific manner, in addition?
For example, in the example below, rule summer should be run locally to create file short_job.txt and run as a cluster job for file long_job.txt.
rule all:
input:
"long_job.txt",
"short_job.txt",
localrules: summer
rule summer:
output:
"{sample}.txt"
shell:
"touch {output}"
To solve this task I would use two separate rules:
rule all:
input:
"long_job.txt",
"short_job.txt",
rule summer:
output:
"{sample}.txt"
wildcard_constraints:
sample=".*long.*"
shell:
"touch {output}"
localrules: summer_local
rule summer_local:
output:
"{sample}.txt"
wildcard_constraints:
sample=".*short.*"
shell:
"touch {output}"

snakemake. How to pass target from command line when creating multiple targets

With help following a previous question, this code creates targets (copies of the file named "practice_phased_reversed.vcf" in each of two directories.
dirs=['k_1','k2_10']
rule all:
input:
expand("{f}/practice_phased_reversed.vcf",f=dirs)
rule r1:
input:
"practice_phased_reversed.vcf"
output:
"{f}/{input}"
shell:
"cp {input} {output}"
However, I would like to pass the target file on the snakemake command line.
I tried this (below), with the command "snakemake practice_phased_reversed.vcf", but it gave an error : "MissingRuleException: No rule to produce practice_phased_reversed.vcf"
dirs=['k_1','k2_10']
rule all:
input:
expand("{f}/{{base}}_phased_reversed.vcf",f=dirs)
rule r1:
input:
"{base}_phased_reversed.vcf"
output:
"{f}/{input}"
shell:
"cp {input} {output}"
Thanks for any help
I think you should pass the target file name as configuration option on the command line and use that option to construct the file names in the Snakefile:
target = config['target']
dirs = ['k_1','k2_10']
rule all:
input:
expand("{f}/%s" % target, f=dirs),
rule r1:
input:
target,
output:
"{f}/%s" % target,
shell:
"cp {input} {output}"
To be executed as:
snakemake -C target=practice_phased_reversed.vcf
Your target file practice_phased_reversed.vcf doesn't satisfy output requirements of rule r1. It is missing wildcard value for {f}.
Instead this following example, snakemake data/practice_phased_reversed.vcf, where data matches wildcard f, will work as expected.
Code:
rule r1:
input:
"{base}_phased_reversed.vcf"
output:
"{f}/{base}_phased_reversed.vcf"
shell:
"cp {input} {output}"

snakemake --list-code-changes does not propagate into subworkflows

I use
snakemake -R `snakemake --list-code-changes`
to rerun rules in which the code has changed. However, this does not seem to work when using subworkflows, for example:
Main Snakefile:
rule all:
input: "out"
subworkflow subworkflow:
workdir: "subworkflow"
rule run_subworkflow:
input: subworkflow("subworkflow_output")
output: "out"
shell: "cat {input} > {output}"
Subworkflow Snakefile:
rule all:
input: "subworkflow_output"
rule generate_b:
output: "subworkflow_output"
shell: "echo 'subworkflow' > {output}"
Changes made in the shell command of the subworkflow are not detected.
Is this expected behaviour?

Using regex in snakemake wildcards

I'm using regex in snakemake wildcards but I've come accross an error that I don't understand.
In this shortened example it works:
rule graphviz:
input: "{graph}.dot"
output: "{graph}.{ext,(pdf|png)}"
shell: "dot -T{wildcards.ext} -o {output} {input}"
In this example, it doesn't:
## This is working
rule fastqc:
input: "{reads}.fastq"
output: "{reads}_fastqc/{sample}_fastqc.html"
shell:"fastqc --format fastq {input}"
## This is not working
rule fastqc:
input: "{reads}.{ext,(bam|fastq)}"
output: "{reads}_fastqc/{sample}_fastqc.html"
shell:"fastqc --format {wildcards.ext} {input}"
I'm attaching a screencap of the error message I'm getting. Thanks for your help.

Snakemake: remove output file

I don't see how to use a Snakemake rule to remove a Snakemake output file that has become useless.
In concrete terms, I have a rule bwa_mem_sam that creates a file named {sample}.sam.
I have this other rule, bwa_mem_bam that creates a file named {sample.bam}.
Has the two files contain the same information in different formats, I'd like to remove the first one cannot succeed doing this.
Any help would be very much appreciated.
Ben.
rule bwa_mem_map:
input:
sam="{sample}.sam",
bam="{sample}.bam"
shell:
"rm {input.sam}"
# Convert SAM to BAM.
rule bwa_mem_map_bam:
input:
rules.sam_to_bam.output
# Use bwa mem to map reads on a reference genome.
rule bwa_mem_map_sam:
input:
reference=reference_genome(),
index=reference_genome_index(),
fastq=lambda wildcards: config["units"][SAMPLE_TO_UNIT[wildcards.sample]],
output:
"mapping/{sample}.sam"
threads: 12
log:
"mapping/{sample}.log"
shell:
"{BWA} mem -t {threads} {input.reference} {input.fastq} > {output} 2> {log} "\
"|| (rc=$?; cat {log}; exit $rc;)"
rule sam_to_bam:
input:
"{prefix}.sam"
output:
"{prefix}.bam"
threads: 8
shell:
"{SAMTOOLS} view --threads {threads} -b {input} > {output}"
You don't need a rule to remove you sam files. Just mark the ouput sam file in "bwa_mem_map_sam" rule as temporary:
rule bwa_mem_map_sam:
input:
reference=reference_genome(),
index=reference_genome_index(),
fastq=lambda wildcards: config["units"][SAMPLE_TO_UNIT[wildcards.sample]],
output:
temp("mapping/{sample}.sam")
threads: 12
log:
"mapping/{sample}.log"
shell:
"{BWA} mem -t {threads} {input.reference} {input.fastq} > {output} 2> {log} "\
"|| (rc=$?; cat {log}; exit $rc;)"
as soon as a temp file is not needed anymore (ie: not used as input in any other rule), it will be removed by snakemake.
EDIT AFTER COMMENT:
If I understand correctly, your statement "if the user asks for a sam..." means the sam file is put in the target rule. If this is the case, then as long as the input of the target rule contains the sam file, the file won't be deleted (I guess). If the bam file is put in the target rule (and not the sam), then it will be deleted.
The other way is this:
rule bwa_mem_map:
input:
sam="{sample}.sam",
bam="{sample}.bam"
output:
touch("{sample}_samErased.txt")
shell:
"rm {input.sam}"
and ask for "{sample}_samErased.txt" in the target rule.
Based on the comments above, you want to ask the user if he wants a sam or bam output.
You could use this as a config argument:
snakemake --config output_format=sam
Then you use this kind Snakefile:
samples = ['A','B']
rule all:
input:
expand('{sample}.mapped.{output_format}', sample=samples, output_format=config['output_format'])
rule bwa:
input: '{sample}.fastq'
output: temp('{sample}.mapped.sam')
shell:
"""touch {output}"""
rule sam_to_bam:
input: '{sample}.mapped.sam'
output: '{sample}.mapped.bam'
shell:
"""touch {output}"""