snakemake --list-code-changes does not propagate into subworkflows - snakemake

I use
snakemake -R `snakemake --list-code-changes`
to rerun rules in which the code has changed. However, this does not seem to work when using subworkflows, for example:
Main Snakefile:
rule all:
input: "out"
subworkflow subworkflow:
workdir: "subworkflow"
rule run_subworkflow:
input: subworkflow("subworkflow_output")
output: "out"
shell: "cat {input} > {output}"
Subworkflow Snakefile:
rule all:
input: "subworkflow_output"
rule generate_b:
output: "subworkflow_output"
shell: "echo 'subworkflow' > {output}"
Changes made in the shell command of the subworkflow are not detected.
Is this expected behaviour?

Related

Running different snakemake rules in parallel

I show below a pseudocode version of my snakefile. Snakemake rule A creates the input files for Snakemake rule B2 and I would like to run Snakemake rules B1 and B2 at the same time but am not having success. I can run this snakefile successfully on very small data without a problem (although the Snakemake rules B1 and B2 do not run in parallel) but once I give it larger data it fails to create the output for Snakemake rule B1. The commands between Snakemake rule B1 and B2 use the same program but have different arguments and input files so I didn't think they should be in the same rule.
rule all:
input: file_A_out, file_B1_out, file_B2_out, file_C_out
rule A:
input: file_A_in
output: file_A_out
log: file_A_log
shell: 'progA {input} --output {output}'
rule B1:
input: file_B1_in
output: file_B1_out
group: 'groupB'
log: file_B1_log
shell: 'progB {input} -x 100 -o {output}'
rule B2:
input: file_A_out
output: file_B2_out
group: 'groupB'
log: file_B2_log
shell: 'progB {input} -x 1 --y -o {output}'
rule C:
input: file_B1_out, file_B2_out
output: file_C_out
log: file_C_log
shell: 'progC {input[0]} {input[1]} -o {output}'
I thought using group to group the rules would indicate to Snakemake that the two rules can be ran at once. To execute snakemake I run nohup snakemake --cores 16 > log.txt 2>&1 & however, it only successfully runs rule B2 while the output of rule B1 is deemed corrupted. I have seen solutions on running one rule in parallel but what about running different rules in parallel?
Error in rule B1:
jobid: 2
input: 'file_B1_in'
output: 'file_B1_out'
log: 'file_B1_log'
(check log file(s) for error details)
shell: 'progB {input} -x 100 -o {output}'
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job B1 since they might be corrupted:
file_B1_out
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
The snakefile below runs rules A, B1, and B2 in parallel then runs rule C, as expected. Maybe there is something you are not showing us?
# Make dummy input files
touch file_A_in file_B1_in
# Run pipeline
snakemake -p -j 10
The snakefile:
rule all:
input: 'file_A_out', 'file_B1_out', 'file_B2_out', 'file_C_out'
rule A:
input: 'file_A_in'
output: 'file_A_out'
shell: 'sleep 10; echo {input} > {output}'
rule B1:
input: 'file_B1_in'
output: 'file_B1_out'
shell: 'sleep 10; echo {input} > {output}'
rule B2:
input: 'file_A_in'
output: 'file_B2_out'
shell: 'sleep 10; echo {input} > {output}'
rule C:
input: 'file_B1_out', 'file_B2_out'
output: 'file_C_out'
shell: 'sleep 10; echo {input[0]} {input[1]} > {output}'

How to write a Snakemake rule with no output?

My Snakefile has a print_to_screen rule with no explicit output file. The following is a simplified example:
rule all:
placeholder_output # What should I put here?
rule create_file:
output:
"file.txt"
shell:
"echo Hello World! > {output}"
rule print_to_screen: # This rule has no output
input:
"file.txt"
shell:
"cat {input}"
How can I write the print_to_screen rule so that it triggers other rules, meaning that:
it can be used as an input in other rules, so running snakemake {placeholder_output} also triggers the previous rule create_file?
it can be included in rule all, so running snakemake triggers all rules?
The output section of the rule is optional, but you need to make it the first rule (define it the first in your Snakefile) to take any effect:
rule print_to_screen:
input:
"file.txt"
shell:
"cat {input}"
rule create_file:
output:
"file.txt"
shell:
"echo Hello World! > {output}"
If you need some flexibility (for example you have several rules like that) you should use flags.

Use snakemake's localrules in wildcard specific manner

localrules can be used to run specific rule(s) locally instead of running it as a cluster job. Is it possible to define this in wildcard specific manner, in addition?
For example, in the example below, rule summer should be run locally to create file short_job.txt and run as a cluster job for file long_job.txt.
rule all:
input:
"long_job.txt",
"short_job.txt",
localrules: summer
rule summer:
output:
"{sample}.txt"
shell:
"touch {output}"
To solve this task I would use two separate rules:
rule all:
input:
"long_job.txt",
"short_job.txt",
rule summer:
output:
"{sample}.txt"
wildcard_constraints:
sample=".*long.*"
shell:
"touch {output}"
localrules: summer_local
rule summer_local:
output:
"{sample}.txt"
wildcard_constraints:
sample=".*short.*"
shell:
"touch {output}"

snakemake. How to pass target from command line when creating multiple targets

With help following a previous question, this code creates targets (copies of the file named "practice_phased_reversed.vcf" in each of two directories.
dirs=['k_1','k2_10']
rule all:
input:
expand("{f}/practice_phased_reversed.vcf",f=dirs)
rule r1:
input:
"practice_phased_reversed.vcf"
output:
"{f}/{input}"
shell:
"cp {input} {output}"
However, I would like to pass the target file on the snakemake command line.
I tried this (below), with the command "snakemake practice_phased_reversed.vcf", but it gave an error : "MissingRuleException: No rule to produce practice_phased_reversed.vcf"
dirs=['k_1','k2_10']
rule all:
input:
expand("{f}/{{base}}_phased_reversed.vcf",f=dirs)
rule r1:
input:
"{base}_phased_reversed.vcf"
output:
"{f}/{input}"
shell:
"cp {input} {output}"
Thanks for any help
I think you should pass the target file name as configuration option on the command line and use that option to construct the file names in the Snakefile:
target = config['target']
dirs = ['k_1','k2_10']
rule all:
input:
expand("{f}/%s" % target, f=dirs),
rule r1:
input:
target,
output:
"{f}/%s" % target,
shell:
"cp {input} {output}"
To be executed as:
snakemake -C target=practice_phased_reversed.vcf
Your target file practice_phased_reversed.vcf doesn't satisfy output requirements of rule r1. It is missing wildcard value for {f}.
Instead this following example, snakemake data/practice_phased_reversed.vcf, where data matches wildcard f, will work as expected.
Code:
rule r1:
input:
"{base}_phased_reversed.vcf"
output:
"{f}/{base}_phased_reversed.vcf"
shell:
"cp {input} {output}"

ChildIOException: error in snake make after running flye

So I have an issue when I run other programs after I ran flye in my snakemake pipeline. This is because the output from flye is a directory. My rules are as followd:
samples, = glob_wildcards("data/samples/{sample}.fastq")
rule all:
input:
[f"assembled/" for sample in samples],
[f"nanopolish/draft.fa" for sample in samples],
[f"nanopolish/reads.sorted.bam" for sample in samples],
[f"nanopolish/reads.indexed.sorted.bam" for sample in samples]
rule fly:
input:
"unzipped/read.fastq"
output:
directory("assembled/")
conda:
"envs/flye.yaml"
shell:
"flye --nano-corr {input} --genome-size 5m --out-dir {output}"
rule bwa:
input:
"assembled/assembly.fasta"
output:
"nanopolish/draft.fa"
conda:
"nanopolish.yaml"
shell:
"bwa index {input} {output}"
rule nanopolish:
input:
"nanopolish/draft.fa",
"zipped/zipped.gz"
output:
"nanopolish/reads.sorted.bam"
conda:
"nanopolish.yaml"
shell:
"bwa mem -x ont2d -t 8 {input} | samtools sort -o {output}"
there are a few steps before this but they work just fine. when I run this it gives the following error:
ChildIOException:
File/directory is a child to another output:
/home/fronglesquad/snakemake_poging_1/assembled
/home/fronglesquad/snakemake_poging_1/assembled/assembly.fasta
I have googled the error. All I could find there that its because snakemake doesnt work well with output directorys. But this tool needs a output directory to work. Does anyone know how to bypass this?
(I think) The problem lies somewhere else in your code.
You have defined two rules, the first that outputs directory assembled, the second that outputs assembled/assembly.fasta. Since the output of the second rule is always at least the directory assembled, Snakemake complains. You can solve it by using the directory as input:
rule second:
input:
"assembled"
output:
...
shell:
cat {input}/assembly.fasta > {output}