Subworkflow as an implicit dependency - snakemake

I've designed a subworkflow that works well if it is specified as an explicit input in the rule all:
rule all:
input: anotherworkflow("data/a.txt")
Now I need to inject this subworkflow as an intermediate rule of my workflow: I need to employ the subworkflow if any other rule needs the file data/a.txt. The subworkflow can be specified only in the input part of the rule, so I design a "bridge" rule:
rule all:
input: ".work/data/a.txt"
subworkflow anotherworkflow:
workdir: ".work"
snakefile: "another.Snakefile"
rule bridge:
input: anotherworkflow("data/a.txt")
output: ".work/data/a.txt"
I expect that the bridge rule declares as an output the same file that the subworkflow produces, and that would allow me to run the subworkflow whenever any rule needs the file. But that doesn't work. The exception is MissingOutputException, "Missing files after 5 seconds".
Am I doing something wrong? Should this approach work? Does this work on any other OS (I'm using Windows + MinGW)?

Related

snakemake rule won't run the complete command

I am working on this snakemake pipeline where the last rule looks like this:
rule RunCodeml:
input:
'{ProjectFolder}/{Fastas}/codeml.ctl'
output:
'{ProjectFolder}/{Fastas}/codeml.out'
shell:
'codeml {input}'
This rule does not run and the error seems to be that the program codeml can't find the .ctl file because it looks for an incomplete path: '/work_beegfs/sunam133/Anas_plasmids/Phylo_chromosome/Acinet_only/Klebs_Esc_SCUG/cluster_536/co'
although the command seems correct:
shell:
codeml /work_beegfs/sunam133/Anas_plasmids/Phylo_chromosome/Acinet_only/Klebs_Esc_SCUG/cluster_536/codeml.ctl
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)'
And here the output from running with -p option:
error when opening file /work_beegfs/sunam133/Anas_plasmids/Phylo_chromosome/Acinet_only/Klebs_Esc_SCUG/cluster_1083/co
tell me the full path-name of the file? Can't find the file. I give up.
I find this behavior very strange and I can't figure out what is going on. Any help would be appreciated.
Thanks!
D.
Ok, so the problem was not snakemake but the program I am calling (codeml), which is restricted in the length of the string given as path to the control file.

Is it possible to dynamically add rules to 'localrules:'? [duplicate]

This question already has an answer here:
snakemake list rules to execute in cluster and local
(1 answer)
Closed 2 years ago.
The example of localrules: given here
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#local-rules
shows how to explicitly list rules that should be executed on the host node.
Is there a way to add rules at runtime, conditioning them on parameters of the workflow?
e.g. something that would look like the following?
my_local_rules = ['rule1', 'rule_2']
if condition:
my_local_rules.append('rule3')
localrules: my_local_rules
I've tried the following, which runs with a warning, indicating that the localrules directive is not doing what I'd like.
local_rules_list = ['all']
if True:
local_rules_list.append('test')
localrules: local_rules_list
rule all:
input:
# The first rule should define the default target files
# Subsequent target rules can be specified below. They should start with all_*.
"results/test.out"
rule test:
input:
"workflow/Snakefile"
output:
"results/test.out"
shell:
"cp {input} {output}"
Specifically, it fails with the following error:
localrules directive specifies rules that are not present in the Snakefile:
local_rules_list
My specific use case is the software Cell Ranger which has both a local mode and a cluster mode.
In local mode, the cellranger command is submitted as a job on a compute node.
In cluster mode, the cellranger command should run on the head node, as cellranger itself handles the job submission to the compute nodes.
I would like my workflow to let users choose which mode to run Cell Ranger (e.g. local or sge) and as a result, if mode: sge the workflow would add the rule that runs cellranger to localrules: ....
Is that possible, or can local rules only be hard coded in the Snakefile?
Best,
Kevin
Thanks
The 'hack' suggested in the post linked in #Maarten-vd-Sande's comment works fine for me.
I have adapted the suggested hack as follows, in this toy example.
Namely:
rule all is explicitly set in the localrules directive
rule test1 is added to the set of local rules using the hack, demonstrating the use of a dummy condition, as I illustrated in my question
rule test2 is here as a negative control that is not added to list of local rules at any point
# The main entry point of your workflow.
# After configuring, running snakemake -n in a clone of this repository should successfully execute a dry-run of the workflow.
report: "report/workflow.rst"
# Allow users to fix the underlying OS via singularity.
singularity: "docker://continuumio/miniconda3"
localrules: all
rule all:
input:
# The first rule should define the default target files
# Subsequent target rules can be specified below. They should start with all_*.
"results/test1.out",
"results/test2.out"
rule test1:
input:
"workflow/Snakefile"
output:
"results/test1.out"
shell:
"cp {input} {output}"
rule test2:
input:
"workflow/Snakefile"
output:
"results/test2.out"
shell:
"cp {input} {output}"
include: "rules/common.smk"
include: "rules/other.smk"
# Conditionally add rules to the directive 'localrules'
_localrules = list(workflow._localrules) # get the local rules so far
if True: # set condition here
_localrules.append('test1') # add rules as required
workflow._localrules = set(_localrules) # set the updated local rules
Thanks!

Using snakemake to copy a file to multiple directories, where a wildcard is used for part of the name of the target

I am trying to use snakemake to copy a file to multiple directories, and I need to be able to use a wildcard for part of the name of the target. Previously I had tried this with 'dirs' specified in the Snakefile (this is an example, the actual application has 15 directories).
dirs=['k_1','k2_10']
rule all:
input:
expand("{f}/practice_phased_reversed.vcf",f=dirs)
rule r1:
input:
"practice_phased_reversed.vcf"
output:
"{f}/{input}"
shell:
"cp {input} {output}"
This copies the file as desired. However the filename must be given in rule all. How can I change this so that I can specify a target on the command line using a wildcard for part of the name?
I then tried this (below), with the command "snakemake practice_phased_reversed.vcf", but it gave an error : "MissingRuleException: No rule to produce practice_phased_reversed.vcf"
dirs=['k_1','k2_10']
rule all:
input:
expand("{f}/{{base}}_phased_reversed.vcf",f=dirs)
rule r1:
input:
"{base}_phased_reversed.vcf"
output:
"{f}/{input}"
shell:
"cp {input} {output}"
Is there a way to fix this so I can use the command line and a wildcard. Thanks for any help.
I'd suggest a few changes. Your second snakefile won't be able to resolve the rule all since it still includes a wildcard base. You would need to provide that in the config file or via command line.
However, if you just want to express targets by the command line, you don't need to worry about the rule all. In rule r1, you probably want to expand output; I don't think referencing input works and I'm surprised it's not an error...
So:
rule r1:
input:
"{base}_phased_reversed.vcf"
output:
"{f}/{base}_phased_reversed.vcf"
shell:
"cp {input} {output}"
snakemake ./test_phased_reversed.vcf will still be an error because it's trying to make a file as the input and output of the same rule. I agree the error isn't very informative as the input file does exist. Maybe under the hood snakemake eliminates rule r1 from consideration due to the matching inputs/outputs? snakemake test/test_phased_reversed.vcf gives a copy in the subdirectory.
Hope that's clear. I don't quite get what you're trying to accomplish though!

shell() function in run does not use singularity

EDIT
I have now posted this question as an issue on the Snakemake bitbucket given this seems to be an unknown behavior.
I am using snakemake with the --use-singularity option.
When I use a classic rule of the form:
singularity: mycontainer
rule myrule:
input:
output:
shell:
"somecommand"
with the somecommand only present in the singularity container, everything goes fine.
However, when I need to use some python code in the run part of the rule, the command is not found.
rule myrule:
input:
output:
run:
some python code here
shell("somecommand")
The only workaround I found is to use
shell("singularity exec mycontainer somecommand")
but this is not optimal.
I am either missing something, such as an option, or this is a missing feature in snakemake.
What I would like to obtain is to use the shell() function with the --use-singularity option.
Snakemake doesn't allow using --use-conda with run block and this is why:
The run block of a rule (see Rules) has access to anything defined in the Snakefile, outside of the rule. Hence, it has to share the conda environment with the main Snakemake process. To avoid confusion we therefore disallow the conda directive together with the run block. It is recommended to use the script directive instead (see External scripts).
I bet --use-singularity is not allowed with run block for the same reason.

Snakemake subworkflow result is not found if I specify a target in subfolder

Let's regard two snakefiles, one main file and one subworkflow:
./Snakefile:
subworkflow sub:
workdir: "."
snakemake: "subworkflow/Snakefile"
rule all:
input: sub("subresult")
./subworkflow/Snakefile:
rule sub_all:
output: "subresult"
shell: "touch {output}"
This code works pretty well. Now let's introduce a small change: substitute "subresult" with "./subresult" in the main file:
output: "./subresult"
That still works, but if I make the same change in the subworkflow, I get the exception:
MissingRuleException:
No rule to produce subresult
The same exception is thrown if I specify any other subfolder in the output of the subworkflow's rule:
subworkflow sub:
workdir: "."
snakemake: "subworkflow/Snakefile"
rule all:
input: sub("ANY_PATH/subresult")
rule sub_all:
output: "ANY_PATH/subresult"
shell: "touch {output}"
I guess this is not a normal behavior. Is there anything wrong in my code? Is there a way to specify subworkflow's target in a subfolder?
OS: Windows + MinGW
Python 3.6.5
Snakemake 5.4.5, 5.2
Update:
I tried the example provided by #JeeYem, and even data subdirectory didn't work on my system. After some investigation I found that this is a platform-specific problem for Windows or Windows/MinGW combination. Below is the code that works and shows the problem (I left the original code commented for comparison):
File Snakefile:
subworkflow otherworkflow:
workdir:
"."
snakefile:
"kingmaker.Snakefile"
rule all:
input:
otherworkflow('data/a.txt')
Subworkflow file kingmaker.Snakefile:
rule write_file:
output:
#'data/a.txt'
'data\\a.txt'
shell:
#'touch {output}'
'touch data/a.txt'
Note that I even cannot use {output} variable in the shell section.
I will submit a ticket to the Snakemake repository.
Based on my testing, my guess would be you are using ./ in the beginning of those paths in input or/and output, which is what causing the problem. I am not sure of the exact reason, but snakemake seems to point to the cause of the problem (see end of the answer).
In my example scripts as shown below, I can use subdirectory data without any problem in both Snakefiles. However, if I use ./data (ie. ./ in the beginning of subdirectory), snakemake has trouble working properly.
File Snakefile:
subworkflow otherworkflow:
workdir:
"."
snakefile:
"kingmaker.Snakefile"
rule all:
input:
otherworkflow('data/a.txt')
Subworkflow file kingmaker.Snakefile:
rule write_file:
output:
'data/a.txt'
shell:
'touch {output}'
In fact, if you run the subworkflow by itself (ie. snakemake -s kingmaker.Snakefile) with ./ in the beginning of the output path, snakemake strongly discourages its usage with this warning:
Relative file path './data/a.txt' starts with './'.
This is redundant and strongly discouraged.
It can also lead to inconsistent results of the file-matching approach used by Snakemake.
You can simply omit the './' for relative file paths.
I am using snakemake v5.4.0 on mac.