I am working on this snakemake pipeline where the last rule looks like this:
rule RunCodeml:
input:
'{ProjectFolder}/{Fastas}/codeml.ctl'
output:
'{ProjectFolder}/{Fastas}/codeml.out'
shell:
'codeml {input}'
This rule does not run and the error seems to be that the program codeml can't find the .ctl file because it looks for an incomplete path: '/work_beegfs/sunam133/Anas_plasmids/Phylo_chromosome/Acinet_only/Klebs_Esc_SCUG/cluster_536/co'
although the command seems correct:
shell:
codeml /work_beegfs/sunam133/Anas_plasmids/Phylo_chromosome/Acinet_only/Klebs_Esc_SCUG/cluster_536/codeml.ctl
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)'
And here the output from running with -p option:
error when opening file /work_beegfs/sunam133/Anas_plasmids/Phylo_chromosome/Acinet_only/Klebs_Esc_SCUG/cluster_1083/co
tell me the full path-name of the file? Can't find the file. I give up.
I find this behavior very strange and I can't figure out what is going on. Any help would be appreciated.
Thanks!
D.
Ok, so the problem was not snakemake but the program I am calling (codeml), which is restricted in the length of the string given as path to the control file.
Related
I am trying to use snakemake to copy a file to multiple directories, and I need to be able to use a wildcard for part of the name of the target. Previously I had tried this with 'dirs' specified in the Snakefile (this is an example, the actual application has 15 directories).
dirs=['k_1','k2_10']
rule all:
input:
expand("{f}/practice_phased_reversed.vcf",f=dirs)
rule r1:
input:
"practice_phased_reversed.vcf"
output:
"{f}/{input}"
shell:
"cp {input} {output}"
This copies the file as desired. However the filename must be given in rule all. How can I change this so that I can specify a target on the command line using a wildcard for part of the name?
I then tried this (below), with the command "snakemake practice_phased_reversed.vcf", but it gave an error : "MissingRuleException: No rule to produce practice_phased_reversed.vcf"
dirs=['k_1','k2_10']
rule all:
input:
expand("{f}/{{base}}_phased_reversed.vcf",f=dirs)
rule r1:
input:
"{base}_phased_reversed.vcf"
output:
"{f}/{input}"
shell:
"cp {input} {output}"
Is there a way to fix this so I can use the command line and a wildcard. Thanks for any help.
I'd suggest a few changes. Your second snakefile won't be able to resolve the rule all since it still includes a wildcard base. You would need to provide that in the config file or via command line.
However, if you just want to express targets by the command line, you don't need to worry about the rule all. In rule r1, you probably want to expand output; I don't think referencing input works and I'm surprised it's not an error...
So:
rule r1:
input:
"{base}_phased_reversed.vcf"
output:
"{f}/{base}_phased_reversed.vcf"
shell:
"cp {input} {output}"
snakemake ./test_phased_reversed.vcf will still be an error because it's trying to make a file as the input and output of the same rule. I agree the error isn't very informative as the input file does exist. Maybe under the hood snakemake eliminates rule r1 from consideration due to the matching inputs/outputs? snakemake test/test_phased_reversed.vcf gives a copy in the subdirectory.
Hope that's clear. I don't quite get what you're trying to accomplish though!
EDIT
I have now posted this question as an issue on the Snakemake bitbucket given this seems to be an unknown behavior.
I am using snakemake with the --use-singularity option.
When I use a classic rule of the form:
singularity: mycontainer
rule myrule:
input:
output:
shell:
"somecommand"
with the somecommand only present in the singularity container, everything goes fine.
However, when I need to use some python code in the run part of the rule, the command is not found.
rule myrule:
input:
output:
run:
some python code here
shell("somecommand")
The only workaround I found is to use
shell("singularity exec mycontainer somecommand")
but this is not optimal.
I am either missing something, such as an option, or this is a missing feature in snakemake.
What I would like to obtain is to use the shell() function with the --use-singularity option.
Snakemake doesn't allow using --use-conda with run block and this is why:
The run block of a rule (see Rules) has access to anything defined in the Snakefile, outside of the rule. Hence, it has to share the conda environment with the main Snakemake process. To avoid confusion we therefore disallow the conda directive together with the run block. It is recommended to use the script directive instead (see External scripts).
I bet --use-singularity is not allowed with run block for the same reason.
Let's regard two snakefiles, one main file and one subworkflow:
./Snakefile:
subworkflow sub:
workdir: "."
snakemake: "subworkflow/Snakefile"
rule all:
input: sub("subresult")
./subworkflow/Snakefile:
rule sub_all:
output: "subresult"
shell: "touch {output}"
This code works pretty well. Now let's introduce a small change: substitute "subresult" with "./subresult" in the main file:
output: "./subresult"
That still works, but if I make the same change in the subworkflow, I get the exception:
MissingRuleException:
No rule to produce subresult
The same exception is thrown if I specify any other subfolder in the output of the subworkflow's rule:
subworkflow sub:
workdir: "."
snakemake: "subworkflow/Snakefile"
rule all:
input: sub("ANY_PATH/subresult")
rule sub_all:
output: "ANY_PATH/subresult"
shell: "touch {output}"
I guess this is not a normal behavior. Is there anything wrong in my code? Is there a way to specify subworkflow's target in a subfolder?
OS: Windows + MinGW
Python 3.6.5
Snakemake 5.4.5, 5.2
Update:
I tried the example provided by #JeeYem, and even data subdirectory didn't work on my system. After some investigation I found that this is a platform-specific problem for Windows or Windows/MinGW combination. Below is the code that works and shows the problem (I left the original code commented for comparison):
File Snakefile:
subworkflow otherworkflow:
workdir:
"."
snakefile:
"kingmaker.Snakefile"
rule all:
input:
otherworkflow('data/a.txt')
Subworkflow file kingmaker.Snakefile:
rule write_file:
output:
#'data/a.txt'
'data\\a.txt'
shell:
#'touch {output}'
'touch data/a.txt'
Note that I even cannot use {output} variable in the shell section.
I will submit a ticket to the Snakemake repository.
Based on my testing, my guess would be you are using ./ in the beginning of those paths in input or/and output, which is what causing the problem. I am not sure of the exact reason, but snakemake seems to point to the cause of the problem (see end of the answer).
In my example scripts as shown below, I can use subdirectory data without any problem in both Snakefiles. However, if I use ./data (ie. ./ in the beginning of subdirectory), snakemake has trouble working properly.
File Snakefile:
subworkflow otherworkflow:
workdir:
"."
snakefile:
"kingmaker.Snakefile"
rule all:
input:
otherworkflow('data/a.txt')
Subworkflow file kingmaker.Snakefile:
rule write_file:
output:
'data/a.txt'
shell:
'touch {output}'
In fact, if you run the subworkflow by itself (ie. snakemake -s kingmaker.Snakefile) with ./ in the beginning of the output path, snakemake strongly discourages its usage with this warning:
Relative file path './data/a.txt' starts with './'.
This is redundant and strongly discouraged.
It can also lead to inconsistent results of the file-matching approach used by Snakemake.
You can simply omit the './' for relative file paths.
I am using snakemake v5.4.0 on mac.
I can't get the output of a script run through singularity.
I have a python script, at the end of which the output is saved with:
...
with open('saveOut.pkl','wb') as myFile:
pickle.dump(myTable,myFile)
I want to run this script with singularity on a distant machine. Since I am learning singularity, I made a 'sand box' debian image (not compiled into a single 'img' file yet) in the directory /tmp/debian; in this image I copied the python script test.py in /usr/src and I run it with the command:
sudo singularity exec /tmp/debian python3.5 /usr/src/test.py
The problem:
It works well as long as I have only displayed results. with the pickle example described above, I don't get any saveOut.pkl file anywhere: this file is just not written anywhere but I don't see any message. I tried to write an explicit path in the python script. For instance /usr/src/saveOut.pkl, but this is the same.
How could I write a result ?
What was your expected result i.e. in which directory did you expect
to find the output file?
I expect a file saveOutput.pkl anywhere, in the container or not, I don't care the location. Currently I don't get it at all: neither in the container's current directory, nor in the container's /usr/src/, nor on the host, nor anywhere.
Did you look for it on the host or in the container?
both, I don't see it anywhere
What's happening here is that your python script is writing the pickle file to its current location (/usr/src/ in the container). Then, since the output from your script is not persistent (due to the sandbox not being writable on execution), it gets deleted at the end of the run.
I believe you could change your script:
with open('/opt/saveOut.pkl','wb') as myFile:
pickle.dump(myTable,myFile)
and then bind the local directory and get the output you're looking for:
sudo singularity exec -B ./:/opt /tmp/debian python3.5 /usr/src/test.py
This worked for me, anyway.
I am attempting to run some picard tools metrics collection in snakemake. A --dryrun works fine with no errors. When I actually run the snake file I receive an MissingOutputException for reasons I do not understand.
First here is my rule
rule CollectAlignmentSummaryMetrics:
input:
"bam_input/final/{sample}/{sample}.ready.bam"
output:
"bam_input/final/{sample}/metrics/{reference}/alignment_summary.metrics"
params:
reference=config['reference']['file'],
memory="10240m"
run:
"java -Xmx{params.memory} -jar $HOME/software/picard/build/libs/picard.jar CollectAlignmentSummaryMetrics R={params.reference} I={input} O={output}"
Now the error.
snakemake --latency-wait 120 -s metrics.snake -p
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
38 CollectAlignmentSummaryMetrics
1 all
39
rule CollectAlignmentSummaryMetrics:
input: bam_input/final/TB5173-T14/TB5173-T14.ready.bam
output: bam_input/final/TB5173-T14/metrics/GRCh37/alignment_summary.metrics
jobid: 7
wildcards: reference=GRCh37, sample=TB5173-T14
Error in job CollectAlignmentSummaryMetrics while creating output file bam_input/final/TB5173-T14/metrics/GRCh37/alignment_summary.metrics.
MissingOutputException in line 21 of/home/bwubb/projects/PD1WES/metrics.snake:
Missing files after 5 seconds:
bam_input/final/TB5173-T14/metrics/GRCh37/alignment_summary.metrics
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Exiting because a job execution failed. Look above for error message
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
The --latency-wait is completely ignored. I have even tried bumping it up to 84600. If I am to run the intended picard java command, it executes no problem. Ive made several snakemake pipelines without any mysterious issues, so this is driving me quite mad. Thank you for any insight!
thanks for reporting.
It is a bug that latency-wait is not propagated when using the run directive. I have fixed that in the master branch.
In your rule, you use the run directive. After run, Snakemake expects plain Python code. You simply provide a string. This means that Python will simply initialize the String and then exit. What you really want here is to use the shell directive. See here. By using the shell directive, your current problem will be fixed, and you should not be affected by the bug. There is also no need to modify latency-wait. Anyway, the fix for the latency-wait bug will occur in the next release for Snakemake.