Snakemake: --use-conda with --cluster - snakemake

When I run with --cluster and --use-conda, Snakemake does not appear to set the conda environment before submitting to the cluster, and my jobs fail accordingly. Is there a trick I am missing to set the conda environment before cluster submission?
EDIT:
I get snakemake in a conda environment like:
channels:
- bioconda
- conda-forge
dependencies:
- snakemake-minimal=5.19.3
- xrootd=4.12.2
Reproducer:
I create a directory with Snakefile, dothing.py, and environment.yml:
Snakefile:
shell.prefix('unset PYTHONPATH; unset LD_LIBRARY_PATH; unset PYTHONHOME; ')
rule dothing:
conda: 'environment.yml'
output: 'completed.out'
log: 'thing.log'
shell: 'python dothing.py &> {log} && touch {output}'
dothing.py:
import uncertainties
print('it worked!')
environment.yml:
name: testsnakeconda
channels:
- conda-forge
dependencies:
- uncertainties=3.1.4
If I run locally like
snakemake --cores all --use-conda
It runs with no problems:
Building DAG of jobs...
Creating conda environment environment.yml...
Downloading and installing remote packages.
Environment for environment.yml created (location: .snakemake/conda/e0fff47f)
Using shell: /usr/bin/bash
Provided cores: 10
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 dothing
1
[Tue Jun 30 16:19:38 2020]
rule dothing:
output: completed.out
log: thing.log
jobid: 0
Activating conda environment: /path/to/environment.yml
[Tue Jun 30 16:19:39 2020]
Finished job 0.
1 of 1 steps (100%) done
Complete log: /path/to/.snakemake/log/2020-06-30T161824.906217.snakemake.log
If I try to submit using --cluster like
snakemake --cores all --use-conda --cluster 'condor_qsub -V -l procs={threads}' --latency-wait 30 --max-jobs-per-second 100 --jobs 50
there is no message about setting up a conda environment and the job fails with an error:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 50
Job counts:
count jobs
1 dothing
1
[Tue Jun 30 16:20:49 2020]
rule dothing:
output: completed.out
log: thing.log
jobid: 0
Submitted job 0 with external jobid 'Your job 9246856 ("snakejob.dothing.0.sh") has been submitted'.
[Tue Jun 30 16:26:00 2020]
Error in rule dothing:
jobid: 0
output: completed.out
log: thing.log (check log file(s) for error message)
conda-env: /path/to/.snakemake/conda/e0fff47f
shell:
python dothing.py &> thing.log && touch completed.out
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: Your job 9246856 ("snakejob.dothing.0.sh") has been submitted
Error executing rule dothing on cluster (jobid: 0, external: Your job 9246856 ("snakejob.dothing.0.sh") has been submitted, jobscript: /path/to/.snakemake/tmp.a7fpixla/snakejob.dothing.0.sh). For error details see the cluster log and the log files of the involved rule(s).
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /path/to/.snakemake/log/2020-06-30T162049.793041.snakemake.log
and I can see that the problem is that the uncertainties package is not available:
$ cat thing.log
Traceback (most recent call last):
File "dothing.py", line 1, in <module>
import uncertainties
ImportError: No module named uncertainties
EDIT:
verbose output without --cluster:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 10
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 dothing
1
Resources before job selection: {'_cores': 10, '_nodes': 9223372036854775807}
Ready jobs (1):
dothing
Selected jobs (1):
dothing
Resources after job selection: {'_cores': 9, '_nodes': 9223372036854775806}
[Thu Jul 2 21:51:18 2020]
rule dothing:
output: completed.out
log: thing.log
jobid: 0
Activating conda environment: /path/to/workingdir/.snakemake/conda/e0fff47f
[Thu Jul 2 21:51:33 2020]
Finished job 0.
1 of 1 steps (100%) done
Complete log: /path/to/workingdir/.snakemake/log/2020-07-02T215117.964474.snakemake.log
unlocking
removing lock
removing lock
removed all locks
verbose output with --cluster:
Building DAG of jobs...
Checking status of 0 jobs.
Using shell: /usr/bin/bash
Provided cluster nodes: 50
Job counts:
count jobs
1 dothing
1
Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 50}
Ready jobs (1):
dothing
Selected jobs (1):
dothing
Resources after job selection: {'_cores': 9223372036854775806, '_nodes': 49}
[Thu Jul 2 21:40:23 2020]
rule dothing:
output: completed.out
log: thing.log
jobid: 0
Jobscript:
#!/bin/sh
# properties = {"type": "single", "rule": "dothing", "local": false, "input": [], "output": ["completed.out"], "wildcards": {}, "params": {}, "log": ["thing.log"], "threads": 1, "resources": {}, "jobid": 0, "cluster": {}}
cd /path/to/workingdir && \
/path/to/miniconda/envs/envname/bin/python3.8 \
-m snakemake dothing --snakefile /path/to/workingdir/Snakefile \
--force -j --keep-target-files --keep-remote \
--wait-for-files /path/to/workingdir/.snakemake/tmp.5n32749i /path/to/workingdir/.snakemake/conda/e0fff47f --latency-wait 30 \
--attempt 1 --force-use-threads \
--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ \
--allowed-rules dothing --nocolor --notemp --no-hooks --nolock \
--mode 2 --use-conda && touch /path/to/workingdir/.snakemake/tmp.5n32749i/0.jobfinished || (touch /path/to/workingdir/.snakemake/tmp.5n32749i/0.jobfailed; exit 1)
Submitted job 0 with external jobid 'Your job 9253728 ("snakejob.dothing.0.sh") has been submitted'.
Checking status of 1 jobs.
...
Checking status of 1 jobs.
[Thu Jul 2 21:46:23 2020]
Error in rule dothing:
jobid: 0
output: completed.out
log: thing.log (check log file(s) for error message)
conda-env: /path/to/workingdir/.snakemake/conda/e0fff47f
shell:
python dothing.py &> thing.log && touch completed.out
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: Your job 9253728 ("snakejob.dothing.0.sh") has been submitted
Error executing rule dothing on cluster (jobid: 0, external: Your job 9253728 ("snakejob.dothing.0.sh") has been submitted, jobscript: /path/to/workingdir/.snakemake/tmp.5n32749i/snakejob.dothing.0.sh). For error details see the cluster log and the log files of the involved rule(s).
Cleanup job metadata.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /path/to/workingdir/.snakemake/log/2020-07-02T214022.614691.snakemake.log
unlocking
removing lock
removing lock
removed all locks

What worked for me is to set the full path to your python interpreter inside the rule itself...
rule dothing:
conda: 'environment.yml'
output: 'completed.out'
log: 'thing.log'
shell: '/full_path_to_your_environment/bin/python dothing.py &> {log} && touch {output}'
and the full path to your python script if it's a package installed in that specific environnement (which is my case).
rule dothing:
conda: 'environment.yml'
output: 'completed.out'
log: 'thing.log'
shell: '/full_path_to_your_environment/bin/python /full_path_to_your_environment/package_dir/dothing.py &> {log} && touch {output}'
by /full_path_to_your_environment/ I mean the hash name conda and snakemake gave to your env the first time they installed it (e.g. /path/to/workingdir/.snakemake/conda/e0fff47f)
It's a bit uggly but still did the trick.
Hope that it'll help

Related

snakemake returns a non-zero exit status on a simple rule when executing with --cluster

The following snakemake rule fails when I execute it with snakemake -r -p --jobs 40 --cluster "qsub"
rule raven_assembly:
"""
Assemble reads with Raven v1.5.0
"""
input:
"results/01_pooled_reads/eb_flongle_reads_pooled.fastq.gz"
output:
assembly="results/eb_raven_assembly.fasta",
shell:
"""
zcat {input} | head -n 2 > {output.assembly} 1> out.txt 2> errors.txt
"""
As you can probably tell the original rule was calling the software raven, but I've been simplifying the rule to investigate the source of the job failure.
The corresponding error message:
Error in rule raven_assembly:
jobid: 1
output: results/eb_raven_assembly.fasta
shell:
zcat results/01_pooled_reads/eb_flongle_reads_pooled.fastq.gz | head -n 2 > results/eb_raven_assembly.fasta &> errors.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: Your job 183238 ("snakejob.raven_assembly.1.sh") has been submitted
Error executing rule raven_assembly on cluster (jobid: 1, external: Your job 183238 ("snakejob.raven_assembly.1.sh") has been submitted, jobscript: /misc/scratch3/jmartijn/snakemake-test/.snakemake/tmp.95wb9rak/snakejob.raven_assembly.1.sh). For error details see the cluster log and the log files of the involved rule(s).
The out.txt file actually returns the expected zcat output, while the errors.txt is an empty file. If I run the zcat command manually, it works fine and returns an 0 exit status.
The jobscript disappears as soon as the snakemake workflow closes, but if I check it while it is still attempting to run it looks like this
#!/bin/sh
# properties = {"type": "single", "rule": "raven_assembly", "local": false, "input": ["results/01_pooled_reads/eb_flongle_reads_pooled.fastq.gz"], "output": ["results/eb_raven_assembly.fasta"], "wildcards": {}, "params": {}, "log": [], "threads": 1, "resources": {"mem_mb": 10903, "disk_mb": 10903, "tmpdir": "/tmp"}, "jobid": 1, "cluster": {}}
cd '/misc/scratch3/jmartijn/snakemake-test' && /scratch2/software/anaconda/envs/proj-ergo/bin/python3.7 -m snakemake --snakefile '/misc/scratch3/jmartijn/snakemake-test/Snakefile' 'results/eb_raven_assembly.fasta' --allowed-rules 'raven_assembly' --cores 'all' --attempt 1 --force-use-threads --wait-for-files '/misc/scratch3/jmartijn/snakemake-test/.snakemake/tmp.ka4jh42u' 'results/01_pooled_reads/eb_flongle_reads_pooled.fastq.gz' --force --keep-target-files --keep-remote --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --skip-script-cleanup --conda-frontend 'mamba' --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --printshellcmds --latency-wait 5 --scheduler 'ilp' --scheduler-solver-path '/scratch2/software/anaconda/envs/proj-ergo/bin' --default-resources 'mem_mb=max(2*input.size_mb, 1000)' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' --mode 2 && touch '/misc/scratch3/jmartijn/snakemake-test/.snakemake/tmp.ka4jh42u/1.jobfinished' || (touch '/misc/scratch3/jmartijn/snakemake-test/.snakemake/tmp.ka4jh42u/1.jobfailed'; exit 1)
The computer cluster is running SGE 8.1.9 and has Ubuntu 18.04 LTS as OS. Snakemake version 7.8.0

Running external scripts with wildcards in snakemake

I am trying to run a snakemake rule with an external script that contains a wildcard as noted in the snakemake reathedocs. However I am running into KeyError when running snakemake.
For example, if we have the following rule:
SAMPLE = ['test']
rule all:
input:
expand("output/{sample}.txt", sample=SAMPLE)
rule NAME:
input: "workflow/scripts/{sample}.R"
output: "output/{sample}.txt",
script: "workflow/scripts/{wildcards.sample}.R"
with the script workflow/scripts/test.R containing the following code
out.path = snakemake#output[[1]]
out = "Hello World"
writeLines(out, out.path)
I get the following error when trying to execute snakemake.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 NAME
1 all
2
[Fri May 21 12:04:55 2021]
rule NAME:
input: workflow/scripts/test.R
output: output/test.txt
jobid: 1
wildcards: sample=test
[Fri May 21 12:04:55 2021]
Error in rule NAME:
jobid: 1
output: output/test.txt
RuleException:
KeyError in line 14 of /sc/arion/projects/LOAD/Projects/sandbox/Snakefile:
'wildcards'
File "/sc/arion/work/andres12/conda/envs/py38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2231, in run_wrapper
File "/sc/arion/projects/LOAD/Projects/sandbox/Snakefile", line 14, in __rule_NAME
File "/sc/arion/work/andres12/conda/envs/py38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 560, in _callback
File "/sc/arion/work/andres12/conda/envs/py38/lib/python3.8/concurrent/futures/thread.py", line 57, in run
File "/sc/arion/work/andres12/conda/envs/py38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 546, in cached_or_run
File "/sc/arion/work/andres12/conda/envs/py38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2262, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /sc/arion/projects/LOAD/Projects/sandbox/.snakemake/log/2021-05-21T120454.713963.snakemake.log
Does anyone know why this not working correctly?
I agree with Dmitry Kuzminov that having a script depending on a wildcard is odd. Maybe there are better solutions.
Anyway, this below works for me on snakemake 6.0.0. Note that in your R script snakemake#output[1] should be snakemake#output[[1]], but that doesn't give the problem you report.
SAMPLE = ['test']
rule all:
input:
expand("output/{sample}.txt", sample=SAMPLE)
rule make_script:
output:
"workflow/scripts/{sample}.R",
shell:
r"""
echo 'out.path = snakemake#output[[1]]' > {output}
echo 'out = "Hello World"' >> {output}
echo 'writeLines(out, out.path)' >> {output}
"""
rule NAME:
input:
"workflow/scripts/{sample}.R"
output:
"output/{sample}.txt",
script:
"workflow/scripts/{wildcards.sample}.R"

InputFunctionException: unexpected EOF while parsing

Major EDIT:
Having fixed a couple of issues thanks to comments and written a minimal reproducible example to help my helpers, I've narrowed down the issue to a difference between execution locally and using DRMAA.
Here is a minimal reproducible pipeline that does not require any external file download and can be executed out of the box or clone following git repository:
git clone git#github.com:kevinrue/snakemake-issue-all.git
When I run the pipeline using DRMAA I get the following error:
Building DAG of jobs...
Using shell: /bin/bash
Provided cluster nodes: 100
Singularity containers: ignored
Job counts:
count jobs
1 all
2 cat
3
InputFunctionException in line 22 of /ifs/research-groups/sims/kevin/snakemake-issue-all/workflow/Snakefile:
SyntaxError: unexpected EOF while parsing (<string>, line 1)
Wildcards:
sample=A
However, if I run the pipeline locally (--cores 1), it works:
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Singularity containers: ignored
Job counts:
count jobs
1 all
2 cat
3
[Sat Jun 13 08:49:46 2020]
rule cat:
input: data/A1, data/A2
output: results/A/cat
jobid: 1
wildcards: sample=A
[Sat Jun 13 08:49:46 2020]
Finished job 1.
1 of 3 steps (33%) done
[Sat Jun 13 08:49:46 2020]
rule cat:
input: data/B1, data/B2
output: results/B/cat
jobid: 2
wildcards: sample=B
[Sat Jun 13 08:49:46 2020]
Finished job 2.
2 of 3 steps (67%) done
[Sat Jun 13 08:49:46 2020]
localrule all:
input: results/A/cat, results/B/cat
jobid: 0
[Sat Jun 13 08:49:46 2020]
Finished job 0.
3 of 3 steps (100%) done
Complete log: /ifs/research-groups/sims/kevin/snakemake-issue-all/.snakemake/log/2020-06-13T084945.632545.snakemake.log
My DRMAA profile is the following:
jobs: 100
default-resources: 'mem_free=4G'
drmaa: "-V -notify -p -10 -l mem_free={resources.mem_free} -pe dedicated {threads} -v MKL_NUM_THREADS={threads} -v OPENBLAS_NUM_THREADS={threads} -v OMP_NUM_THREADS={threads} -R y -q all.q"
drmaa-log-dir: /ifs/scratch/kevin
use-conda: true
conda-prefix: /ifs/home/kevin/devel/snakemake/envs
printshellcmds: true
reason: true
Briefly, the Snakefile looks like this
# The main entry point of your workflow.
# After configuring, running snakemake -n in a clone of this repository should successfully execute a dry-run of the workflow.
report: "report/workflow.rst"
# Allow users to fix the underlying OS via singularity.
singularity: "docker://continuumio/miniconda3"
include: "rules/common.smk"
include: "rules/other.smk"
rule all:
input:
# The first rule should define the default target files
# Subsequent target rules can be specified below. They should start with all_*.
expand("results/{sample}/cat", sample=samples['sample'])
rule cat:
input:
file1="data/{sample}1",
file2="data/{sample}2"
output:
"results/{sample}/cat"
shell:
"cat {input.file1} {input.file2} > {output}"
Running snakemake -np gives me what I expect:
$ snakemake -np
sample condition
sample_id
A A untreated
B B treated
Building DAG of jobs...
Job counts:
count jobs
1 all
2 cat
3
[Sat Jun 13 08:51:19 2020]
rule cat:
input: data/B1, data/B2
output: results/B/cat
jobid: 2
wildcards: sample=B
cat data/B1 data/B2 > results/B/cat
[Sat Jun 13 08:51:19 2020]
rule cat:
input: data/A1, data/A2
output: results/A/cat
jobid: 1
wildcards: sample=A
cat data/A1 data/A2 > results/A/cat
[Sat Jun 13 08:51:19 2020]
localrule all:
input: results/A/cat, results/B/cat
jobid: 0
Job counts:
count jobs
1 all
2 cat
3
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
I'm not sure how to debug it further. I'm happy to provide more information as needed.
Note: I use snakemake version 5.19.2
Thanks in advance!
EDIT
Using the --verbose option, Snakemake seems to trip on the 'default-resources: 'mem_free=4G' and/or drmaa: "-l mem_free={resources.mem_free} that are defined in my 'drmaa' profile (see above).
$ snakemake --profile drmaa --verbose
Building DAG of jobs...
Using shell: /bin/bash
Provided cluster nodes: 100
Singularity containers: ignored
Job counts:
count jobs
1 all
2 cat
3
Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 100}
Ready jobs (2):
cat
cat
Full Traceback (most recent call last):
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/rules.py", line 941, in apply
res, _ = self.apply_input_function(
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/rules.py", line 684, in apply_input_function
raise e
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/rules.py", line 678, in apply_input_function
value = func(Wildcards(fromdict=wildcards), **_aux_params)
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/resources.py", line 10, in callable
value = eval(
File "<string>", line 1
4G
^
SyntaxError: unexpected EOF while parsing
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/__init__.py", line 626, in snakemake
success = workflow.execute(
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/workflow.py", line 951, in execute
success = scheduler.schedule()
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/scheduler.py", line 394, in schedule
run = self.job_selector(needrun)
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/scheduler.py", line 540, in job_selector
a = list(map(self.job_weight, jobs)) # resource usage of jobs
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/scheduler.py", line 613, in job_weight
res = job.resources
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/jobs.py", line 267, in resources
self._resources = self.rule.expand_resources(
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/rules.py", line 977, in expand_resources
resources[name] = apply(name, res, threads=threads)
File "/ifs/devel/kevin/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/rules.py", line 960, in apply
raise InputFunctionException(e, rule=self, wildcards=wildcards)
snakemake.exceptions.InputFunctionException: SyntaxError: unexpected EOF while parsing (<string>, line 1)
Wildcards:
sample=B
InputFunctionException in line 20 of /ifs/research-groups/sims/kevin/snakemake-issue-all/workflow/Snakefile:
SyntaxError: unexpected EOF while parsing (<string>, line 1)
Wildcards:
sample=B
unlocking
removing lock
removing lock
removed all locks
Thanks to #JohannesKöster I realised that my profile settings were wrong.
--default-resources [NAME=INT [NAME=INT ...]] indicates indicates that only integer values are supported, while I was providing string (i.e., mem_free=4G), naively hoping those would be supported as well.
I've updated the following settings in my profile, and successfully ran both snakemake --cores 1 and snakemake --profile drmaa.
default-resources: 'mem_free=4'
drmaa: "-V -notify -p -10 -l mem_free={resources.mem_free}G -pe dedicated {threads} -v MKL_NUM_THREADS={threads} -v OPENBLAS_NUM_THREADS={threads} -v OMP_NUM_THREADS={threads} -R y -q all.q"
Note the integer value 4 set as default resources, and how I moved the G to the drmaa: ... -l mem_free=...G setting.
Thanks a lot for the help everyone!

snakemake job fails if --drmaa-log-dir specified

I am using snakemake v. 5.7.0. The pipeline runs correctly when either launched locally or submitted to SLURM via snakemake --drmaa: jobs get submitted, everything works as expected. However, in the latter case, a number of slurm log files is produced in the current directory.
Snakemake invoked with the --drmaa-log-dir option creates the directory specified in the option, but fails to execute the rules. No log files are produced.
Here is a minimal example. First, the Snakefile used:
rule all:
shell: "sleep 20 & echo SUCCESS!"
Below is the output of snakemake --drmaa
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1
[Fri Apr 10 21:03:50 2020]
rule all:
jobid: 0
Submitted DRMAA job 0 with external jobid 13321.
[Fri Apr 10 21:04:00 2020]
Finished job 0.
1 of 1 steps (100%) done
Complete log: /XXXXX/snakemake_test/.snakemake/log/2020-04-10T210349.984931.snakemake.log
Here is the output of snakemake --drmaa --drmaa-log-dir foobar
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1
[Fri Apr 10 21:06:19 2020]
rule all:
jobid: 0
Submitted DRMAA job 0 with external jobid 13322.
[Fri Apr 10 21:06:29 2020]
Error in rule all:
jobid: 0
shell:
sleep 20 & echo SUCCESS!
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Error executing rule all on cluster (jobid: 0, external: 13322, jobscript: /XXXXXX/snakemake_test/.snakemake/tmp.9l7fqvgg/snakejob.all.0.sh). For error details see the cluster log and the log files of the involved rule(s).
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /XXXXX/snakemake_test/.snakemake/log/2020-04-10T210619.598354.snakemake.log
No log files are produced. The directory foobar has been created, but is empty.
What am I doing wrong?
Problem using --drmaa-log-dir in slurm was reported before, but unfortunately there has been no known solution so far.

Command not found error in snakemake pipeline despite the package existing in the conda environment

I am getting the following error in the snakemake pipeline:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 16
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 long_read_assembly
1
[Wed Jan 15 11:35:18 2020]
rule long_read_assembly:
input: long_reads/F19FTSEUHT1027.PSU4_ISF1A_long.fastq.gz
output: canu-outputs/F19FTSEUHT1027.PSU4_ISF1A.subreads.contigs.fasta
jobid: 0
wildcards: sample=F19FTSEUHT1027.PSU4_ISF1A
/usr/bin/bash: canu: command not found
[Wed Jan 15 11:35:18 2020]
Error in rule long_read_assembly:
jobid: 0
output: canu-outputs/F19FTSEUHT1027.PSU4_ISF1A.subreads.contigs.fasta
shell:
canu -p F19FTSEUHT1027.PSU4_ISF1A -d canu-outputs genomeSize=8m -pacbio-raw long_reads/F19FTSEUHT1027.PSU4_ISF1A_long.fastq.gz
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
I assume it is meaning that the command canu can not be found. But the Canu package does exist inside the conda environment:
(hybrid_assembly) [lamma#fe1 Assembly]$ conda list | grep canu
canu 1.9 he1b5a44_0 bioconda
The snakefile looks like this:
workdir: config["path_to_files"]
wildcard_constraints:
separator = config["separator"],
sample = '|' .join(config["samples"]),
rule all:
input:
expand("assembly-stats/{sample}_stats.txt", sample = config["samples"])
rule short_reads_QC:
input:
f"short_reads/{{sample}}_short{config['separator']}*.fq.gz"
output:
"fastQC-reports/{sample}.html"
conda:
"/home/lamma/env-export/hybrid_assembly.yaml"
shell:
"""
mkdir fastqc-reports
fastqc -o fastqc-reports {input}
"""
rule quallity_trimming:
input:
forward = f"short_reads/{{sample}}_short{config['separator']}1.fq.gz",
reverse = f"short_reads/{{sample}}_short{config['separator']}2.fq.gz",
output:
forward = "cleaned_short-reads/{sample}_short_1-clean.fastq",
reverse = "cleaned_short-reads/{sample}_short_2-clean.fastq"
conda:
"/home/lamma/env-export/hybrid_assembly.yaml"
shell:
"bbduk.sh -Xmx1g in1={input.forward} in2={input.reverse} out1={output.forward} out2={output.reverse} qtrim=rl trimq=10"
rule long_read_assembly:
input:
"long_reads/{sample}_long.fastq.gz"
output:
"canu-outputs/{sample}.subreads.contigs.fasta"
conda:
"/home/lamma/env-export/hybrid_assembly.yaml"
shell:
"canu -p {wildcards.sample} -d canu-outputs genomeSize=8m -pacbio-raw {input}"
rule short_read_alignment:
input:
short_read_fwd = "cleaned_short-reads/{sample}_short_1-clean.fastq",
short_read_rvs = "cleaned_short-reads/{sample}_short_2-clean.fastq",
reference = "canu-outputs/{sample}.subreads.contigs.fasta"
output:
"bwa-output/{sample}_short.bam"
conda:
"/home/lamma/env-export/hybrid_assembly.yaml"
shell:
"bwa mem {input.reference} {input.short_read_fwd} {input.short_read_rvs} | samtools view -S -b > {output}"
rule indexing_and_sorting:
input:
"bwa-output/{sample}_short.bam"
output:
"bwa-output/{sample}_short_sorted.bam"
conda:
"/home/lamma/env-export/hybrid_assembly.yaml"
shell:
"samtools sort {input} > {output}"
rule polishing:
input:
bam_files = "bwa-output/{sample}_short_sorted.bam",
long_assembly = "canu-outputs/{sample}.subreads.contigs.fasta"
output:
"pilon-output/{sample}-improved.fasta"
conda:
"/home/lamma/env-export/hybrid_assembly.yaml"
shell:
"pilon --genome {input.long_assembly} --frags {input.bam_files} --output {output} --outdir pilon-output"
rule assembly_stats:
input:
"pilon-output/{sample}-improved.fasta"
output:
"assembly-stats/{sample}_stats.txt"
conda:
"/home/lamma/env-export/hybrid_assembly.yaml"
shell:
"stats.sh in={input} gc=assembly-stats/{wildcards.sample}/{wildcards.sample}_gc.csv gchist=assembly-stats/{wildcards.sample}/{wildcards.sample}_gchist.csv shist=assembly-stats/{wildcards.sample}/{wildcards.sample}_shist.csv > assembly-stats/{wildcards.sample}/{wildcards.sample}_stats.txt"
The rule calling canu has the correct syntax as far as I am awear so I am not sure what is causing this error.
Edit:
Adding the snakemake command
snakemake --latency-wait 60 --rerun-incomplete --keep-going --jobs 99 --cluster-status 'python /home/lamma/faststorage/scripts/slurm-status.py' --cluster 'sbatch -t {cluster.time} --mem={cluster.mem} --cpus-per-task={cluster.c} --error={cluster.error} --job-name={cluster.name} --output={cluster.output} --wait --parsable' --cluster-config bacterial-hybrid-assembly-config.json --configfile yaml-config-files/test_experiment3.yaml --snakefile bacterial-hybrid-assembly.smk
When running a snakemake workflow, if certain rules are to be ran within a rule-specific conda environment, the command line call should be of the form
snakemake [... various options ...] --use-conda [--conda-prefix <some-directory>]
If you don't tell snakemake to use conda, all the conda: <some_path> entries in your rules are ignored, and the rules are run in whatever environment is currently activated.
The --conda-prefix <dir> is optional, but tells snakemake where to find the installed environment (if you don't specify this, a conda env will be installed within the .snakemake folder, meaning that the .snakemake folder can get pretty huge and that the .snakemake folders for multiple projects may contain a lot of duplicated conda stuff)