Snakemake & singularity, making different mounts available to each rule

Snakemake & singularity, making different mounts available to each rule - snakemake

I have been using singularity with some of my workflows and it works great so far. I have a question about binding directories. I can pass singularity arguments when running the snakemake workflow like:
snakemake --use-singularity --singularity-args "-B /path/outside/container/:/path/inside/container/"
Is there a way to have binds be rule specific?
Thanks.

Related

Containerization of Conda based workflows

I am using the integrated conda package management in snakemake to supply software environments to my rules. Now, I would like to try the same with a container spawned from a docker image.
I automatically generate a Dockerfile from my workflow using (see documentation)
snakemake --containerize > Dockerfile
In the following, I am trying to use this container image in the workflow via the containerized: directive and snakemake --use-singularity. However
containerized: "docker://Dockerfile"
gives me a fail in the pull of the sigularity image. I am not sure about this syntax, has anybody used this before?

Snakemake shadow rule when program writes to /tmp

I am using Snakemake to run the defense-finder program. This program creates and overwites generic temporary files in /tmp/defense-finder, i.e. the file names do not contain unique identifiers. When running my rule across separate cores on different input files, Snakemake crashes due to clashes in /tmp/defense-finder.
It appears that Shadow rules can help when different jobs write to the same files within the working directory. Is there a way to use Shadow rules when a program writes to the /tmp directory?

Following #Marmaduke's comment that file paths are hard-coded, a temporary workaround is to force snakemake to run the defense-finder jobs one at a time while allowing other jobs to run in parallel. You can do this with the resources directive:
rule defense_finder:
resources:
n_defense= 1,
input: ...
output: ...
shell: ...
then run with:
snakemake --resources n_defense=1 -j 10 ...

Snakemake : subworkflow not playing well with the main DAG

I have a main Snakefile and several subworkflows running in independent subdirectories (with paths relative to their own directories). I've noticed that if I modify one of the input of a subworkflow, it will rerun correctly but all the following rules that come afterwards are not rerun.
If I understand correctly what is going on, there's a different DAG for the main Snakefile and for each subworkflow. The main DAG is not aware of any modification in a subworkflow and therefore won't trigger a rerun since the output of the subworkflow hasn't been modified yet.
I'd like that all the rules depending of the output of a subworkflow are rerun if there's a modification in that subworkflow. Isn't that what the default behaviour should be ?
I've also tried the other modularisation techniques. Using includes works but is super annoying because I have to modify all the paths to be relative to the main directory (and therefore I can't run snakemake independently in one subdirectory anymore). I've also tried using the new module system coming with snakemake v.6 that is supposed to be replacing subworkflows. Maybe I don't use it correctly, but it doesn't seem to work for my use case. If I import a rule from a subdirectory it complains that there are missing inputs. It doesn't find the scripts because they are in the subdirectory and not in the main directory. So in that sense it works more like an include than a subworkflow.
Do you have any idea on how to solve my issue ?
Here's a small working example with the module implementation:
MainDirectory
| - Snakefile
rule all:
input: "Subdirectory/file.txt"
module other_workflow:
snakefile: "Subdirectory/Snakefile"
use rule * from other_workflow as other_*
| - Subdirectory
| | - Snakefile
rule rule_a:
input:
script = 'code.py'
output: 'file.txt'
shell: 'python {input.script}'
| | - code.py
with open('file.txt', 'w') as f:
print('This is a test.', file=f)
This doesn't work as the snakefile in the main directory uses all the rules in the same workdir, whereas I would like it to be running the imported rules in their own workdir. I can make it work by modifying all the relative paths in the subdirectory but that's not what I want. I want to be able to run it without modifications.

The issue is that you mix input files with code here. If the script code.py is defined as an input file, Snakemake expects it to be in the workdir. If you'd use the script directive or the Jupyter notebook directive instead, the path will be automatically relative to the Snakefile. If that should be not an option for whatever reason, you can instead build the path relative to the current Snakefile via Path(workflow.snakefile).parent / "code.py".
Note, there is really no reason to register code as input files. If you intend to get a rerun upon changes in the code, it is better to rely on snakemake --list-code-changes. The reason Snakemake does not automatically trigger reruns upon code changes is that they can be just cosmetic (e.g. formatting). Hence, it is up to the dev to trigger the rerun, e.g. via --list-code-changes, or manually.

snakemake - configure rules to run with local container (Singularity, Docker)

For snakemakev5.27+
Is there a way to run snakemake with the container directive that points to a local image? E.g. if I store the Docker containers on Dockerhub, and I also have a copy locally, when running snakemake, I don't want the rule to pull a singularity image copy from DockerHub if there already exists the exact copy locally. Makes for faster runs.

Sure, just pass a relative or absolute file path to the directive.

Even though the snakemake manual doesn't explicitly state it, it is possible to use a local singularity image using the containerized directive.
So instead of the example in the link above:
containerized: "docker://username/myworkflow:1.0.0"
You can point to the singularity sif file path (which contains the image)
containerized: "/path/to/myimage.sif"
Make sure you use --use-singularity when running snakemake.
How to build the singularity (sif) image:
You can build the sif image in various ways as described here, bug as for your question, you can build it from a local docker image.
I.e. you can list your local images by docker images and pick one to build the local sif file like so:
SINGULARITY_NOHTTPS=1 singularity build /path/to/myimage.sif docker-daemon://mydockerimage:latest
Note, it doesn't seem to work straight from local docker container, i.e. I would have expected this to work:
containerized: "docker-daemon://scpipe_docker:latest"
... but it didn't as of snakemake version 6.10.0

How to get the working directory path?

I'm wondering how to manage paths in my Snakefiles. Say I have this configuration:
current_dir
current_dir/snakefiles
current_dir/configfiles
and I execute my workflows this way:
current_dir$ snakemake -s snakefiles/my_snakefile --configfile configfiles/my_config.yml
I know I can get the path to my Snakefile using the global variable workflow.snakefile, but I would like to get also:
the path to my configfile
the path where I'm executing my snakefile, e.g. current_dir
How to achieve this? Are there other global variables in Snakemake, that I'm not aware of?
Thank you

The working directory is set via Python. You can get it with os.getcwd(). Also please note that there is a canonical way to organize Snakemake workflows: http://snakemake.readthedocs.io/en/latest/project_info/faq.html#what-is-the-recommended-way-to-distribute-a-snakemake-workflow.
While you can of course use something else, following this scheme helps others to understand your workflow. There might of course be cases where this does not fit.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas