Containerization of Conda based workflows - snakemake

I am using the integrated conda package management in snakemake to supply software environments to my rules. Now, I would like to try the same with a container spawned from a docker image.
I automatically generate a Dockerfile from my workflow using (see documentation)
snakemake --containerize > Dockerfile
In the following, I am trying to use this container image in the workflow via the containerized: directive and snakemake --use-singularity. However
containerized: "docker://Dockerfile"
gives me a fail in the pull of the sigularity image. I am not sure about this syntax, has anybody used this before?

Related

snakemake - configure rules to run with local container (Singularity, Docker)

For snakemakev5.27+
Is there a way to run snakemake with the container directive that points to a local image? E.g. if I store the Docker containers on Dockerhub, and I also have a copy locally, when running snakemake, I don't want the rule to pull a singularity image copy from DockerHub if there already exists the exact copy locally. Makes for faster runs.
Sure, just pass a relative or absolute file path to the directive.
Even though the snakemake manual doesn't explicitly state it, it is possible to use a local singularity image using the containerized directive.
So instead of the example in the link above:
containerized: "docker://username/myworkflow:1.0.0"
You can point to the singularity sif file path (which contains the image)
containerized: "/path/to/myimage.sif"
Make sure you use --use-singularity when running snakemake.
How to build the singularity (sif) image:
You can build the sif image in various ways as described here, bug as for your question, you can build it from a local docker image.
I.e. you can list your local images by docker images and pick one to build the local sif file like so:
SINGULARITY_NOHTTPS=1 singularity build /path/to/myimage.sif docker-daemon://mydockerimage:latest
Note, it doesn't seem to work straight from local docker container, i.e. I would have expected this to work:
containerized: "docker-daemon://scpipe_docker:latest"
... but it didn't as of snakemake version 6.10.0

How to export a container in Singularity

I would like to move an already built container from one machine to another. What is the proper way to migrate the container from one environment to another?
I can find here the image.export command, but this is for an older version of the software. I am using version 3.5.2.
The container I wish to export is a --sandbox container. Is something like that possible?
Singularity allows you to easily convert between a sandbox and a production build.
For example:
singularity build lolcow.sif docker://godlovedc/lolcow # pulls and builds a container
singularity build --sandbox lolcow_sandbox/ lolcow.sif # converts from container to a writable sandbox
singularity build lolcow2 lolcow_sandbox/ # converts from sandbox to container
Once you have a production SIF or SIMG, you can easily transfer the file and convert as necessary.
singularity build generates a file that you can copy between computers just like any other file. The only things it needs is the singularity binary installed on the new host server.
The difference when using --sandbox is that you get a modifiable directory instead of single file. It can still be run elsewhere, but you may want to tar it up first so you're only moving a single file. Then you can untar it and run as normal on the new host.

How to include NCCL 2 from NVIDIA in Dockerfiles?

For NVIDIA Collective Communications Library (NCCL) version 2, NVIDIA asks the user to first register as a developer before getting access to the installation files.
This will bring a challenge on how to install NCCL in the containers. For personal use, we can copy the installation file to the container using Dockerfile ADD command. However, this approach does not seem right for a Dockerfile to be used by others (or put in public).
Any idea?
Thanks!
I had a similar problem with oracle installation files, the only way I could think of doing this was to ask the user to manually download the files and then using Dockerfile ONBUILD command along with the Dockerfile ADD command within the dockerfile. Meaning every user will essentially have to build their own image but at least the image can be made public without infringing on NVIDIA's policies.
Something like this:
FROM example/test:latest
....
ONBUILD ADD /example/nvidia /example/nvidia
....
CMD ['./foo.sh']
Then the user would have to use their own dockerfile pulling your public image like so:
FROM myrepo/myimage:nvidia
Provided they have the NVIDIA Collective Communications Library placed in the right folder, they can just run docker build to legally have their own image with Nvidia's libraries.

What is Chef doing when I use the `s3cmd` recipe?

I am using Chef and this s3cmd cookbook.
As this tutorial says I use knife to download and tar it. I actually made s3cmd work following the tutorial instructions, but I have problems understanding where exactly the installation of s3cmd is happening?
Can anyone explain to me what Chef is doing when using the s3cmd recipe?
When you run chef-solo locally (or chef-client in a server setup) you are telling Chef to compile a set of resources defined in your cookbooks recipes to then be applied to the node you are running the command on.
A resource defines the tasks and settings for something you want to setup.
The run_list set for the node defines what cookbook recipes will be compiled.
s3cmd
In your case, your run_list will have probably been recipe[s3cmd].
This instructs Chef to look in the s3cmd cookbook and as you didn't give a specific recipe, it loads s3cmd/recipes/default.rb.
If you gave a specific recipe, like recipe[s3_cmd::other] then Chef would load the file s3_cmd/recipes/other.rb.
Chef will compile all the resources defined in the recipe(s) into a list and the run through the list applying changes as required to your system.
What s3cmd::default does
First it installs some packages (via your distributions package manager)
python, python-setuptools, python-distutils-extra, python-dateutil, s3cmd
Note: This is entirely different to what the readme says about how s3cmd is installed! Always check!
Figures out where the config should go.
if node['s3cmd']['config_dir']
home_folder = node['s3cmd']['config_dir']
else
home_folder = node['etc']['passwd'][node['s3cmd']['user']]['dir']
end
Creates the .s3cfg config file from a template in the cookbook.
template "#{home_folder}/.s3cfg" do...
What else
Cookbooks can also define their own resources and providers to create more reusable cookbooks. The resource names and attributes will be defined in cookbook/resources/blah.rb. The code for each resource action will be in cookbook/providers/blah.rb
Code can be packaged in cookbook/libraries/blah.rb and included in other Ruby files.
Run chef-solo with the --log-level DEBUG option and step through the output. Try and identify the run list compilation phase, and then where everything is being applied.

Singularity: What is the difference between an image, a container, and an instance?

I am starting to learn Singularity for reproducible analysis of scientific pipelines. A colleague explained that an image was used to instantiate a container. However, in reading through the documentation and tutorials, the term instance is also used and the usage of image and container seems somewhat interchangeable. So, I am not sure I precisely understand the difference between an image, container, and instance. I do get that a recipe is a text file for building one of these (I think an image?).
For example, on this page it explains:
Now we can build the definition file into an image! Simply run build
and the image will be ready to go:
$ sudo singularity build url-to-pdf-api.img Singularity
Okay, so this uses the recipe Singularity to build an image, with the intuitive extension of .img. However, the help description of the build command states:
$ singularity help build
USAGE: singularity [...] build [build
options...]
The build command
compiles a container per a recipe (definition file) or based on a URI,
location, or archive.
So this seems to indicate we are building a container?
Then, there are image and instance sub-commands.
Are all these terms used interchangeably? It seems sometimes they are and sometimes there is a difference between them.
A container is the general concept of creating a sandboxed run environment and can be used as a general term to refer to either Docker or Singularity images. However it is sometimes used to also refer to the specific files being generated. This is probably not ideal, as it can clearly cause confusion to new users.
image is generally used to to refer to the actual files created by singularity build ...
instance refers to a specific way of running singularity images. Normally, if you singularity run some_image.sif or singularity some_image.sif some_command you can't easily access its environment while it's running. However, if you instead run singularity instance start some_image.sif some_instance1 it creates a persistent service that you can access like a docker container. The singularity service/instance documentation has some good examples of how instances are used differently than the basic exec and run commands.