Singularity: What is the difference between an image, a container, and an instance? - singularity-container

I am starting to learn Singularity for reproducible analysis of scientific pipelines. A colleague explained that an image was used to instantiate a container. However, in reading through the documentation and tutorials, the term instance is also used and the usage of image and container seems somewhat interchangeable. So, I am not sure I precisely understand the difference between an image, container, and instance. I do get that a recipe is a text file for building one of these (I think an image?).
For example, on this page it explains:
Now we can build the definition file into an image! Simply run build
and the image will be ready to go:
$ sudo singularity build url-to-pdf-api.img Singularity
Okay, so this uses the recipe Singularity to build an image, with the intuitive extension of .img. However, the help description of the build command states:
$ singularity help build
USAGE: singularity [...] build [build
options...]
The build command
compiles a container per a recipe (definition file) or based on a URI,
location, or archive.
So this seems to indicate we are building a container?
Then, there are image and instance sub-commands.
Are all these terms used interchangeably? It seems sometimes they are and sometimes there is a difference between them.

A container is the general concept of creating a sandboxed run environment and can be used as a general term to refer to either Docker or Singularity images. However it is sometimes used to also refer to the specific files being generated. This is probably not ideal, as it can clearly cause confusion to new users.
image is generally used to to refer to the actual files created by singularity build ...
instance refers to a specific way of running singularity images. Normally, if you singularity run some_image.sif or singularity some_image.sif some_command you can't easily access its environment while it's running. However, if you instead run singularity instance start some_image.sif some_instance1 it creates a persistent service that you can access like a docker container. The singularity service/instance documentation has some good examples of how instances are used differently than the basic exec and run commands.

Related

Why does `singularity run/exec` automatically bind specific some directories? What is the use case?

I'm familiar with containers, but new to Singularity and I found myself fighting a broken Python installation in a Singularity container tonight. It turns out that this was because $HOME was being mounted into my container without my knowledge.
I guess that I've developed a liking for the idiom "Explicit is better than implicit" from Python. To me, automatically mounting specific directories is unexpected behavior.
Three questions:
Why does Singularity default to mounting $HOME, /tmp, /proc, etc?
So that I can become more comfortable with Singularity, what are some use cases for this behavior?
I see the --no-home flag, but is there a flag to disable all of the default mounts without needing to change the default Singularity configuration?
It's a mixture of design, convenience and technical necessity.
The biggest reason is that, unless you use certain params that say otherwise, Singularity images are read-only filesystems. You need somewhere to write output and any temporary files that get created along the way. Maybe you know to mount in your output dir, but there are all sorts of files that get created / modified / deleted in the background that we don't ever think about. Implicit automounts give reasonable defaults that work in most situations.
Simplistic example: you're doing a big sort and filter operation on some data, but you're print the results to console so you don't bother to mount in anything but the raw data. But even after some manipulation and filtering, the size of the data exceeds available memory so sort falls back to using small files in /tmp before being deleted when the process finishes. And then it crashes because you can't write to /tmp.
You can require a user to manually specify a what to mount to /tmp on run, or you can use a sane default like /tmp and also allow that to be overridden by the user (SINGULARITY_TMPDIR, -B $PWD/fake_tmp:/tmp, --contain/--containall). These are all also configurable, so the admins can set sane defaults specific the running environment.
There are also technical reasons for some of the mounts. e.g., /etc/passwd and /etc/group are needed to match permissions on the host OS. The docs on bind paths and mounts are actually pretty good and have more specifics on the whats and whys, and even the answer to your third question: --no-mount. The --contain/--containall flags will probably also be of interest. If you really want to deep dive, there are also the admin docs and the source code on github.
A simple but real singularity use case, with explanation:
singularity exec \
--cleanenv \
-H $PWD:/home \
-B /some/local/data:/data \
multiqc.sif \
multiqc -i $SAMPLE_ID /data
--cleanenv / -e: You've already experienced the fun of unexpected mounts, there's also unexpected environment variables! --cleanenv/-e tells Singularity to not persist the host execution environment in the container. You can still use, e.g., SINGULARITYENV_SOMEVAR=23 to have SOMEVAR=23 inside the container though, as that is explicitly set.
-H $PWD:/home: This mounts the current directory into the container to /home and sets HOME=/home. While using --contain/--containall and explicit mounts is probably a better solution, I am lazy and this ensures several things:
the current directory is mounted into the container. The implicit mounting of the working is allowed to fail, and will do so quietly, if the base directory does not exist in the image. e.g., if you're running from /cluster/my-lab/some-project and there is no /cluster inside your image, it will not be mounted in. This is not an issue if using explicit binds directly (-B /cluster/my-lab/some-project) or if an explicit bind has a shared path (-B /cluster/data/experiment-123) with current directory.
the command is executed from the context of the current directory. If $PWD fails to be mounted as described above, singularity uses $HOME as the working directory instead. If both $PWD and $HOME failed to mount, / is used. This can cause problems if you're using relative paths and you aren't where you expected to be. Since it is specific to the path on the host, it can be really annoying when trying to duplicate a problem locally.
the base path is inside the container is always the same regardless of host OS file structure. Consistency is good.
The rest is just the command that's being run, which in this case summarizes the logs from other programs that work with genetic data.

snakemake - configure rules to run with local container (Singularity, Docker)

For snakemakev5.27+
Is there a way to run snakemake with the container directive that points to a local image? E.g. if I store the Docker containers on Dockerhub, and I also have a copy locally, when running snakemake, I don't want the rule to pull a singularity image copy from DockerHub if there already exists the exact copy locally. Makes for faster runs.
Sure, just pass a relative or absolute file path to the directive.
Even though the snakemake manual doesn't explicitly state it, it is possible to use a local singularity image using the containerized directive.
So instead of the example in the link above:
containerized: "docker://username/myworkflow:1.0.0"
You can point to the singularity sif file path (which contains the image)
containerized: "/path/to/myimage.sif"
Make sure you use --use-singularity when running snakemake.
How to build the singularity (sif) image:
You can build the sif image in various ways as described here, bug as for your question, you can build it from a local docker image.
I.e. you can list your local images by docker images and pick one to build the local sif file like so:
SINGULARITY_NOHTTPS=1 singularity build /path/to/myimage.sif docker-daemon://mydockerimage:latest
Note, it doesn't seem to work straight from local docker container, i.e. I would have expected this to work:
containerized: "docker-daemon://scpipe_docker:latest"
... but it didn't as of snakemake version 6.10.0

Singularity and interior dynamic libraries

I am currently working on getting a bigger (C++) project inside a Singularity container. So far, everything works well, until I try to execute the container image, in which it won't find a dynamic library file that I previously build inside the container:
./MyProject.img
/<some path>/MyExecutable: error while loading shared libraries: libmongocxx.so._noabi: cannot open shared object file: No such file or directory
My first thought was that maybe the process of building this dependency inside the container did somehow not succeed, therefore I added ls /usr/local/lib/ at the end of the %post section of my recipe to check on that, but everything there is fine:
+ ls /usr/local/lib/
[...]
libmongocxx.so
libmongocxx.so.3.6.0
libmongocxx.so._noabi
[...]
So my next thought was that maybe the basic library folder is for some reason not a part of the environment variables of my container, so I extended the %post section with
export PATH=$PATH:/usr/local/lib/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
still to no avail.
Is there some property of Singularity containers I am missing here? Do I need to somehow extract the dynamic library file to outside of the container? Or did I made some stupid mistake I just can't see here?
(I tagged the question only with singularity-container for now as I don't think this is anything specific to C++ here, but if somebody thinks otherwise feel free to add. My container uses Bootstrap: docker From: ubuntu:18.04, should that be relevant.)
Edit: I also explicitely gave the dynamic libraries execution rights, just in case, and printed their rights:
lrwxrwxrwx 1 root root 20 Sep 10 10:51 libmongocxx.so._noabi -> libmongocxx.so.3.6.0
Didn't work either.
My first guess is that your local environment is overwriting the variables in the image. You can use singularity run --cleanenv MyProject.img to prevent your current environment from persisting into the container. If there are variables you do want to pass in there, you can export SINGULARITYENV_SOMEVAR=foo to have SOMEVAR=foo set in the container environment.
If that doesn't do it, modify the %runscript to have a env | sort in there so you can see exactly what's set when it's attempting to run your code.

How to export a container in Singularity

I would like to move an already built container from one machine to another. What is the proper way to migrate the container from one environment to another?
I can find here the image.export command, but this is for an older version of the software. I am using version 3.5.2.
The container I wish to export is a --sandbox container. Is something like that possible?
Singularity allows you to easily convert between a sandbox and a production build.
For example:
singularity build lolcow.sif docker://godlovedc/lolcow # pulls and builds a container
singularity build --sandbox lolcow_sandbox/ lolcow.sif # converts from container to a writable sandbox
singularity build lolcow2 lolcow_sandbox/ # converts from sandbox to container
Once you have a production SIF or SIMG, you can easily transfer the file and convert as necessary.
singularity build generates a file that you can copy between computers just like any other file. The only things it needs is the singularity binary installed on the new host server.
The difference when using --sandbox is that you get a modifiable directory instead of single file. It can still be run elsewhere, but you may want to tar it up first so you're only moving a single file. Then you can untar it and run as normal on the new host.

What space is required to build a Singularity container?

I've got a def file to build a container (within a Vagrant VM). If I build as a sandbox:
sudo singularity build --sandbox mytest/ mytest.def
then the build completes. However, if I build straight to a container:
sudo singularity build mytest.sif mytest.def
then I get an error:
FATAL: While performing build: While creating SIF: while creating container: writing data object for SIF file: copying data object file to SIF file: write mytest.sif: no space left on device
If I try and convert the sandbox to a container:
sudo singularity build mytest.sif mytest/
then I get the same error.
The docs don't give an indication of the amount of space needed for a build vs sandbox. I could increase the size of the Vagrant VM, but it would be good to have an idea how much I should increase it by to ensure that the build is successful
The size is dependent on the image. If you're building from a docker image, you can look at that to get a general idea based on its size. It's important to know where to put the extra drive space, however.
Singularity uses a tmp dir (default: /tmp) and a cache dir (default: $HOME/.singularity/cache) in addition to the directory you're building in. Note that cache dir uses /root/.singularity/cache not your user home on sudo singularity build because of sudo. VMs often have small /, /root, and/or /tmp partitions by default. This has been a gotcha for me in the past and may also be affecting you.
You can use the --tmpdir flag on build to change that to somewhere that has more space if desired (see documentation here).
To change the default cache dir you have to set the environment variable SINGULARITY_CACHEDIR, with details on specifics in the documentation here. You can also set the SINGULARITY_TMPDIR in the same manner instead of using the --tmpdir flag. It is sometimes nice to keep all the environment modifications in one place.