Having host filesystem visible to singularity container - singularity-container

I use singularity images that do not require any binding of the desired host path, i.e.
singularity exec image.simg IMAGE_COMMAND -i $PWD/input_file -o ./output_dir
simply works like any other command on the "input_file" in my host system, also using relative paths as in "-o".
I'm not comfortable enough with Singularity and its jargon to understand how this is made.
Is a configuration done in singularity.conf?
How is this feature called? (is it "MOUNT HOSTFS"?)

By default, both your home and current directories are mounted/bound into the image for you. You can modify this in singularity.conf. Details on the settings are available in the admin documentation.
The MOUNT HOSTFS in the config is a toggle to automatically mount all host filesystems into the image. MOUNT HOME is the corresponding setting for auto-mounting the user's HOME directory.
You can see which files/directories are currently being mounted by using the --verbose option with your singularity command.

Related

Change singularity home directory to a folder within the container

Background
I have a singularity container that was created from a docker image. The docker image has files that are meant to be in the user's home directory (e.g. in $HOME/.files). Because I don't know what the username will be, I put the files in /opt in the container and want to set the user's home to /opt.
I would like to be able to run the container with /opt as the home directory, OR somehow be able to run the container so that the home directory contains the files that already exist within the container
What I have tried:
use the --home flag : This maps a folder on the host as the home directory, rather than a folder in the container.
try overriding the $HOME environment variable with --env HOME=/opt : I get the error Overriding HOME environment variable with SINGULARITYENV_HOME is not permitted
Other questions
this question is related, but interested in mapping the container's home folder to a folder on the host machine
You can use the $HOME shell environment variable when you bind the two directories.
singularity exec -B $HOME:/opt example_container.sif touch /opt/file
Here is the documentation on singularity's bind feature

Why does `singularity run/exec` automatically bind specific some directories? What is the use case?

I'm familiar with containers, but new to Singularity and I found myself fighting a broken Python installation in a Singularity container tonight. It turns out that this was because $HOME was being mounted into my container without my knowledge.
I guess that I've developed a liking for the idiom "Explicit is better than implicit" from Python. To me, automatically mounting specific directories is unexpected behavior.
Three questions:
Why does Singularity default to mounting $HOME, /tmp, /proc, etc?
So that I can become more comfortable with Singularity, what are some use cases for this behavior?
I see the --no-home flag, but is there a flag to disable all of the default mounts without needing to change the default Singularity configuration?
It's a mixture of design, convenience and technical necessity.
The biggest reason is that, unless you use certain params that say otherwise, Singularity images are read-only filesystems. You need somewhere to write output and any temporary files that get created along the way. Maybe you know to mount in your output dir, but there are all sorts of files that get created / modified / deleted in the background that we don't ever think about. Implicit automounts give reasonable defaults that work in most situations.
Simplistic example: you're doing a big sort and filter operation on some data, but you're print the results to console so you don't bother to mount in anything but the raw data. But even after some manipulation and filtering, the size of the data exceeds available memory so sort falls back to using small files in /tmp before being deleted when the process finishes. And then it crashes because you can't write to /tmp.
You can require a user to manually specify a what to mount to /tmp on run, or you can use a sane default like /tmp and also allow that to be overridden by the user (SINGULARITY_TMPDIR, -B $PWD/fake_tmp:/tmp, --contain/--containall). These are all also configurable, so the admins can set sane defaults specific the running environment.
There are also technical reasons for some of the mounts. e.g., /etc/passwd and /etc/group are needed to match permissions on the host OS. The docs on bind paths and mounts are actually pretty good and have more specifics on the whats and whys, and even the answer to your third question: --no-mount. The --contain/--containall flags will probably also be of interest. If you really want to deep dive, there are also the admin docs and the source code on github.
A simple but real singularity use case, with explanation:
singularity exec \
--cleanenv \
-H $PWD:/home \
-B /some/local/data:/data \
multiqc.sif \
multiqc -i $SAMPLE_ID /data
--cleanenv / -e: You've already experienced the fun of unexpected mounts, there's also unexpected environment variables! --cleanenv/-e tells Singularity to not persist the host execution environment in the container. You can still use, e.g., SINGULARITYENV_SOMEVAR=23 to have SOMEVAR=23 inside the container though, as that is explicitly set.
-H $PWD:/home: This mounts the current directory into the container to /home and sets HOME=/home. While using --contain/--containall and explicit mounts is probably a better solution, I am lazy and this ensures several things:
the current directory is mounted into the container. The implicit mounting of the working is allowed to fail, and will do so quietly, if the base directory does not exist in the image. e.g., if you're running from /cluster/my-lab/some-project and there is no /cluster inside your image, it will not be mounted in. This is not an issue if using explicit binds directly (-B /cluster/my-lab/some-project) or if an explicit bind has a shared path (-B /cluster/data/experiment-123) with current directory.
the command is executed from the context of the current directory. If $PWD fails to be mounted as described above, singularity uses $HOME as the working directory instead. If both $PWD and $HOME failed to mount, / is used. This can cause problems if you're using relative paths and you aren't where you expected to be. Since it is specific to the path on the host, it can be really annoying when trying to duplicate a problem locally.
the base path is inside the container is always the same regardless of host OS file structure. Consistency is good.
The rest is just the command that's being run, which in this case summarizes the logs from other programs that work with genetic data.

Singularity sandbox file management

Fed up with struggling with lib install/dependencies problems, I'm starting working with Singularity.
Though, I'm not sure I understand precisely how it works regarding files management in the sandbox mode (not data, programs).
For example, I designed a very simple definition file that is just a "naked" Debian:
Bootstrap: library
From: debian
%post
apt-get update
I create a sandbox with this to add stuff:
sudo singularity build --sandbox Test/ naked_Debian.def
And I try to install a program. But what I don't understand is that I managed to do it, removed the sandbox directory but I think there are still files that were created during the sandbox life (in /dev, /run, /root, etc.). For example, the program that I cloned from git is now in /root of my local (independently of any container).
From what I understood, everything was in the container and should disappear if I remove the sandbox directory. Otherwise, I'm gonna leave a lot of mess with all the tests? And then I can't port the container from system to another.
If I create any new sandbox directory, the program is already there.
Cheers,
Mathieu
By default, singularity mounts $HOME to the container and uses that path as the working directory for singularity shell / exec. Since you're running the sandbox with sudo, /root is being mounted in and that's where any repos you cloned would end up if you didn't cd to a different directory. /tmp is also automatically mounted in, though that is unlikely to cause an issue since it's just temp files.
You have a few options to avoid files ending up in places you don't expect.
Disable automount of home: singularity shell --no-home ...
The default working directory is now / instead of $HOME and files are created directly in sandbox (as opposed to a mounted in directory)
If you want to get files out of the sandbox, you'll either need to copy to /tmp inside the container, and on the host OS from /tmp to the desired location
Set a different location to use as home: singularity shell --home $PWD ...
This mounts in and uses the current directory as $HOME instead of the user's $HOME on the host OS
Simpler to move files between host OS and container, but still creates files in the host OS
Don't mount system directories at all: singularity shell --contain --workdir /some/dir ...
Directories for /tmp and /var/tmp are created inside /some/dir instead of using /tmp and /var/tmp on the host. $HOME has the same path as the host and is used as the working directory, but it is empty and separate from the host OS
Complete separation from host OS, while still allowing some access between container and OS
Additional details on these options can be found in the documentation.

snakemake - configure rules to run with local container (Singularity, Docker)

For snakemakev5.27+
Is there a way to run snakemake with the container directive that points to a local image? E.g. if I store the Docker containers on Dockerhub, and I also have a copy locally, when running snakemake, I don't want the rule to pull a singularity image copy from DockerHub if there already exists the exact copy locally. Makes for faster runs.
Sure, just pass a relative or absolute file path to the directive.
Even though the snakemake manual doesn't explicitly state it, it is possible to use a local singularity image using the containerized directive.
So instead of the example in the link above:
containerized: "docker://username/myworkflow:1.0.0"
You can point to the singularity sif file path (which contains the image)
containerized: "/path/to/myimage.sif"
Make sure you use --use-singularity when running snakemake.
How to build the singularity (sif) image:
You can build the sif image in various ways as described here, bug as for your question, you can build it from a local docker image.
I.e. you can list your local images by docker images and pick one to build the local sif file like so:
SINGULARITY_NOHTTPS=1 singularity build /path/to/myimage.sif docker-daemon://mydockerimage:latest
Note, it doesn't seem to work straight from local docker container, i.e. I would have expected this to work:
containerized: "docker-daemon://scpipe_docker:latest"
... but it didn't as of snakemake version 6.10.0

What space is required to build a Singularity container?

I've got a def file to build a container (within a Vagrant VM). If I build as a sandbox:
sudo singularity build --sandbox mytest/ mytest.def
then the build completes. However, if I build straight to a container:
sudo singularity build mytest.sif mytest.def
then I get an error:
FATAL: While performing build: While creating SIF: while creating container: writing data object for SIF file: copying data object file to SIF file: write mytest.sif: no space left on device
If I try and convert the sandbox to a container:
sudo singularity build mytest.sif mytest/
then I get the same error.
The docs don't give an indication of the amount of space needed for a build vs sandbox. I could increase the size of the Vagrant VM, but it would be good to have an idea how much I should increase it by to ensure that the build is successful
The size is dependent on the image. If you're building from a docker image, you can look at that to get a general idea based on its size. It's important to know where to put the extra drive space, however.
Singularity uses a tmp dir (default: /tmp) and a cache dir (default: $HOME/.singularity/cache) in addition to the directory you're building in. Note that cache dir uses /root/.singularity/cache not your user home on sudo singularity build because of sudo. VMs often have small /, /root, and/or /tmp partitions by default. This has been a gotcha for me in the past and may also be affecting you.
You can use the --tmpdir flag on build to change that to somewhere that has more space if desired (see documentation here).
To change the default cache dir you have to set the environment variable SINGULARITY_CACHEDIR, with details on specifics in the documentation here. You can also set the SINGULARITY_TMPDIR in the same manner instead of using the --tmpdir flag. It is sometimes nice to keep all the environment modifications in one place.