What space is required to build a Singularity container? - singularity-container

I've got a def file to build a container (within a Vagrant VM). If I build as a sandbox:
sudo singularity build --sandbox mytest/ mytest.def
then the build completes. However, if I build straight to a container:
sudo singularity build mytest.sif mytest.def
then I get an error:
FATAL: While performing build: While creating SIF: while creating container: writing data object for SIF file: copying data object file to SIF file: write mytest.sif: no space left on device
If I try and convert the sandbox to a container:
sudo singularity build mytest.sif mytest/
then I get the same error.
The docs don't give an indication of the amount of space needed for a build vs sandbox. I could increase the size of the Vagrant VM, but it would be good to have an idea how much I should increase it by to ensure that the build is successful

The size is dependent on the image. If you're building from a docker image, you can look at that to get a general idea based on its size. It's important to know where to put the extra drive space, however.
Singularity uses a tmp dir (default: /tmp) and a cache dir (default: $HOME/.singularity/cache) in addition to the directory you're building in. Note that cache dir uses /root/.singularity/cache not your user home on sudo singularity build because of sudo. VMs often have small /, /root, and/or /tmp partitions by default. This has been a gotcha for me in the past and may also be affecting you.
You can use the --tmpdir flag on build to change that to somewhere that has more space if desired (see documentation here).
To change the default cache dir you have to set the environment variable SINGULARITY_CACHEDIR, with details on specifics in the documentation here. You can also set the SINGULARITY_TMPDIR in the same manner instead of using the --tmpdir flag. It is sometimes nice to keep all the environment modifications in one place.

Related

Why does `singularity run/exec` automatically bind specific some directories? What is the use case?

I'm familiar with containers, but new to Singularity and I found myself fighting a broken Python installation in a Singularity container tonight. It turns out that this was because $HOME was being mounted into my container without my knowledge.
I guess that I've developed a liking for the idiom "Explicit is better than implicit" from Python. To me, automatically mounting specific directories is unexpected behavior.
Three questions:
Why does Singularity default to mounting $HOME, /tmp, /proc, etc?
So that I can become more comfortable with Singularity, what are some use cases for this behavior?
I see the --no-home flag, but is there a flag to disable all of the default mounts without needing to change the default Singularity configuration?
It's a mixture of design, convenience and technical necessity.
The biggest reason is that, unless you use certain params that say otherwise, Singularity images are read-only filesystems. You need somewhere to write output and any temporary files that get created along the way. Maybe you know to mount in your output dir, but there are all sorts of files that get created / modified / deleted in the background that we don't ever think about. Implicit automounts give reasonable defaults that work in most situations.
Simplistic example: you're doing a big sort and filter operation on some data, but you're print the results to console so you don't bother to mount in anything but the raw data. But even after some manipulation and filtering, the size of the data exceeds available memory so sort falls back to using small files in /tmp before being deleted when the process finishes. And then it crashes because you can't write to /tmp.
You can require a user to manually specify a what to mount to /tmp on run, or you can use a sane default like /tmp and also allow that to be overridden by the user (SINGULARITY_TMPDIR, -B $PWD/fake_tmp:/tmp, --contain/--containall). These are all also configurable, so the admins can set sane defaults specific the running environment.
There are also technical reasons for some of the mounts. e.g., /etc/passwd and /etc/group are needed to match permissions on the host OS. The docs on bind paths and mounts are actually pretty good and have more specifics on the whats and whys, and even the answer to your third question: --no-mount. The --contain/--containall flags will probably also be of interest. If you really want to deep dive, there are also the admin docs and the source code on github.
A simple but real singularity use case, with explanation:
singularity exec \
--cleanenv \
-H $PWD:/home \
-B /some/local/data:/data \
multiqc.sif \
multiqc -i $SAMPLE_ID /data
--cleanenv / -e: You've already experienced the fun of unexpected mounts, there's also unexpected environment variables! --cleanenv/-e tells Singularity to not persist the host execution environment in the container. You can still use, e.g., SINGULARITYENV_SOMEVAR=23 to have SOMEVAR=23 inside the container though, as that is explicitly set.
-H $PWD:/home: This mounts the current directory into the container to /home and sets HOME=/home. While using --contain/--containall and explicit mounts is probably a better solution, I am lazy and this ensures several things:
the current directory is mounted into the container. The implicit mounting of the working is allowed to fail, and will do so quietly, if the base directory does not exist in the image. e.g., if you're running from /cluster/my-lab/some-project and there is no /cluster inside your image, it will not be mounted in. This is not an issue if using explicit binds directly (-B /cluster/my-lab/some-project) or if an explicit bind has a shared path (-B /cluster/data/experiment-123) with current directory.
the command is executed from the context of the current directory. If $PWD fails to be mounted as described above, singularity uses $HOME as the working directory instead. If both $PWD and $HOME failed to mount, / is used. This can cause problems if you're using relative paths and you aren't where you expected to be. Since it is specific to the path on the host, it can be really annoying when trying to duplicate a problem locally.
the base path is inside the container is always the same regardless of host OS file structure. Consistency is good.
The rest is just the command that's being run, which in this case summarizes the logs from other programs that work with genetic data.

snakemake - configure rules to run with local container (Singularity, Docker)

For snakemakev5.27+
Is there a way to run snakemake with the container directive that points to a local image? E.g. if I store the Docker containers on Dockerhub, and I also have a copy locally, when running snakemake, I don't want the rule to pull a singularity image copy from DockerHub if there already exists the exact copy locally. Makes for faster runs.
Sure, just pass a relative or absolute file path to the directive.
Even though the snakemake manual doesn't explicitly state it, it is possible to use a local singularity image using the containerized directive.
So instead of the example in the link above:
containerized: "docker://username/myworkflow:1.0.0"
You can point to the singularity sif file path (which contains the image)
containerized: "/path/to/myimage.sif"
Make sure you use --use-singularity when running snakemake.
How to build the singularity (sif) image:
You can build the sif image in various ways as described here, bug as for your question, you can build it from a local docker image.
I.e. you can list your local images by docker images and pick one to build the local sif file like so:
SINGULARITY_NOHTTPS=1 singularity build /path/to/myimage.sif docker-daemon://mydockerimage:latest
Note, it doesn't seem to work straight from local docker container, i.e. I would have expected this to work:
containerized: "docker-daemon://scpipe_docker:latest"
... but it didn't as of snakemake version 6.10.0

Singularity Recipe: How to access executable within container?

I am a beginner with Singularity.
What I want to achieve in the long run: I have a programming project with a long lists of dependencies, and I want to be able to give the program to other people in my company without there being bugs caused by missing dependencies, or wrong versions of dependencies.
The idea was now to use Singularity in order to easily provide a working environment.
In order to test this, I wrote a Hello World application which I now want to run in a container. I have a folder HelloWorld/ which contains the source code for a C++ Qt project. Then I wrote the following recipe file:
project.recipe
Bootstrap: docker
From: ubuntu:18.04
%setup
cp -R <some_folder>/HelloWorld ${SINGULARITY_ROOTFS}/HelloWorld
%post
apt update
apt-get install -y qt5-default
apt install -y g++
apt-get install -y build-essential
cd HelloWorld
qmake
make
echo "after build:"
ls
%runscript
echo "before execution:"
ls HelloWorld/
./HelloWorld/HelloWorld
where the echos and directory listings are for my current debugging process.
I can sucessfully build an image file using sudo singularity build --writable project.img project.recipe. (My debugging output shows me that the executable was build successfully.)
The problem is now that if I try to run it using ./project.img, or singularity run project.img, it won't find the executable.
Using my debugging output, I found out that the lines in %runscript use the folders outside of the container.
Tutorials like https://sylabs.io/guides/3.1/user-guide/build_a_container.html made it seem to me as if my recipe was the way to go, but apparently it isn't?
My questions:
Is there some way for me to access my executable? Am I calling it wrong?
Is the way I do it the way it is supposed to be done? Or would one normally do something like getting the executable outside of the container and then use the container to call that outside file? Or is there a different best practice?
If the executable is to be copied outside of the container after compilation, how do I do that? How do I access outside folders when I'm within %post?
Is this the best work process for what I want to achieve? Later on, my idea is that the big project is copied likewise in the container, dependencies are either installed or copied, then the project is compiled and finally its source being deleted. I also considered using a repository, but I can't have the project being in an open repository, and I don't want to store any passwords.
Firstly, use %files, don't use %setup. %setup is run as root and can directly modify the host server. You can very easily and accidentally break things without realizing it. You can get the same effect this way:
%files
some_folder/HelloWorld /HelloWorld
You are calling it wrong. In your %setup (and hopefully now in your %files) steps, you are copying the data to /HelloWorld. In your %runscript your are calling ./HelloWorld/HelloWorld which is the equivalent of $PWD/HelloWorld/HelloWorld. Since singularity automatically mounts in $PWD (as well as $HOME and some other directories), you are not calling what you're trying to call.
You don't copy the executable outside of the container, you just need to make sure what you're executing is where you think it is.
There is no access to the host filesystem in %post, you should have everything you need copied in via %files first.
That's a reasonable workflow. Having a local private repo for the code is probably a good idea for tracking your changes, but that's your call.

How to export a container in Singularity

I would like to move an already built container from one machine to another. What is the proper way to migrate the container from one environment to another?
I can find here the image.export command, but this is for an older version of the software. I am using version 3.5.2.
The container I wish to export is a --sandbox container. Is something like that possible?
Singularity allows you to easily convert between a sandbox and a production build.
For example:
singularity build lolcow.sif docker://godlovedc/lolcow # pulls and builds a container
singularity build --sandbox lolcow_sandbox/ lolcow.sif # converts from container to a writable sandbox
singularity build lolcow2 lolcow_sandbox/ # converts from sandbox to container
Once you have a production SIF or SIMG, you can easily transfer the file and convert as necessary.
singularity build generates a file that you can copy between computers just like any other file. The only things it needs is the singularity binary installed on the new host server.
The difference when using --sandbox is that you get a modifiable directory instead of single file. It can still be run elsewhere, but you may want to tar it up first so you're only moving a single file. Then you can untar it and run as normal on the new host.

Singularity: What is the difference between an image, a container, and an instance?

I am starting to learn Singularity for reproducible analysis of scientific pipelines. A colleague explained that an image was used to instantiate a container. However, in reading through the documentation and tutorials, the term instance is also used and the usage of image and container seems somewhat interchangeable. So, I am not sure I precisely understand the difference between an image, container, and instance. I do get that a recipe is a text file for building one of these (I think an image?).
For example, on this page it explains:
Now we can build the definition file into an image! Simply run build
and the image will be ready to go:
$ sudo singularity build url-to-pdf-api.img Singularity
Okay, so this uses the recipe Singularity to build an image, with the intuitive extension of .img. However, the help description of the build command states:
$ singularity help build
USAGE: singularity [...] build [build
options...]
The build command
compiles a container per a recipe (definition file) or based on a URI,
location, or archive.
So this seems to indicate we are building a container?
Then, there are image and instance sub-commands.
Are all these terms used interchangeably? It seems sometimes they are and sometimes there is a difference between them.
A container is the general concept of creating a sandboxed run environment and can be used as a general term to refer to either Docker or Singularity images. However it is sometimes used to also refer to the specific files being generated. This is probably not ideal, as it can clearly cause confusion to new users.
image is generally used to to refer to the actual files created by singularity build ...
instance refers to a specific way of running singularity images. Normally, if you singularity run some_image.sif or singularity some_image.sif some_command you can't easily access its environment while it's running. However, if you instead run singularity instance start some_image.sif some_instance1 it creates a persistent service that you can access like a docker container. The singularity service/instance documentation has some good examples of how instances are used differently than the basic exec and run commands.