Mock filesystem in ocaml - testing

I am writing code that creates a folder/file structure in ocaml, and I want to write some tests for it. I'd like to not have to create and delete files each time the tests are run, since they cna be run many times.
What would be the best way to go to mock filesystem? I'd be open to have a filesystem in memory or just mock up functions.

Maybe you could use a Makefile to help you.
For instance make test might start by compiling your program, then create the files and folders required for testing, launching your program, and then cleaning the test folder if need be (at that time, you might also want to check if the state of the test folder is as expected).

On linux:
mount -o size=50m -t tmpfs none ./ramdisk
will create a filesystem in ram, size 50M, mounted to ./ramdisk. Only root can do this. Non-root users can use it. It will show up in df and du. You can clean it by doing umount ./ramdisk.
Creation, usage and removal are working just fine, maybe the root requirement is an obstacle.

Related

change the location of the .nextflow folder?

when I run nextflow, I get a .nextflow folder, but I can't find a way to change its location (i.e. it is't -work-dir). How can I change the location of the .nextflow folder?
I have looked at launchDir but it seems that is a read-only implicit variable and cannot be overwritten in the CLI, also, the --launchDir option is only valid for the k8s scope (see original chat in gitter)
I'm using Nextflow 20.10.0 build 5430.
Keeping things neat and tidy is admirable. From this comment, it looks like the only way (without doing crazy things...) is to change to the directory you want your .nextflow cache directory to live and point all other options (i.e. -work-dir, -log etc) away to a separate directory:
If you want .nextflow in dir A and the pipeline work dir in B:
cd A
nextflow run -w B
The .nextflow has to be in the launching
directory to properly maintain the history of the executions.

How to Prevent GitLab Runner From Ever Using /home

I build my runner and it works fine. However, when it initializes, it first clones the project to /home/user/builds/xxxx... I never, ever, ever want GitLab to use /home. Never. Not for anything. I was told that it is impossible to change it to a different location. I find it hard to believe.
See in the image below, it gives me a warning about templates not found in some made up directory, then clones the entire project under the user's home directory. I don't give it that command - so it must be a default. Is there a way to choose ANY other mount point? The project is several hundred gigabytes and the /home directory is 50k. I cannot control that. So to a different mount point it must go.
I can provide the yml etc, but this is about core behavior of the runner itself - not anything I created. I'm hoping it is a simple variable I can send when initializing the runner.
Thank you in advance.

Why does `singularity run/exec` automatically bind specific some directories? What is the use case?

I'm familiar with containers, but new to Singularity and I found myself fighting a broken Python installation in a Singularity container tonight. It turns out that this was because $HOME was being mounted into my container without my knowledge.
I guess that I've developed a liking for the idiom "Explicit is better than implicit" from Python. To me, automatically mounting specific directories is unexpected behavior.
Three questions:
Why does Singularity default to mounting $HOME, /tmp, /proc, etc?
So that I can become more comfortable with Singularity, what are some use cases for this behavior?
I see the --no-home flag, but is there a flag to disable all of the default mounts without needing to change the default Singularity configuration?
It's a mixture of design, convenience and technical necessity.
The biggest reason is that, unless you use certain params that say otherwise, Singularity images are read-only filesystems. You need somewhere to write output and any temporary files that get created along the way. Maybe you know to mount in your output dir, but there are all sorts of files that get created / modified / deleted in the background that we don't ever think about. Implicit automounts give reasonable defaults that work in most situations.
Simplistic example: you're doing a big sort and filter operation on some data, but you're print the results to console so you don't bother to mount in anything but the raw data. But even after some manipulation and filtering, the size of the data exceeds available memory so sort falls back to using small files in /tmp before being deleted when the process finishes. And then it crashes because you can't write to /tmp.
You can require a user to manually specify a what to mount to /tmp on run, or you can use a sane default like /tmp and also allow that to be overridden by the user (SINGULARITY_TMPDIR, -B $PWD/fake_tmp:/tmp, --contain/--containall). These are all also configurable, so the admins can set sane defaults specific the running environment.
There are also technical reasons for some of the mounts. e.g., /etc/passwd and /etc/group are needed to match permissions on the host OS. The docs on bind paths and mounts are actually pretty good and have more specifics on the whats and whys, and even the answer to your third question: --no-mount. The --contain/--containall flags will probably also be of interest. If you really want to deep dive, there are also the admin docs and the source code on github.
A simple but real singularity use case, with explanation:
singularity exec \
--cleanenv \
-H $PWD:/home \
-B /some/local/data:/data \
multiqc.sif \
multiqc -i $SAMPLE_ID /data
--cleanenv / -e: You've already experienced the fun of unexpected mounts, there's also unexpected environment variables! --cleanenv/-e tells Singularity to not persist the host execution environment in the container. You can still use, e.g., SINGULARITYENV_SOMEVAR=23 to have SOMEVAR=23 inside the container though, as that is explicitly set.
-H $PWD:/home: This mounts the current directory into the container to /home and sets HOME=/home. While using --contain/--containall and explicit mounts is probably a better solution, I am lazy and this ensures several things:
the current directory is mounted into the container. The implicit mounting of the working is allowed to fail, and will do so quietly, if the base directory does not exist in the image. e.g., if you're running from /cluster/my-lab/some-project and there is no /cluster inside your image, it will not be mounted in. This is not an issue if using explicit binds directly (-B /cluster/my-lab/some-project) or if an explicit bind has a shared path (-B /cluster/data/experiment-123) with current directory.
the command is executed from the context of the current directory. If $PWD fails to be mounted as described above, singularity uses $HOME as the working directory instead. If both $PWD and $HOME failed to mount, / is used. This can cause problems if you're using relative paths and you aren't where you expected to be. Since it is specific to the path on the host, it can be really annoying when trying to duplicate a problem locally.
the base path is inside the container is always the same regardless of host OS file structure. Consistency is good.
The rest is just the command that's being run, which in this case summarizes the logs from other programs that work with genetic data.

What is the optimal way to store data-files for testing using travis-ci + Docker?

I am trying to set-up the testing of the repository using travis-ci.org and Docker. However, I couldn't find any manuals about what is the politics on memory usage.
To perform a set of tests (test.sh) I need a set of input files to run on, which are very big (up to 1 Gb, but average 500 Mb).
One idea is to wget directly in test.sh script, but for each test-run it would be not efficient to download the input file again and again.
The other idea is to create a separate dockerfile containing the test-files and mount it as a drive, but this would be not nice to push such a big dockerimage in the general register.
Is there a general prescription for such tests?
Have you considered using Travis File Cache?
You can write your test.sh script in a way so that it will only download a test file if it was not available on the local file system yet.
In your .travis.yml file, you specify which directories should be cached after a successful build. Travis will automatically restore that directory and files in it at the beginning of the next build. As your test.sh script will then notice the file exists already, it will simply skip the download and your build should be a little faster.
Note that how the Travis cache works is that it will create an archive file and put it on some cloud storage where it will need to download it later on as well. However, the assumption is that the network traffic will likely be inside that "cloud" and potentially in the same data center as well. This should still give you some benefits in terms of build time and lower use of resources in your own infrastructure.

/tmp files filling up with surefires files

When Jenkins invokes maven build, /tmp fills with 100s of surefire839014140451157473tmp, how to explicitly redirect to another directory during the build. For clover build it fills with 100s of grover53460334580.jar? Any idea to over come this?
And any body know exact step by step to create ramdisk so I could redirect surefire stuffs into that ramdisk ? Will it save write time to hard drive?
Thanks
Many programs respect the TMPDIR (and sometimes TMP) environment variables. Maybe Jenkins uses APIs that respect them? Try:
TMPDIR=/path/to/bigger/filesystem jenkins
when launching Jenkins. (Or however you start it -- does it run as a daemon and have a shell-script to launch it?)
There might be some performance benefit to using a RAM-based filesystem -- ext3, ext4, and similar journaled filesystems will order writes to disk, and even a quick fd=open(O_CREAT); unlink(fd); sequence will probably require both on-disk journal updates and directory updates. (Homework: test this.) A RAM-based filesystem won't perform the journaling, and might or might not write anything to disk (depending upon which one you pick).
There are two main choices: ramfs is a very simple window into the kernel's caching mechanism. There is no disk-based backing for your files at all, and no memory limits. You can fill all your memory with one of these very quickly, and suffer very horrible consequences. (Almost no programs handle out-of-disk well, and the OOM-killer can't free up any of this memory.) See the Linux kernel file Documentation/filesystems/ramfs-rootfs-initramfs.txt.
tmpfs is a slight modification of ramfs -- you can specify an upper limit on the space it can allocate (-o size) and the page cache can swap the data to the swap partitions or swap files -- which is an excellent bonus, as your memory might be significantly better used elsewhere, such as keeping your compiler, linker, source files, and object files in core. See the Linux kernel file Documentation/filesystems/tmpfs.txt.
Adding this line to your /etc/fstab will change /tmp globally:
tmpfs /tmp tmpfs defaults 0 0
(The default is to allow up to half your RAM to be used on the filesystem. Change the defaults if you need to.)
If you want to mount a tmpfs somewhere else, you can; maybe combine that with the TMPDIR environment variable from above or learn about the new shared-subtree features in Documentation/filesystems/sharedsubtree.txt or made easy via pam_namespace to make it visible only to your Jenkins and child processes.