/tmp files filling up with surefires files - surefire

When Jenkins invokes maven build, /tmp fills with 100s of surefire839014140451157473tmp, how to explicitly redirect to another directory during the build. For clover build it fills with 100s of grover53460334580.jar? Any idea to over come this?
And any body know exact step by step to create ramdisk so I could redirect surefire stuffs into that ramdisk ? Will it save write time to hard drive?
Thanks

Many programs respect the TMPDIR (and sometimes TMP) environment variables. Maybe Jenkins uses APIs that respect them? Try:
TMPDIR=/path/to/bigger/filesystem jenkins
when launching Jenkins. (Or however you start it -- does it run as a daemon and have a shell-script to launch it?)
There might be some performance benefit to using a RAM-based filesystem -- ext3, ext4, and similar journaled filesystems will order writes to disk, and even a quick fd=open(O_CREAT); unlink(fd); sequence will probably require both on-disk journal updates and directory updates. (Homework: test this.) A RAM-based filesystem won't perform the journaling, and might or might not write anything to disk (depending upon which one you pick).
There are two main choices: ramfs is a very simple window into the kernel's caching mechanism. There is no disk-based backing for your files at all, and no memory limits. You can fill all your memory with one of these very quickly, and suffer very horrible consequences. (Almost no programs handle out-of-disk well, and the OOM-killer can't free up any of this memory.) See the Linux kernel file Documentation/filesystems/ramfs-rootfs-initramfs.txt.
tmpfs is a slight modification of ramfs -- you can specify an upper limit on the space it can allocate (-o size) and the page cache can swap the data to the swap partitions or swap files -- which is an excellent bonus, as your memory might be significantly better used elsewhere, such as keeping your compiler, linker, source files, and object files in core. See the Linux kernel file Documentation/filesystems/tmpfs.txt.
Adding this line to your /etc/fstab will change /tmp globally:
tmpfs /tmp tmpfs defaults 0 0
(The default is to allow up to half your RAM to be used on the filesystem. Change the defaults if you need to.)
If you want to mount a tmpfs somewhere else, you can; maybe combine that with the TMPDIR environment variable from above or learn about the new shared-subtree features in Documentation/filesystems/sharedsubtree.txt or made easy via pam_namespace to make it visible only to your Jenkins and child processes.

Related

Where to store git repo to run swa efficiently using wsl2?

I'm trying to run my static web app using Windows Subsystem for Linux (2), but I can't figure out where on my computer I should store the git repository to be able to run it decently quickly. I have tried storing it on under /mnt/c/{workfolder}, but it takes several minutes to start up (using npm run start), and I have to rerun to see any changes. This is useless when I'm trying to work...
I have also tried to store it in /mnt/wsl/{workfolder}, and in that case it starts up quickly and I can see my changes without rerunning the app. However, it seems to disappears when I restart my computer.
Where should I store the git repository to be able to run the app quickly and see changes without rerunning? I'm assuming there's something I'm not understanding, help me get this it you know.
You'll want it somewhere on the ext4 partition of the WSL distribution. Typically, the best place is going to be under your WSL /home/<username> folder.
I would recommend:
mkdir ~/src
# or
mkdir ~/projects
# or something similar
Then create subdirectories for each project in that directory.
Why the others don't work:
/mnt/c is the Windows C: drive. That drive is mounted into WSL2 using the 9P network file system, and yes, it's (a) slow, and (b) does not support inotify, so apps cannot register for notifications of changes to files.
/mnt/wsl is a tmpfs mount. It's really there for holding things that need to be shared between all running WSL instances. The auto-generated resolv.conf that you see there is one of those things. You can also use it for copying a file from one WSL distribution to another -- Simply copy the file to /mnt/wsl, start another WSL distribution, and copy or move the file out.
But yes, all tmpfs mounts are ephemeral and will terminate when the last WSL2 distribution/instance terminates.

Why does `singularity run/exec` automatically bind specific some directories? What is the use case?

I'm familiar with containers, but new to Singularity and I found myself fighting a broken Python installation in a Singularity container tonight. It turns out that this was because $HOME was being mounted into my container without my knowledge.
I guess that I've developed a liking for the idiom "Explicit is better than implicit" from Python. To me, automatically mounting specific directories is unexpected behavior.
Three questions:
Why does Singularity default to mounting $HOME, /tmp, /proc, etc?
So that I can become more comfortable with Singularity, what are some use cases for this behavior?
I see the --no-home flag, but is there a flag to disable all of the default mounts without needing to change the default Singularity configuration?
It's a mixture of design, convenience and technical necessity.
The biggest reason is that, unless you use certain params that say otherwise, Singularity images are read-only filesystems. You need somewhere to write output and any temporary files that get created along the way. Maybe you know to mount in your output dir, but there are all sorts of files that get created / modified / deleted in the background that we don't ever think about. Implicit automounts give reasonable defaults that work in most situations.
Simplistic example: you're doing a big sort and filter operation on some data, but you're print the results to console so you don't bother to mount in anything but the raw data. But even after some manipulation and filtering, the size of the data exceeds available memory so sort falls back to using small files in /tmp before being deleted when the process finishes. And then it crashes because you can't write to /tmp.
You can require a user to manually specify a what to mount to /tmp on run, or you can use a sane default like /tmp and also allow that to be overridden by the user (SINGULARITY_TMPDIR, -B $PWD/fake_tmp:/tmp, --contain/--containall). These are all also configurable, so the admins can set sane defaults specific the running environment.
There are also technical reasons for some of the mounts. e.g., /etc/passwd and /etc/group are needed to match permissions on the host OS. The docs on bind paths and mounts are actually pretty good and have more specifics on the whats and whys, and even the answer to your third question: --no-mount. The --contain/--containall flags will probably also be of interest. If you really want to deep dive, there are also the admin docs and the source code on github.
A simple but real singularity use case, with explanation:
singularity exec \
--cleanenv \
-H $PWD:/home \
-B /some/local/data:/data \
multiqc.sif \
multiqc -i $SAMPLE_ID /data
--cleanenv / -e: You've already experienced the fun of unexpected mounts, there's also unexpected environment variables! --cleanenv/-e tells Singularity to not persist the host execution environment in the container. You can still use, e.g., SINGULARITYENV_SOMEVAR=23 to have SOMEVAR=23 inside the container though, as that is explicitly set.
-H $PWD:/home: This mounts the current directory into the container to /home and sets HOME=/home. While using --contain/--containall and explicit mounts is probably a better solution, I am lazy and this ensures several things:
the current directory is mounted into the container. The implicit mounting of the working is allowed to fail, and will do so quietly, if the base directory does not exist in the image. e.g., if you're running from /cluster/my-lab/some-project and there is no /cluster inside your image, it will not be mounted in. This is not an issue if using explicit binds directly (-B /cluster/my-lab/some-project) or if an explicit bind has a shared path (-B /cluster/data/experiment-123) with current directory.
the command is executed from the context of the current directory. If $PWD fails to be mounted as described above, singularity uses $HOME as the working directory instead. If both $PWD and $HOME failed to mount, / is used. This can cause problems if you're using relative paths and you aren't where you expected to be. Since it is specific to the path on the host, it can be really annoying when trying to duplicate a problem locally.
the base path is inside the container is always the same regardless of host OS file structure. Consistency is good.
The rest is just the command that's being run, which in this case summarizes the logs from other programs that work with genetic data.

What is the optimal way to store data-files for testing using travis-ci + Docker?

I am trying to set-up the testing of the repository using travis-ci.org and Docker. However, I couldn't find any manuals about what is the politics on memory usage.
To perform a set of tests (test.sh) I need a set of input files to run on, which are very big (up to 1 Gb, but average 500 Mb).
One idea is to wget directly in test.sh script, but for each test-run it would be not efficient to download the input file again and again.
The other idea is to create a separate dockerfile containing the test-files and mount it as a drive, but this would be not nice to push such a big dockerimage in the general register.
Is there a general prescription for such tests?
Have you considered using Travis File Cache?
You can write your test.sh script in a way so that it will only download a test file if it was not available on the local file system yet.
In your .travis.yml file, you specify which directories should be cached after a successful build. Travis will automatically restore that directory and files in it at the beginning of the next build. As your test.sh script will then notice the file exists already, it will simply skip the download and your build should be a little faster.
Note that how the Travis cache works is that it will create an archive file and put it on some cloud storage where it will need to download it later on as well. However, the assumption is that the network traffic will likely be inside that "cloud" and potentially in the same data center as well. This should still give you some benefits in terms of build time and lower use of resources in your own infrastructure.

Mock filesystem in ocaml

I am writing code that creates a folder/file structure in ocaml, and I want to write some tests for it. I'd like to not have to create and delete files each time the tests are run, since they cna be run many times.
What would be the best way to go to mock filesystem? I'd be open to have a filesystem in memory or just mock up functions.
Maybe you could use a Makefile to help you.
For instance make test might start by compiling your program, then create the files and folders required for testing, launching your program, and then cleaning the test folder if need be (at that time, you might also want to check if the state of the test folder is as expected).
On linux:
mount -o size=50m -t tmpfs none ./ramdisk
will create a filesystem in ram, size 50M, mounted to ./ramdisk. Only root can do this. Non-root users can use it. It will show up in df and du. You can clean it by doing umount ./ramdisk.
Creation, usage and removal are working just fine, maybe the root requirement is an obstacle.

How to use persistent heap images to make loading of theories faster in Isabelle/jEdit?

Let's assume I have a directory isabelle_afp where a lot of theories are stored. This directory is a library and I do not plan to change the files in it. I want to speed up the start-up time of Isabelle/jEdit (by default, all theories in isabelle_afp my current theory depends on are processed anew).
How can I skip this step? The system manual tells me to build a persistent heap image. What is the easiest way to do so?
And how can I tell Isabelle/jEdit to load this heap image?
Isabelle/jEdit in Isabelle2013 already takes care of building your heap images, by a relatively basic mechanism that uses the isabelle build_dialog tool internally (which has a separate entry in the cited documentation).
You have two main possibilities doing it without using isabelle build_dialog or the isabelle build power-tool manually:
The jEdit dialog "Utilities / Options / Plugin Options / Isabelle / General" provides a choice for "Logic", with a tiny tool tip saying that you have to restart the application after changing it. Doing that, the heap image will be produced on restart.
The command line option -l, e.g. isabelle jedit -l HOL-Word
For AFP sessions you need to tell the system separately about session directories. This can be done on the command line via isabelle jedit -d DIR1 -d DIR2 or in your $ISABELLE_HOME_USER/ROOTS file (list each directory on a separate line).
A pure command-line solution would look like this:
isabelle jedit -d isabelle_afp -l Simpl
Note that in this example, isabelle_afp is a (relative or absolute) directory name, while Simpl is the logical session name.
First, you need to set up a "session" for your isabelle_afp directory. This is done by creating a file ROOT (inside isabelle_afp) which contains an entry of the following shape (see also isabelle doc system Chapter 3: Isabelle sessions and build management)
session session_name = HOL +
theories
Theory1
Theory2
Theory3
This roughly means that the heap image session_name should be based on the HOL heap image and additionally contain the theories Theory1, Theory2, ...
Now invoke isabelle jedit -d isabelle_afp -l session_name. When done for the first time, this builds the heap image of session session_name. As long as nothing in isabelle_afp changes, any further invocations will directly start Isabelle/jEdit on top of the prebuilt heap image session_name.