valgrind --track-fds=yes exit code 0 even when there are FD leaks - valgrind

I am trying to set up a CI that will fail if file descriptors leak is detected.
Here is a simple test:
$ valgrind --quiet --track-fds=yes --error-exitcode=1 ./hello_world
hello world!
$ echo $?
0
$ valgrind --quiet --track-fds=yes --error-exitcode=1 ./hello_world_leak
hello world!
==889092== FILE DESCRIPTORS: 4 open (3 std) at exit.
==889092== Open file descriptor 3: /tmp/vg-test/main.cpp
==889092== at 0x4B968DB: open (open64.c:48)
==889092== by 0x109249: main (in /tmp/vg-test/hello_world_leak)
==889092==
==889092==
$ echo $?
0
(the --quiet option supresses the output if 3 std FDs are the only non-closed FDs on exit, which is fine).
As you see, even with --error-exitcode=1 and the FD leak in the program, the valgrind exits with code 0.
My next idea on how to make it work was to write valgrind output to a file, then parse it check if it contains the FILE DESCRIPTORS string:
$ valgrind --quiet --track-fds=yes --error-exitcode=1 --log-file=/tmp/valgrind_out.log ./hello_world
hello world!
$ cat /tmp/valgrind_out.log
==891904== FILE DESCRIPTORS: 4 open (3 std) at exit.
==891904== Open file descriptor 3: /tmp/valgrind_out.log
==891904== <inherited from parent>
==891904==
But here comes the next problem: the file opened for valgrind output (/tmp/valgrind_out.log) is considered open at exit! Same happens when I tee the valgrind's stderr output to a file.
So the only solution I came up with was to parse the output and check that there are exactly 4 open (3 std) FDs at exit.
Are there any less ugly solutions?
EDIT: I've created a wrapper script that does precisely that: github.com/Roman-/valgrind-wrapper. Still looking forward for more elegant solutions.

Related

Setting argv[0] for a valgrind child process?

Is there any way to override the argv[0] specified by valgrind when it execve's the child process?
Why?
I'm using Valgrind with a tool that examines its argv[0] to determine the location of its executable in order to find related executables relative to itself. It exec()s a lot of children, most of which are not of any interest and should not be traced by Valgrind.
I can intercept invocations of the commands of interest by populating a directory with wrapper scripts that call the next executable of the same name on the PATH under the control of valgrind. But valgrind always sets argv[0] to the concrete name of the executable it invoked. I want it to pass the name of the wrapper executable instead, so the child command looks in my wrapper directory for related commands to run.
The usual workaround would be to create a symlink to the real executable from the wrapper dir, then invoke the real executable via the symlink. But that won't work here because that's where the wrapper scripts must exist.
Ugly workaround
So far the only solution I see is to re-exec my wrapper script under valgrind, detect that the wrapper script is running under valgrind, and exec the real target program without wrapping when the script detects it's already running under valgrind. That'll work, but it's ugly. It requires that valgrind --trace-children=yes in order to inspect the actual target, which for my use case is undesirable. And it's expensive to have those short-lived valgrind commands run each wrapper script a second time.
Things I tried
I've tried exec -a /path/to/wrapper/command valgrind /path/to/real/command (bash). But valgrind doesn't seem to notice or care that argv[0] isn't valgrind, and does not pass that on to the child process.
Sample wrapper script with hacky workaround
if [ "${RUNNING_UNDER_VALGRIND:-0}" -eq 0 ]; then
# Find the real executable that's next on the PATH. But don't run it
# yet; instead put its path in the environment so it's available
# when we re-exec ourselves under valgrind.
export NEXT_EXEC="$(type -pafP $mycmd | awk '{ if (NR == 2) { print; exit; } }')"
# Re-exec this wrapper under valgrind's control. Valgrind ignores
# argv[0] so there's no way to directly run NEXT_EXEC under valgrind
# and set its argv[0] to point to our $0.
#
RUNNING_UNDER_VALGRIND=1 exec valgrind --tool=memcheck --trace-children=yes "$0" "$#"
else
# We're under valgrind, so exec the real executable that's next on the
# PATH with an argv[0] that points to this wrapper, so it looks here for
# peer executables when it wants to exec them. We looked up NEXT_EXEC
# in our previous life and put it in the environment.
#
exec -a "$0" "${NEXT_EXEC}" "$#"
fi
Yes that's gross. It'd be possible to make a C executable that did the same thing a bit quicker, but the same issues apply with having to trace children, getting unwanted extra logs, etc.
Edit:
This works, so long as your target program(s) don't care about the executable name itself, only the directory.
NEXT_EXEC="$(type -pafP $mycmd | awk '{ if (NR == 2) { print; exit; } }')"
if ! [ "${0}.real" -ef "${NEXT_EXEC}" ]; then
rm -f "${0}.real"
ln "${NEXT_EXEC}" "${0}.real"
fi
exec valgrind --trace-children=no "${0}.real" "$#"
Edit 2
Beginnings of a valgrind patch to add support for a --with-argv0 argument. When passed, valgrind core will treat the first argument after the executable name as the argv[0] to supply in the target's command line. Normally it puts the executable name there, and treats the client argument list as starting at argv[1].

Gem5,computer architecture

I am trying to run gem5 in FS mode by using command as : "build/ARM/gem5.opt configs/example/fs.py --disk-image=/home/coep/gem5%202/full_system_images/aarch32-ubuntu-natty-headless.img --arm=/home/coep/gem5 2/full_system_images/vmlinux.arm.smp.fb.3.2/vmlinux.arm.smp.fb.3.2"
and getting error as : "Usage: fs.py [options] fs.py: error: option --arm-iset: invalid choice: '/home/coep/gem5' (choose from 'arm', 'thumb', 'aarch64')"
please help me to solve this error.
Thank you.
I assume the --arm=/home/coep/gem5...vmlinux.arm.smp.fb.3.2 argument specifies the path to the guest kernel, in which case it should be --kernel=...:
build/ARM/gem5.opt \
configs/example/fs.py \
--disk-image=/home/coep/gem5\ 2/full_system_images/aarch32-ubuntu-natty-headless.img \
--kernel=/home/coep/gem5\ 2/full_system_images/vmlinux.arm.smp.fb.3.2/vmlinux.arm.smp.fb.3.2
Arguments and their explanations are found in configs/common/Options.py
There can be multiple reasons why are getting this error, One of them can be an incorrect path to the disk image files.
I have run the gem5 in the FS mode and have booted Linux on top of it on Ubuntu 18.04 LTS
You can follow the below steps, the first step is to download and install the full-system binary and disk image files.
1. $ mkdir full_system_image
2. $ cd full_system_image/
3. $ wget http://www.m5sim.org/dist/current/arm/aarch-system-2014-10.tar.bz2
4. $ tar jxf aarch-system-2014-10.tar.bz2
5. $ echo "export M5_PATH=/Path to the full_system_image directory/full_system_images/" >> ~/.bashrc
6. $ source ~/.bashrc
7. $ echo $M5_PATH (- check if the path is set correct)
Now the path has been set, the next step is to run the gem5 in FS mode.
1. connect to gem5 base directory
2. $ ./build/ARM/gem5.opt configs/example/fs.py --disk-image=/home/full_system_image/disks/aarch32-ubuntu-natty-headless.img
3. Note: --disk-image=path to the full_system_image/disks/aarch32-ubuntu-natty-headless.img
4. open a new terminal and listen to port 3456
5. $ telnet localhost 3456
6. Here 3456 is a port number on the gem5 terminal
7. this will take around 30 mins depending on the machine performance.
8. After this, at the end you will get something like this
input: AT Raw Set 2 keyboard as /devices/smb.14/motherboard.15/iofpga.17/1c060000.kmi/serio0/input/input0
input: touchkitPS/2 eGalax Touchscreen as
/devices/smb.14/motherboard.15/iofpga.17/1c070000.kmi/serio1/input/input2
kjournald starting. Commit interval 5 seconds
EXT3-fs (sda1): using internal journal
EXT3-fs (sda1): mounted filesystem with writeback data mode
VFS: Mounted root (ext3 filesystem) on device 8:1.
Freeing unused kernel memory: 292K (806aa000 - 806f3000)
random: init urandom read with 14 bits of entropy available
Ubuntu 11.04 gem5sim ttySA0
9. login as root
Voila, you have run the gem5 in FS mode.

Console output from bash command executed with wsl getting truncated when redirected to a file

I'm attempting to use wsl to execute a bash command from powershell/cmd and capture the output to a file.
When I run wsl -e cat /etc/services the full contents of the file appears correctly in the console.
However, if I run wsl -e cat /etc/services > foo.txt the contents of the foo.txt only contain the first ~ 4k characters from the output. If I run the same command in wsl bash, the foo.txt contains the full content I would expect. I've tried this with a number of wsl commands, and the cutoff point always seems to be about 4k characters. I've also tried wsl -- cat /etc/services > foo.txt with the same results.
Does anyone know why the truncation is happening? More importantly, how do I run a command with wsl and capture the output to a file?

Awk losing posix mode under sudo

This started as an obscure problem with RPM scriptlets occasionally failing on awk. I narrowed it down to the following: The scriptlets use a GNU extension: length(array) construct, not supported when running in the posix mode. OK so far. What I don't understand is how running awk under sudo changes the posix compliance behavior. Here is a simple awk script that should run in the GNU mode, and should fail in posix mode.
$ cat ./try
/bin/awk 'BEGIN{x[1]=foo;x[2]=bar;print length(x);}'
$ /bin/awk --version | grep Awk
GNU Awk 4.0.2
$ id
uid=0(root) gid=0(root) groups=0(root)
$ /bin/sh ./try
awk: cmd. line:1: fatal: length: received array argument
$ sudo /bin/sh ./try
2
$
What is the underlying mechanism that changes the awk behavior?
Awk (really gawk under linux) is being controlled by the POSIXLY_CORRECT environment variable, which was occasionally being inherited from the original user's environment. The installation in question must be run by root, but at times the admin would become root with "su" which keeps the environment, thus keeping his POSIXLY_CORRECT, forcing gawk into a posix mode, and failing the GNU length(array) extension. At other times the admin would run "sudo" or "su -" to become root, start with root's clean environment and successfully run the extended gawk functionality.

Redirect stderr through grep -v in LSF batch job

I'm using a library that generates a whole ton of output to stderr (and there is really no way to suppress the output directly in the code; it is ROOT's Minuit2 minimizer which is known for not having a way to suppress the output). I'm running batch jobs through the LSF system, and the error output files are so big that they exceed my disk quota. Erk.
When I run locally on a shell, I do:
python main.py 2> >( grep -v Minuit2 2>&1 )
to suppress the output, as is done here.
This works great, but unfortunately I can't seem to get that or any variation of it to work when running on LSF. I think this is due to LSF not spawning the necessary subshell, but it's not clear.
I run on batch by passing LSF a submit script. The relevant line is:
python main.py $INPUT_FILE
which works great, aside from the aforementioned problem of gigantic error files.
When I try changing that line to
python main.py $INPUT_FILE 2> >( grep -v Minuit2 2>&1 )
I end up with
./singleSubmit.sh: line 16: syntax error near unexpected token `>'
./singleSubmit.sh: line 16: `python $MAIN $1 2> >( grep -v Minuit2 2>&1 )'
in the error log file.
Any idea how I could accomplish what I want, or why this is not working?
Thanks a ton!
The syntax you're using works in bash, not in csh/tcsh. Try changing the first line of your submission script to
#!/bin/bash