Valgrind suppression and return code - valgrind

It looks like valgrind returns non-zero return code when it detects memory-leak even though they are listed in the suppression file.
No-errors are displayed but yet the return code is 134. This fails all my builds in jenkins... Is there a way around this or am I doing something wrong?

You are very probably doing something wrong (or maybe using a buggy old version of valgrind, the below is with the just released 3.12) :
valgrind --leak-check=full --errors-for-leak-kinds=all --error-exitcode=33
--suppressions=t.supp ./memcheck/tests/trivialleak
...
==22750== suppressed: 1,000 bytes in 1,000 blocks
...
echo $?
0
while without suppression file:
valgrind --leak-check=full --errors-for-leak-kinds=all --error-exitcode=33
./memcheck/tests/trivialleak
...
==22760== 1,000 bytes in 1,000 blocks are definitely lost in loss record 1 of 1
...
echo $?
33

Related

Valgrind(memcheck) not showing all contexts

My last context/error I see in my valgrind output file is...
==3030== 1075 errors in context 61 of 540:
==3030== Syscall param ioctl(SIOCETHTOOL,ir) points to uninitialised byte(s)
==3030== at 0x7525248: ioctl (syscall-template.S:84)
==3030== by 0x686A2A7: ??? (in /lib/libpal.so)
==3030== Address 0x96cf958 is on thread 16's stack
==3030== Uninitialised value was created by a stack allocation
==3030== at 0x686A20C: ??? (in /lib/libpal.so)
...but I don't see error contexts 62 - 540. My first thought was maybe in closing the program, valgrind crashed, but after this context it printed the ERROR SUMMARY
ERROR SUMMARY: 9733 errors from 540 contexts (suppressed: 0 from 0)
I don't think it's because we came across a frame without debug info because I can see this exact same issue get hit the first time at the very beginning of my output file. Or maybe the printing of error contexts specifically, is halted when a stacktrace has missing debug info?
Any ideas? Need an additional command line argument for valgrind? I know in helgrind it'll quit after seeing 1000000 errors(something like that) but it explicitly tells you what it's doing.
So for my version of valgrind I also executed helgrind and saw all contexts(647) as expected. I think the problem above is simply a result of valgrind coming across a frame with no debug symbols and saying, "If there's no debug info, I'm moving on"
All of my logs I'm producing end with this same libpal frame at various context numbers 100-something, 200-something, etc.

Memory problems with LibGit2 initialization

When I initialize and shutdown LibGit2 I am left with reachable memory and/or errors.
My test systems are Ubuntu 18.04 with libgit2 0.26 where g++ -v gives me gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1) and a FreeBSD 11.3 VM with libgit 0.28.3 where, unfortunately, I can't copy & paste from. Here g++ -v gives gcc version 9.2.0 (FreeBSD Ports Collection.
This is a minimal example:
#include <git2.h>
int main () {
git_libgit2_init();
git_libgit2_shutdown();
return 0;
}
On Ubuntu I run the following:
➜ libelektra git:(libgit_test) ✗ g++ minimal.c -lgit2 && valgrind ./a.out
==1174== Memcheck, a memory error detector
==1174== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1174== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==1174== Command: ./a.out
==1174==
==1174==
==1174== HEAP SUMMARY:
==1174== in use at exit: 192 bytes in 12 blocks
==1174== total heap usage: 1,354 allocs, 1,342 frees, 107,044 bytes allocated
==1174==
==1174== LEAK SUMMARY:
==1174== definitely lost: 0 bytes in 0 blocks
==1174== indirectly lost: 0 bytes in 0 blocks
==1174== possibly lost: 0 bytes in 0 blocks
==1174== still reachable: 192 bytes in 12 blocks
==1174== suppressed: 0 bytes in 0 blocks
==1174== Rerun with --leak-check=full to see details of leaked memory
==1174==
==1174== For counts of detected and suppressed errors, rerun with: -v
==1174== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Why do I have reachable memory, when the very first example from the documentation says that git_libgit2_shutdown(); should clean everything up?
While the Valgrind documentation says that some reachable memory might be ok, things get quite wild on FreeBSD. I have some screenshots of the VM
One Two Three.
How can I avoid this?
One additional remark on different memory handling. My goal is to use the git_merge_file function in this project. It should look something like this:
#include <git2.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
int main () {
git_libgit2_init();
sleep (1);
git_merge_file_result out = { 0 }; // out.ptr will not receive a terminating null character
git_merge_file_input libgit_base;
git_merge_file_input libgit_our;
git_merge_file_input libgit_their;
git_merge_file_init_input(&libgit_base, GIT_MERGE_FILE_INPUT_VERSION);
git_merge_file_init_input(&libgit_our, GIT_MERGE_FILE_INPUT_VERSION);
git_merge_file_init_input(&libgit_their, GIT_MERGE_FILE_INPUT_VERSION);
libgit_base.ptr = "A";
libgit_base.size = strlen("A");
libgit_our.ptr = "A";
libgit_our.size = strlen("A");
libgit_their.ptr = "A";
libgit_their.size = strlen("A");
int exitCode = git_merge_file (&out, &libgit_base, &libgit_our, &libgit_their, 0);
printf("Code is %d\n", exitCode);
git_merge_file_result_free (&out);
git_libgit2_shutdown();
sleep (1);
return 0;
}
When I remove initialization and/or shutdown I sometimes got 0 still reachable memory on Ubuntu but segmentation faults on FreeBSD. Is it worth giving this a closer look or is such a difference in behavior normal when ignoring the that LibGit must be initialized?
In the screenshots of the BSD VM __pthread_once is visible as a source of problems. This and __pthread_once_slow seem to be involved in all the errors: The 192 bytes on Ubuntu in the beginning, the more advanced example at the bottom with BSD and Ubuntu and also my real application.
As far as I can see, there's nothing wrong with your code, or the Valgrind report by itself, as as you've pointed out:
"still reachable" means your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable. Don't use --show-reachable=yes if you don't want to see these reports.
Hence, it's likely the 192 bytes aren't really leaked, you've just managed to exit the program before the OS decided to grab back that block of memory — ie. it kept that block under the process's purview, as a optimisation for the next allocation to be made. In this case, the process just exited, so that memory will have to be reclaimed at process termination, and I think that's what "still reachable" means — memory that is fine, and will be reclaimed normally. Hopefully 😉.
The Valgrind errors on FreeBSD aren't allocation problems, but use of an uninitialized zone of memory. They don't look to be inside libgit2 but OpenSSL itself, while parsing certificates (?). You can find the underlying OpenSSL initialization starting from here.
Is it worth giving this a closer look or is such a difference in behavior normal when ignoring the that LibGit must be initialized?
I'm tempted to say no, and yes. The code is now prodding a memory location that contains random garbage instead of an stack-allocated pthread_something. Segfaults are bound to happen randomly.
HTH !

Valgrind not returning program's return value using lackey

So say I have a very simple C program like this:
int main(){
return 1;
}
I compile it into a.out. If I run
valgrind ./a.out
I can get a return value of 1. But if I run
valgrind --tool=lackey ./a.out
I get a return value of 0. So my question is, how can I get the return value of the program while using valgrind with lackey?
lackey outputs a (confusing/useless) 'valgrind exit code' which is
as far as I can see in the valgrind source always equal to 0.
Of all the valgrind tools, only lackey is using this useless code.
However, the 'real' exit status (i.e. seen by the shell) is by
default the exit status of your program:
$ valgrind --tool=lackey a.out
...
==7033== Exit code: 0
$ echo $?
1
For tools that are reporting errors (e.g. memcheck), you can change
the exit code of the program if the tool detected an error, using the
option:
--error-exitcode=<number> exit code to return if errors found [0=disable]

ifort -coarray=shared produces incorrect exit status

If I compile and run
! main.f90
print*, 1/0
end program
with ifort then I get a division by zero error with an exit status of 2 (echo $?), as expected. However, If I compile using ifort -coarray=shared then I still get the error but now the exit status is 0. The problem is that CTest is unable to catch the error. This is my CMakeLists.txt:
cmake_minimum_required(VERSION 2.8)
project(testing Fortran)
enable_testing()
add_executable(main EXCLUDE_FROM_ALL main.f90)
add_test(main main)
add_custom_target(check COMMAND ctest DEPENDS main)
If I run make check then the output is
100% tests passed, 0 tests failed out of 1
even though the test actually failed. If I remove -coarray=shared or use gfortran then I get the correct output
0% tests passed, 1 tests failed out of 1
How can I make CTest report that the test failed? Am I doing something wrong or is this a compiler bug?
I'm testing with ifort 14.0.1 with the example code
program test
implicit none
print *, 1/0
end program
I can't quite replicate your initial return value of 3. In my testing the result of echo $? is 71, which correlates to the message thrown from the runtime error (note: gfortran won't even compile the example):
forrtl: severe (71): integer divide by zero
I am however replicating your return code of 0 from the coarray version. What is happening here is a bit more complicated, as ifort implements coarrays through MPI calls and for -coarray=shared basically wraps your program in a wrapper so you do not have to call mpirun, mpiexec or have to run intel's mpd to handle MPI communications. It is clear that the coarray images (MPI ranks) are all returning with error code 3:
application called MPI_Abort(comm=0x84000000, 3) - process 4
and
exit status of rank 1: return code 3
is emitted by each MPI rank, but the executable itself always returns with exit code 0. Whether this is the intended behavior or a bug is not clear to me, but it seems likely that the wrapper code to launch the MPI processes doesn't look at the MPI return codes from each rank. As pointed out in the comments, how would we expect different return values from different mpi ranks to be handled? It doesn't seem you are doing anything wrong.
Contrast this with a normal MPI example:
program mpitest
use mpi
implicit none
integer :: rank, msize, merror, mstatus(MPI_STATUS_SIZE)
call MPI_INIT(merror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, msize, merror)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, merror)
print *, 1/0
call MPI_FINALIZE(merror)
end program
compiled with
mpiifort -o testmpi testmpi.f90
and run as (with mpd running):
mpirun -np 4 ./testmpi
This produced a severe runtime error 71 for integer divide by zero for each rank as before, but the error code is propagated back to the shell:
$ echo $?
71

What could possibly lead to the following assembly execution result

I was debugging under IBM AIX with dbx. I was seeing the following:
(dbx) print $r4
0x00000001614aa050
(dbx) print *((int64*)0x00000001614aa050)
-1
(dbx) print $r3
0x0000000165e08468
Then I "stepi" my 64bit program which executed the following instruction:
std r3,0x0(r4)
I then immediately checked the content of that memory:
(dbx) print *((int64*)0x00000001614aa050)
-1
Still -1? I was expecting the content in $r3 should be saved to that
memory. I then manually assigned the value to that address using my
variables:
(dbx) print &bmc._pLong
0x00000001614aa050
(dbx) assign bmc._pLong=(int64 *)0x0000000165e08468
(dbx) print *((int64*)0x00000001614aa050)
6004180072 (which is 0x0000000165e08468)
How could that happen?
I assume, somehow, this is "pilot" error. e.g. you did stepi and then the std instruction was displayed? which means that that is the instruction it is about to execute -- not the instruction it executed -- at least I think that's right.
I would do some stepi's before and after and make sure I am understanding what stepi is doing. And, of course, print out the iar, and the instruction at the iar, to verify that dbx is not fibbing on you.