making valgrind abort on error for heap corruption checking? - valgrind

I'd like to try using valgrind to do some heap corruption detection. With the following corruption "unit test":
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main()
{
char * c = (char *) malloc(10) ;
memset( c, 0xAB, 20 ) ;
printf("not aborted\n") ;
return 0 ;
}
I was suprised to find that valgrind doesn't abort on error, but just produces a message:
valgrind -q --leak-check=no a.out
==11097== Invalid write of size 4
==11097== at 0x40061F: main (in /home/hotellnx94/peeterj/tmp/a.out)
==11097== Address 0x51c6048 is 8 bytes inside a block of size 10 alloc'd
==11097== at 0x4A2058F: malloc (vg_replace_malloc.c:236)
==11097== by 0x400609: main (in /home/hotellnx94/peeterj/tmp/a.out)
...
not aborted
I don't see a valgrind option to abort on error (like gnu-libc's mcheck does, but I can't use mcheck because it isn't thread safe). Does anybody know if that is possible (our code dup2's stdout to /dev/null since it runs as a daemon, so a report isn't useful and I'd rather catch the culprit in the act or closer to it).

There is no such option in valgrind.
Consider adding a non-daemon mode (debug mode) into your daemon.
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs 4.6 explains some requests from debugged program to valgrind+memcheck, so you can use some of this in your daemon to do some checks at fixed code positions.

Related

DLL silently ignored even though the library links

I'm using a third part DLL that I've used successfully for ages. Now the linker links the dll lib without complaint but the exe doesn't load the dll.
I recently upgraded from the 32 bit to 64 bit cygwin.
I'm doing a mingw cross compile to 32 bits.
I'm trying to use the FTDI USB interface FTD2XX dll.
I have the version 2.04.06 FTD2XX lib, .h, and dll.
I had been using that dll successfully for ages but with older versions of cygwin and mingw.
Recently upgraded to cygwin64.
The app appears to link with the FTD2XX.lib without complaint.
But when I run the app it doesn't seem to look for or load the FTD2XX.dll.
The app runs but crashes as soon as it tries to call something in FTD2XX dll.
I created a simple hello_dll.dll for side by side test. That works.
The app.c does calls on both hello_dll.dll and ftd2xx.dll.
Is starts without complain, successfully calls function in hello_dll, and then it crashes on a call to ft2xx.dll.
(I renamed the lib to ftd2xx_2.04.06 to distinguish them from other versions I have. Newer versions don't work any better.)
Link with -verbose gives:
i686-w64-mingw32-gcc -Wall -m32 -g -O2 -c -I . -o app.o app.c
i686-w64-mingw32-gcc -Wall -m32 -o app.exe app.o -Wl,-verbose -L. -lhello_dll -lftd2xx_2.04.06
GNU ld (GNU Binutils) 2.34.50.20200227
Supported emulations:
i386pe
using internal linker script:
<snip>
/usr/lib/gcc/i686-w64-mingw32/9.2.0/../../../../i686-w64-mingw32/bin/ld: mode i386pe
attempt to open /usr/i686-w64-mingw32/sys-root/mingw/lib/../lib/crt2.o succeeded
/usr/i686-w64-mingw32/sys-root/mingw/lib/../lib/crt2.o
attempt to open /usr/lib/gcc/i686-w64-mingw32/9.2.0/crtbegin.o succeeded
/usr/lib/gcc/i686-w64-mingw32/9.2.0/crtbegin.o
attempt to open app.o succeeded
app.o
<snip>
attempt to open ./hello_dll.lib succeeded
./hello_dll.lib
(./hello_dll.lib)d000001.o
(./hello_dll.lib)d000000.o
(./hello_dll.lib)d000002.o
<snip>
attempt to open ./ftd2xx_2.04.06.lib succeeded
./ftd2xx_2.04.06.lib
(./ftd2xx_2.04.06.lib)FTD2XX.dll
(./ftd2xx_2.04.06.lib)FTD2XX.dll
(./ftd2xx_2.04.06.lib)FTD2XX.dll
(./ftd2xx_2.04.06.lib)FTD2XX.dll
::::::::::::::::::::::::::::
I obtained a 32 bit compatible version of gdb. When I run gdb:
GNU gdb (GDB) 7.7.50.20140303-cvs
<snip>
This GDB was configured as "i686-pc-mingw32".
<snip>
(gdb) break main
(gdb) Breakpoint 1 at 0x40267b: file app.c, line 28.
(gdb) run
(gdb) Starting program: C:\_d\aaa\pd\src\dll\pathological\app.exe
[New Thread 1428.0x2528]
Breakpoint 1, main (argc=1, argv=0x9b2f70) at app.c:28
28 dostuff();
(gdb) info share
(gdb) From To Syms Read Shared Object Library
0x774e0000 0x77644ccc Yes (*) C:\Windows\SysWOW64\ntdll.dll
0x753d0000 0x754cadec Yes (*) C:\Windows\syswow64\kernel32.dll
0x75ea1000 0x75ee6a3a Yes (*) C:\Windows\syswow64\KernelBase.dll
0x64081000 0x6408a1d8 Yes C:\_d\aaa\pd\src\dll\pathological\hello_dll.dll
0x75041000 0x750eb2c4 Yes (*) C:\Windows\syswow64\msvcrt.dll
(*): Shared library is missing debugging information.
(gdb) A debugging session is active.
(gdb) c
Continuing.
Hello dll. <--- The function in hello_dll.dll prints this.
Program received signal SIGSEGV, Segmentation fault.
0x8000004c in ?? () <----- call to FT_GetLibraryVersion()
(gdb) bt
#0 0x8000004c in ?? ()
#1 0x0040158e in dostuff () at app.c:49
#2 0x00402680 in main (argc=1, argv=0x8e2f70) at app.c:28
(gdb)
It links with the lib without complaint but when I run the exe it (silently) doesn't load the dll.
Anybody have any ideas? Is there some linker control that I am missing? Are there other diagnostic or debug tools to dig into this further?
:::::::::::::::::::::::
edit 7/11/20
I'll post some code. (If I know how. I'm new here.)
It should be shown in the "info share", but it isn't, as you can see above.
I'm suspecting name decoration. Objdump -x of the .exe shows an entry for FTD2XX.dll in the Import Tables. But it doesn't show any vma or bound name under it. I suspect that at program load the loader sees no vma/name and decides it doesn't really need to load the dll.
There is an import table in .idata at 0x406000
<snip>
The Import Tables (interpreted .idata section contents)
vma: Hint Time Forward DLL First
Table Stamp Chain Name Thunk
00006000 0000607c 00000000 00000000 00006218 0000614c
DLL Name: FTD2XX.dll
vma: Hint/Ord Member-Name Bound-To
<----- empty?
00006014 00006080 00000000 00000000 000064f8 00006150
DLL Name: hello_dll.dll
vma: Hint/Ord Member-Name Bound-To
6224 1 hello_dll
00006028 00006088 00000000 00000000 00006554 00006158
DLL Name: KERNEL32.dll
vma: Hint/Ord Member-Name Bound-To
6230 277 DeleteCriticalSection
6248 310 EnterCriticalSection
<snip>
:::::::::::::::::::::::::::::::::::::::::::::
edit 2, 7/11/20
This is the program that calls functions in the DLLs.
/* app.c
Demonstrates using the function imported from the DLL.
*/
// 200708 pathological case. Based on the simple hello_dll.
//#include <stdlib.h>
// for sleep
#include <unistd.h>
#include <stdio.h>
// for dword
#include <windef.h>
// for lpoverlapped
#include <minwinbase.h>
#include "hello_dll.h"
// My legacy app, and really all others too, use 2.04.06.h
#include "ftd2xx_2.04.06.h"
//#include "ftd2xx_2.02.04.h"
///////////////////////////
void dostuff( void );
void call_ft_listdevices( void );
///////////////////////////
int main(int argc, char** argv)
{
FT_STATUS status;
DWORD libver;
//dostuff();
printf( "Calling hello_dll():\n" );
fflush( stdout );
hello_dll();
fflush( stdout );
printf( "Back from hello_dll()\n" );
fflush( stdout );
sleep( 1 );
printf( "Calling FT_GetLibraryVersion().\n" );
fflush( stdout );
status = FT_GetLibraryVersion( &libver );
if( status == FT_OK ){
printf( "FTD2XX library version 0x%lx\n", libver );
fflush( stdout );
}
else{
printf( "Error reading FTD2XX library version.\n" );
fflush( stdout );
}
// 200710 Adding call to different ft function did
// not result in entries in the import table.
//call_ft_listdevices( );
return 0;
}
I don't think there is a need to include the code for my hello_dll. It works.
I have three versions of the FTD2XX. I'm pretty careful about tracking versions. Plus, when one is beating one's head against the wall, double checking the versions appeals early on as a way to end the pain.
I found a surprise copy of FTD2XX.dll. It's in c:/Windows/SysWOW64. It is the oldest of the three versions I have. Versions of my app that were compiled before this problem started run correctly using that dll in that place.
Solved.
There's a bug in the 2.34.50.20200227 i686-w64-mingw32-ld.exe. It won't work with ftd2xx.lib, regardless of ftd2xx version as far as I can tell.
2.25.51.20150320 and 2.29.1.20171006 work with ftd2xx.lib. I've reverted back to 2.29 mingw64-i686-binutils. I'm running again.

Memory problems with LibGit2 initialization

When I initialize and shutdown LibGit2 I am left with reachable memory and/or errors.
My test systems are Ubuntu 18.04 with libgit2 0.26 where g++ -v gives me gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1) and a FreeBSD 11.3 VM with libgit 0.28.3 where, unfortunately, I can't copy & paste from. Here g++ -v gives gcc version 9.2.0 (FreeBSD Ports Collection.
This is a minimal example:
#include <git2.h>
int main () {
git_libgit2_init();
git_libgit2_shutdown();
return 0;
}
On Ubuntu I run the following:
➜ libelektra git:(libgit_test) ✗ g++ minimal.c -lgit2 && valgrind ./a.out
==1174== Memcheck, a memory error detector
==1174== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1174== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==1174== Command: ./a.out
==1174==
==1174==
==1174== HEAP SUMMARY:
==1174== in use at exit: 192 bytes in 12 blocks
==1174== total heap usage: 1,354 allocs, 1,342 frees, 107,044 bytes allocated
==1174==
==1174== LEAK SUMMARY:
==1174== definitely lost: 0 bytes in 0 blocks
==1174== indirectly lost: 0 bytes in 0 blocks
==1174== possibly lost: 0 bytes in 0 blocks
==1174== still reachable: 192 bytes in 12 blocks
==1174== suppressed: 0 bytes in 0 blocks
==1174== Rerun with --leak-check=full to see details of leaked memory
==1174==
==1174== For counts of detected and suppressed errors, rerun with: -v
==1174== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Why do I have reachable memory, when the very first example from the documentation says that git_libgit2_shutdown(); should clean everything up?
While the Valgrind documentation says that some reachable memory might be ok, things get quite wild on FreeBSD. I have some screenshots of the VM
One Two Three.
How can I avoid this?
One additional remark on different memory handling. My goal is to use the git_merge_file function in this project. It should look something like this:
#include <git2.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
int main () {
git_libgit2_init();
sleep (1);
git_merge_file_result out = { 0 }; // out.ptr will not receive a terminating null character
git_merge_file_input libgit_base;
git_merge_file_input libgit_our;
git_merge_file_input libgit_their;
git_merge_file_init_input(&libgit_base, GIT_MERGE_FILE_INPUT_VERSION);
git_merge_file_init_input(&libgit_our, GIT_MERGE_FILE_INPUT_VERSION);
git_merge_file_init_input(&libgit_their, GIT_MERGE_FILE_INPUT_VERSION);
libgit_base.ptr = "A";
libgit_base.size = strlen("A");
libgit_our.ptr = "A";
libgit_our.size = strlen("A");
libgit_their.ptr = "A";
libgit_their.size = strlen("A");
int exitCode = git_merge_file (&out, &libgit_base, &libgit_our, &libgit_their, 0);
printf("Code is %d\n", exitCode);
git_merge_file_result_free (&out);
git_libgit2_shutdown();
sleep (1);
return 0;
}
When I remove initialization and/or shutdown I sometimes got 0 still reachable memory on Ubuntu but segmentation faults on FreeBSD. Is it worth giving this a closer look or is such a difference in behavior normal when ignoring the that LibGit must be initialized?
In the screenshots of the BSD VM __pthread_once is visible as a source of problems. This and __pthread_once_slow seem to be involved in all the errors: The 192 bytes on Ubuntu in the beginning, the more advanced example at the bottom with BSD and Ubuntu and also my real application.
As far as I can see, there's nothing wrong with your code, or the Valgrind report by itself, as as you've pointed out:
"still reachable" means your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable. Don't use --show-reachable=yes if you don't want to see these reports.
Hence, it's likely the 192 bytes aren't really leaked, you've just managed to exit the program before the OS decided to grab back that block of memory — ie. it kept that block under the process's purview, as a optimisation for the next allocation to be made. In this case, the process just exited, so that memory will have to be reclaimed at process termination, and I think that's what "still reachable" means — memory that is fine, and will be reclaimed normally. Hopefully 😉.
The Valgrind errors on FreeBSD aren't allocation problems, but use of an uninitialized zone of memory. They don't look to be inside libgit2 but OpenSSL itself, while parsing certificates (?). You can find the underlying OpenSSL initialization starting from here.
Is it worth giving this a closer look or is such a difference in behavior normal when ignoring the that LibGit must be initialized?
I'm tempted to say no, and yes. The code is now prodding a memory location that contains random garbage instead of an stack-allocated pthread_something. Segfaults are bound to happen randomly.
HTH !

Why is valgrind complaining about the perfectly fine initialized buffer?

This is the test code "valgrind.c". It initializes an on stack buffer, then does a simple string compare over it.
#include <stdlib.h>
#include <string.h>
int main( void)
{
char buf[ 6];
memset( buf, 'X', sizeof( buf));
if( strncmp( buf, "XXXX", 4))
abort();
return( 0);
}
I compile this with cc -O0 -g valgrind.c -o valgrind.
Running on its own, it does fine.
When I run it through valgrind --track-origins=yes ./valgrind though this gives me:
==28182== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==28182== Conditional jump or move depends on uninitialised value(s)
==28182== at 0x4E058CC: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
==28182== by 0x4CAA09A: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
==28182== Uninitialised value was created by a stack allocation
==28182== at 0x4CA9FBD: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
That really makes no sense to me. I am running this on Ubuntu 18.10.
The answer was that the valgrind libraries were buggy. After a complete dist-upgrade, things work now as expected. The version number of valgrind and the executable remain the same though (my current dpkg number is now 1:3.13.0-2ubuntu6, I forgot to jot down the old one, sorry).
These were the strace opened libraries with their shasums. Thre is actually a difference in libraries opened and you can see that the libc and the actual test and valgrind executable are unchanged in both scenarios:
Broken:
41bd206c714bcd2be561b477d756a4104dddd2d3578040cca30ff06d19730d61 /etc/ld.so.cache
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib64/ld-linux-x86-64.so.2
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib/x86_64-linux-gnu/ld-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc.so.6
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a /tmp/valgrind
4652774bd116cb49951ef74115ad4237cad5021b2bd4d80002f09d986ec438b9 /usr/bin/valgrind
0369719ef5fe66d467a385299396bab0937002694ffc78027ede22c09d39abf3 /usr/lib/valgrind/default.supp
16b5f1e6ae25663620edb8f8d4a7f1a392e059d6cf9eb20a270129295548ffb2 /usr/lib/valgrind/memcheck-amd64-linux
6335747b07b2e8a6150fbfa777ade9bd80d56626bba9772d61c7d33328e68bda /usr/lib/valgrind/vgpreload_core-amd64-linux.so
827b4c18aefad7788b6e654b1519d3caa1ab223cf7a6ba58d22d7ad7d383b032 /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a ./valgrind
Healthy:
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib64/ld-linux-x86-64.so.2
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib/x86_64-linux-gnu/ld-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc.so.6
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a /tmp/valgrind
4652774bd116cb49951ef74115ad4237cad5021b2bd4d80002f09d986ec438b9 /usr/bin/valgrind
391826262f9dc33565a8ac0b762ba860951267e73b0b4db7d02d1fd62782f8c8 /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.28.so
3ab1f160af6c3198de45f286dd569fad7ae976a89ff1655e955ef0544b8b5d6c /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.28.so
ae4ea44f87787b9b80d19a69ad287195dc7840eea08c08732d36d2ef1e6ecff3 /usr/lib/valgrind/default.supp
ba18f39979d22efc89340b839257f953a505ef5ca774b5bf06edd78ecb6ed86e /usr/lib/valgrind/memcheck-amd64-linux
1649637bba73e84b962222f3756cc810c5413239ed180e0029cd98f069612613 /usr/lib/valgrind/vgpreload_core-amd64-linux.so
ab1501fa569e0185dea7248648255276ca965bbe270803dcbb930a22ea7a59b7 /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a ./valgrind
Thanks for the helpful comments, especially from Florian, which put me on the right track.

assertion from valgrind mc_main.c

My valgrind runs reports errors like this
Memcheck: mc_main.c:8292 (mc_pre_clo_init): Assertion 'MAX_PRIMARY_ADDRESS == 0x1FFFFFFFFFULL' failed.
What does this mean? Is it a valgrind internal error or an error from my program?
This is a valgrind internal error. This is very weird, as this failed
assertion is a self check done very early on.
You should file a bug on valgrind bugzilla, reporting all the needed
details (version, platform, ...)
From the Valgrind source code (from git HEAD)
/* Only change this. N_PRIMARY_MAP *must* be a power of 2. */
#if VG_WORDSIZE == 4
/* cover the entire address space */
# define N_PRIMARY_BITS 16
#else
/* Just handle the first 128G fast and the rest via auxiliary
primaries. If you change this, Memcheck will assert at startup.
See the definition of UNALIGNED_OR_HIGH for extensive comments. */
# define N_PRIMARY_BITS 21
#endif
/* Do not change this. */
#define N_PRIMARY_MAP ( ((UWord)1) << N_PRIMARY_BITS)
/* Do not change this. */
#define MAX_PRIMARY_ADDRESS (Addr)((((Addr)65536) * N_PRIMARY_MAP)-1)
...
tl_assert(MAX_PRIMARY_ADDRESS == 0x1FFFFFFFFFULL);
So it looks like something has been changed that shouldn't have been.

Valgrind: can possibly lost be treated as definitely lost?

Can I treat the output of a Valgrind memcheck, "possibly lost" as "definitely lost"?
Possibly lost, or “dubious”: A pointer to the interior of the block is found. The pointer might originally have pointed to the start and
have been moved along, or it might be entirely unrelated. Memcheck
deems such a block as “dubious”, because it's unclear whether or not a
pointer to it still exists.
Definitely lost, or “leaked”: The worst outcome is that no pointer to the block can be found. The block is classified as “leaked”,
because the programmer could not possibly have freed it at program
exit, since no pointer to it exists. This is likely a symptom of
having lost the pointer at some earlier point in the program
Yes, I recommend to treat possibly lost as severe as definitely lost. In other words, fix your code until there are no losts at all.
Possibly lost can happen when you traverse an array using the same pointer that is holding it. You know that you can reset the pointer by subtracting the index. But valgrind can't tell whether it is a programming error or you are being clever doing this deliberately. That is why it warns you.
Example
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv) {
char* s = "string";
// this will allocate a new array
char* p = strdup(s);
// move the pointer into the array
// we know we can reset the pointer by subtracting
// but for valgrind the array is now lost
p += 1;
// crash the program
abort();
// reset the pointer to the beginning of the array
p -= 1;
// properly free the memory for the array
free(p);
return 0;
}
Compile
$ gcc -ggdb foo.c -o foo
Valgrind report
$ valgrind ./foo
...
==31539== Process terminating with default action of signal 6 (SIGABRT): dumping core
==31539== at 0x48BBD7F: raise (in /usr/lib/libc-2.28.so)
==31539== by 0x48A6671: abort (in /usr/lib/libc-2.28.so)
==31539== by 0x10917C: main (foo.c:14)
==31539==
==31539== HEAP SUMMARY:
==31539== in use at exit: 7 bytes in 1 blocks
==31539== total heap usage: 1 allocs, 0 frees, 7 bytes allocated
==31539==
==31539== LEAK SUMMARY:
==31539== definitely lost: 0 bytes in 0 blocks
==31539== indirectly lost: 0 bytes in 0 blocks
==31539== possibly lost: 7 bytes in 1 blocks
==31539== still reachable: 0 bytes in 0 blocks
==31539== suppressed: 0 bytes in 0 blocks
...
If you remove abort() then Valgrind will report no memory lost at all. Without abort, the pointer will return to the beginning of the array and the memory will be freed properly.
This is a trivial example. In sufficiently complicated code it is no longer obvious that the pointer can and will return to the beginning of the memory block. Changes in other part of the code can cause the possibly lost to be a definitely lost. That is why you should care about possibly lost.