Valgrind and where new/delete/malloc/free are called from? - valgrind

I'm running valgrind-3.15.0 on an embedded ARM platform with switches:
--5484-- --tool=memcheck
--5484-- --track-origins=yes
--5484-- --leak-check=full
--5484-- --show-leak-kinds=all
--5484-- --trace-children=yes
--5484-- --sigill-diagnostics=no
--5484-- --keep-debuginfo=yes
--5484-- --num-callers=500
--5484-- --verbose
--5484-- --verbose
--5484-- --demangle=yes
Everyone agrees that debugging information needs to be built into the binary, and I've seen people say use -g, -ggdb, or -ggdb3. I've tried all 3 with no change in output with regards to the problem. I am compiling with optimization O1 which I've read should be OK.
I wrote a contrived program to allocate memory, free it, and then write past the end of free'd memory to see what I'd get:
==5484== Invalid write of size 1
==5484== at 0x5EBF6: SpecificObject::Process() (SpecificObject.cpp:98)
==5484== by 0x10BA1: SSystem::RunSingleSol(unsigned int) (SSystem.cpp:4162)
==5484== by 0x1162F: SSystem::RunSolution(unsigned int) (SSystem.cpp:3772)
==5484== by 0x159DB: SSystem::RunMain() (SSystem.cpp:2765)
==5484== by 0x1E527: SSystem::Enter() (SSystem.cpp:2557)
==5484== by 0x485F43D: MTask::MTaskEnter(void*) (MTask.cpp:183)
==5484== by 0x4886E35: ThreadCaller(void*) (WinAbstract.cpp:1562)
==5484== by 0x4EF0F0F: start_thread (pthread_create.c:458)
==5484== by 0x4C0AF57: ??? (clone.S:86)
==5484== Address 0x64b9752 is 1 bytes after a block of size 25 free'd
==5484== at 0x4836D18: operator delete[](void*) (in /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)
==5484== Block was alloc'd at
==5484== at 0x483584C: operator new[](unsigned int) (in /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)
In SpecificObject::Process, I have this test code in a switch/case statement (the only other code in this case is a break):
char *Crash = new char[25];
for(i = 0 ; i < 25 ; i++)
Crash[i] = 0xFF;
delete []Crash;
// use after free.
Crash[26] = 10;
Line 98 is the "Crash[26]=10" which makes sense since that is the overwrite and the allocation via new is only a few lines above it.
However no matter what I do, I cannot get new nor delete to show me anything except what's above, I see other people have tracebacks from where exactly new/delete/malloc/free was called from. For example, from some other random example on Stack Overflow, someone said they see:
==9700== Uninitialised value was created by a heap allocation
==9700== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9700== by 0x403D6F: get_all_system_info (kernel.c:118)
==9700== by 0x547DE99: start_thread (pthread_create.c:308)
==9700== by 0x57873FC: clone (clone.S:112)
As you can see, it shows that malloc was called by get_all_system_info(), etc... (again that's just a random example of what I'm trying to see)
How do I obtain this information?

Related

valgrind thinks calloc allocated memory is ununitialized

From the linux manual page on calloc, we learn that:
"The calloc() function allocates memory for an array of nmemb elements of size bytes each and returns a pointer to the allocated memory. The memory is set to zero."
When it is set to zero, it means it is initialized.
Yet, valgrind will report this...
Syscall param writev(vector[...]) points to uninitialised byte(s)
...
Address 0x28805be0 is 32 bytes inside a block of size 16,384 alloc'd
at 0x4849A83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
...on memory that was allocated as calloc(1,16384)
How can calloc-allocated memory ever be considered as uninitialized by Valgrind?
OS: Ubuntu 22.10
Kernel: 5.19.0
Valgrind: 3.18.1
UPDATE: I tried valgrind 3.20 as well: same behaviour.
This happens because after the initial clearing with zeros, it is overwritten with other data that was uninitialized.
To see what other data got put in there, you can use the valgrind flag --track-origins=yes

Valgrind reports "invalid write" at "X bytes below stack pointer"

I'm running some code under Valgrind, compiled with gcc 7.5 targeting an aarch64 (ARM 64 bits) architecture, with optimizations enabled.
I get the following error:
==3580== Invalid write of size 8
==3580== at 0x38865C: ??? (in ...)
==3580== Address 0x1ffeffdb70 is on thread 1's stack
==3580== 16 bytes below stack pointer
This is the assembly dump in the vicinity of the offending code:
388640: a9bd7bfd stp x29, x30, [sp, #-48]!
388644: f9000bfc str x28, [sp, #16]
388648: a9024ff4 stp x20, x19, [sp, #32]
38864c: 910003fd mov x29, sp
388650: d1400bff sub sp, sp, #0x2, lsl #12
388654: 90fff3f4 adrp x20, 204000 <_IO_stdin_used-0x4f0>
388658: 3dc2a280 ldr q0, [x20, #2688]
38865c: 3c9f0fe0 str q0, [sp, #-16]!
I'm trying to ascertain whether this is a possible bug in my code (note that I've thoroughly reviewed my code and I'm fairly confident it's correct), or whether Valgrind will blindly report any writes below the stack pointer as an error.
Assuming the latter, it looks like a Valgrind bug since the offending instruction at 0x38865c uses the pre-decrement addressing mode, so it's not actually writing below the stack pointer.
Furthermore, at address 0x388640 a similar access (and again with pre-decrement addressing mode) is performed, yet this isn't reported by Valgrind; the main difference being the use of an x register at address 0x388640 versus a q register at address 38865c.
I'd also like to draw attention to the large stack pointer subtraction at 0x388650, which may or may not have anything to do with the issue (note this subtraction makes sense, given that the offending C code declares a large array on the stack).
So, will anyone help me make sense of this, and whether I should worry about my code?
The code looks fine, and the write is certainly not below the stack pointer. The message seems to be a valgrind bug, possibly #432552, which is marked as fixed. OP confirms that the message is not produced after upgrading valgrind to 3.17.0.
code declares a large array on the stack
should [I] worry about my code?
I think it depends upon your desire for your code to be more portable.
Take this bit of code that I believe represents at least one important thing you mentioned in your post:
#include <stdio.h>
#include <stdlib.h>
long long foo (long long sz, long long v) {
long long arr[sz]; // allocating a variable on the stack
arr[sz-1] = v;
return arr[sz-1];
}
int main (int argc, char *argv[]) {
long long n = atoll(argv[1]);
long long v = foo(n, n);
printf("v = %lld\n", v);
}
$ uname -mprsv
Darwin 20.5.0 Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64 x86_64 i386
$ gcc test.c
$ a.out 1047934
v = 1047934
$ a.out 1047935
Segmentation fault: 11
$ uname -snrvmp
Linux localhost.localdomain 3.19.8-100.fc20.x86_64 #1 SMP Tue May 12 17:08:50 UTC 2015 x86_64 x86_64
$ gcc test.c
$ ./a.out 2147483647
v = 2147483647
$ ./a.out 2147483648
v = 2147483648
There are at least some minor portability concerns with this code. The amount of allocatable stack memory for these two environments differs significantly. And that's only for two platforms. Haven't tried it on my Windows 10 vm but I don't think I need to because I got bit by this one a long time ago.
Beyond OP issue that was due to a Valgrind bug, the title of this question is bound to attract more people (like me) who are getting "invalid write at X bytes below stack pointer" as a legitimate error.
My piece of advice: check that the address you're writing to is not a local variable of another function (not present in the call stack)!
I stumbled upon this issue while attempting to write into the address returned by yyget_lloc(yyscanner) while outside of function yyparse (the former returns the address of a local variable in the latter).

Memory problems with LibGit2 initialization

When I initialize and shutdown LibGit2 I am left with reachable memory and/or errors.
My test systems are Ubuntu 18.04 with libgit2 0.26 where g++ -v gives me gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1) and a FreeBSD 11.3 VM with libgit 0.28.3 where, unfortunately, I can't copy & paste from. Here g++ -v gives gcc version 9.2.0 (FreeBSD Ports Collection.
This is a minimal example:
#include <git2.h>
int main () {
git_libgit2_init();
git_libgit2_shutdown();
return 0;
}
On Ubuntu I run the following:
➜ libelektra git:(libgit_test) ✗ g++ minimal.c -lgit2 && valgrind ./a.out
==1174== Memcheck, a memory error detector
==1174== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1174== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==1174== Command: ./a.out
==1174==
==1174==
==1174== HEAP SUMMARY:
==1174== in use at exit: 192 bytes in 12 blocks
==1174== total heap usage: 1,354 allocs, 1,342 frees, 107,044 bytes allocated
==1174==
==1174== LEAK SUMMARY:
==1174== definitely lost: 0 bytes in 0 blocks
==1174== indirectly lost: 0 bytes in 0 blocks
==1174== possibly lost: 0 bytes in 0 blocks
==1174== still reachable: 192 bytes in 12 blocks
==1174== suppressed: 0 bytes in 0 blocks
==1174== Rerun with --leak-check=full to see details of leaked memory
==1174==
==1174== For counts of detected and suppressed errors, rerun with: -v
==1174== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Why do I have reachable memory, when the very first example from the documentation says that git_libgit2_shutdown(); should clean everything up?
While the Valgrind documentation says that some reachable memory might be ok, things get quite wild on FreeBSD. I have some screenshots of the VM
One Two Three.
How can I avoid this?
One additional remark on different memory handling. My goal is to use the git_merge_file function in this project. It should look something like this:
#include <git2.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
int main () {
git_libgit2_init();
sleep (1);
git_merge_file_result out = { 0 }; // out.ptr will not receive a terminating null character
git_merge_file_input libgit_base;
git_merge_file_input libgit_our;
git_merge_file_input libgit_their;
git_merge_file_init_input(&libgit_base, GIT_MERGE_FILE_INPUT_VERSION);
git_merge_file_init_input(&libgit_our, GIT_MERGE_FILE_INPUT_VERSION);
git_merge_file_init_input(&libgit_their, GIT_MERGE_FILE_INPUT_VERSION);
libgit_base.ptr = "A";
libgit_base.size = strlen("A");
libgit_our.ptr = "A";
libgit_our.size = strlen("A");
libgit_their.ptr = "A";
libgit_their.size = strlen("A");
int exitCode = git_merge_file (&out, &libgit_base, &libgit_our, &libgit_their, 0);
printf("Code is %d\n", exitCode);
git_merge_file_result_free (&out);
git_libgit2_shutdown();
sleep (1);
return 0;
}
When I remove initialization and/or shutdown I sometimes got 0 still reachable memory on Ubuntu but segmentation faults on FreeBSD. Is it worth giving this a closer look or is such a difference in behavior normal when ignoring the that LibGit must be initialized?
In the screenshots of the BSD VM __pthread_once is visible as a source of problems. This and __pthread_once_slow seem to be involved in all the errors: The 192 bytes on Ubuntu in the beginning, the more advanced example at the bottom with BSD and Ubuntu and also my real application.
As far as I can see, there's nothing wrong with your code, or the Valgrind report by itself, as as you've pointed out:
"still reachable" means your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable. Don't use --show-reachable=yes if you don't want to see these reports.
Hence, it's likely the 192 bytes aren't really leaked, you've just managed to exit the program before the OS decided to grab back that block of memory — ie. it kept that block under the process's purview, as a optimisation for the next allocation to be made. In this case, the process just exited, so that memory will have to be reclaimed at process termination, and I think that's what "still reachable" means — memory that is fine, and will be reclaimed normally. Hopefully 😉.
The Valgrind errors on FreeBSD aren't allocation problems, but use of an uninitialized zone of memory. They don't look to be inside libgit2 but OpenSSL itself, while parsing certificates (?). You can find the underlying OpenSSL initialization starting from here.
Is it worth giving this a closer look or is such a difference in behavior normal when ignoring the that LibGit must be initialized?
I'm tempted to say no, and yes. The code is now prodding a memory location that contains random garbage instead of an stack-allocated pthread_something. Segfaults are bound to happen randomly.
HTH !

Valgrind suppression and return code

It looks like valgrind returns non-zero return code when it detects memory-leak even though they are listed in the suppression file.
No-errors are displayed but yet the return code is 134. This fails all my builds in jenkins... Is there a way around this or am I doing something wrong?
You are very probably doing something wrong (or maybe using a buggy old version of valgrind, the below is with the just released 3.12) :
valgrind --leak-check=full --errors-for-leak-kinds=all --error-exitcode=33
--suppressions=t.supp ./memcheck/tests/trivialleak
...
==22750== suppressed: 1,000 bytes in 1,000 blocks
...
echo $?
0
while without suppression file:
valgrind --leak-check=full --errors-for-leak-kinds=all --error-exitcode=33
./memcheck/tests/trivialleak
...
==22760== 1,000 bytes in 1,000 blocks are definitely lost in loss record 1 of 1
...
echo $?
33

Valgrind: can possibly lost be treated as definitely lost?

Can I treat the output of a Valgrind memcheck, "possibly lost" as "definitely lost"?
Possibly lost, or “dubious”: A pointer to the interior of the block is found. The pointer might originally have pointed to the start and
have been moved along, or it might be entirely unrelated. Memcheck
deems such a block as “dubious”, because it's unclear whether or not a
pointer to it still exists.
Definitely lost, or “leaked”: The worst outcome is that no pointer to the block can be found. The block is classified as “leaked”,
because the programmer could not possibly have freed it at program
exit, since no pointer to it exists. This is likely a symptom of
having lost the pointer at some earlier point in the program
Yes, I recommend to treat possibly lost as severe as definitely lost. In other words, fix your code until there are no losts at all.
Possibly lost can happen when you traverse an array using the same pointer that is holding it. You know that you can reset the pointer by subtracting the index. But valgrind can't tell whether it is a programming error or you are being clever doing this deliberately. That is why it warns you.
Example
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv) {
char* s = "string";
// this will allocate a new array
char* p = strdup(s);
// move the pointer into the array
// we know we can reset the pointer by subtracting
// but for valgrind the array is now lost
p += 1;
// crash the program
abort();
// reset the pointer to the beginning of the array
p -= 1;
// properly free the memory for the array
free(p);
return 0;
}
Compile
$ gcc -ggdb foo.c -o foo
Valgrind report
$ valgrind ./foo
...
==31539== Process terminating with default action of signal 6 (SIGABRT): dumping core
==31539== at 0x48BBD7F: raise (in /usr/lib/libc-2.28.so)
==31539== by 0x48A6671: abort (in /usr/lib/libc-2.28.so)
==31539== by 0x10917C: main (foo.c:14)
==31539==
==31539== HEAP SUMMARY:
==31539== in use at exit: 7 bytes in 1 blocks
==31539== total heap usage: 1 allocs, 0 frees, 7 bytes allocated
==31539==
==31539== LEAK SUMMARY:
==31539== definitely lost: 0 bytes in 0 blocks
==31539== indirectly lost: 0 bytes in 0 blocks
==31539== possibly lost: 7 bytes in 1 blocks
==31539== still reachable: 0 bytes in 0 blocks
==31539== suppressed: 0 bytes in 0 blocks
...
If you remove abort() then Valgrind will report no memory lost at all. Without abort, the pointer will return to the beginning of the array and the memory will be freed properly.
This is a trivial example. In sufficiently complicated code it is no longer obvious that the pointer can and will return to the beginning of the memory block. Changes in other part of the code can cause the possibly lost to be a definitely lost. That is why you should care about possibly lost.