More Memcheck context lines - valgrind

At the end of the program run, "leak check full" "show leak kinds all" shows a number of leak incidents in the form of sections like this, just as an example:
[...]
==12522==
==12522== x bytes in 1 blocks are still reachable in loss record y of z
==12522== at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==12522== by 0x42350D: whatever_1 (whatever_1.c:3401)
==12522== by 0x4225B9: whatever_2 (whatever_2.c:2948)
==12522== by 0x41D93E: whatever_3 (whatever_3.c:1185)
==12522== by 0x423A07: whatever_4 (whatever_4.c:1370)
==12522== by 0x4294A4: whatever_5 (whatever_5.y:1413)
==12522== by 0x403998: whatever_6 (whatever_6.c:205)
==12522== by 0x4038AC: whatever_7 (whatever_7.c:180)
==12522== by 0x4036AF: whatever_8 (whatever_8.c:143)
==12522== by 0x413C7F: whatever_test_0 (whatever_9.c:53)
==12522== by 0x413F92: whatever_tests (whatever_10.c:177)
==12522== by 0x45AD69: run_tests (whatever_11.c:153)
==12522==
[...]
The topmost scope is main, and in multiple blocks, like this one, there is enough depth that the call stack doesn't show up to main. In other incidents, there's enough context to show every level.
I've seen that, in this host, only 12 levels are shown here (vg_replace_malloc.c then my 11 user levels).
Can this be configured to show more than 12 levels?
I've read Valgrind arguments, Memcheck arguments, have increased verbosity, with no success.

Use the option --num-callers (that has default value 12):
--num-callers=<number> show <number> callers in stack traces [12]

Related

Valgrind and where new/delete/malloc/free are called from?

I'm running valgrind-3.15.0 on an embedded ARM platform with switches:
--5484-- --tool=memcheck
--5484-- --track-origins=yes
--5484-- --leak-check=full
--5484-- --show-leak-kinds=all
--5484-- --trace-children=yes
--5484-- --sigill-diagnostics=no
--5484-- --keep-debuginfo=yes
--5484-- --num-callers=500
--5484-- --verbose
--5484-- --verbose
--5484-- --demangle=yes
Everyone agrees that debugging information needs to be built into the binary, and I've seen people say use -g, -ggdb, or -ggdb3. I've tried all 3 with no change in output with regards to the problem. I am compiling with optimization O1 which I've read should be OK.
I wrote a contrived program to allocate memory, free it, and then write past the end of free'd memory to see what I'd get:
==5484== Invalid write of size 1
==5484== at 0x5EBF6: SpecificObject::Process() (SpecificObject.cpp:98)
==5484== by 0x10BA1: SSystem::RunSingleSol(unsigned int) (SSystem.cpp:4162)
==5484== by 0x1162F: SSystem::RunSolution(unsigned int) (SSystem.cpp:3772)
==5484== by 0x159DB: SSystem::RunMain() (SSystem.cpp:2765)
==5484== by 0x1E527: SSystem::Enter() (SSystem.cpp:2557)
==5484== by 0x485F43D: MTask::MTaskEnter(void*) (MTask.cpp:183)
==5484== by 0x4886E35: ThreadCaller(void*) (WinAbstract.cpp:1562)
==5484== by 0x4EF0F0F: start_thread (pthread_create.c:458)
==5484== by 0x4C0AF57: ??? (clone.S:86)
==5484== Address 0x64b9752 is 1 bytes after a block of size 25 free'd
==5484== at 0x4836D18: operator delete[](void*) (in /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)
==5484== Block was alloc'd at
==5484== at 0x483584C: operator new[](unsigned int) (in /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)
In SpecificObject::Process, I have this test code in a switch/case statement (the only other code in this case is a break):
char *Crash = new char[25];
for(i = 0 ; i < 25 ; i++)
Crash[i] = 0xFF;
delete []Crash;
// use after free.
Crash[26] = 10;
Line 98 is the "Crash[26]=10" which makes sense since that is the overwrite and the allocation via new is only a few lines above it.
However no matter what I do, I cannot get new nor delete to show me anything except what's above, I see other people have tracebacks from where exactly new/delete/malloc/free was called from. For example, from some other random example on Stack Overflow, someone said they see:
==9700== Uninitialised value was created by a heap allocation
==9700== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9700== by 0x403D6F: get_all_system_info (kernel.c:118)
==9700== by 0x547DE99: start_thread (pthread_create.c:308)
==9700== by 0x57873FC: clone (clone.S:112)
As you can see, it shows that malloc was called by get_all_system_info(), etc... (again that's just a random example of what I'm trying to see)
How do I obtain this information?

valgrind more allocs than frees but no leaks

I have encountered a problem..
When i run the valgrind with my program i've got the following output and it confuse me:
==12919== HEAP SUMMARY:
==12919== in use at exit: 97,820 bytes in 1 blocks
==12919== total heap usage: 17 allocs, 16 frees, 99,388 bytes allocated
==12919==
==12919== LEAK SUMMARY:
==12919== definitely lost: 0 bytes in 0 blocks
==12919== indirectly lost: 0 bytes in 0 blocks
==12919== possibly lost: 0 bytes in 0 blocks
==12919== still reachable: 97,820 bytes in 1 blocks
==12919== suppressed: 0 bytes in 0 blocks
==12919== Reachable blocks (those to which a pointer was found) are not shown.
==12919== To see them, rerun with: --leak-check=full --show-reachable=yes
==12919==
==12919== For counts of detected and suppressed errors, rerun with: -v
==12919== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)
I found that it was caused by compilation with "-pg" flag. Without it all is ok!
From the valgrind faq
5.2. With Memcheck's memory leak detector, what's the difference between "definitely lost", "indirectly lost", "possibly lost", "still reachable", and "suppressed"?
The details are in the Memcheck section of the user manual.
In short:
"definitely lost" means your program is leaking memory -- fix those leaks!
"indirectly lost" means your program is leaking memory in a pointer-based structure. (E.g. if the root node of a binary tree is "definitely lost", all the children will be "indirectly lost".) If you fix the "definitely lost" leaks, the "indirectly lost" leaks should go away.
"possibly lost" means your program is leaking memory, unless you're doing unusual things with pointers that could cause them to point into the middle of an allocated block; see the user manual for some possible causes. Use --show-possibly-lost=no if you don't want to see these reports.
"still reachable" means your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable. Don't use --show-reachable=yes if you don't want to see these reports.
"suppressed" means that a leak error has been suppressed. There are some suppressions in the default suppression files. You can ignore suppressed errors.

Can valgrind output partial reports without having to quit the profiled application?

I want to check a long running process for memory leaks with valgrind. I suspect the memory leak I'm after might happen only after several hours of execution. I can run the app under valgrind and get the valgrind log just fine, but doing so means I have to quit the application and start it again anew for a new valgrind session for which I would still have to wait several hours. Is it possible to keep valgrind and the app running and still get valgrind's (partial) data at any point during execution?
You can do that by using the Valgrind gdbserver and GDB.
In short, you start your program with valgrind as usual, but with the --vgdb=yes switch:
$ valgrind --tool=memcheck --vgdb=yes ./a.out
In another session, you start gdb on the same executable, and connect to valgrind. You can then issue valgrind commands:
$ gdb ./a.out
...
(gdb) target remote | vgdb
....
(gdb) monitor leak_check full reachable any
==8677== 32 bytes in 1 blocks are definitely lost in loss record 1 of 2
==8677== at 0x4C28E3D: malloc (vg_replace_malloc.c:263)
==8677== by 0x400591: foo (in /home/me/tmp/a.out)
==8677== by 0x4005A7: main (in /home/me/tmp/a.out)
==8677==
==8677== 32 bytes in 1 blocks are definitely lost in loss record 2 of 2
==8677== at 0x4C28E3D: malloc (vg_replace_malloc.c:263)
==8677== by 0x400591: foo (in /home/me/tmp/a.out)
==8677== by 0x4005AC: main (in /home/me/tmp/a.out)
==8677==
==8677== LEAK SUMMARY:
==8677== definitely lost: 64 bytes in 2 blocks
==8677== indirectly lost: 0 bytes in 0 blocks
==8677== possibly lost: 0 bytes in 0 blocks
==8677== still reachable: 0 bytes in 0 blocks
==8677== suppressed: 0 bytes in 0 blocks
==8677==
(gdb)
See the manual for a list of commands, here for memcheck.

How to print disassembly registers in the Xcode console

I'm looking at some disassembly code and see something like 0x01c8f09b <+0015> mov 0x8(%edx),%edi and I am wondering what the value of %edx or %edi is.
Is there a way to print the value of %edx or other assembly variables? Is there a way to print the value at the memory address that %edx points at (I'm assuming edx is a register containing a pointer to ... something here).
For example, you can print an objet by typing po in the console, so is there a command or syntax for printing registers/variables in the assembly?
Background:
I'm getting EXC_BAD_ACCESS on this line and I would like to debug what is going on. I'm aware this error is related to memory management and I'm looking at figuring out where I may be missing/too-many retain/release/autorelease calls.
Additional Info:
This is on IOS, and my application is running in the iPhone simulator.
You can print a register (e.g, eax) using:
print $eax
Or for short:
p $eax
To print it as hexadecimal:
p/x $eax
To display the value pointed to by a register:
x $eax
Check the gdb help for more details:
help print
help x
Depends up which Xcode compiler/debugger you are using. For gcc/gdb it's
info registers
but for clang/lldb it's
register read
(gdb) info reg
eax 0xe 14
ecx 0x2844e0 2639072
edx 0x285360 2642784
ebx 0x283ff4 2637812
esp 0xbffff350 0xbffff350
ebp 0xbffff368 0xbffff368
esi 0x0 0
edi 0x0 0
eip 0x80483f9 0x80483f9 <main+21>
eflags 0x246 [ PF ZF IF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
From Debugging with gdb:
You can refer to machine register contents, in expressions, as variables with names
starting with `$'. The names of registers are different for each machine; use info
registers to see the names used on your machine.
info registers
Print the names and values of all registers except floating-point
registers (in the selected stack frame).
info all-registers
Print the names and values of all registers, including floating-point
registers.
info registers regname ...
Print the relativized value of each specified register regname.
regname may be any register name valid on the machine you are using,
with or without the initial `$'.
If you are using LLDB instead of GDB you can use register read
Those are not variables, but registers.
In GDB, you can see the values of standard registers by using the following command:
info registers
Note that a register contains integer values (32bits in your case, as the register name is prefixed by e). What it represent is not known. It can be a pointer, an integer, mostly anything.
If po crashes when you try to print a register's value as a pointer, it's likely that the value is not a pointer (or an invalid one).

Valgrind: can possibly lost be treated as definitely lost?

Can I treat the output of a Valgrind memcheck, "possibly lost" as "definitely lost"?
Possibly lost, or “dubious”: A pointer to the interior of the block is found. The pointer might originally have pointed to the start and
have been moved along, or it might be entirely unrelated. Memcheck
deems such a block as “dubious”, because it's unclear whether or not a
pointer to it still exists.
Definitely lost, or “leaked”: The worst outcome is that no pointer to the block can be found. The block is classified as “leaked”,
because the programmer could not possibly have freed it at program
exit, since no pointer to it exists. This is likely a symptom of
having lost the pointer at some earlier point in the program
Yes, I recommend to treat possibly lost as severe as definitely lost. In other words, fix your code until there are no losts at all.
Possibly lost can happen when you traverse an array using the same pointer that is holding it. You know that you can reset the pointer by subtracting the index. But valgrind can't tell whether it is a programming error or you are being clever doing this deliberately. That is why it warns you.
Example
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv) {
char* s = "string";
// this will allocate a new array
char* p = strdup(s);
// move the pointer into the array
// we know we can reset the pointer by subtracting
// but for valgrind the array is now lost
p += 1;
// crash the program
abort();
// reset the pointer to the beginning of the array
p -= 1;
// properly free the memory for the array
free(p);
return 0;
}
Compile
$ gcc -ggdb foo.c -o foo
Valgrind report
$ valgrind ./foo
...
==31539== Process terminating with default action of signal 6 (SIGABRT): dumping core
==31539== at 0x48BBD7F: raise (in /usr/lib/libc-2.28.so)
==31539== by 0x48A6671: abort (in /usr/lib/libc-2.28.so)
==31539== by 0x10917C: main (foo.c:14)
==31539==
==31539== HEAP SUMMARY:
==31539== in use at exit: 7 bytes in 1 blocks
==31539== total heap usage: 1 allocs, 0 frees, 7 bytes allocated
==31539==
==31539== LEAK SUMMARY:
==31539== definitely lost: 0 bytes in 0 blocks
==31539== indirectly lost: 0 bytes in 0 blocks
==31539== possibly lost: 7 bytes in 1 blocks
==31539== still reachable: 0 bytes in 0 blocks
==31539== suppressed: 0 bytes in 0 blocks
...
If you remove abort() then Valgrind will report no memory lost at all. Without abort, the pointer will return to the beginning of the array and the memory will be freed properly.
This is a trivial example. In sufficiently complicated code it is no longer obvious that the pointer can and will return to the beginning of the memory block. Changes in other part of the code can cause the possibly lost to be a definitely lost. That is why you should care about possibly lost.