Valgrind reports "invalid write" at "X bytes below stack pointer" - valgrind

I'm running some code under Valgrind, compiled with gcc 7.5 targeting an aarch64 (ARM 64 bits) architecture, with optimizations enabled.
I get the following error:
==3580== Invalid write of size 8
==3580== at 0x38865C: ??? (in ...)
==3580== Address 0x1ffeffdb70 is on thread 1's stack
==3580== 16 bytes below stack pointer
This is the assembly dump in the vicinity of the offending code:
388640: a9bd7bfd stp x29, x30, [sp, #-48]!
388644: f9000bfc str x28, [sp, #16]
388648: a9024ff4 stp x20, x19, [sp, #32]
38864c: 910003fd mov x29, sp
388650: d1400bff sub sp, sp, #0x2, lsl #12
388654: 90fff3f4 adrp x20, 204000 <_IO_stdin_used-0x4f0>
388658: 3dc2a280 ldr q0, [x20, #2688]
38865c: 3c9f0fe0 str q0, [sp, #-16]!
I'm trying to ascertain whether this is a possible bug in my code (note that I've thoroughly reviewed my code and I'm fairly confident it's correct), or whether Valgrind will blindly report any writes below the stack pointer as an error.
Assuming the latter, it looks like a Valgrind bug since the offending instruction at 0x38865c uses the pre-decrement addressing mode, so it's not actually writing below the stack pointer.
Furthermore, at address 0x388640 a similar access (and again with pre-decrement addressing mode) is performed, yet this isn't reported by Valgrind; the main difference being the use of an x register at address 0x388640 versus a q register at address 38865c.
I'd also like to draw attention to the large stack pointer subtraction at 0x388650, which may or may not have anything to do with the issue (note this subtraction makes sense, given that the offending C code declares a large array on the stack).
So, will anyone help me make sense of this, and whether I should worry about my code?

The code looks fine, and the write is certainly not below the stack pointer. The message seems to be a valgrind bug, possibly #432552, which is marked as fixed. OP confirms that the message is not produced after upgrading valgrind to 3.17.0.

code declares a large array on the stack
should [I] worry about my code?
I think it depends upon your desire for your code to be more portable.
Take this bit of code that I believe represents at least one important thing you mentioned in your post:
#include <stdio.h>
#include <stdlib.h>
long long foo (long long sz, long long v) {
long long arr[sz]; // allocating a variable on the stack
arr[sz-1] = v;
return arr[sz-1];
}
int main (int argc, char *argv[]) {
long long n = atoll(argv[1]);
long long v = foo(n, n);
printf("v = %lld\n", v);
}
$ uname -mprsv
Darwin 20.5.0 Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64 x86_64 i386
$ gcc test.c
$ a.out 1047934
v = 1047934
$ a.out 1047935
Segmentation fault: 11
$ uname -snrvmp
Linux localhost.localdomain 3.19.8-100.fc20.x86_64 #1 SMP Tue May 12 17:08:50 UTC 2015 x86_64 x86_64
$ gcc test.c
$ ./a.out 2147483647
v = 2147483647
$ ./a.out 2147483648
v = 2147483648
There are at least some minor portability concerns with this code. The amount of allocatable stack memory for these two environments differs significantly. And that's only for two platforms. Haven't tried it on my Windows 10 vm but I don't think I need to because I got bit by this one a long time ago.

Beyond OP issue that was due to a Valgrind bug, the title of this question is bound to attract more people (like me) who are getting "invalid write at X bytes below stack pointer" as a legitimate error.
My piece of advice: check that the address you're writing to is not a local variable of another function (not present in the call stack)!
I stumbled upon this issue while attempting to write into the address returned by yyget_lloc(yyscanner) while outside of function yyparse (the former returns the address of a local variable in the latter).

Related

Memory problems with LibGit2 initialization

When I initialize and shutdown LibGit2 I am left with reachable memory and/or errors.
My test systems are Ubuntu 18.04 with libgit2 0.26 where g++ -v gives me gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1) and a FreeBSD 11.3 VM with libgit 0.28.3 where, unfortunately, I can't copy & paste from. Here g++ -v gives gcc version 9.2.0 (FreeBSD Ports Collection.
This is a minimal example:
#include <git2.h>
int main () {
git_libgit2_init();
git_libgit2_shutdown();
return 0;
}
On Ubuntu I run the following:
➜ libelektra git:(libgit_test) ✗ g++ minimal.c -lgit2 && valgrind ./a.out
==1174== Memcheck, a memory error detector
==1174== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1174== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==1174== Command: ./a.out
==1174==
==1174==
==1174== HEAP SUMMARY:
==1174== in use at exit: 192 bytes in 12 blocks
==1174== total heap usage: 1,354 allocs, 1,342 frees, 107,044 bytes allocated
==1174==
==1174== LEAK SUMMARY:
==1174== definitely lost: 0 bytes in 0 blocks
==1174== indirectly lost: 0 bytes in 0 blocks
==1174== possibly lost: 0 bytes in 0 blocks
==1174== still reachable: 192 bytes in 12 blocks
==1174== suppressed: 0 bytes in 0 blocks
==1174== Rerun with --leak-check=full to see details of leaked memory
==1174==
==1174== For counts of detected and suppressed errors, rerun with: -v
==1174== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Why do I have reachable memory, when the very first example from the documentation says that git_libgit2_shutdown(); should clean everything up?
While the Valgrind documentation says that some reachable memory might be ok, things get quite wild on FreeBSD. I have some screenshots of the VM
One Two Three.
How can I avoid this?
One additional remark on different memory handling. My goal is to use the git_merge_file function in this project. It should look something like this:
#include <git2.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
int main () {
git_libgit2_init();
sleep (1);
git_merge_file_result out = { 0 }; // out.ptr will not receive a terminating null character
git_merge_file_input libgit_base;
git_merge_file_input libgit_our;
git_merge_file_input libgit_their;
git_merge_file_init_input(&libgit_base, GIT_MERGE_FILE_INPUT_VERSION);
git_merge_file_init_input(&libgit_our, GIT_MERGE_FILE_INPUT_VERSION);
git_merge_file_init_input(&libgit_their, GIT_MERGE_FILE_INPUT_VERSION);
libgit_base.ptr = "A";
libgit_base.size = strlen("A");
libgit_our.ptr = "A";
libgit_our.size = strlen("A");
libgit_their.ptr = "A";
libgit_their.size = strlen("A");
int exitCode = git_merge_file (&out, &libgit_base, &libgit_our, &libgit_their, 0);
printf("Code is %d\n", exitCode);
git_merge_file_result_free (&out);
git_libgit2_shutdown();
sleep (1);
return 0;
}
When I remove initialization and/or shutdown I sometimes got 0 still reachable memory on Ubuntu but segmentation faults on FreeBSD. Is it worth giving this a closer look or is such a difference in behavior normal when ignoring the that LibGit must be initialized?
In the screenshots of the BSD VM __pthread_once is visible as a source of problems. This and __pthread_once_slow seem to be involved in all the errors: The 192 bytes on Ubuntu in the beginning, the more advanced example at the bottom with BSD and Ubuntu and also my real application.
As far as I can see, there's nothing wrong with your code, or the Valgrind report by itself, as as you've pointed out:
"still reachable" means your program is probably ok -- it didn't free some memory it could have. This is quite common and often reasonable. Don't use --show-reachable=yes if you don't want to see these reports.
Hence, it's likely the 192 bytes aren't really leaked, you've just managed to exit the program before the OS decided to grab back that block of memory — ie. it kept that block under the process's purview, as a optimisation for the next allocation to be made. In this case, the process just exited, so that memory will have to be reclaimed at process termination, and I think that's what "still reachable" means — memory that is fine, and will be reclaimed normally. Hopefully 😉.
The Valgrind errors on FreeBSD aren't allocation problems, but use of an uninitialized zone of memory. They don't look to be inside libgit2 but OpenSSL itself, while parsing certificates (?). You can find the underlying OpenSSL initialization starting from here.
Is it worth giving this a closer look or is such a difference in behavior normal when ignoring the that LibGit must be initialized?
I'm tempted to say no, and yes. The code is now prodding a memory location that contains random garbage instead of an stack-allocated pthread_something. Segfaults are bound to happen randomly.
HTH !

Why is valgrind complaining about the perfectly fine initialized buffer?

This is the test code "valgrind.c". It initializes an on stack buffer, then does a simple string compare over it.
#include <stdlib.h>
#include <string.h>
int main( void)
{
char buf[ 6];
memset( buf, 'X', sizeof( buf));
if( strncmp( buf, "XXXX", 4))
abort();
return( 0);
}
I compile this with cc -O0 -g valgrind.c -o valgrind.
Running on its own, it does fine.
When I run it through valgrind --track-origins=yes ./valgrind though this gives me:
==28182== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==28182== Conditional jump or move depends on uninitialised value(s)
==28182== at 0x4E058CC: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
==28182== by 0x4CAA09A: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
==28182== Uninitialised value was created by a stack allocation
==28182== at 0x4CA9FBD: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
That really makes no sense to me. I am running this on Ubuntu 18.10.
The answer was that the valgrind libraries were buggy. After a complete dist-upgrade, things work now as expected. The version number of valgrind and the executable remain the same though (my current dpkg number is now 1:3.13.0-2ubuntu6, I forgot to jot down the old one, sorry).
These were the strace opened libraries with their shasums. Thre is actually a difference in libraries opened and you can see that the libc and the actual test and valgrind executable are unchanged in both scenarios:
Broken:
41bd206c714bcd2be561b477d756a4104dddd2d3578040cca30ff06d19730d61 /etc/ld.so.cache
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib64/ld-linux-x86-64.so.2
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib/x86_64-linux-gnu/ld-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc.so.6
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a /tmp/valgrind
4652774bd116cb49951ef74115ad4237cad5021b2bd4d80002f09d986ec438b9 /usr/bin/valgrind
0369719ef5fe66d467a385299396bab0937002694ffc78027ede22c09d39abf3 /usr/lib/valgrind/default.supp
16b5f1e6ae25663620edb8f8d4a7f1a392e059d6cf9eb20a270129295548ffb2 /usr/lib/valgrind/memcheck-amd64-linux
6335747b07b2e8a6150fbfa777ade9bd80d56626bba9772d61c7d33328e68bda /usr/lib/valgrind/vgpreload_core-amd64-linux.so
827b4c18aefad7788b6e654b1519d3caa1ab223cf7a6ba58d22d7ad7d383b032 /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a ./valgrind
Healthy:
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib64/ld-linux-x86-64.so.2
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib/x86_64-linux-gnu/ld-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc.so.6
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a /tmp/valgrind
4652774bd116cb49951ef74115ad4237cad5021b2bd4d80002f09d986ec438b9 /usr/bin/valgrind
391826262f9dc33565a8ac0b762ba860951267e73b0b4db7d02d1fd62782f8c8 /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.28.so
3ab1f160af6c3198de45f286dd569fad7ae976a89ff1655e955ef0544b8b5d6c /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.28.so
ae4ea44f87787b9b80d19a69ad287195dc7840eea08c08732d36d2ef1e6ecff3 /usr/lib/valgrind/default.supp
ba18f39979d22efc89340b839257f953a505ef5ca774b5bf06edd78ecb6ed86e /usr/lib/valgrind/memcheck-amd64-linux
1649637bba73e84b962222f3756cc810c5413239ed180e0029cd98f069612613 /usr/lib/valgrind/vgpreload_core-amd64-linux.so
ab1501fa569e0185dea7248648255276ca965bbe270803dcbb930a22ea7a59b7 /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a ./valgrind
Thanks for the helpful comments, especially from Florian, which put me on the right track.

Statically Defined IDT

This question already has answers here:
Solution needed for building a static IDT and GDT at assemble/compile/link time
(1 answer)
How to do computations with addresses at compile/linking time?
(2 answers)
Closed 5 days ago.
I'm working on a project that has tight boot time requirements. The targeted architecture is an IA-32 based processor running in 32 bit protected mode. One of the areas identified that can be improved is that the current system dynamically initializes the processor's IDT (interrupt descriptor table). Since we don't have any plug-and-play devices and the system is relatively static, I want to be able to use a statically built IDT.
However, this proving to be troublesome for the IA-32 arch since the 8 byte interrupt gate descriptors splits the ISR address. The low 16 bits of the ISR appear in the first 2 bytes of the descriptor, some other bits fill in the next 4 bytes, and then finally the last 16 bits of the ISR appear in the last 2 bytes.
I wanted to use a const array to define the IDT and then simply point the IDT register at it like so:
typedef struct s_myIdt {
unsigned short isrLobits;
unsigned short segSelector;
unsigned short otherBits;
unsigned short isrHibits;
} myIdtStruct;
myIdtStruct myIdt[256] = {
{ (unsigned short)myIsr0, 1, 2, (unsigned short)(myIsr0 >> 16)},
{ (unsigned short)myIsr1, 1, 2, (unsigned short)(myIsr1 >> 16)},
etc.
Obviously this won't work as it is illegal to do this in C because myIsr is not constant. Its value is resolved by the linker (which can do only a limited amount of math) and not by the compiler.
Any recommendations or other ideas on how to do this?
You ran into a well known x86 wart. I don't believe the linker can stuff the address of your isr routines in the swizzled form expected by the IDT entry.
If you are feeling ambitious, you could create an IDT builder script that does something like this (Linux based) approach. I haven't tested this scheme and it probably qualifies as a nasty hack anyway, so tread carefully.
Step 1: Write a script to run 'nm' and capture the stdout.
Step 2: In your script, parse the nm output to get the memory address of all your interrupt service routines.
Step 3: Output a binary file, 'idt.bin' that has the IDT bytes all setup and ready for the LIDT instruction. Your script obviously outputs the isr addresses in the correct swizzled form.
Step 4: Convert his raw binary into an elf section with objcopy:
objcopy -I binary -O elf32-i386 idt.bin idt.elf
Step 5: Now idt.elf file has your IDT binary with the symbol something like this:
> nm idt.elf
000000000000000a D _binary_idt_bin_end
000000000000000a A _binary_idt_bin_size
0000000000000000 D _binary_idt_bin_start
Step 6: relink your binary including idt.elf. In your assembly stubs and linker scripts, you can refer to symbol _binary_idt_bin_start as the base of the IDT. For example, your linker script can place the symbol _binary_idt_bin_start at any address you like.
Be careful that relinking with the IDT section doesn't move anyting else in your binary, e.g. your isr routines. Manage this in your linker script (.ld file) by puting the IDT into it's own dedicated section.
---EDIT---
From comments, there seems to be confusion about the problem. The 32-bit x86 IDT expects the address of the interrupt service routine to be split into two different 16-bit words, like so:
31 16 15 0
+---------------+---------------+
| Address 31-16 | |
+---------------+---------------+
| | Address 15-0 |
+---------------+---------------+
A linker is thus unable to plug-in the ISR address as a normal relocation. So, at boot time, software must construct this split format, which slows boot time.

How to print disassembly registers in the Xcode console

I'm looking at some disassembly code and see something like 0x01c8f09b <+0015> mov 0x8(%edx),%edi and I am wondering what the value of %edx or %edi is.
Is there a way to print the value of %edx or other assembly variables? Is there a way to print the value at the memory address that %edx points at (I'm assuming edx is a register containing a pointer to ... something here).
For example, you can print an objet by typing po in the console, so is there a command or syntax for printing registers/variables in the assembly?
Background:
I'm getting EXC_BAD_ACCESS on this line and I would like to debug what is going on. I'm aware this error is related to memory management and I'm looking at figuring out where I may be missing/too-many retain/release/autorelease calls.
Additional Info:
This is on IOS, and my application is running in the iPhone simulator.
You can print a register (e.g, eax) using:
print $eax
Or for short:
p $eax
To print it as hexadecimal:
p/x $eax
To display the value pointed to by a register:
x $eax
Check the gdb help for more details:
help print
help x
Depends up which Xcode compiler/debugger you are using. For gcc/gdb it's
info registers
but for clang/lldb it's
register read
(gdb) info reg
eax 0xe 14
ecx 0x2844e0 2639072
edx 0x285360 2642784
ebx 0x283ff4 2637812
esp 0xbffff350 0xbffff350
ebp 0xbffff368 0xbffff368
esi 0x0 0
edi 0x0 0
eip 0x80483f9 0x80483f9 <main+21>
eflags 0x246 [ PF ZF IF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
From Debugging with gdb:
You can refer to machine register contents, in expressions, as variables with names
starting with `$'. The names of registers are different for each machine; use info
registers to see the names used on your machine.
info registers
Print the names and values of all registers except floating-point
registers (in the selected stack frame).
info all-registers
Print the names and values of all registers, including floating-point
registers.
info registers regname ...
Print the relativized value of each specified register regname.
regname may be any register name valid on the machine you are using,
with or without the initial `$'.
If you are using LLDB instead of GDB you can use register read
Those are not variables, but registers.
In GDB, you can see the values of standard registers by using the following command:
info registers
Note that a register contains integer values (32bits in your case, as the register name is prefixed by e). What it represent is not known. It can be a pointer, an integer, mostly anything.
If po crashes when you try to print a register's value as a pointer, it's likely that the value is not a pointer (or an invalid one).

making valgrind abort on error for heap corruption checking?

I'd like to try using valgrind to do some heap corruption detection. With the following corruption "unit test":
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main()
{
char * c = (char *) malloc(10) ;
memset( c, 0xAB, 20 ) ;
printf("not aborted\n") ;
return 0 ;
}
I was suprised to find that valgrind doesn't abort on error, but just produces a message:
valgrind -q --leak-check=no a.out
==11097== Invalid write of size 4
==11097== at 0x40061F: main (in /home/hotellnx94/peeterj/tmp/a.out)
==11097== Address 0x51c6048 is 8 bytes inside a block of size 10 alloc'd
==11097== at 0x4A2058F: malloc (vg_replace_malloc.c:236)
==11097== by 0x400609: main (in /home/hotellnx94/peeterj/tmp/a.out)
...
not aborted
I don't see a valgrind option to abort on error (like gnu-libc's mcheck does, but I can't use mcheck because it isn't thread safe). Does anybody know if that is possible (our code dup2's stdout to /dev/null since it runs as a daemon, so a report isn't useful and I'd rather catch the culprit in the act or closer to it).
There is no such option in valgrind.
Consider adding a non-daemon mode (debug mode) into your daemon.
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs 4.6 explains some requests from debugged program to valgrind+memcheck, so you can use some of this in your daemon to do some checks at fixed code positions.