Is the gcc insane optimisation level (-O3) not insane enough? - optimization

As part of answering another question, I wanted to show that the insane level of optimisation of gcc (-O3) would basically strip out any variables that weren't used in main. The code was:
#include <stdio.h>
int main (void) {
char bing[71];
int x = 7;
bing[0] = 11;
return 0;
}
and the gcc -O3 output was:
.file "qq.c"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
Now I can see it's removed the local variables but there's still quite a bit of wastage in there. It seems to me that the entire:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
section could be replaced with the simpler:
xorl %eax, %eax
ret
Does anyone have any idea why gcc does not perform this optimisation? I know that would save very little for main itself but, if this were done with normal functions as well, the effect of unnecessarily adjusting the stack pointer in a massive loop would be considerable.
The command used to generate the assembly was:
gcc -O3 -std=c99 -S qq.c

You can enable that particular optimization with the -fomit-frame-pointer compiler flag. Doing so makes debugging impossible on some machines and substantially more difficult on everything else, which is why it's usually disabled.
Although your GCC documentation may say that -fomit-frame-pointer is enabled at various optimization levels, you'll likely find that that's not the case—you'll almost certainly have to explicitly enable it yourself.

Turning on -fomit-frame-pointer (source) should get rid of the extra stack manipulations.
GCC apparently left those in because they facilitate debugging (getting a stack trace when needed), although the docs note that -fomit-frame-pointer is the default starting with GCC 4.6.

Related

gdb crash when using breakpoints

In every program I try to debug, I am getting the same result, every time I use breakpoints and try to run any program gdb crash. I tried the same thing on different programs and it keeps acting like this.
I will show the result on this simple:
int main(int argc,char* argv[]){
for(int i = 0;i < 200; i++){
printf("%d\n",i);
}
}
gcc main.c -m32 -std=c99 -o test
GNU gdb (Debian 8.3-1) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test...
(No debugging symbols found in test)
(gdb) disas main
Dump of assembler code for function main:
0x00001199 <+0>: lea 0x4(%esp),%ecx
0x0000119d <+4>: and $0xfffffff0,%esp
0x000011a0 <+7>: pushl -0x4(%ecx)
0x000011a3 <+10>: push %ebp
0x000011a4 <+11>: mov %esp,%ebp
0x000011a6 <+13>: push %ebx
0x000011a7 <+14>: push %ecx
0x000011a8 <+15>: sub $0x10,%esp
0x000011ab <+18>: call 0x10a0 <__x86.get_pc_thunk.bx>
0x000011b0 <+23>: add $0x2e50,%ebx
0x000011b6 <+29>: movl $0x0,-0xc(%ebp)
0x000011bd <+36>: jmp 0x11d8 <main+63>
0x000011bf <+38>: sub $0x8,%esp
0x000011c2 <+41>: pushl -0xc(%ebp)
0x000011c5 <+44>: lea -0x1ff8(%ebx),%eax
0x000011cb <+50>: push %eax
0x000011cc <+51>: call 0x1030 <printf#plt>
0x000011d1 <+56>: add $0x10,%esp
0x000011d4 <+59>: addl $0x1,-0xc(%ebp)
0x000011d8 <+63>: cmpl $0xc7,-0xc(%ebp)
0x000011df <+70>: jle 0x11bf <main+38>
0x000011e1 <+72>: mov $0x0,%eax
0x000011e6 <+77>: lea -0x8(%ebp),%esp
0x000011e9 <+80>: pop %ecx
0x000011ea <+81>: pop %ebx
0x000011eb <+82>: pop %ebp
0x000011ec <+83>: lea -0x4(%ecx),%esp
0x000011ef <+86>: ret
End of assembler dump.
(gdb) break *0x000011ef
Breakpoint 1 at 0x11ef
(gdb) run
Starting program: /root/test
[1]+ Stopped gdb test
I tried to do the same thing in another linux machine, and it works fine. So what could be the problem?
Update: I found a temp solution for the breakpoints issue (so gdb do not crash), You use the command (start) at the beginning and everything will work fine :
GNU gdb (Debian 8.3-1) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test...
(No debugging symbols found in test)
(gdb) start
Temporary breakpoint 1 at 0x11a8
Starting program: /root/test
Temporary breakpoint 1, 0x565561a8 in main ()
(gdb) enable
(gdb) disas main
Dump of assembler code for function main:
0x56556199 <+0>: lea 0x4(%esp),%ecx
0x5655619d <+4>: and $0xfffffff0,%esp
0x565561a0 <+7>: pushl -0x4(%ecx)
0x565561a3 <+10>: push %ebp
0x565561a4 <+11>: mov %esp,%ebp
0x565561a6 <+13>: push %ebx
0x565561a7 <+14>: push %ecx
=> 0x565561a8 <+15>: sub $0x10,%esp
0x565561ab <+18>: call 0x565560a0 <__x86.get_pc_thunk.bx>
0x565561b0 <+23>: add $0x2e50,%ebx
0x565561b6 <+29>: movl $0x0,-0xc(%ebp)
0x565561bd <+36>: jmp 0x565561d8 <main+63>
0x565561bf <+38>: sub $0x8,%esp
0x565561c2 <+41>: pushl -0xc(%ebp)
0x565561c5 <+44>: lea -0x1ff8(%ebx),%eax
0x565561cb <+50>: push %eax
0x565561cc <+51>: call 0x56556030 <printf#plt>
0x565561d1 <+56>: add $0x10,%esp
0x565561d4 <+59>: addl $0x1,-0xc(%ebp)
0x565561d8 <+63>: cmpl $0xc7,-0xc(%ebp)
0x565561df <+70>: jle 0x565561bf <main+38>
0x565561e1 <+72>: mov $0x0,%eax
0x565561e6 <+77>: lea -0x8(%ebp),%esp
0x565561e9 <+80>: pop %ecx
0x565561ea <+81>: pop %ebx
0x565561eb <+82>: pop %ebp
0x565561ec <+83>: lea -0x4(%ecx),%esp
0x565561ef <+86>: ret
End of assembler dump.
(gdb) break *0x565561df
Breakpoint 2 at 0x565561df
(gdb) info break
Num Type Disp Enb Address What
2 breakpoint keep y 0x565561df <main+70>
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /root/test
Breakpoint 2, 0x565561df in main ()
(gdb) step
Single stepping until exit from function main,
which has no line number information.
0
Breakpoint 2, 0x565561df in main ()
(gdb)
Single stepping until exit from function main,
which has no line number information.
1
Breakpoint 2, 0x565561df in main ()
(gdb)
Single stepping until exit from function main,
which has no line number information.
2
Breakpoint 2, 0x565561df in main ()
(gdb)
Single stepping until exit from function main,
which has no line number information.
3
Breakpoint 2, 0x565561df in main ()
(gdb)
Single stepping until exit from function main,
which has no line number information.
4
Breakpoint 2, 0x565561df in main ()
Unfortunately, This is a temp solution just so you can deal with breakpoints, and it have nothing to do with the crashing problem.
You are most likely trying to set a breakpoint at an invalid address with this command break *0x000011ef. The 0x11ef is the offset of that instruction within the section within the ELF, but the program is going to be relocated when it is loaded / started.
You should instead try start, then disas main, and then place your breakpoint.
GDB stopping like this is a bug which occurs when GDB throws an error while trying to place a breakpoint, it was fixed in upstream GDB with this patch:
https://sourceware.org/ml/gdb-patches/2019-05/msg00361.html
Once you see GDB stopped like this:
[1]+ Stopped gdb soQuestionProgram
you should be dropped back to a shell. Just resume GDB with the fg command and continue your debug session. Once GDB 8.4 is out this bug will be fixed.
it keeps acting like this
First: GDB did not crash. It merely got stopped (by your shell). You can get it back with the shell fg command.
Second: this has nothing to do with GDB, and something to do with your terminal configuration. Using reset may cure this problem.

Why is valgrind complaining about the perfectly fine initialized buffer?

This is the test code "valgrind.c". It initializes an on stack buffer, then does a simple string compare over it.
#include <stdlib.h>
#include <string.h>
int main( void)
{
char buf[ 6];
memset( buf, 'X', sizeof( buf));
if( strncmp( buf, "XXXX", 4))
abort();
return( 0);
}
I compile this with cc -O0 -g valgrind.c -o valgrind.
Running on its own, it does fine.
When I run it through valgrind --track-origins=yes ./valgrind though this gives me:
==28182== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==28182== Conditional jump or move depends on uninitialised value(s)
==28182== at 0x4E058CC: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
==28182== by 0x4CAA09A: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
==28182== Uninitialised value was created by a stack allocation
==28182== at 0x4CA9FBD: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
That really makes no sense to me. I am running this on Ubuntu 18.10.
The answer was that the valgrind libraries were buggy. After a complete dist-upgrade, things work now as expected. The version number of valgrind and the executable remain the same though (my current dpkg number is now 1:3.13.0-2ubuntu6, I forgot to jot down the old one, sorry).
These were the strace opened libraries with their shasums. Thre is actually a difference in libraries opened and you can see that the libc and the actual test and valgrind executable are unchanged in both scenarios:
Broken:
41bd206c714bcd2be561b477d756a4104dddd2d3578040cca30ff06d19730d61 /etc/ld.so.cache
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib64/ld-linux-x86-64.so.2
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib/x86_64-linux-gnu/ld-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc.so.6
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a /tmp/valgrind
4652774bd116cb49951ef74115ad4237cad5021b2bd4d80002f09d986ec438b9 /usr/bin/valgrind
0369719ef5fe66d467a385299396bab0937002694ffc78027ede22c09d39abf3 /usr/lib/valgrind/default.supp
16b5f1e6ae25663620edb8f8d4a7f1a392e059d6cf9eb20a270129295548ffb2 /usr/lib/valgrind/memcheck-amd64-linux
6335747b07b2e8a6150fbfa777ade9bd80d56626bba9772d61c7d33328e68bda /usr/lib/valgrind/vgpreload_core-amd64-linux.so
827b4c18aefad7788b6e654b1519d3caa1ab223cf7a6ba58d22d7ad7d383b032 /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a ./valgrind
Healthy:
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib64/ld-linux-x86-64.so.2
b0d9f1bc02b4500cff157d16b2761b9b2420151cc129de37ccdecf6d3005a1e0 /lib/x86_64-linux-gnu/ld-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc-2.28.so
701e316140eda639d651efad20b187a0811ea4deac0a52f8bcd322dffbb29d94 /lib/x86_64-linux-gnu/libc.so.6
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a /tmp/valgrind
4652774bd116cb49951ef74115ad4237cad5021b2bd4d80002f09d986ec438b9 /usr/bin/valgrind
391826262f9dc33565a8ac0b762ba860951267e73b0b4db7d02d1fd62782f8c8 /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.28.so
3ab1f160af6c3198de45f286dd569fad7ae976a89ff1655e955ef0544b8b5d6c /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.28.so
ae4ea44f87787b9b80d19a69ad287195dc7840eea08c08732d36d2ef1e6ecff3 /usr/lib/valgrind/default.supp
ba18f39979d22efc89340b839257f953a505ef5ca774b5bf06edd78ecb6ed86e /usr/lib/valgrind/memcheck-amd64-linux
1649637bba73e84b962222f3756cc810c5413239ed180e0029cd98f069612613 /usr/lib/valgrind/vgpreload_core-amd64-linux.so
ab1501fa569e0185dea7248648255276ca965bbe270803dcbb930a22ea7a59b7 /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so
38705bdbed45a77c2de28bedf5560d6ca016d57861bf60caa42255ceab8f076a ./valgrind
Thanks for the helpful comments, especially from Florian, which put me on the right track.

sigbrt error (Xcode 4.6)

I've been trying to put together an app (using Xcode 4.6.3) that has many buttons and they have a similar code, but when I try to test it, the debugger comes up. I'm just learning Objective C and don't really know how to fix this problem. I was able to use a few breakpoints to get this set of code in the debugger,
CoreFoundation`-[NSException raise]:
0x1d19fa0: pushl %ebp
0x1d19fa1: movl %esp, %ebp
0x1d19fa3: subl $8, %esp
0x1d19fa6: movl 8(%ebp), %eax
0x1d19fa9: movl %eax, (%esp)
0x1d19fac: calll 0x1d645a4 ; symbol stub for: objc_exception_throw
0x1d19fb1: nopw %cs:(%eax,%eax)
I don't understand what this means. Can someone tell me how I can fix this? If you need more info I'll try to get it to you. Thanks!

How to jump into bootloader in Ada?

I am writing embedded code in Ada. I want to jump into bootloader code which is located at address 0x0E00. I am trying to use following code:
with Interfaces; use Interfaces;
with System;
package AVR.bootloader is
procedure Call;
pragma No_Return(Call);
pragma Import (Assembler,Call);
for Call'Address use System'To_Address (16#0E00#);
end AVR.bootloader;
The problem is this does not work.
Edit: I want to do a following C equivalent:
void (*boot)(void)=0x0E00;
I did a small experiment on this Macbook Pro, and your code seems to do what you meant it to; I modified the code to read
with System;
procedure Bootloader is
procedure Call;
pragma No_Return (Call);
pragma Import (Assembler, Call);
for Call'Address use System'To_Address (16#0E00#);
begin
Call;
end Bootloader;
and when I compile with gnatmake -c -u -f -S bootloader.adb the saved assembler is
.text
.globl __ada_bootloader
__ada_bootloader:
LFB1:
pushq %rbp
LCFI0:
movq %rsp, %rbp
LCFI1:
subq $16, %rsp
LCFI2:
movq $3584, -8(%rbp)
movq -8(%rbp), %rax
call *%rax
leave
LCFI3:
ret
[...]
which looks hopeful, though I’m not familiar enough with asm to know.
Running it under gdb I get (after a lot of chatter)
(gdb) run
Starting program: /Users/simon/tmp/bootloader
Reading symbols for shared libraries ++........................ done
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000e00
0x0000000000000e00 in ?? ()
(gdb) bt
#0 0x0000000000000e00 in ?? ()
Cannot access memory at address 0xe00
#1 0x0000000100000d93 in main (argc=1, argv=140734799805048, envp=140734799805064) at /Users/simon/tmp/b~bootloader.adb:121
#2 0x0000000100000bf4 in start ()
which looks even more hopeful.
Perhaps your AVR compiler isn’t code-generating properly?
Since normally a boot-loader runs on reset, the simplest method is to force a processor reset. A boot-loader may reasonably assume that it is running on an uninitialised system in reset state and may perform initialisation that is not valid on an already initialised system, so forcing a reset is the safest method.
Your processor may have a reset instruction or a reset controller that can perform this directly. Failing that it may have a watchdog timer that can generate a reset. Start the watchdog timer with a suitably short time-out and let it run without servicing it.

Can a partially tail-recursive function still gain the optimization advantages of a fully tail-recursive function?

I realise the answer to this question could be different for different languages, and the language I am most interested in is C++. If the tag needs be changed because this can't be answered in a language-agnostic manner, feel free.
Is it possible to have a function be partially tail-recursive and still get any advantage that being tail-recursive would get you?
As I understand it, tail-recursion is where instead of doing a full function call, the compiler will optimise the function to just change the arguments in place to the new arguments and jump to the beginning of the function.
If you have a function like this:
def example(arg):
if arg == 0:
return 0 # base case
if arg % 2 == 0:
return example(arg - 1) # should be tail recursive
return 3 + example(arg - 1) # isn't tail recursive because 3 is added to the result
When an optimiser encounters something like that (where the function is tail-recursive in some cases and not in others) will it turn the one into a jump and the other into a call, or will some fact of optimisation reality (if I knew it I wouldn't be asking) make it have to turn everything into a call and lose all the efficiency you would have had if the function were tail-recursive?
In Scheme, the first language that comes to mind when I think of tail calls, the second case is guaranteed to be a tail call by the language specification. (Terminology note: it is preferred to refer to such function calls as 'tail calls'.)
The Scheme specification defines exactly what tail calls are in Scheme and mandates that compilers support them specially. You can see the definition in 11.20. Tail calls and tail contexts of R6RS (source).
Note that in Scheme, the specification says nothing about optimization of tail calls. Rather, it says that an implementation must support an unbounded number of active tail calls — a semantic property of the language runtime. They can be implemented as normal calls, but usually aren't.
Example, in C:
Take a C version of your example.
int example(int arg)
{
if (arg == 0)
return 0;
if ((arg % 2) == 0)
return example(arg - 1);
return 3 + example(arg - 1);
}
Compile it using gcc's usual optimization settings (-O2) for i386:
_example:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
movl 8(%ebp), %edx
testl %edx, %edx
jne L5
jmp L15
.align 4,0x90
L14:
decl %edx
testl %edx, %edx
je L7
L5:
testb $1, %dl
je L14
decl %edx
addl $3, %eax
testl %edx, %edx
jne L5
L7:
leave
ret
L15:
leave
xorl %eax, %eax
ret
Note that there are no function calls in the assembly code. GCC has not only optimized your tail call into a jump, it optimized the non-tail call into a jump as well.
As far as I understand it, a smart compiler could apply tail recursion to your first call by just jumping to the example entry point instead of setting up a new stack frame. A following return will unwind the stack to the original caller, effectively "ending" both calls in one step, even if it cannot do that for the other call.
And you could optimize your function by moving the adding of 3 inside the call:
def example(arg, add=0):
arg += add
....
return example(arg - 1, 3) # tail now too
Another technique would be to create a second function and have both call each other.
I don't know if python or C++ compilers can handle that though, but you can check assembly output for C++. Strangely I think checking bytecode output for python may be harder.