Why would one ever want to compile with -O2 instead of -O3

Why would one ever want to compile with -O2 instead of -O3 - optimization

We usually compile with -O2 because -O3 would "trigger subtle bugs".
For our GCC version -O3 enables more aggressive inlining which would actually reveal bugs otherwise unnoticed (e.g. use of uninitialized values from functions taking them as reference arguments or out-of-bounds access for arrays). It seems to me this aggressive inlining also allows a more expressive way of coding with smaller functions and -funswitch-loops helps keeping variable definitions more local in loops.
Given that bugs in our code are orders of magnitude more likely than compiler bugs and that we use -Wall -Wextra without any issues what kind of bugs should we be looking for?
If it matters we use gcc-4.3.2. Compile time is not a major issue for us.

Size. Of course if size does really matters (sometimes is does, like embedded), one would use -Os. But main difference at O3 is the (from you already mentioned) inlining. This can increase the generated code size (but it is faster). Maybe you want speed, but not at all (space) cost? Otherwise I would see no reason why not to use O3 (except you know of a gcc compiler bug that only occurs in your code at O3, but as long as you dont have an error, you cant reproduce at O2, I would not care).

Don't kid yourself that compiler bugs aren't lurking out there to make your life hell. Here's a nasty one which cropped up in Debian last year, and where the fix was to fall back to -O2.

Sometimes aggressive optimisation can break code just like you mentioned. If this is a project you are currently working on, then perhaps this is not a problem. However, if the code in question is legacy code that is fragile, poorly written, and not well-understood, then you want to take as few chances as possible.
Also, not all optimisations are formally proven. That means that they may alter the behaviour of programs in undesirable ways.
The best example I can think of is a Java one, but it should illustrate my point about optimisations in general.
It is common to have code like this
while( keepGoing ){
doStuff();
}
Then value of keepGoing gets modified by another thread. Well one optimisation that the JVM will do, is see that keepGoing is not modified within the body of the loop, so it "elevates" it and checks before the loop, essentially transforming the code into:
if( keepGoing ){
while( true ){
doStuff();
}
}
Which in a multi-threaded environment is not the same thing, but in a single-threaded it is. These are the kinds of things that can break with optimisations. This is a frequent source of "Heisenbugs".
PS- In Java the proper answer is the make keepGoing "volatile" so it cannot presume cached values and would do what you intend.

Related

Will Fortran compilers completely remove always false if-blocks when optimizing?

If i have the following declaration in my program:
logical, parameter :: verbose = .false.
will adding a bunch of things such as
if (verbose) write(*,*) "Information here"
affect the performance at all when compiling with "-03"?
I would hope the compiler would recognize that the blocks are always false and thus completely remove them, so I can feel free to add debug-prints all over. Is this the case?
I guess this may be compiler dependent, but was hoping that there is a single answer for the most common compilers. If not, what is the behavior of gfortran?
Thanks in advance for any help.

Following the good advice of the commenters above, I tested this myself.
It turns of that with gfortran, even optimization level -O0 appears to completely remove the dead write-blocks.

Test performance of two stuffs, which flags should I use (with gcc) ? -O0, -O2, or -g?

When I write a routine to test the performance of two stuffs, which optimization flags should I use? -O0, -O2, or -g ?

You should test the performance of your code using each of the settings. Ideally the larger the number -O0, -O1, -O2, -O3, implies better performance as there is more/better optimization, but that is not always the case.
Likewise depending on how your code is written some of it may be removed in a way that you didnt expect from the language or the compiler or both. So not only do you need to test the performance of your code, you need to actually test the program generated from your code to see that it does what you think it does.
There is definitely not one optimization setting that provides the best performance for any code that can be compiled by that compiler. You have to test the settings and compiler on a particular system to verify that for that system the code does indeed run faster. How you test that performance is filled with many traps and other error producing problems that you can easily misunderstand the results. So you have to be careful in how you test your performance.
For gcc folks usually say -O3 is risky to use and -O2 is the best performance/safe. And for the most part that is the case -O2 is used enough to get many bugs flushed out. -O2 does not always produce the fastest code but it generally produces faster code that -O0 and -O1. Use of debuggers can defeat the optimization or remove it all together, so never test for performance with a debugger based build or using a debugger. Test on the system as the user would use the system, if the user uses a debugger when they run your program then test that way, otherwise dont.

In GCC -O0 disables compiler code optimizations at all. -g adds debugging info to executable so you can use debugger.
If you want to enable speed optimizations use flags -O1 or -O2. See man gcc(1) for more information.
If you want to measure performance of your code use profiler such as valgrind or gprof.

Actually, if you care about performance you should definitely use -O3. Why give away potential optimisations?
And yes, there’s a small but measurable difference between -O2 and -O3.
-g is not an optimisation flag but it can prevent optimisations so it must be disabled for representative benchmarks.

Does compilation with -g result in slower code?

I am using a package that is compiled using gcc -O3 -g.
Since some function calls to that code are the slowest part of my program I am wondering if the -g could be the culprit? Or should it not matter in terms of runtime?

Since -O3 implies aggressive inlining, and -g implies avoiding inlining so that the debugger can have function addresses, those options are somewhat at odds. Nevertheless in general -O3 wins, and aside from a somewhat larger binary -- and the minor speed effects that might come from paging or nonlocality -- it should not make much of a difference.

-g will make your code bigger (added space for debug symbols) and will disable some optimizations like inlining, but probably not appreciably slower.

If your real question is "Why is it slow?"
there's an easy way to find out.

gfortran optimization causes fortran do-variable loop error during runtime

I have written a fortran routine that uses some legacy fortran 77 code for finite elements. However, with a particular mesh, when the -O optimization flag is turned on, an important do-loop iterator is somehow being modified, even though fortran supposedly prohibits this. I have compiled this code using gfortran4.5 with the -fcheck=do run-time checking enabled and it has verifies what I've noted above. A runtime error occurs, only when optimizations are turned on and points directly to the do-iterator.
Using gdb on optimized code seems (while it seems erratic - lines bouncing back and forth) seems to clearly indicate that the do-iterator somehow gets set back to zero, and essentially this causes a nice infinite loop.
Any suggestions as to how to hunt down and fix whatever is causing this bug would be greatly appreciated, as I'd like to make sure the whole project can be consistently compiled with the same flags.

You say that you use fcheck=do; why not go all the way and use fcheck=all? What you're seeing sounds like a typical case of memory corruption due to an array bounds violation, which fcheck=all can in some cases catch. Where the array bounds checking doesn't work that well is with implicit interfaces and incorrect bounds being passed; a solution here is to put your procedures into modules, allowing the compiler to check interfaces.
And, like Jonathan Dursi said, consider using a tool like valgrind.

Any Macro or Technic for Part Optimization?

I am working on lock free structure with g++ compiler. It seems that with -o1 switch, g++ will change the execution order of my code. How can I forbid g++'s optimization on certain part of my code while maintain the optimization to other part? I know I can split it to two files and link them, but it looks ugly.

If you find that gcc changes the order of execution in your code, you should consider using a memory barrier. Just don't assume that volatile variables will protect you from that issue. They will only make sure that in a single thread, the behavior is what the language guarantees, and will always read variables from their memory location to account for changes "invisible" to the executing code. (e.g changes to a variable done by a signal handler).
GCC supports OpenMP since version 4.2. You can use it to create a memory barrier with a special #pragma directive.
A very good insight about locking free code is this PDF by Herb Sutter and Andrei Alexandrescu: C++ and the Perils of Double-Checked Locking

You can use a function attribute "__attribute__ ((optimize 0))" to set the optimization for a single function, or "#pragma GCC optimize" for a block of code. These are only for GCC 4.4, though, I think - check your GCC manual. If they aren't supported, separation of the source is your only option.
I would also say, though, that if your code fails with optimization turned on, it is most likely that your code is just wrong, especially as you're trying to do something that is fundamentally very difficult. The processor will potentially perform reordering on your code (within the limits of sequential consistency) so any re-ordering that you're getting with GCC could potentially occur anyway.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas