Unable to understand compilers' main optimizations

Unable to understand compilers' main optimizations - optimization

Martinus gives a good example of a code where compiler optimizes the code at run-time by calculating out multiplication:
Martinus code
int x = 0;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
x += x + x + x + x + x;
}
System.out.println(x);
His code after Constant Folding -compiler's optimization at compile-time (Thanks to Abelenky for pointing that out)
int x = 0;
for (int i = 0; i < 100000000000; ++i) {
x += x + x + x + x + x;
}
System.out.println(x);
This optimization technique seems to be a trivial in my opinion.
I guess that it may one of the techniques Sun started to keep trivial recently.
I am interested in two types of optimizations made by compilers:
optimizations which were omitted in today's compilers as trivial such as in Java's compiler at run-time
optimizations which are used by the majority of today's compilers
Please, put each optimization technique to a separate answer.
Which techniques have compilers used in 90s (1) and today (2)?

Just buy the latest edition of the Dragon Book.

How about loop unrolling?:
for (i = 0; i < 100; i++)
g ();
To:
for (i = 0; i < 100; i += 2)
{
g ();
g ();
}
From http://www.compileroptimizations.com/. They have many more - too many for an answer per technique.
Check out Trace Trees for a cool interpreter/just-in-time optimization.

The optimization shown in your example, of collapsing 100*1000*1000*1000 => 100000000000 is NOT a run-time optimization. It happens at compile-time. (and I wouldn't even call it an optimization)
I don't know of any optimizations that happen at run-time, unless you count VM engines that have JIT (just-in-time) compiling.
Optimizations that happen at compile-time are wide ranging, and frequently not simple to explain. But they can include in-lining small functions, re-arranging instructions for cache-locality, re-arranging instructions for better pipelining or hyperthreading, and many, many other techniques.
EDIT: Some F*ER edited my post... and then down-voted it. My original post clearly indicated that collapsing multiplication happens at COMPILE TIME, not RUN TIME, as the poster suggested. Then I mentioned I don't really consider collapsing constants to be much of an optimization. The pre-processor even does it.
Masi: if you want to answer the question, then answer the question. Do NOT edit other people's answers to put in words they never wrote.

Compiler books should provide a pretty good resource.
If this is obvious, please ignore it, but you're asking about low-level optimizations, the only ones that compilers can do. In non-toy programs, high-level optimizations are far more productive, but only the programmer can do them.

Related

Variable names: How to abbreviate "index"?

I like to keep the names of my variables short but readable.
Still, when, for example, naming the variable that holds the index of an element in some list, I tend to use elemIndex because I don't know a proper (and universally understood) way of abbreviating the word "index".
Is there a canonic way of abbreviating "index"? Or is it best to spell it in full to avoid misunderstandings?

In my experience it depends on the context. Typically I can tell if something is an index from what it is used for, so I am often more interested in knowing what it is an index of.
My rule of thumb goes roughly like this:
If it is just a loop index in a short loop (e.g.: all fits on screen at once) and the context informs the reader what the index is doing, then you can probably get away with something simple like i.
Example: thresholding an image
//For each pixel in the image, threshold it
for (int i = 0; i < height; i++ ) {
for (int j = 0; j < width; j++) {
if (image[i][j] < 128) {
image[i][j] = 0;
} else {
image[i][j] = 255;
}
}
}
If the code section is larger, or you have multiple indeces going on, indicate which list it is an index into:
File[] files_in_dir = ...;
int num_files = files_in_dir.length();
for (int fileIdx = 0; fileIdx < num_files; fileIdx++) { //for each file in dir.
...
}
If, however the index is actually important to the meaning of the code, then specify it fully, for example:
int imageToDeleteIdx = 3; //index of the image to be deleted.
image_list.delete(imageToDeleteIdx);
However code should be considered "write once, read many" and your effort should be allocated as such; i.e.: lots on the writing, so the reading is easy. To this end, as was mentioned by Brad M, never assume the reader understands your abbreviations. If you are going to use abbreviations, at least declare them in the comments.

Stick to established and well known conventions. If you use common conventions, people will have fewer surprises when they read your code.
Programmers are used to using a lot of conventions from mathematics. E.g. in mathematics we typically label indices:
i, j, k
While e.g. coordinates are referred to with letters such as:
x, y, z
This depends of course on context. E.g. using i to denote some global index would be a terrible idea. Use short names for very local variables and longer names for more global functions and variables, is a good rule of thumb.
For me this style was influenced by Rob Pike, who elaborates more on this here. As someone with an interest in user interface design and experience I've also written more extensively about this.

What's the limit of compiler optimization? How smart is it?

People keep telling me instead of writing "shift 1 bit to the left", just write "multiple by 2", because it's a lot more readable, and the compile will be smart enough to do the optimization.
What else would compiles generally do, and developers should not do (for code readability)? I always write string.length == 0 instead of string == "" because I read somewhere 5-6 years ago, saying numeric operations are much faster. Is this still true?
Or, would most compiler be smart enough to convert the following:
int result = 0;
for (int i = 0; i <= 100; i++)
{
result += i;
}
into: int result = 5050;?
What is your favourite "optimization" that you do, because most compiles won't do?

Algorithms: no compiler on the planet so far can choose a better algorithm for you. Too many people hastily jump to the rewrite-in-C part after they benchmark, when they should have really considered replacing the algorithm they're using in the first place.

Reference for relative operation expense

Does anyone know a good reference to help with understanding the relative cost of operations like copying variables, declaring new variables, FileIO, array operations, etc? I've been told to study decompilation and machine code but a quick reference would be nice. For example, something to tell me how much worse
for(int i = 0; i < 100; i++){
new double d = 7.65;
calc(d);
}
is than
double d = 7.65;
for(int i = 0; i < 100; i++){
calc(d);
}

Here is a nice paper by Felix von Leitner on the state of C compiler optimization. I learned of it on this Lambda the Ultimate page.
The performance of operations you mention such as file I/O, memory access, and computation are highly dependent on a computer's architecture. Much of the optimization of software for today's desktop computers is focused on cache memory.
You would gain much from an architecture book or course. Here's a good example from CMU.

What are you favorite low level code optimization tricks? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I know that you should only optimize things when it is deemed necessary. But, if it is deemed necessary, what are your favorite low level (as opposed to algorithmic level) optimization tricks.
For example: loop unrolling.

gcc -O2
Compilers do a lot better job of it than you can.

Picking a power of two for filters, circular buffers, etc.
So very, very convenient.
-Adam

Why, bit twiddling hacks, of course!

One of the most useful in scientific code is to replace pow(x,4) with x*x*x*x. Pow is almost always more expensive than multiplication. This is followed by
for(int i = 0; i < N; i++)
{
z += x/y;
}
to
double denom = 1/y;
for(int i = 0; i < N; i++)
{
z += x*denom;
}
But my favorite low level optimization is to figure out which calculations can be removed from a loop. Its always faster to do the calculation once rather than N times. Depending on your compiler, some of these may be automatically done for you.

Inspect the compiler's output, then try to coerce it to do something faster.

I wouldn't necessarily call it a low level optimization, but I have saved orders of magnitude more cycles through judicious application of caching than I have through all my applications of low level tricks combined. Many of these methods are applications specific.
Having an LRU cache of database queries (or any other IPC based request).
Remembering the last failed database query and returning a failure if re-requested within a certain time frame.
Remembering your location in a large data structure to ensure that if the next request is for the same node, the search is free.
Caching calculation results to prevent duplicate work. In addition to more complex scenarios, this is often found in if or for statements.
CPUs and compilers are constantly changing. Whatever low level code trick that made sense 3 CPU chips ago with a different compiler may actually be slower on the current architecture and there may be a good chance that this trick may confuse whoever is maintaining this code in the future.

++i can be faster than i++, because it avoids creating a temporary.
Whether this still holds for modern C/C++/Java/C# compilers, I don't know. It might well be different for user-defined types with overloaded operators, whereas in the case of simple integers it probably doesn't matter.
But I've come to like the syntax... it reads like "increment i" which is a sensible order.

Using template metaprogramming to calculate things at compile time instead of at run-time.

Years ago with a not-so-smart compilier, I got great mileage from function inlining, walking pointers instead of indexing arrays, and iterating down to zero instead of up to a maximum.
When in doubt, a little knowledge of assembly will let you look at what the compiler is producing and attack the inefficient parts (in your source language, using structures friendlier to your compiler.)

precalculating values.
For instance, instead of sin(a) or cos(a), if your application doesn't necessarily need angles to be very precise, maybe you represent angles in 1/256 of a circle, and create arrays of floats sine[] and cosine[] precalculating the sin and cos of those angles.
And, if you need a vector at some angle of a given length frequently, you might precalculate all those sines and cosines already multiplied by that length.
Or, to put it more generally, trade memory for speed.
Or, even more generally, "All programming is an exercise in caching" -- Terje Mathisen
Some things are less obvious. For instance traversing a two dimensional array, you might do something like
for (x=0;x<maxx;x++)
for (y=0;y<maxy;y++)
do_something(a[x,y]);
You might find the processor cache likes it better if you do:
for (y=0;y<maxy;y++)
for (x=0;x<maxx;x++)
do_something(a[x,y]);
or vice versa.

Don't do loop unrolling. Don't do Duff's device. Make your loops as small as possible, anything else inhibits x86 performance and gcc optimizer performance.
Getting rid of branches can be useful, though - so getting rid of loops completely is good, and those branchless math tricks really do work. Beyond that, try never to go out of the L2 cache - this means a lot of precalculation/caching should also be avoided if it wastes cache space.
And, especially for x86, try to keep the number of variables in use at any one time down. It's hard to tell what compilers will do with that kind of thing, but usually having less loop iteration variables/array indexes will end up with better asm output.
Of course, this is for desktop CPUs; a slow CPU with fast memory access can precalculate a lot more, but in these days that might be an embedded system with little total memory anyway…

I've found that changing from a pointer to indexed access may make a difference; the compiler has different instruction forms and register usages to choose from. Vice versa, too. This is extremely low-level and compiler dependent, though, and only good when you need that last few percent.
E.g.
for (i = 0; i < n; ++i)
*p++ = ...; // some complicated expression
vs.
for (i = 0; i < n; ++i)
p[i] = ...; // some complicated expression

Optimizing cache locality - for example when multiplying two matrices that don't fit into cache.

Allocating with new on a pre-allocated buffer using C++'s placement new.

Counting down a loop. It's cheaper to compare against 0 than N:
for (i = N; --i >= 0; ) ...
Shifting and masking by powers of two is cheaper than division and remainder, / and %
#define WORD_LOG 5
#define SIZE (1 << WORD_LOG)
#define MASK (SIZE - 1)
uint32_t bits[K]
void set_bit(unsigned i)
{
bits[i >> WORD_LOG] |= (1 << (i & MASK))
}
Edit
(i >> WORD_LOG) == (i / SIZE) and
(i & MASK) == (i % SIZE)
because SIZE is 32 or 2^5.

Jon Bentley's Writing Efficient Programs is a great source of low- and high-level techniques -- if you can find a copy.

Eliminating branches (if/elses) by using boolean math:
if(x == 0)
x = 5;
// becomes:
x += (x == 0) * 5;
// if '5' was a base 2 number, let's say 4:
x += (x == 0) << 2;
// divide by 2 if flag is set
sum >>= (blendMode == BLEND);
This REALLY speeds things out especially when those ifs are in a loop or somewhere that is being called a lot.

The one from Assembler:
xor ax, ax
instead of:
mov ax, 0
Classical optimization for program size and performance.

In SQL, if you only need to know whether any data exists or not, don't bother with COUNT(*):
SELECT 1 FROM table WHERE some_primary_key = some_value
If your WHERE clause is likely return multiple rows, add a LIMIT 1 too.
(Remember that databases can't see what your code's doing with their results, so they can't optimise these things away on their own!)

Recycling the frame-pointer all of a sudden
Pascal calling-convention
Rewrite stack-frame tail call optimizarion (although it sometimes messes with the above)
Using vfork() instead of fork() before exec()
And one I am still looking for, an excuse to use: data driven code-generation at runtime

Liberal use of __restrict to eliminate load-hit-store stalls.

Rolling up loops.
Seriously, the last time I needed to do anything like this was in a function that took 80% of the runtime, so it was worth trying to micro-optimize if I could get a noticeable performance increase.
The first thing I did was to roll up the loop. This gave me a very significant speed increase. I believe this was a matter of cache locality.
The next thing I did was add a layer of indirection, and put some more logic into the loop, which allowed me to only loop through the things I needed. This wasn't as much of a speed increase, but it was worth doing.
If you're going to micro-optimize, you need to have a reasonable idea of two things: the architecture you're actually using (which is vastly different from the systems I grew up with, at least for micro-optimization purposes), and what the compiler will do for you.
A lot of the traditional micro-optimizations trade space for time. Nowadays, using more space increases the chances of a cache miss, and there goes your performance. Moreover, a lot of them are now done by modern compilers, and typically better than you're likely to do them.
Currently, you should (a) profile to see if you need to micro-optimize, and then (b) try to trade computation for space, in the hope of keeping as much as possible in cache. Finally, run some tests, so you know if you've improved things or screwed them up. Modern compilers and chips are far too complex for you to keep a good mental model, and the only way you'll know if some optimization works or not is to test.

In addition to Joshua's comment about code generation (a big win), and other good suggestions, ...
I'm not sure if you would call it "low-level", but (and this is downvote-bait) 1) stay away from using any more levels of abstraction than absolutely necessary, and 2) stay away from event-driven notification-style programming, if possible.
If a computer executing a program is like a car running a race, a method call is like a detour. That's not necessarily bad except there's a strong temptation to nest those things, because once you're written a method call, you tend to forget what that call could cost you.
If your're relying on events and notifications, it's because you have multiple data structures that need to be kept in agreement. This is costly, and should only be done if you can't avoid it.
In my experience, the biggest performance killers are too much data structure and too much abstraction.

I was amazed at the speedup I got by replacing a for loop adding numbers together in structs:
const unsigned long SIZE = 100000000;
typedef struct {
int a;
int b;
int result;
} addition;
addition *sum;
void start() {
unsigned int byte_count = SIZE * sizeof(addition);
sum = malloc(byte_count);
unsigned int i = 0;
if (i < SIZE) {
do {
sum[i].a = i;
sum[i].b = i;
i++;
} while (i < SIZE);
}
}
void test_func() {
unsigned int i = 0;
if (i < SIZE) { // this is about 30% faster than the more obvious for loop, even with O3
do {
addition *s1 = &sum[i];
s1->result = s1->b + s1->a;
i++;
} while ( i<SIZE );
}
}
void finish() {
free(sum);
}
Why doesn't gcc optimise for loops into this? Or is there something I missed? Some cache effect?

Is a variable named i unacceptable? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
As far as variable naming conventions go, should iterators be named i or something more semantic like count? If you don't use i, why not? If you feel that i is acceptable, are there cases of iteration where it shouldn't be used?

Depends on the context I suppose. If you where looping through a set of Objects in some
collection then it should be fairly obvious from the context what you are doing.
for(int i = 0; i < 10; i++)
{
// i is well known here to be the index
objectCollection[i].SomeProperty = someValue;
}
However if it is not immediately clear from the context what it is you are doing, or if you are making modifications to the index you should use a variable name that is more indicative of the usage.
for(int currentRow = 0; currentRow < numRows; currentRow++)
{
for(int currentCol = 0; currentCol < numCols; currentCol++)
{
someTable[currentRow][currentCol] = someValue;
}
}

"i" means "loop counter" to a programmer. There's nothing wrong with it.

Here's another example of something that's perfectly okay:
foreach (Product p in ProductList)
{
// Do something with p
}

I tend to use i, j, k for very localized loops (only exist for a short period in terms of number of source lines). For variables that exist over a larger source area, I tend to use more detailed names so I can see what they're for without searching back in the code.
By the way, I think that the naming convention for these came from the early Fortran language where I was the first integer variable (A - H were floats)?

i is acceptable, for certain. However, I learned a tremendous amount one semester from a C++ teacher I had who refused code that did not have a descriptive name for every single variable. The simple act of naming everything descriptively forced me to think harder about my code, and I wrote better programs after that course, not from learning C++, but from learning to name everything. Code Complete has some good words on this same topic.

i is fine, but something like this is not:
for (int i = 0; i < 10; i++)
{
for (int j = 0; j < 10; j++)
{
string s = datarow[i][j].ToString(); // or worse
}
}
Very common for programmers to inadvertently swap the i and the j in the code, especially if they have bad eyesight or their Windows theme is "hotdog". This is always a "code smell" for me - it's kind of rare when this doesn't get screwed up.

i is so common that it is acceptable, even for people that love descriptive variable names.
What is absolutely unacceptable (and a sin in my book) is using i,j, or k in any other context than as an integer index in a loop.... e.g.
foreach(Input i in inputs)
{
Process(i);
}

i is definitely acceptable. Not sure what kind of justification I need to make -- but I do use it all of the time, and other very respected programmers do as well.
Social validation, I guess :)

Yes, in fact it's preferred since any programmer reading your code will understand that it's simply an iterator.

What is the value of using i instead of a more specific variable name? To save 1 second or 10 seconds or maybe, maybe, even 30 seconds of thinking and typing?
What is the cost of using i? Maybe nothing. Maybe the code is so simple that using i is fine. But maybe, maybe, using i will force developers who come to this code in the future to have to think for a moment "what does i mean here?" They will have to think: "is it an index, a count, an offset, a flag?" They will have to think: "is this change safe, is it correct, will I be off by 1?"
Using i saves time and intellectual effort when writing code but may end up costing more intellectual effort in the future, or perhaps even result in the inadvertent introduction of defects due to misunderstanding the code.
Generally speaking, most software development is maintenance and extension, so the amount of time spent reading your code will vastly exceed the amount of time spent writing it.
It's very easy to develop the habit of using meaningful names everywhere, and once you have that habit it takes only a few seconds more to write code with meaningful names, but then you have code which is easier to read, easier to understand, and more obviously correct.

I use i for short loops.
The reason it's OK is that I find it utterly implausible that someone could see a declaration of iterator type, with initializer, and then three lines later claim that it's not clear what the variable represents. They're just pretending, because they've decided that "meaningful variable names" must mean "long variable names".
The reason I actually do it, is that I find that using something unrelated to the specific task at hand, and that I would only ever use in a small scope, saves me worrying that I might use a name that's misleading, or ambiguous, or will some day be useful for something else in the larger scope. The reason it's "i" rather than "q" or "count" is just convention borrowed from mathematics.
I don't use i if:
The loop body is not small, or
the iterator does anything other than advance (or retreat) from the start of a range to the finish of the loop:
i doesn't necessarily have to go in increments of 1 so long as the increment is consistent and clear, and of course might stop before the end of the iterand, but if it ever changes direction, or is unmodified by an iteration of the loop (including the devilish use of iterator.insertAfter() in a forward loop), I try to remember to use something different. This signals "this is not just a trivial loop variable, hence this may not be a trivial loop".

If the "something more semantic" is "iterator" then there is no reason not to use i; it is a well understood idiom.

i think i is completely acceptable in for-loop situations. i have always found this to be pretty standard and never really run into interpretation issues when i is used in this instance. foreach-loops get a little trickier and i think really depends on your situation. i rarely if ever use i in foreach, only in for loops, as i find i to be too un-descriptive in these cases. for foreach i try to use an abbreviation of the object type being looped. e.g:
foreach(DataRow dr in datatable.Rows)
{
//do stuff to/with datarow dr here
}
anyways, just my $0.02.

It helps if you name it something that describes what it is looping through. But I usually just use i.

As long as you are either using i to count loops, or part of an index that goes from 0 (or 1 depending on PL) to n, then I would say i is fine.
Otherwise its probably easy to name i something meaningful it its more than just an index.

I should point out that i and j are also mathematical notation for matrix indices. And usually, you're looping over an array. So it makes sense.

As long as you're using it temporarily inside a simple loop and it's obvious what you're doing, sure. That said, is there no other short word you can use instead?
i is widely known as a loop iterator, so you're actually more likely to confuse maintenance programmers if you use it outside of a loop, but if you use something more descriptive (like filecounter), it makes code nicer.

It depends.
If you're iterating over some particular set of data then I think it makes more sense to use a descriptive name. (eg. filecounter as Dan suggested).
However, if you're performing an arbitrary loop then i is acceptable. As one work mate described it to me - i is a convention that means "this variable is only ever modified by the for loop construct. If that's not true, don't use i"

The use of i, j, k for INTEGER loop counters goes back to the early days of FORTRAN.
Personally I don't have a problem with them so long as they are INTEGER counts.
But then I grew up on FORTRAN!

my feeling is that the concept of using a single letter is fine for "simple" loops, however, i learned to use double-letters a long time ago and it has worked out great.
i asked a similar question last week and the following is part of my own answer:// recommended style ● // "typical" single-letter style
●
for (ii=0; ii<10; ++ii) { ● for (i=0; i<10; ++i) {
for (jj=0; jj<10; ++jj) { ● for (j=0; j<10; ++j) {
mm[ii][jj] = ii * jj; ● m[i][j] = i * j;
} ● }
} ● }
in case the benefit isn't immediately obvious: searching through code for any single letter will find many things that aren't what you're looking for. the letter i occurs quite often in code where it isn't the variable you're looking for.
i've been doing it this way for at least 10 years.
note that plenty of people commented that either/both of the above are "ugly"...

I am going to go against the grain and say no.
For the crowd that says "i is understood as an iterator", that may be true, but to me that is the equivalent of comments like 'Assign the value 5 to variable Y. Variable names like comment should explain the why/what not the how.
To use an example from a previous answer:
for(int i = 0; i < 10; i++)
{
// i is well known here to be the index
objectCollection[i].SomeProperty = someValue;
}
Is it that much harder to just use a meaningful name like so?
for(int objectCollectionIndex = 0; objectCollectionIndex < 10; objectCollectionIndex ++)
{
objectCollection[objectCollectionIndex].SomeProperty = someValue;
}
Granted the (borrowed) variable name objectCollection is pretty badly named too.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas