Is a variable named i unacceptable? [closed] - variables

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
As far as variable naming conventions go, should iterators be named i or something more semantic like count? If you don't use i, why not? If you feel that i is acceptable, are there cases of iteration where it shouldn't be used?

Depends on the context I suppose. If you where looping through a set of Objects in some
collection then it should be fairly obvious from the context what you are doing.
for(int i = 0; i < 10; i++)
{
// i is well known here to be the index
objectCollection[i].SomeProperty = someValue;
}
However if it is not immediately clear from the context what it is you are doing, or if you are making modifications to the index you should use a variable name that is more indicative of the usage.
for(int currentRow = 0; currentRow < numRows; currentRow++)
{
for(int currentCol = 0; currentCol < numCols; currentCol++)
{
someTable[currentRow][currentCol] = someValue;
}
}

"i" means "loop counter" to a programmer. There's nothing wrong with it.

Here's another example of something that's perfectly okay:
foreach (Product p in ProductList)
{
// Do something with p
}

I tend to use i, j, k for very localized loops (only exist for a short period in terms of number of source lines). For variables that exist over a larger source area, I tend to use more detailed names so I can see what they're for without searching back in the code.
By the way, I think that the naming convention for these came from the early Fortran language where I was the first integer variable (A - H were floats)?

i is acceptable, for certain. However, I learned a tremendous amount one semester from a C++ teacher I had who refused code that did not have a descriptive name for every single variable. The simple act of naming everything descriptively forced me to think harder about my code, and I wrote better programs after that course, not from learning C++, but from learning to name everything. Code Complete has some good words on this same topic.

i is fine, but something like this is not:
for (int i = 0; i < 10; i++)
{
for (int j = 0; j < 10; j++)
{
string s = datarow[i][j].ToString(); // or worse
}
}
Very common for programmers to inadvertently swap the i and the j in the code, especially if they have bad eyesight or their Windows theme is "hotdog". This is always a "code smell" for me - it's kind of rare when this doesn't get screwed up.

i is so common that it is acceptable, even for people that love descriptive variable names.
What is absolutely unacceptable (and a sin in my book) is using i,j, or k in any other context than as an integer index in a loop.... e.g.
foreach(Input i in inputs)
{
Process(i);
}

i is definitely acceptable. Not sure what kind of justification I need to make -- but I do use it all of the time, and other very respected programmers do as well.
Social validation, I guess :)

Yes, in fact it's preferred since any programmer reading your code will understand that it's simply an iterator.

What is the value of using i instead of a more specific variable name? To save 1 second or 10 seconds or maybe, maybe, even 30 seconds of thinking and typing?
What is the cost of using i? Maybe nothing. Maybe the code is so simple that using i is fine. But maybe, maybe, using i will force developers who come to this code in the future to have to think for a moment "what does i mean here?" They will have to think: "is it an index, a count, an offset, a flag?" They will have to think: "is this change safe, is it correct, will I be off by 1?"
Using i saves time and intellectual effort when writing code but may end up costing more intellectual effort in the future, or perhaps even result in the inadvertent introduction of defects due to misunderstanding the code.
Generally speaking, most software development is maintenance and extension, so the amount of time spent reading your code will vastly exceed the amount of time spent writing it.
It's very easy to develop the habit of using meaningful names everywhere, and once you have that habit it takes only a few seconds more to write code with meaningful names, but then you have code which is easier to read, easier to understand, and more obviously correct.

I use i for short loops.
The reason it's OK is that I find it utterly implausible that someone could see a declaration of iterator type, with initializer, and then three lines later claim that it's not clear what the variable represents. They're just pretending, because they've decided that "meaningful variable names" must mean "long variable names".
The reason I actually do it, is that I find that using something unrelated to the specific task at hand, and that I would only ever use in a small scope, saves me worrying that I might use a name that's misleading, or ambiguous, or will some day be useful for something else in the larger scope. The reason it's "i" rather than "q" or "count" is just convention borrowed from mathematics.
I don't use i if:
The loop body is not small, or
the iterator does anything other than advance (or retreat) from the start of a range to the finish of the loop:
i doesn't necessarily have to go in increments of 1 so long as the increment is consistent and clear, and of course might stop before the end of the iterand, but if it ever changes direction, or is unmodified by an iteration of the loop (including the devilish use of iterator.insertAfter() in a forward loop), I try to remember to use something different. This signals "this is not just a trivial loop variable, hence this may not be a trivial loop".

If the "something more semantic" is "iterator" then there is no reason not to use i; it is a well understood idiom.

i think i is completely acceptable in for-loop situations. i have always found this to be pretty standard and never really run into interpretation issues when i is used in this instance. foreach-loops get a little trickier and i think really depends on your situation. i rarely if ever use i in foreach, only in for loops, as i find i to be too un-descriptive in these cases. for foreach i try to use an abbreviation of the object type being looped. e.g:
foreach(DataRow dr in datatable.Rows)
{
//do stuff to/with datarow dr here
}
anyways, just my $0.02.

It helps if you name it something that describes what it is looping through. But I usually just use i.

As long as you are either using i to count loops, or part of an index that goes from 0 (or 1 depending on PL) to n, then I would say i is fine.
Otherwise its probably easy to name i something meaningful it its more than just an index.

I should point out that i and j are also mathematical notation for matrix indices. And usually, you're looping over an array. So it makes sense.

As long as you're using it temporarily inside a simple loop and it's obvious what you're doing, sure. That said, is there no other short word you can use instead?
i is widely known as a loop iterator, so you're actually more likely to confuse maintenance programmers if you use it outside of a loop, but if you use something more descriptive (like filecounter), it makes code nicer.

It depends.
If you're iterating over some particular set of data then I think it makes more sense to use a descriptive name. (eg. filecounter as Dan suggested).
However, if you're performing an arbitrary loop then i is acceptable. As one work mate described it to me - i is a convention that means "this variable is only ever modified by the for loop construct. If that's not true, don't use i"

The use of i, j, k for INTEGER loop counters goes back to the early days of FORTRAN.
Personally I don't have a problem with them so long as they are INTEGER counts.
But then I grew up on FORTRAN!

my feeling is that the concept of using a single letter is fine for "simple" loops, however, i learned to use double-letters a long time ago and it has worked out great.
i asked a similar question last week and the following is part of my own answer:// recommended style ● // "typical" single-letter style
●
for (ii=0; ii<10; ++ii) { ● for (i=0; i<10; ++i) {
for (jj=0; jj<10; ++jj) { ● for (j=0; j<10; ++j) {
mm[ii][jj] = ii * jj; ● m[i][j] = i * j;
} ● }
} ● }
in case the benefit isn't immediately obvious: searching through code for any single letter will find many things that aren't what you're looking for. the letter i occurs quite often in code where it isn't the variable you're looking for.
i've been doing it this way for at least 10 years.
note that plenty of people commented that either/both of the above are "ugly"...

I am going to go against the grain and say no.
For the crowd that says "i is understood as an iterator", that may be true, but to me that is the equivalent of comments like 'Assign the value 5 to variable Y. Variable names like comment should explain the why/what not the how.
To use an example from a previous answer:
for(int i = 0; i < 10; i++)
{
// i is well known here to be the index
objectCollection[i].SomeProperty = someValue;
}
Is it that much harder to just use a meaningful name like so?
for(int objectCollectionIndex = 0; objectCollectionIndex < 10; objectCollectionIndex ++)
{
objectCollection[objectCollectionIndex].SomeProperty = someValue;
}
Granted the (borrowed) variable name objectCollection is pretty badly named too.

Related

Variable Declaration Inside For Loop Initialization Statement

I have a simple question with regards to initializing for loops.
Here is my for loop declaration:
for (int i=player.x-xIndex-1; i<=player.x+xIndex+1; i++)
{
for (int j=player.y-yIndex-1; j<=player.y+yIndex+1; j++)
{
}
}
My question is:
Is it bad practice to have the values of the indices i and j be set to non-static integer values at declaration?
Will the code just evaluate the minimum and maximum values of i and j once at beginning of execution, or will it evaluate those values (i.e. player.x+xIndex+1, etc.) every single time the loop executes.
Any light you guys can shed on my problem would be awesome!
I'm a freakin' amateur, guys. Seriously.
Thanks :D
Not an amateur question at all. The "initialization" expressions are calculated only on the first run through, because of course they're only used that one time.
For the loop's "condition" (the middle expression that is tested at the end of every iteration), in the worst case it can be evaluated every iteration. Because what if (in this case) player.y actually changes during the loop?
However, most modern compilers will likely not compute that whole thing every loop if they can detect that the end value is provably never changing during the loop.
If you wanted to be double sure and manhandle the path of execution, you can explicitly "hoist" the conditional end expression out of the loop yourself, like:
int maxValue = foo.x + y.bar + 12 + myString.length;
for (int i = 0; i < maxValue; i++) {
....
But now the standard style disclaimer: optimizing prematurely can make your code less readable for no provable gains. Unless you're doing real work in that condition expression, or the loop is running bazillions of iterations, some additional computation won't hurt you much, and might be worth keeping so that it's clearer to yourself and others what you're trying to do.

What naming conventions should I use on the second integer on a nested for loop?

I'm pretty new to programming, and I was just wondering in the following case what would be an appropriate name for the second integer I use in this piece of code
for (int i = 0; i < 10; i++)
{
for (int x = 0; x < 10; x++)
{
//stuff
}
}
I usually just name it x but I have a feeling that this could get confusing quickly. Is there a standard name for this kind of thing?
Depending upon what you're iterating over, a name might be easy or obvious by context:
for(struct mail *mail=inbox->start; mail ; mailid++) {
for (struct attachment *att=mail->attachment[0]; att; att++) {
/* work on all attachments on all mails */
}
}
For the cases where i makes the most sense for an outer loop variable, convention uses j, k, l, and so on.
But when you start nesting, look harder for meaningful names. You'll thank yourself in six months.
You could opt to reduce the nesting by making a method call. Inside of this method, you would be using a local variable also named i.
for (int i = 0; i < 10; i++)
{
methodCall(array[i], array);
}
I have assumed you need to pass the element at position i in the outer loop as well as the array to be iterated over in the inner loop - this is an assumption as you may actually require different arguments.
As always, you should measure the performance of this - there shouldn't be a massive overhead in making a method call within a loop, but this depends on the language.
Personally I feel that you should give variables meaningful names - here i and x mean nothing and will not help you understand your code in 3 months time, at which point it will appear to you as code written by a dyslexic monkey.
Name variables so that other people can understand what your code is trying to accomplish. You will save yourself time in the long run.
Since you said you are beginning, I'd say it's beneficial to experiment with multiple styles.
For the purposes of your example, my suggestion is simply replace x with j.
There's tons of real code that will use the convention of i, j, and k for single letter nested loop variables.
There's also tons that uses longer more meaningful names.
But there's much less that looks like your example.
So you can consider it a step forward because you're code looks more like real world code.

Recycling variable name within single function

I have a function that contains two for loops, and I'm using a variable called count as the counter. I've chosen to recycle the name as the the first loop will finish it's execution completely before the second one begins, so there is no chance of the counters interfering with each other. The G++ compiler has taken exception to this via the following warning:
error: name lookup of ‘count’ changed for ISO ‘for’ scoping
note: (if you use ‘-fpermissive’ G++ will accept your code)
Is variable recycling considered bad practice in professional software development, or is it a situational concern, and what other implications have I missed here?
Are you doing this?
for(int count = 0; ...)
{
...
}
for(count = 0; ...)
{
...
}
I doubt gcc would like that, as the second count isn't in scope. I think it only applies to the first for loop, but gcc has options to accept poor code. If you either make the second int count or move the first to the outer scope, gcc should be happy.
This depends on the circumstances, but I generally don't reuse variables. The name of the variable should reflect its purpose, and switching part way through a function can be confusing. Declare what you need, let the compiler take care of the optimizations.
Steve McConnell recommends not reusing local variables in functions in Code Complete.
He's not the definitive voice of practice in professional software development, but he's about as close as you're going to get to a definitive voice.
The argument is that it makes it harder to read the code.
What are you counting? Name the variables after that.
It sounds like you're defining the variable in the for? i.e. "for (int count=0; count++; count < x)"? If so, that could be problematic, as well as unclear. If you're going to use it in a second for loop define it outside both loops.
If you're using a loop counter variables like this, then it usually doesn't matter.
for (int i ...; ... ; ...) {
...
}
for (int i ...; ... ; ...) {
...
}
however, if you're intending to shadow another variable:
int i ...;
for (int i ...; ... ; ...) {
...
}
that's a red flag.

Is while (true) with break bad programming practice?

I often use this code pattern:
while(true) {
//do something
if(<some condition>) {
break;
}
}
Another programmer told me that this was bad practice and that I should replace it with the more standard:
while(!<some condition>) {
//do something
}
His reasoning was that you could "forget the break" too easily and have an endless loop. I told him that in the second example you could just as easily put in a condition which never returned true and so just as easily have an endless loop, so both are equally valid practices.
Further, I often prefer the former as it makes the code easier to read when you have multiple break points, i.e. multiple conditions which get out of the loop.
Can anyone enrichen this argument by adding evidence for one side or the other?
There is a discrepancy between the two examples. The first will execute the "do something" at least once every time even if the statement is never true. The second will only "do something" when the statement evaluates to true.
I think what you are looking for is a do-while loop. I 100% agree that while (true) is not a good idea because it makes it hard to maintain this code and the way you are escaping the loop is very goto esque which is considered bad practice.
Try:
do {
//do something
} while (!something);
Check your individual language documentation for the exact syntax. But look at this code, it basically does what is in the do, then checks the while portion to see if it should do it again.
To quote that noted developer of days gone by, Wordsworth:
...
In truth the prison, unto which we doom
Ourselves, no prison is; and hence for me,
In sundry moods, 'twas pastime to be bound
Within the Sonnet's scanty plot of ground;
Pleased if some souls (for such their needs must be)
Who have felt the weight of too much liberty,
Should find brief solace there, as I have found.
Wordsworth accepted the strict requirements of the sonnet as a liberating frame, rather than as a straightjacket. I'd suggest that the heart of "structured programming" is about giving up the freedom to build arbitrarily-complex flow graphs in favor of a liberating ease of understanding.
I freely agree that sometimes an early exit is the simplest way to express an action. However, my experience has been that when I force myself to use the simplest possible control structures (and really think about designing within those constraints), I most often find that the result is simpler, clearer code. The drawback with
while (true) {
action0;
if (test0) break;
action1;
}
is that it's easy to let action0 and action1 become larger and larger chunks of code, or to add "just one more" test-break-action sequence, until it becomes difficult to point to a specific line and answer the question, "What conditions do I know hold at this point?" So, without making rules for other programmers, I try to avoid the while (true) {...} idiom in my own code whenever possible.
When you can write your code in the form
while (condition) { ... }
or
while (!condition) { ... }
with no exits (break, continue, or goto) in the body, that form is preferred, because someone can read the code and understand the termination condition just by looking at the header. That's good.
But lots of loops don't fit this model, and the infinite loop with explicit exit(s) in the middle is an honorable model. (Loops with continue are usually harder to understand than loops with break.) If you want some evidence or authority to cite, look no further than Don Knuth's famous paper on Structured Programming with Goto Statements; you will find all the examples, arguments, and explanations you could want.
A minor point of idiom: writing while (true) { ... } brands you as an old Pascal programmer or perhaps these days a Java programmer. If you are writing in C or C++, the preferred idiom is
for (;;) { ... }
There's no good reason for this, but you should write it this way because this is the way C programmers expect to see it.
I prefer
while(!<some condition>) {
//do something
}
but I think it's more a matter of readability, rather than the potential to "forget the break." I think that forgetting the break is a rather weak argument, as that would be a bug and you'd find and fix it right away.
The argument I have against using a break to get out of an endless loop is that you're essentially using the break statement as a goto. I'm not religiously against using goto (if the language supports it, it's fair game), but I do try to replace it if there's a more readable alternative.
In the case of many break points I would replace them with
while( !<some condition> ||
!<some other condition> ||
!<something completely different> ) {
//do something
}
Consolidating all of the stop conditions this way makes it a lot easier to see what's going to end this loop. break statements could be sprinkled around, and that's anything but readable.
while (true) might make sense if you have many statements and you want to stop if any fail
while (true) {
if (!function1() ) return;
if (!function2() ) return;
if (!function3() ) return;
if (!function4() ) return;
}
is better than
while (!fail) {
if (!fail) {
fail = function1()
}
if (!fail) {
fail = function2()
}
........
}
Javier made an interesting comment on my earlier answer (the one quoting Wordsworth):
I think while(true){} is a more 'pure' construct than while(condition){}.
and I couldn't respond adequately in 300 characters (sorry!)
In my teaching and mentoring, I've informally defined "complexity" as "How much of the rest of the code I need to have in my head to be able to understand this single line or expression?" The more stuff I have to bear in mind, the more complex the code is. The more the code tells me explicitly, the less complex.
So, with the goal of reducing complexity, let me reply to Javier in terms of completeness and strength rather than purity.
I think of this code fragment:
while (c1) {
// p1
a1;
// p2
...
// pz
az;
}
as expressing two things simultaneously:
the (entire) body will be repeated as long as c1 remains true, and
at point 1, where a1 is performed, c1 is guaranteed to hold.
The difference is one of perspective; the first of these has to do with the outer, dynamic behavior of the entire loop in general, while the second is useful to understanding the inner, static guarantee which I can count on while thinking about a1 in particular. Of course the net effect of a1 may invalidate c1, requiring that I think harder about what I can count on at point 2, etc.
Let's put a specific (tiny) example in place to think about the condition and first action:
while (index < length(someString)) {
// p1
char c = someString.charAt(index++);
// p2
...
}
The "outer" issue is that the loop is clearly doing something within someString that can only be done as long as index is positioned in the someString. This sets up an expectation that we'll be modifying either index or someString within the body (at a location and manner not known until I examine the body) so that termination eventually occurs. That gives me both context and expectation for thinking about the body.
The "inner" issue is that we're guaranteed that the action following point 1 will be legal, so while reading the code at point 2 I can think about what is being done with a char value I know has been legally obtained. (We can't even evaluate the condition if someString is a null ref, but I'm also assuming we've guarded against that in the context around this example!)
In contrast, a loop of the form:
while (true) {
// p1
a1;
// p2
...
}
lets me down on both issues. At the outer level, I am left wondering whether this means that I really should expect this loop to cycle forever (e.g. the main event dispatch loop of an operating system), or whether there's something else going on. This gives me neither an explicit context for reading the body, nor an expectation of what constitutes progress toward (uncertain) termination.
At the inner level, I have absolutely no explicit guarantee about any circumstances that may hold at point 1. The condition true, which is of course true everywhere, is the weakest possible statement about what we can know at any point in the program. Understanding the preconditions of an action are very valuable information when trying to think about what the action accomplishes!
So, I suggest that the while (true) ... idiom is much more incomplete and weak, and therefore more complex, than while (c1) ... according to the logic I've described above.
The problem is that not every algorithm sticks to the "while(cond){action}" model.
The general loop model is like this :
loop_prepare
loop:
action_A
if(cond) exit_loop
action_B
goto loop
after_loop_code
When there is no action_A you can replace it by :
loop_prepare
while(cond)
action_B
after_loop_code
When there is no action_B you can replace it by :
loop_prepare
do action_A
while(cond)
after_loop_code
In the general case, action_A will be executed n times and action_B will be executed (n-1) times.
A real life example is : print all the elements of a table separated by commas.
We want all the n elements with (n-1) commas.
You always can do some tricks to stick to the while-loop model, but this will always repeat code or check twice the same condition (for every loops) or add a new variable. So you will always be less efficient and less readable than the while-true-break loop model.
Example of (bad) "trick" : add variable and condition
loop_prepare
b=true // one more local variable : more complex code
while(b): // one more condition on every loop : less efficient
action_A
if(cond) b=false // the real condition is here
else action_B
after_loop_code
Example of (bad) "trick" : repeat the code. The repeated code must not be forgotten while modifying one of the two sections.
loop_prepare
action_A
while(cond):
action_B
action_A
after_loop_code
Note : in the last example, the programmer can obfuscate (willingly or not) the code by mixing the "loop_prepare" with the first "action_A", and action_B with the second action_A. So he can have the feeling he is not doing this.
The first is OK if there are many ways to break from the loop, or if the break condition cannot be expressed easily at the top of the loop (for example, the content of the loop needs to run halfway but the other half must not run, on the last iteration).
But if you can avoid it, you should, because programming should be about writing very complex things in the most obvious way possible, while also implementing features correctly and performantly. That's why your friend is, in the general case, correct. Your friend's way of writing loop constructs is much more obvious (assuming the conditions described in the preceding paragraph do not obtain).
There's a substantially identical question already in SO at Is WHILE TRUE…BREAK…END WHILE a good design?. #Glomek answered (in an underrated post):
Sometimes it's very good design. See Structured Programing With Goto Statements by Donald Knuth for some examples. I use this basic idea often for loops that run "n and a half times," especially read/process loops. However, I generally try to have only one break statement. This makes it easier to reason about the state of the program after the loop terminates.
Somewhat later, I responded with the related, and also woefully underrated, comment (in part because I didn't notice Glomek's the first time round, I think):
One fascinating article is Knuth's "Structured Programming with go to Statements" from 1974 (available in his book 'Literate Programming', and probably elsewhere too). It discusses, amongst other things, controlled ways of breaking out of loops, and (not using the term) the loop-and-a-half statement.
Ada also provides looping constructs, including
loopname:
loop
...
exit loopname when ...condition...;
...
end loop loopname;
The original question's code is similar to this in intent.
One difference between the referenced SO item and this is the 'final break'; that is a single-shot loop which uses break to exit the loop early. There have been questions on whether that is a good style too - I don't have the cross-reference at hand.
Sometime you need infinite loop, for example listening on port or waiting for connection.
So while(true)... should not categorized as good or bad, let situation decide what to use
It depends on what you’re trying to do, but in general I prefer putting the conditional in the while.
It’s simpler, since you don't need another test in the code.
It’s easier to read, since you don’t have to go hunting for a break inside the loop.
You’re reinventing the wheel. The whole point of while is to do something as long as a test is true. Why subvert that by putting the break condition somewhere else?
I’d use a while(true) loop if I was writing a daemon or other process that should run until it gets killed.
If there's one (and only one) non-exceptional break condition, putting that condition directly into the control-flow construct (the while) is preferable. Seeing while(true) { ... } makes me as a code-reader think that there's no simple way to enumerate the break conditions and makes me think "look carefully at this and think about carefully about the break conditions (what is set before them in the current loop and what might have been set in the previous loop)"
In short, I'm with your colleague in the simplest case, but while(true){ ... } is not uncommon.
The perfect consultant's answer: it depends. Most cases, the right thing to do is either use a while loop
while (condition is true ) {
// do something
}
or a "repeat until" which is done in a C-like language with
do {
// do something
} while ( condition is true);
If either of these cases works, use them.
Sometimes, like in the inner loop of a server, you really mean that a program should keep going until something external interrupts it. (Consider, eg, an httpd daemon -- it isn't going to stop unless it crashes or it's stopped by a shutdown.)
THEN AND ONLY THEN use a while(1):
while(1) {
accept connection
fork child process
}
Final case is the rare occasion where you want to do some part of the function before terminating. In that case, use:
while(1) { // or for(;;)
// do some stuff
if (condition met) break;
// otherwise do more stuff.
}
I think the benefit of using "while(true)" is probably to let multiple exit condition easier to write especially if these exit condition has to appear in different location within the code block. However, for me, it could be chaotic when I have to dry-run the code to see how the code interacts.
Personally I will try to avoid while(true). The reason is that whenever I look back at the code written previously, I usually find that I need to figure out when it runs/terminates more than what it actually does. Therefore, having to locate the "breaks" first is a bit troublesome for me.
If there is a need for multiple exit condition, I tend to refactor the condition determining logic into a separate function so that the loop block looks clean and easier to understand.
No, that's not bad since you may not always know the exit condition when you setup the loop or may have multiple exit conditions. However it does require more care to prevent an infinite loop.
He is probably correct.
Functionally the two can be identical.
However, for readability and understanding program flow, the while(condition) is better. The break smacks more of a goto of sorts. The while (condition) is very clear on the conditions which continue the loop, etc. That doesn't mean break is wrong, just can be less readable.
A few advantages of using the latter construct that come to my mind:
it's easier to understand what the loop is doing without looking for breaks in the loop's code.
if you don't use other breaks in the loop code, there's only one exit point in your loop and that's the while() condition.
generally ends up being less code, which adds to readability.
I prefer the while(!) approach because it more clearly and immediately conveys the intent of the loop.
There has been much talk about readability here and its very well constructed but as with all loops that are not fixed in size (ie. do while and while) you run at a risk.
His reasoning was that you could "forget the break" too easily and have an endless loop.
Within a while loop you are in fact asking for a process that runs indefinitely unless something happens, and if that something does not happen within a certain parameter, you will get exactly what you wanted... an endless loop.
What your friend recommend is different from what you did. Your own code is more akin to
do{
// do something
}while(!<some condition>);
which always run the loop at least once, regardless of the condition.
But there are times breaks are perfectly okay, as mentioned by others. In response to your friend's worry of "forget the break", I often write in the following form:
while(true){
// do something
if(<some condition>) break;
// continue do something
}
By good indentation, the break point is clear to first time reader of the code, look as structural as codes which break at the beginning or bottom of a loop.
It's not so much the while(true) part that's bad, but the fact that you have to break or goto out of it that is the problem. break and goto are not really acceptable methods of flow control.
I also don't really see the point. Even in something that loops through the entire duration of a program, you can at least have like a boolean called Quit or something that you set to true to get out of the loop properly in a loop like while(!Quit)... Not just calling break at some arbitrary point and jumping out,
using loops like
while(1) { do stuff }
is necessary in some situations. If you do any embedded systems programming (think microcontrollers like PICs, MSP430, and DSP programming) then almost all your code will be in a while(1) loop. When coding for DSPs sometimes you just need a while(1){} and the rest of the code is an interrupt service routine (ISR).
If you loop over an external condition (not being changed inside the loop), you use while(t), where t is the condition. However, if the loop stops when the condition changes inside the loop, it's more convenient to have the exit point explicitly marked with break, instead of waiting for it to happen on the next iteration of the loop:
while (true) {
...
a := a + 1;
if (a > 10) break; // right here!
...
}
As was already mentioned in a few other answers, the less code you have to keep in your head while reading a particular line, the better.

What are you favorite low level code optimization tricks? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I know that you should only optimize things when it is deemed necessary. But, if it is deemed necessary, what are your favorite low level (as opposed to algorithmic level) optimization tricks.
For example: loop unrolling.
gcc -O2
Compilers do a lot better job of it than you can.
Picking a power of two for filters, circular buffers, etc.
So very, very convenient.
-Adam
Why, bit twiddling hacks, of course!
One of the most useful in scientific code is to replace pow(x,4) with x*x*x*x. Pow is almost always more expensive than multiplication. This is followed by
for(int i = 0; i < N; i++)
{
z += x/y;
}
to
double denom = 1/y;
for(int i = 0; i < N; i++)
{
z += x*denom;
}
But my favorite low level optimization is to figure out which calculations can be removed from a loop. Its always faster to do the calculation once rather than N times. Depending on your compiler, some of these may be automatically done for you.
Inspect the compiler's output, then try to coerce it to do something faster.
I wouldn't necessarily call it a low level optimization, but I have saved orders of magnitude more cycles through judicious application of caching than I have through all my applications of low level tricks combined. Many of these methods are applications specific.
Having an LRU cache of database queries (or any other IPC based request).
Remembering the last failed database query and returning a failure if re-requested within a certain time frame.
Remembering your location in a large data structure to ensure that if the next request is for the same node, the search is free.
Caching calculation results to prevent duplicate work. In addition to more complex scenarios, this is often found in if or for statements.
CPUs and compilers are constantly changing. Whatever low level code trick that made sense 3 CPU chips ago with a different compiler may actually be slower on the current architecture and there may be a good chance that this trick may confuse whoever is maintaining this code in the future.
++i can be faster than i++, because it avoids creating a temporary.
Whether this still holds for modern C/C++/Java/C# compilers, I don't know. It might well be different for user-defined types with overloaded operators, whereas in the case of simple integers it probably doesn't matter.
But I've come to like the syntax... it reads like "increment i" which is a sensible order.
Using template metaprogramming to calculate things at compile time instead of at run-time.
Years ago with a not-so-smart compilier, I got great mileage from function inlining, walking pointers instead of indexing arrays, and iterating down to zero instead of up to a maximum.
When in doubt, a little knowledge of assembly will let you look at what the compiler is producing and attack the inefficient parts (in your source language, using structures friendlier to your compiler.)
precalculating values.
For instance, instead of sin(a) or cos(a), if your application doesn't necessarily need angles to be very precise, maybe you represent angles in 1/256 of a circle, and create arrays of floats sine[] and cosine[] precalculating the sin and cos of those angles.
And, if you need a vector at some angle of a given length frequently, you might precalculate all those sines and cosines already multiplied by that length.
Or, to put it more generally, trade memory for speed.
Or, even more generally, "All programming is an exercise in caching" -- Terje Mathisen
Some things are less obvious. For instance traversing a two dimensional array, you might do something like
for (x=0;x<maxx;x++)
for (y=0;y<maxy;y++)
do_something(a[x,y]);
You might find the processor cache likes it better if you do:
for (y=0;y<maxy;y++)
for (x=0;x<maxx;x++)
do_something(a[x,y]);
or vice versa.
Don't do loop unrolling. Don't do Duff's device. Make your loops as small as possible, anything else inhibits x86 performance and gcc optimizer performance.
Getting rid of branches can be useful, though - so getting rid of loops completely is good, and those branchless math tricks really do work. Beyond that, try never to go out of the L2 cache - this means a lot of precalculation/caching should also be avoided if it wastes cache space.
And, especially for x86, try to keep the number of variables in use at any one time down. It's hard to tell what compilers will do with that kind of thing, but usually having less loop iteration variables/array indexes will end up with better asm output.
Of course, this is for desktop CPUs; a slow CPU with fast memory access can precalculate a lot more, but in these days that might be an embedded system with little total memory anyway…
I've found that changing from a pointer to indexed access may make a difference; the compiler has different instruction forms and register usages to choose from. Vice versa, too. This is extremely low-level and compiler dependent, though, and only good when you need that last few percent.
E.g.
for (i = 0; i < n; ++i)
*p++ = ...; // some complicated expression
vs.
for (i = 0; i < n; ++i)
p[i] = ...; // some complicated expression
Optimizing cache locality - for example when multiplying two matrices that don't fit into cache.
Allocating with new on a pre-allocated buffer using C++'s placement new.
Counting down a loop. It's cheaper to compare against 0 than N:
for (i = N; --i >= 0; ) ...
Shifting and masking by powers of two is cheaper than division and remainder, / and %
#define WORD_LOG 5
#define SIZE (1 << WORD_LOG)
#define MASK (SIZE - 1)
uint32_t bits[K]
void set_bit(unsigned i)
{
bits[i >> WORD_LOG] |= (1 << (i & MASK))
}
Edit
(i >> WORD_LOG) == (i / SIZE) and
(i & MASK) == (i % SIZE)
because SIZE is 32 or 2^5.
Jon Bentley's Writing Efficient Programs is a great source of low- and high-level techniques -- if you can find a copy.
Eliminating branches (if/elses) by using boolean math:
if(x == 0)
x = 5;
// becomes:
x += (x == 0) * 5;
// if '5' was a base 2 number, let's say 4:
x += (x == 0) << 2;
// divide by 2 if flag is set
sum >>= (blendMode == BLEND);
This REALLY speeds things out especially when those ifs are in a loop or somewhere that is being called a lot.
The one from Assembler:
xor ax, ax
instead of:
mov ax, 0
Classical optimization for program size and performance.
In SQL, if you only need to know whether any data exists or not, don't bother with COUNT(*):
SELECT 1 FROM table WHERE some_primary_key = some_value
If your WHERE clause is likely return multiple rows, add a LIMIT 1 too.
(Remember that databases can't see what your code's doing with their results, so they can't optimise these things away on their own!)
Recycling the frame-pointer all of a sudden
Pascal calling-convention
Rewrite stack-frame tail call optimizarion (although it sometimes messes with the above)
Using vfork() instead of fork() before exec()
And one I am still looking for, an excuse to use: data driven code-generation at runtime
Liberal use of __restrict to eliminate load-hit-store stalls.
Rolling up loops.
Seriously, the last time I needed to do anything like this was in a function that took 80% of the runtime, so it was worth trying to micro-optimize if I could get a noticeable performance increase.
The first thing I did was to roll up the loop. This gave me a very significant speed increase. I believe this was a matter of cache locality.
The next thing I did was add a layer of indirection, and put some more logic into the loop, which allowed me to only loop through the things I needed. This wasn't as much of a speed increase, but it was worth doing.
If you're going to micro-optimize, you need to have a reasonable idea of two things: the architecture you're actually using (which is vastly different from the systems I grew up with, at least for micro-optimization purposes), and what the compiler will do for you.
A lot of the traditional micro-optimizations trade space for time. Nowadays, using more space increases the chances of a cache miss, and there goes your performance. Moreover, a lot of them are now done by modern compilers, and typically better than you're likely to do them.
Currently, you should (a) profile to see if you need to micro-optimize, and then (b) try to trade computation for space, in the hope of keeping as much as possible in cache. Finally, run some tests, so you know if you've improved things or screwed them up. Modern compilers and chips are far too complex for you to keep a good mental model, and the only way you'll know if some optimization works or not is to test.
In addition to Joshua's comment about code generation (a big win), and other good suggestions, ...
I'm not sure if you would call it "low-level", but (and this is downvote-bait) 1) stay away from using any more levels of abstraction than absolutely necessary, and 2) stay away from event-driven notification-style programming, if possible.
If a computer executing a program is like a car running a race, a method call is like a detour. That's not necessarily bad except there's a strong temptation to nest those things, because once you're written a method call, you tend to forget what that call could cost you.
If your're relying on events and notifications, it's because you have multiple data structures that need to be kept in agreement. This is costly, and should only be done if you can't avoid it.
In my experience, the biggest performance killers are too much data structure and too much abstraction.
I was amazed at the speedup I got by replacing a for loop adding numbers together in structs:
const unsigned long SIZE = 100000000;
typedef struct {
int a;
int b;
int result;
} addition;
addition *sum;
void start() {
unsigned int byte_count = SIZE * sizeof(addition);
sum = malloc(byte_count);
unsigned int i = 0;
if (i < SIZE) {
do {
sum[i].a = i;
sum[i].b = i;
i++;
} while (i < SIZE);
}
}
void test_func() {
unsigned int i = 0;
if (i < SIZE) { // this is about 30% faster than the more obvious for loop, even with O3
do {
addition *s1 = &sum[i];
s1->result = s1->b + s1->a;
i++;
} while ( i<SIZE );
}
}
void finish() {
free(sum);
}
Why doesn't gcc optimise for loops into this? Or is there something I missed? Some cache effect?