how can I skip the first few lines of a text file using the raku IO.lines.race construct - raku

I am trying to read a text file using raku with the IO.lines.race construct. For example
for $file.IO.lines.race
{
#do something, such as
my ($a,$b)=.split(" ");
}
How can I skip the, say, first three lines of the text file?
Thanks!
Tao

Update: As recommended by Elizabeth Mattijsen it is more efficient use skip instead of tail.
for $file.IO.lines.skip(3).race
There is a tail routine you can use:
for $file.IO.lines.tail(*-3).race
{
#do something, such as
my ($a,$b)=.split(" ");
}

The answer above by Lukas Valle is perfectly fine; you can use skip to skip the lines you don't need.
However, I can't help but indicate that case goes much better with other functional constructs such as map:
$file.IO.lines.skip(3).race.map( .split(" ") );
That way, you can chain several operations together without creating different loops. Of course, in Raku TIMTOWDI, so a for loop (or several) is perfectly fine.
Also, in this case I would really time how much the loop, or map, is going to take. For files that don't have many lines, race is not going to give you much, and it might even be slower, due to overhead. If your intention is to beat the clock a bit, IO::Handle.Supply is probably going to be a bit faster.

Related

Filtering with regex and .contains in Perl 6

I often have to filter elements of an array of strings, containing some substring (e.g. one character). Since it can be done either by matching a regex or with .contains method, I've decided to make a small test to be sure that .contains is faster (and therefore more appropriate).
my #array = "aa" .. "cc";
my constant $substr = 'a';
my $time1 = now;
my #a_array = #array.grep: *.contains($substr);
my $time2 = now;
#a_array = #array.grep: * ~~ /$substr/;
my $time3 = now;
my $time_contains = $time2 - $time1;
my $time_regex = $time3 - $time2;
say "contains: $time_contains sec";
say "regex: $time_regex sec";
Then I change the size of #array and the length of $substr and compare the times which each method took to filter the #array. In most cases (as expected), .contains is much faster than regex, especially if #array is large. But in case of a small #array (as in the code above) regex is slightly faster.
contains: 0.0015010 sec
regex: 0.0008708 sec
Why does this happen?
In an entirely unscientific experiment I just switched the regex version and the contains version around and found that the difference in performance you're measuring is not "regex vs contains" but in fact "first thing versus second thing":
When contains comes first:
contains: 0.001555 sec
regex: 0.0009051 sec
When regex comes first:
regex: 0.002055 sec
contains: 0.000326 sec
Benchmarking properly is a difficult task. It's really easy to accidentally measure something different from what you wanted to figure out.
When I want to compare the performance of multiple things I will usually run each thing in a separate script, or maybe have a shared script but only run one of the tasks at once (for example using a multi sub MAIN("task1") approach). That way any startup work gets shared.
In the #perl6 IRC channel on freenode we have a bot called benchable6 which can do benchmarks for you. Read the section "Comparing Code" on its wiki page to find out how it can compare two pieces of code for you.

If-else statements to cut down processing time

Suppose I have a number of possible inputs from the user of my program listed from most likely to least as input1, input2, input3,...,inputN. Would the following framework cut down on processing time by accessing the most probable If statement needed first and then ignoring the rest (rather than testing the validity of each If statement thereafter)? I assume the least probable inputN will be extra burdensome on the processor, but the limited likelihood of the user giving that input makes it worth it if this structure reduces processing time overall.
If (input1) then (output1)
Else
If (input2) then (output2)
Else
If (input3) then:(output3)
Else
If ...
... Else
OutputN
Thanks!
This is how if-else-if statements work.
if(booleanTest1)
{
//do a thing
}
else if(booleanTest2)
{
//do another thing
}
//...ad infinitum
else
{
//do default behavior
}
If booleanTest1 is true, we execute its code, and then skip past all the other tests.
If you're comparing one variable against many possible values, use a switch statement.
I do not know for sure, but I'd assume, that a switch-case wolud be more efficient during runtime, because of branch prediction. With If-elses you have many branches, that might go wrong, which is not good for the piped commands in the processor que.
If there are really a lot of possibilities.
I usually do ist with a map / dictionary of <Key, Method to call>. As long as they have the same signature, this might work. It may not be as fast as a switch-case, but it will grant you some flexilibity, when you need to react to new inputs.
example:
Dictionary myDic = new Dictionary();
myDic.Add(input1,() => What ever to do when input1 comes);
the call the looks like this:
myDicinput1;

What is the difference between an IF, CASE, and WHILE statement

I just want to know what the difference between all the conditional statements in objective-c and which one is faster and lighter.
One piece of advice: stop worrying about which language constructs are microscopically faster or slower than which others, and instead focus on which ones let you express yourself best.
If and case statements described
While statement described
Since these statements do different things, it is unproductive to debate which is faster.
It's like asking whether a hammer is faster than a screwdriver.
The language-agnostic version (mostly, obviously this doesn't count for declarative languages or other weird ones):
When I was taught programming (quite a while ago, I'll freely admit), a language consisted of three ways of executing instructions:
sequence (doing things in order).
selection (doing one of many things).
iteration (doing something zero or more times).
The if and case statements are both variants on selection. If is used to select one of two different options based on a condition (using pseudo-code):
if condition:
do option 1
else:
do option 2
keeping in mind that the else may not be needed in which case it's effectively else do nothing. Also remember that option 1 or 2 may also consist of any of the statement types, including more if statements (called nesting).
Case is slightly different - it's generally meant for more than two choices like when you want to do different things based on a character:
select ch:
case 'a','e','i','o','u':
print "is a vowel"
case 'y':
print "never quite sure"
default:
print "is a consonant"
Note that you can use case for two options (or even one) but it's a bit like killing a fly with a thermonuclear warhead.
While is not a selection variant but an iteration one. It belongs with the likes of for, repeat, until and a host of other possibilities.
As to which is fastest, it doesn't matter in the vast majority of cases. The compiler writers know far more than we mortal folk how to get the last bit of performance out of their code. You either trust them to do their job right or you hand-code it in assembly yourself (I'd prefer the former).
You'll get far more performance by concentrating on the macro view rather than the minor things. That includes selection of appropriate algorithms, profiling, and targeting of hot spots. It does little good to find something that take five minutes each month and get that running in two minutes. Better to get a smaller improvement in something happening every minute.
The language constructs like if, while, case and so on will already be as fast as they can be since they're used heavily and are relative simple. You should be first writing your code for readability and only worrying about performance when it becomes an issue (see YAGNI).
Even if you found that using if/goto combinations instead of case allowed you to run a bit faster, the resulting morass of source code would be harder to maintain down the track.
while isn't a conditional it is a loop. The difference being that the body of a while-loop can be executed many times, the body of a conditional will only be executed once or not at all.
The difference between if and switch is that if accepts an arbitrary expression as the condition and switch just takes values to compare against. Basically if you have a construct like if(x==0) {} else if(x==1) {} else if(x==2) ..., it can be written much more concisely (and effectively) by using switch.
A case statement could be written as
if (a)
{
// Do something
}
else if (b)
{
// Do something else
}
But the case is much more efficient, since it only evaluates the conditional once and then branches.
while is only useful if you want a condition to be evaluated, and the associated code block executed, multiple times. If you expect a condition to only occur once, then it's equivalent to if. A more apt comparison is that while is a more generalized for.
Each condition statement serves a different purpose and you won't use the same one in every situation. Learn which ones are appropriate for which situation and then write your code. If you profile your code and find there's a bottleneck, then you go ahead and address it. Don't worry about optimizing before there's actually a problem.
Are you asking whether an if structure will execute faster than a switch statement inside of a large loop? If so, I put together a quick test, this code was put into the viewDidLoad method of a new view based project I just created in the latest Xcode and iPhone SDK:
NSLog(#"Begin loop");
NSDate *loopBegin = [NSDate date];
int ctr0, ctr1, ctr2, ctr3, moddedNumber;
ctr0 = 0;
ctr1 = 0;
ctr2 = 0;
ctr3 = 0;
for (int i = 0; i < 10000000; i++) {
moddedNumber = i % 4;
// 3.34, 1.23s in simulator
if (moddedNumber == 0)
{
ctr0++;
}
else if (moddedNumber == 1)
{
ctr1++;
}
else if (moddedNumber == 2)
{
ctr2++;
}
else if (moddedNumber == 3)
{
ctr3++;
}
// 4.11, 1.34s on iPod Touch
/*switch (moddedNumber)
{
case 0:
ctr0++;
break;
case 1:
ctr1++;
break;
case 2:
ctr2++;
break;
case 3:
ctr3++;
break;
}*/
}
NSTimeInterval elapsed = [[NSDate date] timeIntervalSinceDate:loopBegin];
NSLog(#"End loop: %f seconds", elapsed );
This code sample is by no means complete, because as pointed out earlier if you have a situation that comes up more times than the others, you would of course want to put that one up front to reduce the total number of comparisons. It does show that the if structure would execute a bit faster in a situation where the decisions are more or less equally divided among the branches.
Also, keep in mind that the results of this little test varied widely in performance between running it on a device vs. running it in the emulator. The times cited in the code comments are running on an actual device. (The first time shown is the time to run the loop the first time the code was run, and the second number was the time when running the same code again without rebuilding.)
There are conditional statements and conditional loops. (If Wikipedia is to be trusted, then simply referring to "a conditional" in programming doesn't cover conditional loops. But this is a minor terminology issue.)
Shmoopty said "Since these statements do different things, it is nonsensical to debate which is faster."
Well... it may be time poorly spent, but it's not nonsensical. For instance, let's say you have an if statement:
if (cond) {
code
}
You can transform that into a loop that executes at most one time:
while (cond) {
code
break;
}
The latter will be slower in pretty much any language (or the same speed, because the optimizer turned it back into the original if behind the scenes!) Still, there are occasions in computer programming where (due to bizarre circumstances) the convoluted thing runs faster
But those incidents are few and far between. The focus should be on your code--what makes it clearest, and what captures your intent.
loops and branches are hard to explain briefly, to get the best code out of a construct in any c-style language depends on the processor used and the local context of the code. The main objective is to reduce the breaking of the execution pipeline -- primarily by reducing branch mispredictions.
I suggest you go here for all your optimization needs. The manuals are written for the c-style programmer and relatively easy to understand if you know some assembly. These manuals should explain to you the subtleties in modern processors, the strategies used by top compilers, and the best way to structure code to get the most out of it.
I just remembered the most important thing about conditionals and branching code. Order your code as follows
if(x==1); //80% of the time
else if(x==2); // 10% of the time
else if(x==3); //6% of the time
else break;
You must use an else sequence... and in this case the prediction logic in your CPU will predict correctly for x==1 and avoid the breaking of your pipeline for 80% of all execution.
More information from intel. Particularly:
In order to effectively write your code to take advantage of these rules, when writing if-else or switch statements, check the most common cases first and work progressively down to the least common. Loops do not necessarily require any special ordering of code for static branch prediction, as only the condition of the loop iterator is normally used.
By following this rule you are flat-out giving the CPU hints about how to bias its prediction logic towards your chained conditionals.

Is while (true) with break bad programming practice?

I often use this code pattern:
while(true) {
//do something
if(<some condition>) {
break;
}
}
Another programmer told me that this was bad practice and that I should replace it with the more standard:
while(!<some condition>) {
//do something
}
His reasoning was that you could "forget the break" too easily and have an endless loop. I told him that in the second example you could just as easily put in a condition which never returned true and so just as easily have an endless loop, so both are equally valid practices.
Further, I often prefer the former as it makes the code easier to read when you have multiple break points, i.e. multiple conditions which get out of the loop.
Can anyone enrichen this argument by adding evidence for one side or the other?
There is a discrepancy between the two examples. The first will execute the "do something" at least once every time even if the statement is never true. The second will only "do something" when the statement evaluates to true.
I think what you are looking for is a do-while loop. I 100% agree that while (true) is not a good idea because it makes it hard to maintain this code and the way you are escaping the loop is very goto esque which is considered bad practice.
Try:
do {
//do something
} while (!something);
Check your individual language documentation for the exact syntax. But look at this code, it basically does what is in the do, then checks the while portion to see if it should do it again.
To quote that noted developer of days gone by, Wordsworth:
...
In truth the prison, unto which we doom
Ourselves, no prison is; and hence for me,
In sundry moods, 'twas pastime to be bound
Within the Sonnet's scanty plot of ground;
Pleased if some souls (for such their needs must be)
Who have felt the weight of too much liberty,
Should find brief solace there, as I have found.
Wordsworth accepted the strict requirements of the sonnet as a liberating frame, rather than as a straightjacket. I'd suggest that the heart of "structured programming" is about giving up the freedom to build arbitrarily-complex flow graphs in favor of a liberating ease of understanding.
I freely agree that sometimes an early exit is the simplest way to express an action. However, my experience has been that when I force myself to use the simplest possible control structures (and really think about designing within those constraints), I most often find that the result is simpler, clearer code. The drawback with
while (true) {
action0;
if (test0) break;
action1;
}
is that it's easy to let action0 and action1 become larger and larger chunks of code, or to add "just one more" test-break-action sequence, until it becomes difficult to point to a specific line and answer the question, "What conditions do I know hold at this point?" So, without making rules for other programmers, I try to avoid the while (true) {...} idiom in my own code whenever possible.
When you can write your code in the form
while (condition) { ... }
or
while (!condition) { ... }
with no exits (break, continue, or goto) in the body, that form is preferred, because someone can read the code and understand the termination condition just by looking at the header. That's good.
But lots of loops don't fit this model, and the infinite loop with explicit exit(s) in the middle is an honorable model. (Loops with continue are usually harder to understand than loops with break.) If you want some evidence or authority to cite, look no further than Don Knuth's famous paper on Structured Programming with Goto Statements; you will find all the examples, arguments, and explanations you could want.
A minor point of idiom: writing while (true) { ... } brands you as an old Pascal programmer or perhaps these days a Java programmer. If you are writing in C or C++, the preferred idiom is
for (;;) { ... }
There's no good reason for this, but you should write it this way because this is the way C programmers expect to see it.
I prefer
while(!<some condition>) {
//do something
}
but I think it's more a matter of readability, rather than the potential to "forget the break." I think that forgetting the break is a rather weak argument, as that would be a bug and you'd find and fix it right away.
The argument I have against using a break to get out of an endless loop is that you're essentially using the break statement as a goto. I'm not religiously against using goto (if the language supports it, it's fair game), but I do try to replace it if there's a more readable alternative.
In the case of many break points I would replace them with
while( !<some condition> ||
!<some other condition> ||
!<something completely different> ) {
//do something
}
Consolidating all of the stop conditions this way makes it a lot easier to see what's going to end this loop. break statements could be sprinkled around, and that's anything but readable.
while (true) might make sense if you have many statements and you want to stop if any fail
while (true) {
if (!function1() ) return;
if (!function2() ) return;
if (!function3() ) return;
if (!function4() ) return;
}
is better than
while (!fail) {
if (!fail) {
fail = function1()
}
if (!fail) {
fail = function2()
}
........
}
Javier made an interesting comment on my earlier answer (the one quoting Wordsworth):
I think while(true){} is a more 'pure' construct than while(condition){}.
and I couldn't respond adequately in 300 characters (sorry!)
In my teaching and mentoring, I've informally defined "complexity" as "How much of the rest of the code I need to have in my head to be able to understand this single line or expression?" The more stuff I have to bear in mind, the more complex the code is. The more the code tells me explicitly, the less complex.
So, with the goal of reducing complexity, let me reply to Javier in terms of completeness and strength rather than purity.
I think of this code fragment:
while (c1) {
// p1
a1;
// p2
...
// pz
az;
}
as expressing two things simultaneously:
the (entire) body will be repeated as long as c1 remains true, and
at point 1, where a1 is performed, c1 is guaranteed to hold.
The difference is one of perspective; the first of these has to do with the outer, dynamic behavior of the entire loop in general, while the second is useful to understanding the inner, static guarantee which I can count on while thinking about a1 in particular. Of course the net effect of a1 may invalidate c1, requiring that I think harder about what I can count on at point 2, etc.
Let's put a specific (tiny) example in place to think about the condition and first action:
while (index < length(someString)) {
// p1
char c = someString.charAt(index++);
// p2
...
}
The "outer" issue is that the loop is clearly doing something within someString that can only be done as long as index is positioned in the someString. This sets up an expectation that we'll be modifying either index or someString within the body (at a location and manner not known until I examine the body) so that termination eventually occurs. That gives me both context and expectation for thinking about the body.
The "inner" issue is that we're guaranteed that the action following point 1 will be legal, so while reading the code at point 2 I can think about what is being done with a char value I know has been legally obtained. (We can't even evaluate the condition if someString is a null ref, but I'm also assuming we've guarded against that in the context around this example!)
In contrast, a loop of the form:
while (true) {
// p1
a1;
// p2
...
}
lets me down on both issues. At the outer level, I am left wondering whether this means that I really should expect this loop to cycle forever (e.g. the main event dispatch loop of an operating system), or whether there's something else going on. This gives me neither an explicit context for reading the body, nor an expectation of what constitutes progress toward (uncertain) termination.
At the inner level, I have absolutely no explicit guarantee about any circumstances that may hold at point 1. The condition true, which is of course true everywhere, is the weakest possible statement about what we can know at any point in the program. Understanding the preconditions of an action are very valuable information when trying to think about what the action accomplishes!
So, I suggest that the while (true) ... idiom is much more incomplete and weak, and therefore more complex, than while (c1) ... according to the logic I've described above.
The problem is that not every algorithm sticks to the "while(cond){action}" model.
The general loop model is like this :
loop_prepare
loop:
action_A
if(cond) exit_loop
action_B
goto loop
after_loop_code
When there is no action_A you can replace it by :
loop_prepare
while(cond)
action_B
after_loop_code
When there is no action_B you can replace it by :
loop_prepare
do action_A
while(cond)
after_loop_code
In the general case, action_A will be executed n times and action_B will be executed (n-1) times.
A real life example is : print all the elements of a table separated by commas.
We want all the n elements with (n-1) commas.
You always can do some tricks to stick to the while-loop model, but this will always repeat code or check twice the same condition (for every loops) or add a new variable. So you will always be less efficient and less readable than the while-true-break loop model.
Example of (bad) "trick" : add variable and condition
loop_prepare
b=true // one more local variable : more complex code
while(b): // one more condition on every loop : less efficient
action_A
if(cond) b=false // the real condition is here
else action_B
after_loop_code
Example of (bad) "trick" : repeat the code. The repeated code must not be forgotten while modifying one of the two sections.
loop_prepare
action_A
while(cond):
action_B
action_A
after_loop_code
Note : in the last example, the programmer can obfuscate (willingly or not) the code by mixing the "loop_prepare" with the first "action_A", and action_B with the second action_A. So he can have the feeling he is not doing this.
The first is OK if there are many ways to break from the loop, or if the break condition cannot be expressed easily at the top of the loop (for example, the content of the loop needs to run halfway but the other half must not run, on the last iteration).
But if you can avoid it, you should, because programming should be about writing very complex things in the most obvious way possible, while also implementing features correctly and performantly. That's why your friend is, in the general case, correct. Your friend's way of writing loop constructs is much more obvious (assuming the conditions described in the preceding paragraph do not obtain).
There's a substantially identical question already in SO at Is WHILE TRUE…BREAK…END WHILE a good design?. #Glomek answered (in an underrated post):
Sometimes it's very good design. See Structured Programing With Goto Statements by Donald Knuth for some examples. I use this basic idea often for loops that run "n and a half times," especially read/process loops. However, I generally try to have only one break statement. This makes it easier to reason about the state of the program after the loop terminates.
Somewhat later, I responded with the related, and also woefully underrated, comment (in part because I didn't notice Glomek's the first time round, I think):
One fascinating article is Knuth's "Structured Programming with go to Statements" from 1974 (available in his book 'Literate Programming', and probably elsewhere too). It discusses, amongst other things, controlled ways of breaking out of loops, and (not using the term) the loop-and-a-half statement.
Ada also provides looping constructs, including
loopname:
loop
...
exit loopname when ...condition...;
...
end loop loopname;
The original question's code is similar to this in intent.
One difference between the referenced SO item and this is the 'final break'; that is a single-shot loop which uses break to exit the loop early. There have been questions on whether that is a good style too - I don't have the cross-reference at hand.
Sometime you need infinite loop, for example listening on port or waiting for connection.
So while(true)... should not categorized as good or bad, let situation decide what to use
It depends on what you’re trying to do, but in general I prefer putting the conditional in the while.
It’s simpler, since you don't need another test in the code.
It’s easier to read, since you don’t have to go hunting for a break inside the loop.
You’re reinventing the wheel. The whole point of while is to do something as long as a test is true. Why subvert that by putting the break condition somewhere else?
I’d use a while(true) loop if I was writing a daemon or other process that should run until it gets killed.
If there's one (and only one) non-exceptional break condition, putting that condition directly into the control-flow construct (the while) is preferable. Seeing while(true) { ... } makes me as a code-reader think that there's no simple way to enumerate the break conditions and makes me think "look carefully at this and think about carefully about the break conditions (what is set before them in the current loop and what might have been set in the previous loop)"
In short, I'm with your colleague in the simplest case, but while(true){ ... } is not uncommon.
The perfect consultant's answer: it depends. Most cases, the right thing to do is either use a while loop
while (condition is true ) {
// do something
}
or a "repeat until" which is done in a C-like language with
do {
// do something
} while ( condition is true);
If either of these cases works, use them.
Sometimes, like in the inner loop of a server, you really mean that a program should keep going until something external interrupts it. (Consider, eg, an httpd daemon -- it isn't going to stop unless it crashes or it's stopped by a shutdown.)
THEN AND ONLY THEN use a while(1):
while(1) {
accept connection
fork child process
}
Final case is the rare occasion where you want to do some part of the function before terminating. In that case, use:
while(1) { // or for(;;)
// do some stuff
if (condition met) break;
// otherwise do more stuff.
}
I think the benefit of using "while(true)" is probably to let multiple exit condition easier to write especially if these exit condition has to appear in different location within the code block. However, for me, it could be chaotic when I have to dry-run the code to see how the code interacts.
Personally I will try to avoid while(true). The reason is that whenever I look back at the code written previously, I usually find that I need to figure out when it runs/terminates more than what it actually does. Therefore, having to locate the "breaks" first is a bit troublesome for me.
If there is a need for multiple exit condition, I tend to refactor the condition determining logic into a separate function so that the loop block looks clean and easier to understand.
No, that's not bad since you may not always know the exit condition when you setup the loop or may have multiple exit conditions. However it does require more care to prevent an infinite loop.
He is probably correct.
Functionally the two can be identical.
However, for readability and understanding program flow, the while(condition) is better. The break smacks more of a goto of sorts. The while (condition) is very clear on the conditions which continue the loop, etc. That doesn't mean break is wrong, just can be less readable.
A few advantages of using the latter construct that come to my mind:
it's easier to understand what the loop is doing without looking for breaks in the loop's code.
if you don't use other breaks in the loop code, there's only one exit point in your loop and that's the while() condition.
generally ends up being less code, which adds to readability.
I prefer the while(!) approach because it more clearly and immediately conveys the intent of the loop.
There has been much talk about readability here and its very well constructed but as with all loops that are not fixed in size (ie. do while and while) you run at a risk.
His reasoning was that you could "forget the break" too easily and have an endless loop.
Within a while loop you are in fact asking for a process that runs indefinitely unless something happens, and if that something does not happen within a certain parameter, you will get exactly what you wanted... an endless loop.
What your friend recommend is different from what you did. Your own code is more akin to
do{
// do something
}while(!<some condition>);
which always run the loop at least once, regardless of the condition.
But there are times breaks are perfectly okay, as mentioned by others. In response to your friend's worry of "forget the break", I often write in the following form:
while(true){
// do something
if(<some condition>) break;
// continue do something
}
By good indentation, the break point is clear to first time reader of the code, look as structural as codes which break at the beginning or bottom of a loop.
It's not so much the while(true) part that's bad, but the fact that you have to break or goto out of it that is the problem. break and goto are not really acceptable methods of flow control.
I also don't really see the point. Even in something that loops through the entire duration of a program, you can at least have like a boolean called Quit or something that you set to true to get out of the loop properly in a loop like while(!Quit)... Not just calling break at some arbitrary point and jumping out,
using loops like
while(1) { do stuff }
is necessary in some situations. If you do any embedded systems programming (think microcontrollers like PICs, MSP430, and DSP programming) then almost all your code will be in a while(1) loop. When coding for DSPs sometimes you just need a while(1){} and the rest of the code is an interrupt service routine (ISR).
If you loop over an external condition (not being changed inside the loop), you use while(t), where t is the condition. However, if the loop stops when the condition changes inside the loop, it's more convenient to have the exit point explicitly marked with break, instead of waiting for it to happen on the next iteration of the loop:
while (true) {
...
a := a + 1;
if (a > 10) break; // right here!
...
}
As was already mentioned in a few other answers, the less code you have to keep in your head while reading a particular line, the better.

Is a variable named i unacceptable? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
As far as variable naming conventions go, should iterators be named i or something more semantic like count? If you don't use i, why not? If you feel that i is acceptable, are there cases of iteration where it shouldn't be used?
Depends on the context I suppose. If you where looping through a set of Objects in some
collection then it should be fairly obvious from the context what you are doing.
for(int i = 0; i < 10; i++)
{
// i is well known here to be the index
objectCollection[i].SomeProperty = someValue;
}
However if it is not immediately clear from the context what it is you are doing, or if you are making modifications to the index you should use a variable name that is more indicative of the usage.
for(int currentRow = 0; currentRow < numRows; currentRow++)
{
for(int currentCol = 0; currentCol < numCols; currentCol++)
{
someTable[currentRow][currentCol] = someValue;
}
}
"i" means "loop counter" to a programmer. There's nothing wrong with it.
Here's another example of something that's perfectly okay:
foreach (Product p in ProductList)
{
// Do something with p
}
I tend to use i, j, k for very localized loops (only exist for a short period in terms of number of source lines). For variables that exist over a larger source area, I tend to use more detailed names so I can see what they're for without searching back in the code.
By the way, I think that the naming convention for these came from the early Fortran language where I was the first integer variable (A - H were floats)?
i is acceptable, for certain. However, I learned a tremendous amount one semester from a C++ teacher I had who refused code that did not have a descriptive name for every single variable. The simple act of naming everything descriptively forced me to think harder about my code, and I wrote better programs after that course, not from learning C++, but from learning to name everything. Code Complete has some good words on this same topic.
i is fine, but something like this is not:
for (int i = 0; i < 10; i++)
{
for (int j = 0; j < 10; j++)
{
string s = datarow[i][j].ToString(); // or worse
}
}
Very common for programmers to inadvertently swap the i and the j in the code, especially if they have bad eyesight or their Windows theme is "hotdog". This is always a "code smell" for me - it's kind of rare when this doesn't get screwed up.
i is so common that it is acceptable, even for people that love descriptive variable names.
What is absolutely unacceptable (and a sin in my book) is using i,j, or k in any other context than as an integer index in a loop.... e.g.
foreach(Input i in inputs)
{
Process(i);
}
i is definitely acceptable. Not sure what kind of justification I need to make -- but I do use it all of the time, and other very respected programmers do as well.
Social validation, I guess :)
Yes, in fact it's preferred since any programmer reading your code will understand that it's simply an iterator.
What is the value of using i instead of a more specific variable name? To save 1 second or 10 seconds or maybe, maybe, even 30 seconds of thinking and typing?
What is the cost of using i? Maybe nothing. Maybe the code is so simple that using i is fine. But maybe, maybe, using i will force developers who come to this code in the future to have to think for a moment "what does i mean here?" They will have to think: "is it an index, a count, an offset, a flag?" They will have to think: "is this change safe, is it correct, will I be off by 1?"
Using i saves time and intellectual effort when writing code but may end up costing more intellectual effort in the future, or perhaps even result in the inadvertent introduction of defects due to misunderstanding the code.
Generally speaking, most software development is maintenance and extension, so the amount of time spent reading your code will vastly exceed the amount of time spent writing it.
It's very easy to develop the habit of using meaningful names everywhere, and once you have that habit it takes only a few seconds more to write code with meaningful names, but then you have code which is easier to read, easier to understand, and more obviously correct.
I use i for short loops.
The reason it's OK is that I find it utterly implausible that someone could see a declaration of iterator type, with initializer, and then three lines later claim that it's not clear what the variable represents. They're just pretending, because they've decided that "meaningful variable names" must mean "long variable names".
The reason I actually do it, is that I find that using something unrelated to the specific task at hand, and that I would only ever use in a small scope, saves me worrying that I might use a name that's misleading, or ambiguous, or will some day be useful for something else in the larger scope. The reason it's "i" rather than "q" or "count" is just convention borrowed from mathematics.
I don't use i if:
The loop body is not small, or
the iterator does anything other than advance (or retreat) from the start of a range to the finish of the loop:
i doesn't necessarily have to go in increments of 1 so long as the increment is consistent and clear, and of course might stop before the end of the iterand, but if it ever changes direction, or is unmodified by an iteration of the loop (including the devilish use of iterator.insertAfter() in a forward loop), I try to remember to use something different. This signals "this is not just a trivial loop variable, hence this may not be a trivial loop".
If the "something more semantic" is "iterator" then there is no reason not to use i; it is a well understood idiom.
i think i is completely acceptable in for-loop situations. i have always found this to be pretty standard and never really run into interpretation issues when i is used in this instance. foreach-loops get a little trickier and i think really depends on your situation. i rarely if ever use i in foreach, only in for loops, as i find i to be too un-descriptive in these cases. for foreach i try to use an abbreviation of the object type being looped. e.g:
foreach(DataRow dr in datatable.Rows)
{
//do stuff to/with datarow dr here
}
anyways, just my $0.02.
It helps if you name it something that describes what it is looping through. But I usually just use i.
As long as you are either using i to count loops, or part of an index that goes from 0 (or 1 depending on PL) to n, then I would say i is fine.
Otherwise its probably easy to name i something meaningful it its more than just an index.
I should point out that i and j are also mathematical notation for matrix indices. And usually, you're looping over an array. So it makes sense.
As long as you're using it temporarily inside a simple loop and it's obvious what you're doing, sure. That said, is there no other short word you can use instead?
i is widely known as a loop iterator, so you're actually more likely to confuse maintenance programmers if you use it outside of a loop, but if you use something more descriptive (like filecounter), it makes code nicer.
It depends.
If you're iterating over some particular set of data then I think it makes more sense to use a descriptive name. (eg. filecounter as Dan suggested).
However, if you're performing an arbitrary loop then i is acceptable. As one work mate described it to me - i is a convention that means "this variable is only ever modified by the for loop construct. If that's not true, don't use i"
The use of i, j, k for INTEGER loop counters goes back to the early days of FORTRAN.
Personally I don't have a problem with them so long as they are INTEGER counts.
But then I grew up on FORTRAN!
my feeling is that the concept of using a single letter is fine for "simple" loops, however, i learned to use double-letters a long time ago and it has worked out great.
i asked a similar question last week and the following is part of my own answer:// recommended style ● // "typical" single-letter style
●
for (ii=0; ii<10; ++ii) { ● for (i=0; i<10; ++i) {
for (jj=0; jj<10; ++jj) { ● for (j=0; j<10; ++j) {
mm[ii][jj] = ii * jj; ● m[i][j] = i * j;
} ● }
} ● }
in case the benefit isn't immediately obvious: searching through code for any single letter will find many things that aren't what you're looking for. the letter i occurs quite often in code where it isn't the variable you're looking for.
i've been doing it this way for at least 10 years.
note that plenty of people commented that either/both of the above are "ugly"...
I am going to go against the grain and say no.
For the crowd that says "i is understood as an iterator", that may be true, but to me that is the equivalent of comments like 'Assign the value 5 to variable Y. Variable names like comment should explain the why/what not the how.
To use an example from a previous answer:
for(int i = 0; i < 10; i++)
{
// i is well known here to be the index
objectCollection[i].SomeProperty = someValue;
}
Is it that much harder to just use a meaningful name like so?
for(int objectCollectionIndex = 0; objectCollectionIndex < 10; objectCollectionIndex ++)
{
objectCollection[objectCollectionIndex].SomeProperty = someValue;
}
Granted the (borrowed) variable name objectCollection is pretty badly named too.