Is it worth precomputing the conditional in a for loop?

Is it worth precomputing the conditional in a for loop? - objective-c

Is there a difference between the following two code blocks in terms of the resulting machine code when using the llvm or gcc compilers?
When is this optimization actually worthwhile, if ever?
Not optimized:
for (int i=0; i<array.count; i++) {
//do some work
}
Optimized:
int count = array.count;
for (int i=0; i<count; i++) {
//do some work
}
EDIT: I should point out that array is immutable and array.count doesn't change during the loop's execution.

You really need to check it yourself. My guess is that there is a difference in the emitted code, but it might depend on compiler and compiler options, and it certainly can depend on the definition of array.
Nearly never, on the assumption that evaluating array.count is nearly always insignificant compared with "some work". The way to measure it, though, is to use a profiler (or equivalent) and observe what proportion of your program's runtime is spent at that line of code. Provided the profiler is accurate, that's the most you could hope to gain by changing it.
Suppose array.count is something really slow, that you happen to know will always return the same result but the compiler doesn't know that. Then it might be worth manually hoisting it. strlen gets used as an example. It's debateable how often strlen is actually slow in practice, but easy to manufacture examples likely to run slower than they need to:
char some_function(char a) {
return (a * 2 + 1) & 0x3F;
}
for (int i = 0; i < strlen(ptr); ++i) {
ptr[i] = some_function(ptr[i]); // faster than strlen for long enough strings.
}
You and I know that some_function never returns 0, and hence the length of the string never changes. The compiler might not see the definition of some_function, and even if it does see the definition might not realize that its non-zero-returningness is important.

The Steve Jessop answer is a good one. I just want to add:
Personally, I always use optimized version. It's just in my set of good practices to remove every constant component out of the loop. It's not much work and it makes code cleaner. It's not "premature optimization" and it does not introduce any problems or tradeoffs. It makes debugging easier (stepping). And it could potentially make the code faster. So it's a no-brainer to me.

Related

Why is passing pointer-pointers not motivated in Cocoa development?

When I pass a string the Apple-style way to a function and test it a billion times it takes ~ 42,001 seconds:
- (void)test:(NSString *)str {
NSString *test = str;
if (test) {
return;
}
}
NSString *value = #"Value 1";
NSLog(#"START");
for (int i = 0; i < 1e9; i++) {
[self test:value];
}
NSLog(#"END");
But then passing the pointer it's pointer as a value (assuming my test function will be read-only style) like so:
- (void)test:(NSString **)str {
NSString *test = *str;
if (test) {
return;
}
}
NSLog(#"START");
for (int i = 0; i < 1e9; i++) {
[self test:&value];
}
NSLog(#"END");
..only takes ~26,804 seconds.
Why does Apple promote the first example as normal practice, while the latter seems to perform so different?
I read about the Toll-Free Bridging that Foundation applies, but if the difference is relatively so big, what's the added value? A whole application that would run a factor of more than 100% faster by just upgrading some major function arguments like this, then isn't that a considerable flaw by Apple, in their way of instructing how to build apps in Objective-C?

You wouldn't use the NSString ** syntax, as that suggests that the method you're calling can change what value points to. You would never do that unless this is really what was taking place.
The simple NSString * example may be taking longer because in the absence of any optimization, the NSString * rendition is probably adding/removing of a strong references to value when the method is called and returns.
If you turn on optimization, the behavior changes. For example, when I used -Os "Fastest, Smallest" build setting, the NSString * rendition was actually faster than the NSString ** one. And even if the performance was worse, I wouldn't write the code that exposed me to all sorts problems down the line just because it was was 0.0000152 seconds faster per call. I'd find other ways to optimize the code.
To quote Donald Knuth:
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. [Emphasis added]
The goal is always to write code whose functional intent is clear, whose type handling is safest and then, where possible, use the compiler's own internal optimization capabilities to tackle the performance issues. Only sacrifice the code readability and ease of maintenance and debugging when it's absolutely essential.

Threading nested for loops

I've been looking for a while for a similar question but without any success. I don't know how to optimize some code in cocoa to use all available cores of CPU (I don't want to use GPU at the moment). Below is simple sample of code with case I mean:
int limA = 1000;
int limB = 1000;
unsigned short tmp;
for (int i = 0; i < 10000; i++) {
for (int a = 0; a < limA; a++) {
for (int b = 0; b < limB; b++) {
tmp = [[array objectAtIndex:(a*b)] unsignedShortValue];
c_array[a*limB+b] += tmp;
}
}
}
assume that array and c_array is properly initialized etc... But as you can see, if we have many iterations (in this case: 10^10) it takes some time to execute this code. I thought that maybe It is simple possibility to execute this code in few threads, but how to synchronize c_array? What is the best way to improve time execution of this kind of code in objective-c? Maybe it could be done this way that iterations 0-2499 of most external for loop would be executed in thread 1 and 2500-4999 thread 2 etc... ? I know that this is silly way but I don't need "real time" performance... any ideas?

A few suggestions:
Do an initial pass over the array to extract all the shorts from their object wrappers:
short *tmp_array = calloc(limA * limB, sizeof(short));
int tmp_idx = 0;
for (NSNumber *num in array) {
tmp_array[tmp_idx++] = [num unsignedShortValue];
}
This has several benefits. You go from 10^10 method calls to 10^6, your inner loop stops being opaque to the compiler (it can't "see through" method calls), and your inner loop gets smaller and more likely to fit in the instruction cache.
Try to linearize access patterns. Right now you're doing 'strided' access, since the index is being multiplied each time. If you can rearrange the data in tmp_array so that elements that are processed sequentially are also sequential in the array, you should get much better performance (since each access to the array is loading a full cache line, which is 64 bytes on most processors).
Getting a benefit out of parallelism is likely to be tricky. You could try replacing the outer loop with:
dispatch_apply(10000, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(size_t i) {
});
and the += in the inner loop with OSAtomicAdd, but my suspicion is that your speed is going to be dominated by memory accesses anyway, and adding more processors to the mix will just lead to them stepping on each other's toes (i.e. processor 0 loads c_array[1500] so that it knows what to add tmp to, which actually loads the cache line covering [1500-1531], then processor 1 writes to c_array[1512], invalidating that entire cache line and forcing it to be re-read). Also, I'm pretty sure you'd need to store 32 bit values in c_array to do that, since you'd be using OSAtomicAdd32 (there's no OSAtomicAdd16).
At the very least, if you're going to parallelize, then you need to figure out how to divide the work into non-overlapping chunks of 32 elements of c_array (i.e. 64 bytes), so that you can avoid contention. Dividing up the ranges of the array should also let you avoid needing to use atomic add operations.
(edit)
Check out an0's answer for some practical suggestions for parallelizing this, rather than this discussion of why the naive parallelization won't work :)

First, follow #Catfish_Man's suggestion, except for the parallelism part.
For the parallelism, here are my ideas:
The outmost loop is meaningless. Just use 10000 * tmp instead of tmp.
Since the segments of target array to be written to are strictly disjoint for different a values, the second level of loop can be easily parallelized. In fact, it also applies to b. But if we also parallelize over b the calculation unit left in the body will be too small for the splitting of the work load to be useful.
Code:
int limA = 1000;
int limB = 1000;
short *tmp_array = calloc(limA * limB, sizeof(short));
int tmp_idx = 0;
for (NSNumber *num in array) {
tmp_array[tmp_idx++] = [num unsignedShortValue];
}
dispatch_apply(limA, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(size_t a) {
for (int b = 0; b < limB; b++) {
tmp = ;
c_array[a*limB+b] += 1000 * tmp_array[a*b];
}
});
free(tmp_array);

First, follow #Catfish_Man's suggestions. Then follow #an0's suggestions. Then do this as well:
// ...
short *tmp_array = calloc(limA * limB, sizeof(short));
unsigned short (*unsignedShortValueIMP)(id, SEL) = class_getMethodImplementation([NSNumber class], #selector(unsignedShortValue));
void * (*objectAtIndexIMP)(id, SEL, NSUInteger) = class_getMethodImplementation(array.class, #selector(objectAtIndex:));
NSUInteger n = array.count;
for (NSUInteger i = 0; i < n; ++i) {
void *obj = objectAtIndexIMP(array, #selector(objectAtIndex:), i);
tmp_array[i] = unsignedShortValueIMP((__bridge id)obj, #selector(unsignedShortValue));
}
// ...
By lifting the IMPs out of Objective-C, you bypass all the overhead of the message dispatch machinery and allow the compiler to "see through" the calls; while these selectors are part of Foundation and can't be inlined, removing the extra levels of indirection improves the holy heck out of the branch prediction and prefetching machinery in the CPU cores. In addition, by using a raw C for loop instead of Objective-C's array enumeration, AND not forcing the opacity of objc_msgSend() on the compiler, you allow Clang's loop unwinding and vectorization optimizers to work.
#Catfish_Man may be able to tell me this is an outmoded optimization no longer worth doing, but as far as I'm aware, it's still a win for massive repetitions of calling the same methods like this.
Final note: My code assumes ARC, so uses void * and a bridge cast instead of id on the objectAtIndex: IMP to bypass the extra implicit retain/release pair. This is evil shadow hackery, disabling ARC for the file in question is a better solution, and I should be ashamed of myself.

math expression evaluation - very fast - with objective-c

i want to evaluate a math expression like y = 2(x * x) + 2.
But i need it in a loop, where the x changes maybe 100000 times.
I have written code to translate the expression in a parse tree.
Then i have a method to evaluate the parse tree.
- (double) evaluate:(TreeNode *)node variable:(double)x
{
if ([node _operand] != 0)
{
return [node _operand];
}
else if ([node _variable] != NULL)
{
return x;
}
else if ([node _operator] != NULL)
{
if ([[node _operator] isEqualToString: #"+"])
{
return ([self evaluate:[node left] variable:x] + [self evaluate:[node right] variable:x]);
}
else if ([[node _operator] isEqualToString: #"-"])
{
return ([self evaluate:[node left] variable:x] - [self evaluate:[node right] variable:x]);
}
else if ([[node _operator] isEqualToString: #"*"])
{
return ([self evaluate:[node left] variable:x] * [self evaluate:[node right] variable:x]);
}
else if ([[node _operator] isEqualToString: #"/"])
{
return ([self evaluate:[node left] variable:x] / [self evaluate:[node right] variable:x]);
}
}
return 0;
}
Someone said: if i gotta go for speed, i could translate the expression into C code, compile and link it into a dll on-the-fly and load it (takes about a second). That, plus memoized versions of the math functions, could give me the best performance.
How can i achive that ?
How can i compile the math expression into C code and compile and link it into a dll or so. And then load it on the fly to speed the loop up ?
Thanks a lot !
Chris

My advice: Do not write this code yourself. Having written code that does this, there are some things to be aware of:
Parsing mathematical expressions is not a trivial problem, if you're going to do it correctly and fully. You have to consider things like the associativity of each operator: what happens if you find more than one of the same operator in an expression? Which one do you evaluate first? How do you deal with operators whose precedence changes depending on their context? (for example, the negation operator) These are hard questions, and very few implementations get it right.
As was mentioned in a comment on the question, there are some things that can do this for you already:
NSPredicate. Pros: built-in, reasonably fast, decent precision. Cons: the exponent is parsed with incorrect associativity, not extensible, does not support implicit multiplication (i.e., 2(x*x)), does not parse the negation operator correctly.
GCMathParser. Pros: very fast, decent precision. Cons: not extensible, does not support implicit multiplication, does not parse the negation operator correctly.
DDMathParser. Pros: excellent precision, extensible, supports implicit multiplication. Cons: not quite as fast as the other two, due to the parsing engine and high precision math
Obviously, I recommend DDMathParser (I wrote it). In your case, you'd want to do something like this:
NSError *error = nil;
NSString *math = [DDExpression expressionFromString:#"2($x * $x) + 2" error:&error];
for (int x = 0; x < 100; ++x) {
NSNumber *variable = [NSNumber numberWithInt:x];
NSDictionary *sub = [NSDictionary dictionaryWithObject:variable forKey:#"x"];
NSNumber *value = [math evaluateWithSubstitutions:sub evaluator:nil error:&error];
NSLog(#"value: %#", value);
}
DDMathParser is available on GitHub: https://github.com/davedelong/DDMathParser . Please be mindful of its license (free for use with attribution).
However, if you're ok with sacrificing some precision (and a couple of cases of it being incorrect) in exchange for blazing fast speed, I'd recommend using GCMathParser.

If you were to performance analyze that code, you'd [very most likely almost 100% assuredly] find that string comparison is what is killing your performance.
An easy fix is to split parsing from evaluation. That is, parse the expression into an intermediate form (like what jills and Rudy allude to, but simpler) and then evaluate that intermediate form.
That is, you might create a "parse:" method that [recursively] walks your tree of nodes, parses each, and then sets a property to some # representing the operator.
typedef enum {
PlusOperator,
SinOperator,
..... etc ....
} OperatorID;
#property(nonatomic) OperatorID operatorID;
Then, your evaluate:variable:'s if/else would be replaced with a switch statement.
switch([node operatorID) {
case PlusOperator:
....
break;
... etc ...
Hi, thanks a lot. But i already parsed the expression and have build a
tree, which i evaluate with the method above. What i need is a faster
evaluation in a loop.
Don't represent the parse tree as strings.
I.e. instead of _operator returning an NSString, make it return an int (or OperatorID, if using the above) then use a switch statement.
#property(nonatomic) OperatorID _operator;
Since you are already parsing the expression, this should be even easier / more straightforward to do.

i want to evaluate a math expression like y = 2(x * x) + 2.
But i need it in a loop, where the x changes maybe 100000 times.
You should consider using the TinyExpr math expression evaluation library. It is written in C, and will do exactly what you want. If you would prefer to code it yourself, TinyExpr is only 500 lines of code, so it's probably the simplest complete example you'll find.
Here's how you would solve your problem with x constantly changing:
double x;
te_variable vars[] = {{"x", &x}};
te_expr *expr = te_compile("2*(x*x)+2", vars, 1, 0);
for (x = 0; x < 100000; ++x) {
double y = te_eval(expr);
printf("x=%f, y=%f\n", x, y);
}
Note that the expression is automatically reevaluated with the present value of x, even though the expression is only "compiled" once.
If you need to be even faster, you could always generate code at run-time and then invoke an actual compiler. However, the time it takes to run the compiler would likely dwarf the speed savings until you're well into billions of evaluations. The 100,000 evaluation number you gave in your question would likely be evaluated almost instantly by TinyExpr. It's pretty fast.

You could get an existing expression parser. Some of them can "compile" such expressions on the fly to some internal format that would make evaluating the expression faster, and then allow you to provide it with values for the variables. The "compilation" would be done before the loop and the substitution once every loop iteration.
I know such expression parsers/evaluators exist for Delphi, but I don't know any for C, sorry. I assume you can find them online, as C has a far larger worldwide code base than Delphi. Just google (or bing, etc.) for "expression parser" and then look if the ones you found can do such substitutions without having to reparse the expression.

What's wrong with simply using OO design?
#implementation TreeNodeAdd
-(double)evaluateWithVariable:(double)x
{
return [left evaluateWithVariable:x] + [right evaluateWithVariable:x];
}
#end
...
- (double) evaluate:(TreeNode *)node variable:(double)x
{
return [node evaluateWithVariable:x];
}
The equivalent in C++ might be a little faster.

You cannot generate and execute machine code on iOS, but you can still do better than walking a parse tree. From the parse tree, generate instructions for a fictitious stack machine (think Forth, x87 machine code, java bytecode, CLR bytecode). While generating, you can determine how much stack space (numbers) you need. Then interpret these instructions for each value of x. This is faster because the instructions are more compact and have better locality than the parse tree and because no C recursion is used.
EDIT: For example, the expression sqrt(x+1) is translated to four instructions: one to push the variable x onto the stack, one to push the constant 1, one to pop the two numbers and push the sum and one to replace the sum with its square root. Any parse tree can easily be translated to such a list of instructions using a recursive function. An instruction could be represented by a struct containing an enum for the type of the instruction and a number for push constant instructions. The "stack" is not the C stack but simply an array of numbers with an integer that indicates how many are currently in use (which starts off as 0 and will end at 1).

When to use enumerateObjectsUsingBlock vs. for

Besides the obvious differences:
Use enumerateObjectsUsingBlock when you need both the index and the object
Don't use enumerateObjectsUsingBlock when you need to modify local variables (I was wrong about this, see bbum's answer)
Is enumerateObjectsUsingBlock generally considered better or worse when for (id obj in myArray) would also work? What are the advantages/disadvantages (for example is it more or less performant)?

Ultimately, use whichever pattern you want to use and comes more naturally in the context.
While for(... in ...) is quite convenient and syntactically brief, enumerateObjectsUsingBlock: has a number of features that may or may not prove interesting:
enumerateObjectsUsingBlock: will be as fast or faster than fast enumeration (for(... in ...) uses the NSFastEnumeration support to implement enumeration). Fast enumeration requires translation from an internal representation to the representation for fast enumeration. There is overhead therein. Block-based enumeration allows the collection class to enumerate contents as quickly as the fastest traversal of the native storage format. Likely irrelevant for arrays, but it can be a huge difference for dictionaries.
"Don't use enumerateObjectsUsingBlock when you need to modify local variables" - not true; you can declare your locals as __block and they'll be writable in the block.
enumerateObjectsWithOptions:usingBlock: supports either concurrent or reverse enumeration.
with dictionaries, block based enumeration is the only way to retrieve the key and value simultaneously.
Personally, I use enumerateObjectsUsingBlock: more often than for (... in ...), but - again - personal choice.

For simple enumeration, simply using fast enumeration (i.e. a for…in… loop) is the more idiomatic option. The block method might be marginally faster, but that doesn't matter much in most cases — few programs are CPU-bound, and even then it's rare that the loop itself rather than the computation inside will be a bottleneck.
A simple loop also reads more clearly. Here's the boilerplate of the two versions:
for (id x in y){
}
[y enumerateObjectsUsingBlock:^(id x, NSUInteger index, BOOL *stop){
}];
Even if you add a variable to track the index, the simple loop is easier to read.
So when you should use enumerateObjectsUsingBlock:? When you're storing a block to execute later or in multiple places. It's good for when you're actually using a block as a first-class function rather than an overwrought replacement for a loop body.

Although this question is old, things have not changed, the accepted answer is incorrect.
The enumerateObjectsUsingBlock API was not meant to supersede for-in, but for a totally different use case:
It allows the application of arbitrary, non-local logic. i.e. you don’t need to know what the block does to use it on an array.
Concurrent enumeration for large collections or heavy computation (using the withOptions: parameter)
Fast Enumeration with for-in is still the idiomatic method of enumerating a collection.
Fast Enumeration benefits from brevity of code, readability and additional optimizations which make it unnaturally fast. Faster than a old C for-loop!
A quick test concludes that in the year 2014 on iOS 7, enumerateObjectsUsingBlock is consistently 700% slower than for-in (based on 1mm iterations of a 100 item array).
Is performance a real practical concern here?
Definitely not, with rare exception.
The point is to demonstrate that there is little benefit to using enumerateObjectsUsingBlock: over for-in without a really good reason. It doesn't make the code more readable... or faster... or thread-safe. (another common misconception).
The choice comes down to personal preference. For me, the idiomatic and readable option wins. In this case, that is Fast Enumeration using for-in.
Benchmark:
NSMutableArray *arr = [NSMutableArray array];
for (int i = 0; i < 100; i++) {
arr[i] = [NSString stringWithFormat:#"%d", i];
}
int i;
__block NSUInteger length;
i = 1000 * 1000;
uint64_t a1 = mach_absolute_time();
while (--i > 0) {
for (NSString *s in arr) {
length = s.length;
}
}
NSLog(#"For-in %llu", mach_absolute_time()-a1);
i = 1000 * 1000;
uint64_t b1 = mach_absolute_time();
while (--i > 0) {
[arr enumerateObjectsUsingBlock:^(NSString *s, NSUInteger idx, BOOL *stop) {
length = s.length;
}];
}
NSLog(#"Enum %llu", mach_absolute_time()-b1);
Results:
2014-06-11 14:37:47.717 Test[57483:60b] For-in 1087754062
2014-06-11 14:37:55.492 Test[57483:60b] Enum 7775447746

To answer the question about performance, I made some tests using my performance test project. I wanted to know which of the three options for sending a message to all objects in an array is the fastest.
The options were:
1) makeObjectsPerformSelector
[arr makeObjectsPerformSelector:#selector(_stubMethod)];
2) fast enumeration & regular message send
for (id item in arr)
{
[item _stubMethod];
}
3) enumerateObjectsUsingBlock & regular message send
[arr enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop)
{
[obj _stubMethod];
}];
It turns out that makeObjectsPerformSelector was the slowest by far. It took twice as long as fast enumeration. And enumerateObjectsUsingBlock was the fastest, it was around 15-20% faster than fast iteration.
So if you're very concerned about the best possible performance, use enumerateObjectsUsingBlock. But keep in mind that in some cases the time it takes to enumerate a collection is dwarfed by the time it takes to run whatever code you want each object to execute.

It's fairly useful to use enumerateObjectsUsingBlock as an outer loop when you want to break nested loops.
e.g.
[array1 enumerateObjectsUsingBlock:^(id obj1, NSUInteger idx, BOOL * _Nonnull stop) {
for(id obj2 in array2) {
for(id obj3 in array3) {
if(condition) {
// break ALL the loops!
*stop = YES;
return;
}
}
}
}];
The alternative is using goto statements.

Thanks to #bbum and #Chuck for starting comprehensive comparisons on performance. Glad to know it's trivial. I seem to have gone with:
for (... in ...) - as my default goto. More intuitive to me, more programming history here than any real preference - cross language reuse, less typing for most data structures due to IDE auto complete :P.
enumerateObject... - when access to object and index is needed. And when accessing non-array or dictionary structures (personal preference)
for (int i=idx; i<count; i++) - for arrays, when I need to start on a non-zero index

Which block of code is 'better'?

In order to promote good programming habits and increase the efficiency of my code (Read: "My brother and I are arguing over some code"), I propose this question to experienced programmers:
Which block of code is "better"?
For those who can't be bothered to read the code, is it worth putting a conditional within a for-loop to decrease the amount of redundant code than to put it outside and make 2 for-loops? Both pieces of code work, the question is efficiency vs. readability.
- (NSInteger)eliminateGroup {
NSMutableArray *blocksToKill = [[NSMutableArray arrayWithCapacity:rowCapacity*rowCapacity] retain];
NSInteger numOfBlocks = (NSInteger)[self countChargeOfGroup:blocksToKill];
Block *temp;
NSInteger chargeTotal = 0;
//Start paying attention here
if (numOfBlocks > 3)
for (NSUInteger i = 0; i < [blocksToKill count]; i++) {
temp = (Block *)[blocksToKill objectAtIndex:i];
chargeTotal += temp.charge;
[temp eliminate];
temp.beenCounted = NO;
}
}
else {
for (NSUInteger i = 0; i < [blocksToKill count]; i++) {
temp = (Block *)[blocksToKill objectAtIndex:i];
temp.beenCounted = NO;
}
}
[blocksToKill release];
return chargeTotal;
}
Or...
- (NSInteger)eliminateGroup {
NSMutableArray *blocksToKill = [[NSMutableArray arrayWithCapacity:rowCapacity*rowCapacity] retain];
NSInteger numOfBlocks = (NSInteger)[self countChargeOfGroup:blocksToKill];
Block *temp;
NSInteger chargeTotal = 0;
//Start paying attention here
for (NSUInteger i = 0; i < [blocksToKill count]; i++) {
temp = (Block *)[blocksToKill objectAtIndex:i];
if (numOfBlocks > 3) {
chargeTotal += temp.charge;
[temp eliminate];
}
temp.beenCounted = NO;
}
[blocksToKill release];
return chargeTotal;
}
Keep in mind that this is for a game. The method is called anytime the user double-taps the screen and the for loop normally runs anywhere between 1 and 15 iterations, 64 at maximum. I understand that it really doesn't matter that much, this is mainly for helping me understand exactly how costly conditional statements are. (Read: I just want to know if I'm right.)

The first code block is cleaner and more efficient because the check numOfBlocks > 3 is either true or false throughout the whole iteration.
The second code block avoids code duplication and might therefore pose lesser risk. However, it is conceptually more complicated.
The second block can be improved by adding
bool increaseChargeTotal = (numOfBlocks > 3)
before the loop and then using this boolean variable instead of the actual check inside the loop, emphasizing the fact that during the iteration it doesn't change.
Personally, in this case I would vote for the first option (duplicated loops) because the loop bodies are small and this shows clearly that the condition is external to the loop; also, it's more efficient and might fit the pattern "make the common case fast".

There is no way to answer this without defining your requirements for "better". Is it runtime efficiency? compiled size? code readability? code maintainability? code portability? code reuseability? algorithmic provability? developer efficiency? (Please leave comments on any popular measurements I've missed.)
Sometimes absolute runtime efficiency is all that matters, but not as often as people generally imagine, as you give a nod towards in your question—but this is at least easy to test! Often it's a mix of all these concerns, and you'll have to make a subjective judgement in the end.
Every answer here is applying a personal mix of these aspects, and people often get into vigorous Holy Wars because everyone's right—in the right circumstance. These approaches are ultimately wrong. The only correct approach is to define what matters to you, and then measure against it.

All other things being equal, having two separate loops will generally be faster, because you do the test once instead of every iteration of the loop. The branch inside the loop each iteration will often slow you down significantly due to pipeline stalls and branch mispredictions; however, since the branch always goes the same way, the CPU will almost certainly predict the branch correctly for every iteration except for the first few, assuming you're using a CPU with branch prediction (I'm not sure if the ARM chip used in the iPhone has a branch predictor unit).
However, another thing to consider is code size: the two loops approach generates a lot more code, especially if the rest of the body of the loop is large. Not only does this increase the size of your program's object code, but it also hurts your instruction cache performance -- you'll get a lot more cache misses.
All things considered, unless the code is a significant bottleneck in your application, I would go with the branch inside of the loop, as it leads to clearer code, and it doesn't violate the don't repeat yourself principle. If you make a change to once of the loops and forget to change the other loop in the two-loops version, you're in for a world of hurt.

I would go with the second option. If all of the logic in the loop was completely different, then it would make sense to make 2 for loops, but the case is that some of the logic is the same, and some is additional based upon the conditional. So the second option is cleaner.
The first option would be faster, but marginally so, and I would only use it if I found there to be a bottleneck there.

You would probably waste more time in the pointless and unnessesary [blocksToKill retain]/[blocksToKill release] at the start/end of the method than the time taken to execute a few dozens comparisons. There is no need to retain the array since you wont need it after you return and it will never be cleaned up before then.
IMHO, code duplication is a leading cause of bugs which should be avoided whenever possible.
Adding Jens recomendation to use fast enumeration and Antti's recomendation to use a clearly named boolean, you'd get something like:
- (NSInteger)eliminateGroup {
NSMutableArray *blocksToKill = [NSMutableArray arrayWithCapacity:rowCapacity*rowCapacity];
NSInteger numOfBlocks = (NSInteger)[self countChargeOfGroup:blocksToKill];
NSInteger chargeTotal = 0;
BOOL calculateAndEliminateBlocks = (numOfBlocks > 3);
for (Block* block in blocksToKill) {
if (calculateAndEliminateBlocks) {
chargeTotal += block.charge;
[block eliminate];
}
block.beenCounted = NO;
}
return chargeTotal;
}
If you finish your project and if your program is not running fast enough (two big ifs), then you can profile it and find the hotspots and then you can determine whether the few microseconds you spend contemplating that branch is worth thinking about — certainly it is not worth thinking about at all now, which means that the only consideration is which is more readable/maintainable.

My vote is strongly in favor of the second block.
The second block makes clear what the difference in logic is, and shares the same looping structure. It is both more readable and more maintainable.
The first block is an example of premature optimization.
As for using a bool to "save" all those LTE comparisons--in this case, I don't think it will help, the machine language will likely require exactly the same number and size of instructions.

The overhead of the "if" test is a handful of CPU instructions; way less than a microsecond. Unless you think the loop is going to run hundreds of thousands of times in response to user input, that's just lost in the noise. So I would go with the second solution because the code is both smaller and easier to understand.
In either case, though, I would change the loop to be
for (temp in blocksToKill) { ... }
This is both cleaner-reading and considerably faster than manually getting each element of the array.

Readability (and thus maintainability) can and should be sacrificed in the name of performance, but when, and only when, it's been determined that performance is an issue.
The second block is more readable, and unless/until speed is an issue, it is better (in my opinion). During testing for your app, if you find out this loop is responsible for unacceptable performance, then by all means, seek to make it faster, even if it becomes harder to maintain. But don't do it until you have to.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas