Threading nested for loops - objective-c

I've been looking for a while for a similar question but without any success. I don't know how to optimize some code in cocoa to use all available cores of CPU (I don't want to use GPU at the moment). Below is simple sample of code with case I mean:
int limA = 1000;
int limB = 1000;
unsigned short tmp;
for (int i = 0; i < 10000; i++) {
for (int a = 0; a < limA; a++) {
for (int b = 0; b < limB; b++) {
tmp = [[array objectAtIndex:(a*b)] unsignedShortValue];
c_array[a*limB+b] += tmp;
}
}
}
assume that array and c_array is properly initialized etc... But as you can see, if we have many iterations (in this case: 10^10) it takes some time to execute this code. I thought that maybe It is simple possibility to execute this code in few threads, but how to synchronize c_array? What is the best way to improve time execution of this kind of code in objective-c? Maybe it could be done this way that iterations 0-2499 of most external for loop would be executed in thread 1 and 2500-4999 thread 2 etc... ? I know that this is silly way but I don't need "real time" performance... any ideas?

A few suggestions:
Do an initial pass over the array to extract all the shorts from their object wrappers:
short *tmp_array = calloc(limA * limB, sizeof(short));
int tmp_idx = 0;
for (NSNumber *num in array) {
tmp_array[tmp_idx++] = [num unsignedShortValue];
}
This has several benefits. You go from 10^10 method calls to 10^6, your inner loop stops being opaque to the compiler (it can't "see through" method calls), and your inner loop gets smaller and more likely to fit in the instruction cache.
Try to linearize access patterns. Right now you're doing 'strided' access, since the index is being multiplied each time. If you can rearrange the data in tmp_array so that elements that are processed sequentially are also sequential in the array, you should get much better performance (since each access to the array is loading a full cache line, which is 64 bytes on most processors).
Getting a benefit out of parallelism is likely to be tricky. You could try replacing the outer loop with:
dispatch_apply(10000, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(size_t i) {
});
and the += in the inner loop with OSAtomicAdd, but my suspicion is that your speed is going to be dominated by memory accesses anyway, and adding more processors to the mix will just lead to them stepping on each other's toes (i.e. processor 0 loads c_array[1500] so that it knows what to add tmp to, which actually loads the cache line covering [1500-1531], then processor 1 writes to c_array[1512], invalidating that entire cache line and forcing it to be re-read). Also, I'm pretty sure you'd need to store 32 bit values in c_array to do that, since you'd be using OSAtomicAdd32 (there's no OSAtomicAdd16).
At the very least, if you're going to parallelize, then you need to figure out how to divide the work into non-overlapping chunks of 32 elements of c_array (i.e. 64 bytes), so that you can avoid contention. Dividing up the ranges of the array should also let you avoid needing to use atomic add operations.
(edit)
Check out an0's answer for some practical suggestions for parallelizing this, rather than this discussion of why the naive parallelization won't work :)

First, follow #Catfish_Man's suggestion, except for the parallelism part.
For the parallelism, here are my ideas:
The outmost loop is meaningless. Just use 10000 * tmp instead of tmp.
Since the segments of target array to be written to are strictly disjoint for different a values, the second level of loop can be easily parallelized. In fact, it also applies to b. But if we also parallelize over b the calculation unit left in the body will be too small for the splitting of the work load to be useful.
Code:
int limA = 1000;
int limB = 1000;
short *tmp_array = calloc(limA * limB, sizeof(short));
int tmp_idx = 0;
for (NSNumber *num in array) {
tmp_array[tmp_idx++] = [num unsignedShortValue];
}
dispatch_apply(limA, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(size_t a) {
for (int b = 0; b < limB; b++) {
tmp = ;
c_array[a*limB+b] += 1000 * tmp_array[a*b];
}
});
free(tmp_array);

First, follow #Catfish_Man's suggestions. Then follow #an0's suggestions. Then do this as well:
// ...
short *tmp_array = calloc(limA * limB, sizeof(short));
unsigned short (*unsignedShortValueIMP)(id, SEL) = class_getMethodImplementation([NSNumber class], #selector(unsignedShortValue));
void * (*objectAtIndexIMP)(id, SEL, NSUInteger) = class_getMethodImplementation(array.class, #selector(objectAtIndex:));
NSUInteger n = array.count;
for (NSUInteger i = 0; i < n; ++i) {
void *obj = objectAtIndexIMP(array, #selector(objectAtIndex:), i);
tmp_array[i] = unsignedShortValueIMP((__bridge id)obj, #selector(unsignedShortValue));
}
// ...
By lifting the IMPs out of Objective-C, you bypass all the overhead of the message dispatch machinery and allow the compiler to "see through" the calls; while these selectors are part of Foundation and can't be inlined, removing the extra levels of indirection improves the holy heck out of the branch prediction and prefetching machinery in the CPU cores. In addition, by using a raw C for loop instead of Objective-C's array enumeration, AND not forcing the opacity of objc_msgSend() on the compiler, you allow Clang's loop unwinding and vectorization optimizers to work.
#Catfish_Man may be able to tell me this is an outmoded optimization no longer worth doing, but as far as I'm aware, it's still a win for massive repetitions of calling the same methods like this.
Final note: My code assumes ARC, so uses void * and a bridge cast instead of id on the objectAtIndex: IMP to bypass the extra implicit retain/release pair. This is evil shadow hackery, disabling ARC for the file in question is a better solution, and I should be ashamed of myself.

Related

Why do I get the error "Array initializer must be an initializer list" when I'm trying to return this array in a function?

I am coming from Java, and I'm very new to Objective C. Anyway, I have this static method which is designed to make a copy of an array (if there's a better way to accomplish this, please let me know, but I'm asking this question more-so to find out why I got this error and how to avoid such an error in the future.) I ran into some problems with it, but just when I thought I had them all sorted out, I got this error that looked like
Here is the method in the interface:
+ (float[]) copyArray: (float[]) array withLength: (int) length;
And here is the method in the implementation:
+ (float[]) copyArray: (float[]) array withLength: (int) length
{
float copiedArray[length];
for (int i = 0; i < length; i++)
{
copiedArray[i] = array[i];
}
return copiedArray;
}
If all you really want is to copy the first n elements from one C array into another already existing array, probably the best way is to simply use memcpy:
memcpy(targetArray, sourceArray, sizeof(sourceArray[0]) * numElements);
The sizeof(sourceArray[0]) calculates the byte-size of the type in your array (in your case, it's equivalent to sizeof(float).
method/function cannot return C array. you should do this
+ (void) copyArrayFrom:(float *)array to:(float *)toArray withLength: (unsigned) length
{
for (int i = 0; i < length; i++)
{
toArray [i] = array[i];
}
}
C arrays are way more tricky than Java arrays. One of the biggest issues is that in a lot of instances, you don't know how large a C array is unless you have saved this information in a different variable, for example. The C FAQ "Arrays and Pointers" lists a lot of traps and they apply to Objective-C as well. You might want to see question 6.5 in particular.
As #lwxted already suggested, try to avoid C arrays unless you really know what you're doing and you have determined that you need them. C arrays are faster than NSArray but unless you have determined that your array really is a performance bottleneck by measuring with a profiler you will most likely not notice any difference.
And I strongly recommend avoiding a C array of Objective-C objects (id objects[]) unless you really, really know very well what you are doing (memory management issues).
In Objective-C, unless for particular needs, a better way to handle this usually is to use the NSArray as opposed to C arrays.
[NSArray arrayWithArray: array];
will copy an array.
Besides, in this case, if you insist on using C arrays, the use of implicitly typed length float[] is advised against. A better way is to use pointers to manipulate arrays.
Also, the stack-allocated array would be invalid after leaving the function, since it's local only in the scope of the copyArray function. You should dynamically allocate memory, if you wish the array to be valid outside the scope.
While I agree with all the points #DarkDust makes, if you're working with a C API such as OpenGL, there may be situations where using NSArray and NSNumber vs. C arrays of type float will have performance impacts. As always, try to use the simpler approach first, and carefully measure performance before deciding to optimize.
In any case, to answer the original question, here's how to correctly return a copy of a C array:
+ (float *)copyOfCArray:(float *)array withLength:(int)length
{
float *copyOfArray = malloc(length * sizeof(float));
for (int i = 0; i < length; i++) {
copyOfArray[i] = array[i];
}
return copyOfArray;
}
Also, there's arguably no need to make the above a method at all. Instead, consider writing it as a C function:
float *CopyArray(float *array, int length)
{
// Implementation would be the same...
}

Strange for loops I'm not familiar with: "for (id * in *)"

I apologize if this question is exceedingly simple, but I've Googled like crazy and am unable to find a suitable explanation for what this is.
for (id line in self.lines){
[linesCopy addObject:[line copyWithZone:zone]];
}
I'm just learning Objective-C, and this is a form of for loop that I've never seen before. I'm familiar with the simple
for (int x = 1, x < 10, x++)
style of for loop.
From Cocoa Core Competencies: Enumeration:
Fast Enumeration
Several Cocoa classes, including the collection classes, adopt the NSFastEnumeration protocol. You use it to retrieve elements held by an instance using a syntax similar to that of a standard C for loop, as illustrated in the following example:
NSArray *anArray = // get an array;
for (id element in anArray) {
/* code that acts on the element */
}
As the name suggests, fast enumeration is more efficient than other forms of enumeration.
In case you didn't know, id is an Objective-C type that basically means “a pointer to any Objective-C object”. Note that the pointer-ness of id is built in to it; you usually do not want to say id *.
If you expect the elements of anArray to be of a specific class, say MyObject, you can use that instead:
for (MyObject *element in anArray) {
/* code that acts on the element */
}
However, neither the compiler nor the runtime will check that the elements are indeed instances of MyObject. If an element of anArray is not an instance of MyObject, you'll probably end up trying to send it a message it doesn't understand, and get a selector-not-recognized exception.
It's the shorthand equivalent of this common form:
for (int i = 0; i < [self.lines count]; i++) {
id line = [self.lines objectAtIndex:i];
// ...
}
It's such a common looping idiom (walking through some collection, array, set, etc. an item at a time), that it's been turned into a shorthand form like this, called "fast enumeration".
In fact, in its internal implementation, it's actually slightly more faster than doing it yourself, so it's preferable both for clarity and performance.
It's a statement that can be used with classes that are conform to NSFastEnumeration protocol. When you have this available, the Objective-C programming guide suggest you to use it. Take a look here. It's a way to iterate over a collection without the traditional for (int i = 0; i < length; ++i) syntax.
Mind that it doesn't usually support deleting and inserting elements while iterating through this way (also by using normal for loops you should take care about indices in any case).
Basically all standard collections supports this way of iteration.
It's called a forin loop, also called fast enumeration. Basically, the syntax is:
for (SomeObjectIAmExpecting *localVariableName in anArrayOfObjects)
{
if (![localVariableName isKindOfClass:[SomeObjectIAmExpecting class]]) continue; //To avoid errors.
//do something to them
}

Is it worth precomputing the conditional in a for loop?

Is there a difference between the following two code blocks in terms of the resulting machine code when using the llvm or gcc compilers?
When is this optimization actually worthwhile, if ever?
Not optimized:
for (int i=0; i<array.count; i++) {
//do some work
}
Optimized:
int count = array.count;
for (int i=0; i<count; i++) {
//do some work
}
EDIT: I should point out that array is immutable and array.count doesn't change during the loop's execution.
You really need to check it yourself. My guess is that there is a difference in the emitted code, but it might depend on compiler and compiler options, and it certainly can depend on the definition of array.
Nearly never, on the assumption that evaluating array.count is nearly always insignificant compared with "some work". The way to measure it, though, is to use a profiler (or equivalent) and observe what proportion of your program's runtime is spent at that line of code. Provided the profiler is accurate, that's the most you could hope to gain by changing it.
Suppose array.count is something really slow, that you happen to know will always return the same result but the compiler doesn't know that. Then it might be worth manually hoisting it. strlen gets used as an example. It's debateable how often strlen is actually slow in practice, but easy to manufacture examples likely to run slower than they need to:
char some_function(char a) {
return (a * 2 + 1) & 0x3F;
}
for (int i = 0; i < strlen(ptr); ++i) {
ptr[i] = some_function(ptr[i]); // faster than strlen for long enough strings.
}
You and I know that some_function never returns 0, and hence the length of the string never changes. The compiler might not see the definition of some_function, and even if it does see the definition might not realize that its non-zero-returningness is important.
The Steve Jessop answer is a good one. I just want to add:
Personally, I always use optimized version. It's just in my set of good practices to remove every constant component out of the loop. It's not much work and it makes code cleaner. It's not "premature optimization" and it does not introduce any problems or tradeoffs. It makes debugging easier (stepping). And it could potentially make the code faster. So it's a no-brainer to me.

When to use enumerateObjectsUsingBlock vs. for

Besides the obvious differences:
Use enumerateObjectsUsingBlock when you need both the index and the object
Don't use enumerateObjectsUsingBlock when you need to modify local variables (I was wrong about this, see bbum's answer)
Is enumerateObjectsUsingBlock generally considered better or worse when for (id obj in myArray) would also work? What are the advantages/disadvantages (for example is it more or less performant)?
Ultimately, use whichever pattern you want to use and comes more naturally in the context.
While for(... in ...) is quite convenient and syntactically brief, enumerateObjectsUsingBlock: has a number of features that may or may not prove interesting:
enumerateObjectsUsingBlock: will be as fast or faster than fast enumeration (for(... in ...) uses the NSFastEnumeration support to implement enumeration). Fast enumeration requires translation from an internal representation to the representation for fast enumeration. There is overhead therein. Block-based enumeration allows the collection class to enumerate contents as quickly as the fastest traversal of the native storage format. Likely irrelevant for arrays, but it can be a huge difference for dictionaries.
"Don't use enumerateObjectsUsingBlock when you need to modify local variables" - not true; you can declare your locals as __block and they'll be writable in the block.
enumerateObjectsWithOptions:usingBlock: supports either concurrent or reverse enumeration.
with dictionaries, block based enumeration is the only way to retrieve the key and value simultaneously.
Personally, I use enumerateObjectsUsingBlock: more often than for (... in ...), but - again - personal choice.
For simple enumeration, simply using fast enumeration (i.e. a for…in… loop) is the more idiomatic option. The block method might be marginally faster, but that doesn't matter much in most cases — few programs are CPU-bound, and even then it's rare that the loop itself rather than the computation inside will be a bottleneck.
A simple loop also reads more clearly. Here's the boilerplate of the two versions:
for (id x in y){
}
[y enumerateObjectsUsingBlock:^(id x, NSUInteger index, BOOL *stop){
}];
Even if you add a variable to track the index, the simple loop is easier to read.
So when you should use enumerateObjectsUsingBlock:? When you're storing a block to execute later or in multiple places. It's good for when you're actually using a block as a first-class function rather than an overwrought replacement for a loop body.
Although this question is old, things have not changed, the accepted answer is incorrect.
The enumerateObjectsUsingBlock API was not meant to supersede for-in, but for a totally different use case:
It allows the application of arbitrary, non-local logic. i.e. you don’t need to know what the block does to use it on an array.
Concurrent enumeration for large collections or heavy computation (using the withOptions: parameter)
Fast Enumeration with for-in is still the idiomatic method of enumerating a collection.
Fast Enumeration benefits from brevity of code, readability and additional optimizations which make it unnaturally fast. Faster than a old C for-loop!
A quick test concludes that in the year 2014 on iOS 7, enumerateObjectsUsingBlock is consistently 700% slower than for-in (based on 1mm iterations of a 100 item array).
Is performance a real practical concern here?
Definitely not, with rare exception.
The point is to demonstrate that there is little benefit to using enumerateObjectsUsingBlock: over for-in without a really good reason. It doesn't make the code more readable... or faster... or thread-safe. (another common misconception).
The choice comes down to personal preference. For me, the idiomatic and readable option wins. In this case, that is Fast Enumeration using for-in.
Benchmark:
NSMutableArray *arr = [NSMutableArray array];
for (int i = 0; i < 100; i++) {
arr[i] = [NSString stringWithFormat:#"%d", i];
}
int i;
__block NSUInteger length;
i = 1000 * 1000;
uint64_t a1 = mach_absolute_time();
while (--i > 0) {
for (NSString *s in arr) {
length = s.length;
}
}
NSLog(#"For-in %llu", mach_absolute_time()-a1);
i = 1000 * 1000;
uint64_t b1 = mach_absolute_time();
while (--i > 0) {
[arr enumerateObjectsUsingBlock:^(NSString *s, NSUInteger idx, BOOL *stop) {
length = s.length;
}];
}
NSLog(#"Enum %llu", mach_absolute_time()-b1);
Results:
2014-06-11 14:37:47.717 Test[57483:60b] For-in 1087754062
2014-06-11 14:37:55.492 Test[57483:60b] Enum 7775447746
To answer the question about performance, I made some tests using my performance test project. I wanted to know which of the three options for sending a message to all objects in an array is the fastest.
The options were:
1) makeObjectsPerformSelector
[arr makeObjectsPerformSelector:#selector(_stubMethod)];
2) fast enumeration & regular message send
for (id item in arr)
{
[item _stubMethod];
}
3) enumerateObjectsUsingBlock & regular message send
[arr enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop)
{
[obj _stubMethod];
}];
It turns out that makeObjectsPerformSelector was the slowest by far. It took twice as long as fast enumeration. And enumerateObjectsUsingBlock was the fastest, it was around 15-20% faster than fast iteration.
So if you're very concerned about the best possible performance, use enumerateObjectsUsingBlock. But keep in mind that in some cases the time it takes to enumerate a collection is dwarfed by the time it takes to run whatever code you want each object to execute.
It's fairly useful to use enumerateObjectsUsingBlock as an outer loop when you want to break nested loops.
e.g.
[array1 enumerateObjectsUsingBlock:^(id obj1, NSUInteger idx, BOOL * _Nonnull stop) {
for(id obj2 in array2) {
for(id obj3 in array3) {
if(condition) {
// break ALL the loops!
*stop = YES;
return;
}
}
}
}];
The alternative is using goto statements.
Thanks to #bbum and #Chuck for starting comprehensive comparisons on performance. Glad to know it's trivial. I seem to have gone with:
for (... in ...) - as my default goto. More intuitive to me, more programming history here than any real preference - cross language reuse, less typing for most data structures due to IDE auto complete :P.
enumerateObject... - when access to object and index is needed. And when accessing non-array or dictionary structures (personal preference)
for (int i=idx; i<count; i++) - for arrays, when I need to start on a non-zero index

Which block of code is 'better'?

In order to promote good programming habits and increase the efficiency of my code (Read: "My brother and I are arguing over some code"), I propose this question to experienced programmers:
Which block of code is "better"?
For those who can't be bothered to read the code, is it worth putting a conditional within a for-loop to decrease the amount of redundant code than to put it outside and make 2 for-loops? Both pieces of code work, the question is efficiency vs. readability.
- (NSInteger)eliminateGroup {
NSMutableArray *blocksToKill = [[NSMutableArray arrayWithCapacity:rowCapacity*rowCapacity] retain];
NSInteger numOfBlocks = (NSInteger)[self countChargeOfGroup:blocksToKill];
Block *temp;
NSInteger chargeTotal = 0;
//Start paying attention here
if (numOfBlocks > 3)
for (NSUInteger i = 0; i < [blocksToKill count]; i++) {
temp = (Block *)[blocksToKill objectAtIndex:i];
chargeTotal += temp.charge;
[temp eliminate];
temp.beenCounted = NO;
}
}
else {
for (NSUInteger i = 0; i < [blocksToKill count]; i++) {
temp = (Block *)[blocksToKill objectAtIndex:i];
temp.beenCounted = NO;
}
}
[blocksToKill release];
return chargeTotal;
}
Or...
- (NSInteger)eliminateGroup {
NSMutableArray *blocksToKill = [[NSMutableArray arrayWithCapacity:rowCapacity*rowCapacity] retain];
NSInteger numOfBlocks = (NSInteger)[self countChargeOfGroup:blocksToKill];
Block *temp;
NSInteger chargeTotal = 0;
//Start paying attention here
for (NSUInteger i = 0; i < [blocksToKill count]; i++) {
temp = (Block *)[blocksToKill objectAtIndex:i];
if (numOfBlocks > 3) {
chargeTotal += temp.charge;
[temp eliminate];
}
temp.beenCounted = NO;
}
[blocksToKill release];
return chargeTotal;
}
Keep in mind that this is for a game. The method is called anytime the user double-taps the screen and the for loop normally runs anywhere between 1 and 15 iterations, 64 at maximum. I understand that it really doesn't matter that much, this is mainly for helping me understand exactly how costly conditional statements are. (Read: I just want to know if I'm right.)
The first code block is cleaner and more efficient because the check numOfBlocks > 3 is either true or false throughout the whole iteration.
The second code block avoids code duplication and might therefore pose lesser risk. However, it is conceptually more complicated.
The second block can be improved by adding
bool increaseChargeTotal = (numOfBlocks > 3)
before the loop and then using this boolean variable instead of the actual check inside the loop, emphasizing the fact that during the iteration it doesn't change.
Personally, in this case I would vote for the first option (duplicated loops) because the loop bodies are small and this shows clearly that the condition is external to the loop; also, it's more efficient and might fit the pattern "make the common case fast".
There is no way to answer this without defining your requirements for "better". Is it runtime efficiency? compiled size? code readability? code maintainability? code portability? code reuseability? algorithmic provability? developer efficiency? (Please leave comments on any popular measurements I've missed.)
Sometimes absolute runtime efficiency is all that matters, but not as often as people generally imagine, as you give a nod towards in your question—but this is at least easy to test! Often it's a mix of all these concerns, and you'll have to make a subjective judgement in the end.
Every answer here is applying a personal mix of these aspects, and people often get into vigorous Holy Wars because everyone's right—in the right circumstance. These approaches are ultimately wrong. The only correct approach is to define what matters to you, and then measure against it.
All other things being equal, having two separate loops will generally be faster, because you do the test once instead of every iteration of the loop. The branch inside the loop each iteration will often slow you down significantly due to pipeline stalls and branch mispredictions; however, since the branch always goes the same way, the CPU will almost certainly predict the branch correctly for every iteration except for the first few, assuming you're using a CPU with branch prediction (I'm not sure if the ARM chip used in the iPhone has a branch predictor unit).
However, another thing to consider is code size: the two loops approach generates a lot more code, especially if the rest of the body of the loop is large. Not only does this increase the size of your program's object code, but it also hurts your instruction cache performance -- you'll get a lot more cache misses.
All things considered, unless the code is a significant bottleneck in your application, I would go with the branch inside of the loop, as it leads to clearer code, and it doesn't violate the don't repeat yourself principle. If you make a change to once of the loops and forget to change the other loop in the two-loops version, you're in for a world of hurt.
I would go with the second option. If all of the logic in the loop was completely different, then it would make sense to make 2 for loops, but the case is that some of the logic is the same, and some is additional based upon the conditional. So the second option is cleaner.
The first option would be faster, but marginally so, and I would only use it if I found there to be a bottleneck there.
You would probably waste more time in the pointless and unnessesary [blocksToKill retain]/[blocksToKill release] at the start/end of the method than the time taken to execute a few dozens comparisons. There is no need to retain the array since you wont need it after you return and it will never be cleaned up before then.
IMHO, code duplication is a leading cause of bugs which should be avoided whenever possible.
Adding Jens recomendation to use fast enumeration and Antti's recomendation to use a clearly named boolean, you'd get something like:
- (NSInteger)eliminateGroup {
NSMutableArray *blocksToKill = [NSMutableArray arrayWithCapacity:rowCapacity*rowCapacity];
NSInteger numOfBlocks = (NSInteger)[self countChargeOfGroup:blocksToKill];
NSInteger chargeTotal = 0;
BOOL calculateAndEliminateBlocks = (numOfBlocks > 3);
for (Block* block in blocksToKill) {
if (calculateAndEliminateBlocks) {
chargeTotal += block.charge;
[block eliminate];
}
block.beenCounted = NO;
}
return chargeTotal;
}
If you finish your project and if your program is not running fast enough (two big ifs), then you can profile it and find the hotspots and then you can determine whether the few microseconds you spend contemplating that branch is worth thinking about — certainly it is not worth thinking about at all now, which means that the only consideration is which is more readable/maintainable.
My vote is strongly in favor of the second block.
The second block makes clear what the difference in logic is, and shares the same looping structure. It is both more readable and more maintainable.
The first block is an example of premature optimization.
As for using a bool to "save" all those LTE comparisons--in this case, I don't think it will help, the machine language will likely require exactly the same number and size of instructions.
The overhead of the "if" test is a handful of CPU instructions; way less than a microsecond. Unless you think the loop is going to run hundreds of thousands of times in response to user input, that's just lost in the noise. So I would go with the second solution because the code is both smaller and easier to understand.
In either case, though, I would change the loop to be
for (temp in blocksToKill) { ... }
This is both cleaner-reading and considerably faster than manually getting each element of the array.
Readability (and thus maintainability) can and should be sacrificed in the name of performance, but when, and only when, it's been determined that performance is an issue.
The second block is more readable, and unless/until speed is an issue, it is better (in my opinion). During testing for your app, if you find out this loop is responsible for unacceptable performance, then by all means, seek to make it faster, even if it becomes harder to maintain. But don't do it until you have to.