When to use enumerateObjectsUsingBlock vs. for - objective-c

Besides the obvious differences:
Use enumerateObjectsUsingBlock when you need both the index and the object
Don't use enumerateObjectsUsingBlock when you need to modify local variables (I was wrong about this, see bbum's answer)
Is enumerateObjectsUsingBlock generally considered better or worse when for (id obj in myArray) would also work? What are the advantages/disadvantages (for example is it more or less performant)?

Ultimately, use whichever pattern you want to use and comes more naturally in the context.
While for(... in ...) is quite convenient and syntactically brief, enumerateObjectsUsingBlock: has a number of features that may or may not prove interesting:
enumerateObjectsUsingBlock: will be as fast or faster than fast enumeration (for(... in ...) uses the NSFastEnumeration support to implement enumeration). Fast enumeration requires translation from an internal representation to the representation for fast enumeration. There is overhead therein. Block-based enumeration allows the collection class to enumerate contents as quickly as the fastest traversal of the native storage format. Likely irrelevant for arrays, but it can be a huge difference for dictionaries.
"Don't use enumerateObjectsUsingBlock when you need to modify local variables" - not true; you can declare your locals as __block and they'll be writable in the block.
enumerateObjectsWithOptions:usingBlock: supports either concurrent or reverse enumeration.
with dictionaries, block based enumeration is the only way to retrieve the key and value simultaneously.
Personally, I use enumerateObjectsUsingBlock: more often than for (... in ...), but - again - personal choice.

For simple enumeration, simply using fast enumeration (i.e. a for…in… loop) is the more idiomatic option. The block method might be marginally faster, but that doesn't matter much in most cases — few programs are CPU-bound, and even then it's rare that the loop itself rather than the computation inside will be a bottleneck.
A simple loop also reads more clearly. Here's the boilerplate of the two versions:
for (id x in y){
}
[y enumerateObjectsUsingBlock:^(id x, NSUInteger index, BOOL *stop){
}];
Even if you add a variable to track the index, the simple loop is easier to read.
So when you should use enumerateObjectsUsingBlock:? When you're storing a block to execute later or in multiple places. It's good for when you're actually using a block as a first-class function rather than an overwrought replacement for a loop body.

Although this question is old, things have not changed, the accepted answer is incorrect.
The enumerateObjectsUsingBlock API was not meant to supersede for-in, but for a totally different use case:
It allows the application of arbitrary, non-local logic. i.e. you don’t need to know what the block does to use it on an array.
Concurrent enumeration for large collections or heavy computation (using the withOptions: parameter)
Fast Enumeration with for-in is still the idiomatic method of enumerating a collection.
Fast Enumeration benefits from brevity of code, readability and additional optimizations which make it unnaturally fast. Faster than a old C for-loop!
A quick test concludes that in the year 2014 on iOS 7, enumerateObjectsUsingBlock is consistently 700% slower than for-in (based on 1mm iterations of a 100 item array).
Is performance a real practical concern here?
Definitely not, with rare exception.
The point is to demonstrate that there is little benefit to using enumerateObjectsUsingBlock: over for-in without a really good reason. It doesn't make the code more readable... or faster... or thread-safe. (another common misconception).
The choice comes down to personal preference. For me, the idiomatic and readable option wins. In this case, that is Fast Enumeration using for-in.
Benchmark:
NSMutableArray *arr = [NSMutableArray array];
for (int i = 0; i < 100; i++) {
arr[i] = [NSString stringWithFormat:#"%d", i];
}
int i;
__block NSUInteger length;
i = 1000 * 1000;
uint64_t a1 = mach_absolute_time();
while (--i > 0) {
for (NSString *s in arr) {
length = s.length;
}
}
NSLog(#"For-in %llu", mach_absolute_time()-a1);
i = 1000 * 1000;
uint64_t b1 = mach_absolute_time();
while (--i > 0) {
[arr enumerateObjectsUsingBlock:^(NSString *s, NSUInteger idx, BOOL *stop) {
length = s.length;
}];
}
NSLog(#"Enum %llu", mach_absolute_time()-b1);
Results:
2014-06-11 14:37:47.717 Test[57483:60b] For-in 1087754062
2014-06-11 14:37:55.492 Test[57483:60b] Enum 7775447746

To answer the question about performance, I made some tests using my performance test project. I wanted to know which of the three options for sending a message to all objects in an array is the fastest.
The options were:
1) makeObjectsPerformSelector
[arr makeObjectsPerformSelector:#selector(_stubMethod)];
2) fast enumeration & regular message send
for (id item in arr)
{
[item _stubMethod];
}
3) enumerateObjectsUsingBlock & regular message send
[arr enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop)
{
[obj _stubMethod];
}];
It turns out that makeObjectsPerformSelector was the slowest by far. It took twice as long as fast enumeration. And enumerateObjectsUsingBlock was the fastest, it was around 15-20% faster than fast iteration.
So if you're very concerned about the best possible performance, use enumerateObjectsUsingBlock. But keep in mind that in some cases the time it takes to enumerate a collection is dwarfed by the time it takes to run whatever code you want each object to execute.

It's fairly useful to use enumerateObjectsUsingBlock as an outer loop when you want to break nested loops.
e.g.
[array1 enumerateObjectsUsingBlock:^(id obj1, NSUInteger idx, BOOL * _Nonnull stop) {
for(id obj2 in array2) {
for(id obj3 in array3) {
if(condition) {
// break ALL the loops!
*stop = YES;
return;
}
}
}
}];
The alternative is using goto statements.

Thanks to #bbum and #Chuck for starting comprehensive comparisons on performance. Glad to know it's trivial. I seem to have gone with:
for (... in ...) - as my default goto. More intuitive to me, more programming history here than any real preference - cross language reuse, less typing for most data structures due to IDE auto complete :P.
enumerateObject... - when access to object and index is needed. And when accessing non-array or dictionary structures (personal preference)
for (int i=idx; i<count; i++) - for arrays, when I need to start on a non-zero index

Related

Why is passing pointer-pointers not motivated in Cocoa development?

When I pass a string the Apple-style way to a function and test it a billion times it takes ~ 42,001 seconds:
- (void)test:(NSString *)str {
NSString *test = str;
if (test) {
return;
}
}
NSString *value = #"Value 1";
NSLog(#"START");
for (int i = 0; i < 1e9; i++) {
[self test:value];
}
NSLog(#"END");
But then passing the pointer it's pointer as a value (assuming my test function will be read-only style) like so:
- (void)test:(NSString **)str {
NSString *test = *str;
if (test) {
return;
}
}
NSLog(#"START");
for (int i = 0; i < 1e9; i++) {
[self test:&value];
}
NSLog(#"END");
..only takes ~26,804 seconds.
Why does Apple promote the first example as normal practice, while the latter seems to perform so different?
I read about the Toll-Free Bridging that Foundation applies, but if the difference is relatively so big, what's the added value? A whole application that would run a factor of more than 100% faster by just upgrading some major function arguments like this, then isn't that a considerable flaw by Apple, in their way of instructing how to build apps in Objective-C?
You wouldn't use the NSString ** syntax, as that suggests that the method you're calling can change what value points to. You would never do that unless this is really what was taking place.
The simple NSString * example may be taking longer because in the absence of any optimization, the NSString * rendition is probably adding/removing of a strong references to value when the method is called and returns.
If you turn on optimization, the behavior changes. For example, when I used -Os "Fastest, Smallest" build setting, the NSString * rendition was actually faster than the NSString ** one. And even if the performance was worse, I wouldn't write the code that exposed me to all sorts problems down the line just because it was was 0.0000152 seconds faster per call. I'd find other ways to optimize the code.
To quote Donald Knuth:
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. [Emphasis added]
The goal is always to write code whose functional intent is clear, whose type handling is safest and then, where possible, use the compiler's own internal optimization capabilities to tackle the performance issues. Only sacrifice the code readability and ease of maintenance and debugging when it's absolutely essential.

Threading nested for loops

I've been looking for a while for a similar question but without any success. I don't know how to optimize some code in cocoa to use all available cores of CPU (I don't want to use GPU at the moment). Below is simple sample of code with case I mean:
int limA = 1000;
int limB = 1000;
unsigned short tmp;
for (int i = 0; i < 10000; i++) {
for (int a = 0; a < limA; a++) {
for (int b = 0; b < limB; b++) {
tmp = [[array objectAtIndex:(a*b)] unsignedShortValue];
c_array[a*limB+b] += tmp;
}
}
}
assume that array and c_array is properly initialized etc... But as you can see, if we have many iterations (in this case: 10^10) it takes some time to execute this code. I thought that maybe It is simple possibility to execute this code in few threads, but how to synchronize c_array? What is the best way to improve time execution of this kind of code in objective-c? Maybe it could be done this way that iterations 0-2499 of most external for loop would be executed in thread 1 and 2500-4999 thread 2 etc... ? I know that this is silly way but I don't need "real time" performance... any ideas?
A few suggestions:
Do an initial pass over the array to extract all the shorts from their object wrappers:
short *tmp_array = calloc(limA * limB, sizeof(short));
int tmp_idx = 0;
for (NSNumber *num in array) {
tmp_array[tmp_idx++] = [num unsignedShortValue];
}
This has several benefits. You go from 10^10 method calls to 10^6, your inner loop stops being opaque to the compiler (it can't "see through" method calls), and your inner loop gets smaller and more likely to fit in the instruction cache.
Try to linearize access patterns. Right now you're doing 'strided' access, since the index is being multiplied each time. If you can rearrange the data in tmp_array so that elements that are processed sequentially are also sequential in the array, you should get much better performance (since each access to the array is loading a full cache line, which is 64 bytes on most processors).
Getting a benefit out of parallelism is likely to be tricky. You could try replacing the outer loop with:
dispatch_apply(10000, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(size_t i) {
});
and the += in the inner loop with OSAtomicAdd, but my suspicion is that your speed is going to be dominated by memory accesses anyway, and adding more processors to the mix will just lead to them stepping on each other's toes (i.e. processor 0 loads c_array[1500] so that it knows what to add tmp to, which actually loads the cache line covering [1500-1531], then processor 1 writes to c_array[1512], invalidating that entire cache line and forcing it to be re-read). Also, I'm pretty sure you'd need to store 32 bit values in c_array to do that, since you'd be using OSAtomicAdd32 (there's no OSAtomicAdd16).
At the very least, if you're going to parallelize, then you need to figure out how to divide the work into non-overlapping chunks of 32 elements of c_array (i.e. 64 bytes), so that you can avoid contention. Dividing up the ranges of the array should also let you avoid needing to use atomic add operations.
(edit)
Check out an0's answer for some practical suggestions for parallelizing this, rather than this discussion of why the naive parallelization won't work :)
First, follow #Catfish_Man's suggestion, except for the parallelism part.
For the parallelism, here are my ideas:
The outmost loop is meaningless. Just use 10000 * tmp instead of tmp.
Since the segments of target array to be written to are strictly disjoint for different a values, the second level of loop can be easily parallelized. In fact, it also applies to b. But if we also parallelize over b the calculation unit left in the body will be too small for the splitting of the work load to be useful.
Code:
int limA = 1000;
int limB = 1000;
short *tmp_array = calloc(limA * limB, sizeof(short));
int tmp_idx = 0;
for (NSNumber *num in array) {
tmp_array[tmp_idx++] = [num unsignedShortValue];
}
dispatch_apply(limA, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(size_t a) {
for (int b = 0; b < limB; b++) {
tmp = ;
c_array[a*limB+b] += 1000 * tmp_array[a*b];
}
});
free(tmp_array);
First, follow #Catfish_Man's suggestions. Then follow #an0's suggestions. Then do this as well:
// ...
short *tmp_array = calloc(limA * limB, sizeof(short));
unsigned short (*unsignedShortValueIMP)(id, SEL) = class_getMethodImplementation([NSNumber class], #selector(unsignedShortValue));
void * (*objectAtIndexIMP)(id, SEL, NSUInteger) = class_getMethodImplementation(array.class, #selector(objectAtIndex:));
NSUInteger n = array.count;
for (NSUInteger i = 0; i < n; ++i) {
void *obj = objectAtIndexIMP(array, #selector(objectAtIndex:), i);
tmp_array[i] = unsignedShortValueIMP((__bridge id)obj, #selector(unsignedShortValue));
}
// ...
By lifting the IMPs out of Objective-C, you bypass all the overhead of the message dispatch machinery and allow the compiler to "see through" the calls; while these selectors are part of Foundation and can't be inlined, removing the extra levels of indirection improves the holy heck out of the branch prediction and prefetching machinery in the CPU cores. In addition, by using a raw C for loop instead of Objective-C's array enumeration, AND not forcing the opacity of objc_msgSend() on the compiler, you allow Clang's loop unwinding and vectorization optimizers to work.
#Catfish_Man may be able to tell me this is an outmoded optimization no longer worth doing, but as far as I'm aware, it's still a win for massive repetitions of calling the same methods like this.
Final note: My code assumes ARC, so uses void * and a bridge cast instead of id on the objectAtIndex: IMP to bypass the extra implicit retain/release pair. This is evil shadow hackery, disabling ARC for the file in question is a better solution, and I should be ashamed of myself.

Why do I get the error "Array initializer must be an initializer list" when I'm trying to return this array in a function?

I am coming from Java, and I'm very new to Objective C. Anyway, I have this static method which is designed to make a copy of an array (if there's a better way to accomplish this, please let me know, but I'm asking this question more-so to find out why I got this error and how to avoid such an error in the future.) I ran into some problems with it, but just when I thought I had them all sorted out, I got this error that looked like
Here is the method in the interface:
+ (float[]) copyArray: (float[]) array withLength: (int) length;
And here is the method in the implementation:
+ (float[]) copyArray: (float[]) array withLength: (int) length
{
float copiedArray[length];
for (int i = 0; i < length; i++)
{
copiedArray[i] = array[i];
}
return copiedArray;
}
If all you really want is to copy the first n elements from one C array into another already existing array, probably the best way is to simply use memcpy:
memcpy(targetArray, sourceArray, sizeof(sourceArray[0]) * numElements);
The sizeof(sourceArray[0]) calculates the byte-size of the type in your array (in your case, it's equivalent to sizeof(float).
method/function cannot return C array. you should do this
+ (void) copyArrayFrom:(float *)array to:(float *)toArray withLength: (unsigned) length
{
for (int i = 0; i < length; i++)
{
toArray [i] = array[i];
}
}
C arrays are way more tricky than Java arrays. One of the biggest issues is that in a lot of instances, you don't know how large a C array is unless you have saved this information in a different variable, for example. The C FAQ "Arrays and Pointers" lists a lot of traps and they apply to Objective-C as well. You might want to see question 6.5 in particular.
As #lwxted already suggested, try to avoid C arrays unless you really know what you're doing and you have determined that you need them. C arrays are faster than NSArray but unless you have determined that your array really is a performance bottleneck by measuring with a profiler you will most likely not notice any difference.
And I strongly recommend avoiding a C array of Objective-C objects (id objects[]) unless you really, really know very well what you are doing (memory management issues).
In Objective-C, unless for particular needs, a better way to handle this usually is to use the NSArray as opposed to C arrays.
[NSArray arrayWithArray: array];
will copy an array.
Besides, in this case, if you insist on using C arrays, the use of implicitly typed length float[] is advised against. A better way is to use pointers to manipulate arrays.
Also, the stack-allocated array would be invalid after leaving the function, since it's local only in the scope of the copyArray function. You should dynamically allocate memory, if you wish the array to be valid outside the scope.
While I agree with all the points #DarkDust makes, if you're working with a C API such as OpenGL, there may be situations where using NSArray and NSNumber vs. C arrays of type float will have performance impacts. As always, try to use the simpler approach first, and carefully measure performance before deciding to optimize.
In any case, to answer the original question, here's how to correctly return a copy of a C array:
+ (float *)copyOfCArray:(float *)array withLength:(int)length
{
float *copyOfArray = malloc(length * sizeof(float));
for (int i = 0; i < length; i++) {
copyOfArray[i] = array[i];
}
return copyOfArray;
}
Also, there's arguably no need to make the above a method at all. Instead, consider writing it as a C function:
float *CopyArray(float *array, int length)
{
// Implementation would be the same...
}

performance for reads of nsdictionary vs nsarray

Continuing off this post: Performance hit incurred using NSMutableDictionary vs. NSMutableArray>
I am trying to run a little test to see if the performance gap is that great for read and writes between NSArray & NSDictionary as well as their mutable coutnerparts...
However, I am having difficulties finding a "balanced" test... because the dictionary has 2 (or 3 depending on how you see this) objects to loop through to get the value (not the key) seeked, while the array has only one...
Any suggestions?
--If you want more details:
What I mean is easier to explain through examples;
For the array:
(for NSString *str in array) { do smth with the string }
For the dictionary
(for NSString *str in [dictionary allValues]) { string }
OR
(for NSString *str in [dictionary allKeys]) { [dictionary valueForKey:key] }
OR
(for NSString *str in [dictionary allKeys]) { string }
OR EVEN
NSArray *valuesOrKeys = [dictionary allKeys/allValues];
(for NSString *str in valuesOrKeys) {string }
What is the "fairest" test to do for the dictionary?
--EDIT (comment)
As you all pointed (and asked why I would want that) that when a dictionary is used, it's because it fits the model better than an array...
well the reason for my asking is that an app I'm building is painfully slow and so I'm trying to figure out if the use of a different datatype would change any of that, and I am considering using basic c arrays... I have the choice at this point so I am able to change the inner workings to fit whatever type I want...
I'd like to point you at the following article: "Array", by ridiculous_fish, an engineer at Apple. Cocoa arrays are not necessarily well-implemented naïve arrays as you might expect, nor are dictionaries simple hash tables. Their performance is very circumstantial, and depends on the number of objects they hold (as well as their values, etc.). This might not directly affect the answer, but it's something to consider (NSDictionary performance will, of course, vary with the speed and reliability of your hashing function, and so on).
Additionally, if you're looking for a 'balanced' test, you'd have to look for a way for both classes to behave as close to each other as possible. You want to rule out accessing values via keys in the dictionary, because that — regardless of how fast seek times are for the underlying data structures maintained by NSDictionary — is slower than simply pulling objects from an array because you're performing more operations to do it. Access from an array is O(1), for a hash table, O(1) at best and O(n) at worst (depending on the implementation, somewhere in the middle).
There are several ways to enumerate both dictionaries and arrays, as you mentioned above. You're going to want to use the methods that are closest to each other in terms of implementation, those being either block-based enumeration (enumerateObjectsUsingBlock: for NSArray and enumerateKeysAndObjects: for NSDictionary), or fast enumeration (using either allKeys or allValues for the NSDictionary). Because the performance of these algorithms is mainly empirical, I performed several tests to note access times (each with 10000 NSNumber objects):
NSArray, Block Enumeration:
1. 10.5s
2. 9.1s
3. 10.0s
4. 9.8s
5. 9.9s
-----
9.9s Avg
NSArray, Fast Enumeration:
1. 9.7s
2. 9.5s
3. 9.3s
4. 9.1s
5. 10.5s
-----
9.6s Avg
NSDictionary, Block Enumeration
1. 10.5s
2. 10.6s
3. 9.9s
4. 11.1s
5. 11.0s
-----
10.6s Avg
NSDictionary, allKeys -> Fast Enumeration
1. 10.0s
2. 11.2s
3. 10.2s
4. 10.8s
5. 10.8s
-----
10.6s Avg
NSDictionary, allValues -> Fast Enumeration
1. 10.7s
2. 10.3s
3. 10.5s
4. 10.5s
5. 9.7s
-----
10.3s Avg
As you can see from the results of this contrived test, NSDictionary is clearly slower than NSArray (around 7% slower using block enumeration, and 7–10% slower with fast enumeration). However, this comparison is rather pointless, seeing as using the fastest enumeration for NSDictionary simply devolves it into an array anyway.
So the big question is, why would you consider using a dictionary? Arrays and hash tables aren't exactly interchangeable; what kind of model do you have that allows drop-in replacement of NSArray with NSDictionary? Regardless of the times given by contrived examples to prove performance benefits one way or another, you should always implement your models in a way that makes sense — you can optimize later for performance if you have to. I don't see how you would uses these data structures interchangeably, but anyway, NSArray is the winner here, especially considering the sequential order in which you're attempting to access values.
Here's your "balanced" test using fast enumeration:
[arr enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
// do something with objects
}];
[dict enumerateKeysAndObjectsUsingBlock:^(id key, id obj, BOOL *stop) {
// do something with objects
}];
I am trying to run a little test to see if the performance gap is that
great for read and writes between NSArray & NSDictionary as well as
their mutable coutnerparts...
Why? If it's just to satisfy your curiosity, that's one thing. But usually if you need a dictionary, an array really won't do, and vice versa. So it doesn't matter which one is faster at a given operation -- it's not like one is good alternative for the other.
However, I am having difficulties finding a "balanced" test... because
the dictionary has 2 (or 3 depending on how you see this) objects to
loop through to get the value (not the key) seeked, while the array
has only one...
You're making some assumptions here that aren't likely to be valid. There's probably not a lot of looping involved to access elements of either kind of container.

Should reversing an NSMutableArray be avoided when possible?

Assume I have NSNumbers 1 - 450. I can choose to add them to an NSMutableArray either starting with 1 and ending with 450, or starting with 450 and ending with 1. My code would be a little simpler if I could start with 1 and end with 450, but when the time finally comes for me to enumerate over the array, I will ultimately need to reverse it. In other words, I need the first object in the enumeration to be 450, and the last one to be 1.
Since my code would be simpler if I do it this way (add starting with 1, then reverse it prior to enumeration), it's my preferred method. However, the Apple documentation for - (NSEnumerator *)reverseObjectEnumerator says:
It is more efficient to use the fast enumeration protocol (see
NSFastEnumeration). Fast enumeration is available on Mac OS X v10.5
and later and iOS 2.0 and later.
So should I avoid the array reversal and simply write slightly more complicated code so that the NSMutableArray gets created in the desired order in the first place?
You can use fast enumeration with an NSEnumerator instance, the documentation about fast enumeration even uses the reverseObjectEnumerator: method as an example:
NSEnumerator *enumerator = [array reverseObjectEnumerator];
for (NSString *element in enumerator) {
//...
}
Besides, your question sounds a lot like premature optimization...
No, you do not need to write the more complicated code to put it in the right order. The reverseObjectEnumerator will work fine, it is only marginally slower. If performance is a big concern, either of the snippets below will work well (the faster of the two being the while loop)
// Assuming 'array' is your NSMutableArray
int i = [array count];
while(i--) {
Object *current = [array objectAtIndex:i];
// Mess around with current
}
That will start you at 450 and end at 0. You can also do this with a for loop, though you need to make sure that you either start with index 449, or you do something like
for(int i = [array count]; i > 0 ; i--) {
Object *curreyt = [array objectAtIndex:i-1];
// Mess around with current
}