How can NSArray be this slow? - objective-c

I'm coming from a C++/STL world and I wanted to check how objective-c containers are in comparison to stl.
I wanted to compare an array of numbers but the only way to add a number to an NSArray is using NSNumber which is utterly slow and drank my ram empty so I guess I need to dealloc them manually. But I don't want to test side effects so I just added [NSNull null] into the array.
The results of adding 10k things into array 1k times:
NSArray - 0.923411 seconds
vector<int> - 0.129984 seconds
I thought it might be allocations and deallocations so I set the number of arrays(imax in the code) to 1 and number of additions to 10000000(jmax) but it was even slower
NSArray - 2.19859 seconds
vector<int> - 0.223471 seconds
Edit:
As mentioned in the comments the constant increasing size of the array might be the problem so I made NSArray using arrayWithCapacity, but vector with reserve too and it was even slower than before(!) (imax = 1, jmax = 10000000).
NSArray - 2.55942
vector<int> - 0.19139
End edit
Why is this so slow?
My code for reference:
#import <Foundation/Foundation.h>
#include <vector>
#include <iostream>
#include <time.h>
using namespace std;
int main (int argc, const char * argv[])
{
int imax = 1000;
int jmax = 10000;
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
cout << "Vector insertions" << endl;
clock_t start = clock();
for(int i = 0; i < imax; i++)
{
vector<int> *v = new vector<int>();
for(int j = 0; j < jmax; j++)
{
v->push_back(j);
}
delete v;
}
double interval = (clock() - start) / (double)CLOCKS_PER_SEC;
cout << interval << " seconds" << endl;
cout << "NSArray insertions" << endl;
start = clock();
for(int i = 0; i < imax; i++)
{
NSMutableArray *v = [[NSMutableArray alloc] init];
for(int j = 0; j < jmax; j++)
{
[v addObject:[NSNull null]];
}
[v dealloc];
}
interval = (clock() - start) / (double)CLOCKS_PER_SEC;
cout << interval << " seconds" << endl;
[pool drain];
return 0;
}

#JeremyP provides an excellent link and information. Always read the fish. Here's some breakdown of what's eating time, though, and what you might do about it.
First, there's the many calls to objc_msgSend() for dynamic dispatch. These can be avoided and you'll save some of the time (though not as much as you'd think. objc_msgSend() is crazy optimized). But you'll knock maybe 5% off by skipping it:
IMP addObject = class_getMethodImplementation([NSMutableArray class], #selector(addObject:));
NSNull *null = [NSNull null];
start = clock();
for(int i = 0; i < imax; i++)
{
NSMutableArray *v = [[NSMutableArray alloc] init];
for(int j = 0; j < jmax; j++)
{
addObject(v, #selector(addObject:), null);
}
[v release];
}
A lot of time is eaten up with retain/release. You can avoid that (and stick real numbers in rather than NSNumber) by using a non-retaining CFMutableArray). This will get the append times to about 2x of vector.
CFArrayCallBacks cb = {0};
for(int i = 0; i < imax; i++)
{
CFMutableArrayRef v = CFArrayCreateMutable(NULL, 0, &cb);
for(int j = 0; j < jmax; j++)
{
CFArrayAppendValue(v, &j);
}
CFRelease(v);
}
The biggest cost of this one is the calls to memmove() (or the collectable version of it on the Mac).
Man, NSMutableArray sure is slow. How could Apple be so stupid, right? I mean, really... wait... I wonder if there's something NSMutableArray does better than vector?
Try swapping out these lines for their obvious counterparts:
v->insert(v->begin(), j);
NSNumber *num = [[NSNumber alloc] initWithInt:j];
[v insertObject:num atIndex:0];
[num release];
(Yes, including creating and releasing the NSNumber, not just using NSNull.)
Oh, and you might try this one too to see just how fast NSMutableArray and CFMutableArray really can be:
CFArrayInsertValueAtIndex(v, 0, &j);
In my tests I get:
Vector insertions
7.83188 seconds
NSArray insertions
2.66572 seconds
Non-retaining
0.310126 seconds

The short answer: Yes, NSArray really is quite a bit slower than C++'s STL collection classes. This has much to do with compile time vs. runtime behaviors, optimization opportunities on the part of the compiler, and numerous implementation details.
(And, as Rob points out, NSMutableArray is optimized for random insertion and performs better than C++ for that...)
The real answer:
Micro-benchmarks are useless for optimizing user facing applications.
Using a micro-benchmark to make implementation decisions is the very definition of premature optimization.
You would be hard pressed to find an Objective-C app targeted to iOS or Mac OS X where CPU profiling would show any significant time spent in code paths related to NSArray, yet the vast majority of those apps use the NS* collection classes pretty much exclusively.
Certainly, there are cases where the performance of NS* aren't viable and, for that, you turn to C++/STL.
None of this is to imply that your question is invalid. Without more context, it is difficult to say if the observed performance difference really matters (however, in my experience, just about every time a developer has asked a question based on a micro-benchmark, it has been misguided).
Oh -- and read this as it gives a bit of insight into the implementation of *Array.

It's a fully fledged Objective-C object which means there is an overhead each time you add an object due to Cocoa's message lookup algorithm which is necessary to implement properly dynamic binding.
There's also the point that NSArrays are not necessarily internally structured as a contiguous set of pointers. For very large arrays, NSArray performs much better (i.e. has much better big O time complexity) than C++ vectors. Have a read of this the definitive Ridiculous Fish Blog on the topic.

At least some of the time is consumed in repeatedly increasing the capacity of the NSArray. It should be faster to initialize the NSArray to the right (or at least a better) capacity initially with:
[NSMutableArray arrayWithCapacity:10000];

#include <stdio.h>
#include <time.h>
int main (int argc, char **argv)
{
int imax = 1000;
int jmax = 10000;
clock_t start = clock();
for(int i = 0; i < imax; i++)
{
int array[jmax];
for(int j = 0; j < jmax; j++)
j[array] = 0;
}
double interval = (clock() - start) / (double)CLOCKS_PER_SEC;
printf("%f\n", interval);
return 0;
}
Output in my 2GHz Core2Duo iMac (compiled with LLVM):
0.000003

Related

Calculating value of K without messages

Question:
Find the value of K in myInterViewArray without any messages/calls
I was given this hint:
The numbers in the array will never exceed 1-9.
NSArray *myInterViewArray = #[#2,#1,#3,#9,#9,#8,#7];
Example:
If you send 3, the array will return the 3 biggest values in myInterViewArray * 3. So in the example below, K = 9 + 9 + 8.
--
I was asked this question a while back in an interview and was completely stumped. The first solution that I could think of looked something like this:
Interview Test Array:
[self findingK:myInterViewArray abc:3];
-(int)findingK:(NSArray *)myArray abc:(int)k{ // With Reverse Object Enumerator
myArray = [[[myArray sortedArrayUsingSelector:#selector(compare:)] reverseObjectEnumerator] allObjects];
int tempA = 0;
for (int i = 0; i < k; i++) {
tempA += [[myArray objectAtIndex:i] intValue];
}
k = tempA;
return k;
}
But apparently that was a big no-no. They wanted me to find the value of K without using any messages. That means that I was unable to use sortedArrayUsingSelector and even reverseObjectEnumerator.
Now to the point!
I've been thinking about this for quite a while and I still can't think of an approach without messages. Does anyone have any ideas?
There is only one way to do that and that is bridging the array to CF type and then use plain C, e.g.:
NSArray *array = #[#1, #2, #3];
CFArrayRef cfArray = (__bridge CFArrayRef)(array);
NSLog(#"%#", CFArrayGetValueAtIndex(cfArray, 0));
However, if the value is a NSNumber, you will still need messages to access its numeric value.
Most likely the authors of the question didn't have a very good knowledge of the concept of messages. Maybe they thought that subscripting and property access were not messages or something else.
Using objects in Obj-C without messages is impossible. Every property access, every method call, every method initialization is done using messages.
Rereading the question, they probably wanted you to implement the algorithm without using library functions, e.g. sort (e.g. you could implement a K-heap and use that heap to find the K highest numbers in a for iteration).
I assume what is meant is that you can't mutate the original array. Otherwise, that restriction doesn't make sense.
Here's something that might work:
NSMutableArray *a = [NSMutableArray array];
for (NSNumber *num in array) {
BOOL shouldAdd = NO;
for (int i = a.count - 1; i >= k; i--) {
if ([a[i] intValue] < [num intValue]) {
shouldAdd = YES;
break;
}
}
if (shouldAdd) {
[a addObject:num];
}
}
int result = a[a.count - k];
for (int i = k; k < a.count; k++) {
result += [a[i] intValue];
}
return result;

Wildly varying hashing performance with CFSet and CFDictionary on OS X

When using CFSet and CFDictionary configured with custom callbacks to use integers as their keys, I've noticed some wildly varying performance of their internal hashing implementation. I'm using 64 bit integers (int64_t) with a range of roughly 1 - 1,000,000.
While profiling my application with, I noticed that every so often, a certain combination of factors would produce unusually poor performance. Looking at Instruments, CFBasicHash was taking much longer than usual.
After a bunch of investigating, I finally narrowed things down to a set of 400,000 integers that, when added to a CFSet or CFDictionary cause terrible performance with hashing.
The hashing implementation in CFBasicHash.m is beyond my understating for a problem like this, so I was wondering if anyone had any idea why such a completely random set of integers could cause such dreadful performance.
The following test application will output an average iteration time of 37ms for adding sequential integers to a set, but an average run time of 3622ms when adding the same number of integers but from the problematic data set.
(And if you insert the same number of completely random integers, then performance is much closer to 37ms. As well, adding these problematic integers to an std::map or std:set produces acceptable performance.)
#import <Foundation/Foundation.h>
extern uint64_t dispatch_benchmark(size_t count, void (^block)(void));
int main(int argc, char *argv[]) {
#autoreleasepool {
NSString *data = [NSString stringWithContentsOfFile:#"Integers.txt" encoding:NSUTF8StringEncoding error:NULL];
NSArray *components = [data componentsSeparatedByString:#","];
NSInteger count = components.count;
int64_t *numbers = (int64_t *)malloc(sizeof(int64_t) * count);
int64_t *sequentialNumbers = (int64_t *)malloc(sizeof(int64_t) * count);
for (NSInteger c = 0; c < count; c++) {
numbers[c] = [components[c] integerValue];
sequentialNumbers[c] = c;
}
NSLog(#"Beginning test with %# numbers...", #(count));
// Test #1 - Loading sequential integers
uint64_t t1 = dispatch_benchmark(10, ^{
CFMutableSetRef mutableSetRef = CFSetCreateMutable(NULL, 0, NULL);
for (NSInteger c = 0; c < count; c++) {
CFSetAddValue(mutableSetRef, (const void *)sequentialNumbers[c]);
}
NSLog(#"Sequential iteration completed with %# items in set.", #(CFSetGetCount(mutableSetRef)));
CFRelease(mutableSetRef);
});
NSLog(#"Sequential Numbers Average Runtime: %llu ms", t1 / NSEC_PER_MSEC);
NSLog(#"-----");
// Test #2 - Loading data set
uint64_t t2 = dispatch_benchmark(10, ^{
CFMutableSetRef mutableSetRef = CFSetCreateMutable(NULL, 0, NULL);
for (NSInteger c = 0; c < count; c++) {
CFSetAddValue(mutableSetRef, (const void *)numbers[c]);
}
NSLog(#"Dataset iteration completed with %# items in set.", #(CFSetGetCount(mutableSetRef)));
CFRelease(mutableSetRef);
});
NSLog(#"Dataset Average Runtime: %llu ms", t2 / NSEC_PER_MSEC);
free(sequentialNumbers);
free(numbers);
}
}
Example output:
Sequential Numbers Average Runtime: 37 ms
Dataset Average Runtime: 3622 ms
The integers are available here:
Gist (Integers.txt) or Dropbox (Integers.txt)
Can anyone help explain what is "special" about the given integers that might cause such a degradation in the hashing implementation used by CFSet and CFDictionary?

Objective-C blocks

Trying to understand how blocks working in objective-c. Got next question while reading apple's docs (link)
Here is an example how we should no use blocks:
void dontDoThis() {
void (^blockArray[3])(void); // an array of 3 block references
for (int i = 0; i < 3; ++i) {
blockArray[i] = ^{ printf("hello, %d\n", i); };
// WRONG: The block literal scope is the "for" loop.
}
}
But how we could get 3 different blocks that will print "hello, 0", "hello, 1" and "hello, 2"? I tried many different ways but every time I got "hello, 2" three times.
A block starts out life on the stack and, thus, a block's lifespan is only as long as the scope it is declared in.
The body of a for() loop -- the body of the loop in the {}s -- is a scope in and of itself. Thus, your code is putting a reference to something on the stack [the block] into a variable in the surrounding scope [the language array].
You need to copy the block to the heap to have it survive:
void dontDoThis() {
void (^blockArray[3])(void); // an array of 3 block references
for (int i = 0; i < 3; ++i) {
blockArray[i] = [^{ printf("hello, %d\n", i); } copy];
}
}
If not using ARC, you would also need to -release the copied blocks at some point.
You might find this weblog post handy (I wrote it shortly after Blocks were made public). This one goes into a few tips, tricks, and gotchas.
Wait -- yeah -- you're correct. There is magic going on in the ARC compiler that is causing the blocks to seemingly be on the heap magically. However, I can't find anything in the LLVM documentation that explicitly documents this behavior. If you turn off ARC, you'll see the output be something like 2,2,2 instead of 0,1,2.
This is somewhat new behavior. I wouldn't rely on this behavior until someone can find the explicit note in the compiler that defines exactly how this is supported.
#autoreleasepool {
void (^blockArray[3])(void); // an array of 3 block references
for (int i = 0; i < 3; ++i) {
void (^block)(void) = ^{ printf("hello, %d\n", i); };
NSLog(#"%p", block);
blockArray[i] = block;
NSLog(#"%p", blockArray[i]);
}
for (int i = 0; i < 3; ++i) blockArray[i]();
}
Outputs:
2012-12-24 16:15:36.752 jkdfjkfdjkdfjk[70708:303] 0x7fff5fbff838
2012-12-24 16:15:36.755 jkdfjkfdjkdfjk[70708:303] 0x100108160
2012-12-24 16:15:36.758 jkdfjkfdjkdfjk[70708:303] 0x7fff5fbff838
2012-12-24 16:15:36.759 jkdfjkfdjkdfjk[70708:303] 0x100108000
2012-12-24 16:15:36.760 jkdfjkfdjkdfjk[70708:303] 0x7fff5fbff838
2012-12-24 16:15:36.760 jkdfjkfdjkdfjk[70708:303] 0x100102e70
hello, 0
hello, 1
hello, 2
Thus, the block is created on the stack and copied to the heap automatically on the assignment outside of the scope of the for() loop.
A similar test also reveals that the block will be copied when passed as an argument to NSArray's addObject:.
If you really wanted to get this to work you could use an NSMutableArray instead of a C array.
NSMutableArray *blocks = [NSMutableArray array];
for (int i = 0; i <= 3; i++) {
blocks[i] = ^{ printf("hello %d\n", i); };
}
By adding them to an NSMutableArray they will be copied off of the stack and onto the heap allowing them to outlive the scope of the for loop.
As #bbum points out the above does not work, I took the idea that blocks just work with ARC too far.
You would need to actively copy the blocks for them to work... so the following should work
NSMutableArray *blocks = [NSMutableArray array];
for (int i = 0; i <= 3; i++) {
blocks[i] = [^{ printf("hello %d\n", i); } copy];
}

is it less efficient to reference an instance variable array element multiple times than a local variable?

If I do:
foos[i] = [[Foo alloc] init];
foos[i].prop = #"bar";
[foos[i] baz];
... Is that less efficient than:
Foo *foo = [[Foo alloc] init];
foo.prop = #"bar";
[foo baz];
foos[i] = foo;
or are they equivalent?
They're not equivalent, but they are sufficiently close that an optimizing compiler might generate exactly the same binary code.
Even if it doesn't, you'll struggle to measure the difference (unless foos is a C++ class with an extremely expensive operator[]). Until a profiler says otherwise — optimizing this code is premature.
If your array is a simple a C array (Foo * array[11];), then it will not have significant performance impact.
If your array is an NSMutableArray (or another subscriptable NS-type), then it will have to call the method's implementation repeatedly (which uses short circuited dispatch), so that will introduce some overhead. Although some would consider it a micro-optimization. In this case, the compiler cannot know what the implementation returns, so it cannot omit the calls.
Here are basic wall clock time results:
MRC:
NSArray: 27 seconds
C Array: 18 seconds
ARC:
NSArray: 31 seconds
C Array: 18 seconds
and the program (which you can perform the obvious ARC changes to to test ARC):
const int NIter = 10000;
__attribute__((noinline)) void fn1() {
#autoreleasepool {
NSMutableArray * foos = [NSMutableArray array];
for (size_t idx = 0; idx < NIter; ++idx) {
NSMutableString * str = [NSMutableString new];
foos[0] = str;
[foos[0] length];
[foos removeAllObjects];
[str release];
}
}
}
__attribute__((noinline)) void fn2() {
#autoreleasepool {
NSMutableString * foos[1];
for (size_t idx = 0; idx < NIter; ++idx) {
foos[0] = [NSMutableString new];
[foos[0] length];
[foos[0] release];
foos[0] = 0;
}
}
}
int main() {
for (size_t idx = 0; idx < NIter; ++idx) {
if (UseNSArray) {
fn1();
}
else {
fn2();
}
}
return 0;
}
Certainly, unless the compiler optimizes it away. Whether it is significantly less efficient is another matter, which depends on what else your code is doing. Worrying about mico-optimizations like this is generally futile unless you already have evidence of an efficiency problem in this vicinity.

Objective C - Matrix Multiplication Slow Performance

I have 2 2-D NSMutableArrays and I am trying to do some basic matrix multiplication. I have my generic formula code below, but its performance is exceptionally slow (as expected). I have done lots of googling and have not found any easy nor easy to understand formulas to change up the code for performance enhancement. Can anyone point me in the right direction of a straightforward formula/tutorial/example of how to get better performance than 0(n^3) with matrix multiplication in Objective C.
+ (NSMutableArray*)multiply:(NSMutableArray*)a1 withArray:(NSMutableArray*)a2
{
if([[a1 objectAtIndex: 0] count] != [a2 count])
{
NSLog(#"Multiplicaton error!");
return NULL;
}
int a1_rowNum = [a1 count];
int a2_rowNum = [a2 count];
int a2_colNum = [[a2 objectAtIndex:0] count];
NSMutableArray *result = [NSMutableArray arrayWithCapacity:a1_rowNum];
for (int i = 0; i < a1_rowNum; i++) {
NSMutableArray *tempRow = [NSMutableArray arrayWithCapacity:a2_colNum];
for (int j = 0; j < a2_colNum; j++) {
double tempTotal = 0;
for (int k = 0; k < a2_rowNum; k++) {
double temp1 = [[[a1 objectAtIndex:i] objectAtIndex:k] doubleValue];
double temp2 = [[[a2 objectAtIndex:k] objectAtIndex:j] doubleValue];
tempTotal += temp1 * temp2;
}
//Stored as a string because I upload it to an online database for storage.
[tempRow addObject:[NSString stringWithFormat:#"%f",tempTotal]];
}
[result addObject:tempRow];
}
return result;
}
It will be much faster if you Write it in C.
double[] will be ridiculously fast compared to an NSArray of NSNumbers for this task. you'll have good cache coherency, minimal instructions, no need to go through the runtime or allocate in order to write or read an element. no need to perform reference count cycling on each element…
You need have a look at Apple's Accelerate frameWork for ios4.0 onwards.
You can do a lot of complex math and matrix manipulation with it and this framework is optimized to run on any iOS hardware.
Checkout:
https://developer.apple.com/performance/accelerateframework.html