How can I understand the performance tradeoffs in Cocoa library classes? - objective-c

Today I was asked how long an NSMutableDictionary insertion takes, were that dictionary to contain 1,000,000 elements. Not coming from a computer science background, I had absolutely no idea. I was surprised to learn that it completes in (what I now understand to be called) O(n) time. Great. Wonderful.
How could someone know that, definitively?
Obviously, one could just write dozens and dozens of tests against every single Cocoa class and chart out all the time data. I'll be sure to get around to that when I have a few weeks of free time. Barring all of that...
Is this just super obvious to someone with a computer science
background?
Does Apple publish documentation that explains
this?
Does his knowledge imply that he, being a computer
science expert, did his own testing to discover this?

What you are asking about is called the "complexity" of an algorithm. It is language independent; NSDictionary time complexity is no different than any other associative dictionary such as C++ std::map. However, that doesn't mean that an NSDictionary filled with some objects is guaranteed to be able to perform insert, search, or delete operations as quickly as an std::map; all it means is that the time it takes to do those operations is linear (O(n)), on worst case, in relation to the number of elements (the n part of O(n)). Dictionary insertion could be O(1), which is constant time (operation takes the same amount of time independent of the number of elements in the dictionary) if there were no hash collisions.
The "algorithm" employed by an NSDictionary is called a Hash Table. A hash table does insertion by hashing the key input (a constant time operation), then resolving collisions, an O(n) operation. Hopefully you can see that, in the worst case, all of your insert operations will collide, which is O(n).
Tables/hashing algorithms can of course be specialized to reduce collisions within a specific set of data, but NSDictionary just uses the objects hash method of your key objects, which you can override for your NSObject subclasses if you need some sort of specialization (probably not).
Since it is a general purpose dictionary and not specialized for a specific set of data, we don't necessarily need to know the implementation details of NSDictionary (Apple's documentation for NSDictionary doesnt mention specifics) to know that these operations will be O(n). Neither do we have to run "tests" to discover the complexity.

Related

When to use mutable objects?

There are tons of articles and blog posts over the internet telling that mutable objects are bad and that we shouldn't use them and therefore we shall make all our objects immutable.
I have nothing against this except that the topic has gone so far that some people might be "tricked" into thinking that mutable objects shall never be used at all.
When do we have to resort to use mutable objects? What are the common kinds of problems that are unsolvable without using mutable state?
As to your fear, it's common. Every concept gets taken by some as to mean that nothing else shall ever be done, for any reason.
These are the people who try to make requirements fit their ideology, rather than the other way around (a.k.a. they're not pragmatic).
When to use mutables? Basically when you feel like it, when you think it makes sense.
Prime example is in low memory and high performance situations where creating a new instance that's identical except for one little thing from the old one is too expensive in either memory and/or CPU cycles.

differences between NSArray and CCArray

What are the differences between NSArray and CCArray? Also, in what cases will one be preferred to the other with respect to game programming?
CCArray emulates NSMutableArray. It is a wrapper around a C array (memory buffer). It was developed and is used internally by cocos2d because NSMutableArray was deemed too slow. However the performance improvement is minimal. Any use cases (features) of CCArray that cocos2d itself doesn't use remain a potential source of issues, including weird and hard to debug issues or terrible performance characteristics.
The most important performance critical aspect is reading the array sequential. In my latest tests that's an area where CCArray (no longer?) excels. Specifically fast enumeration: NSMutableArray is around 33 times faster!
CCArray is a perfect example why one should not reinvent the wheel, specifically when it comes to storage classes when there is already a stable, proven, and fast solution available (NSMutableArray). Any speed advantage it may have once had is long gone. What remains is a runtime behavior you will not want to deal with, including some extremely bad performance characteristics (insertion, fast enumeration).
Long story short: do not use CCArray in your own code! Treat CCArray like an internal, private class not to be used in user code (except where unavoidable, ie children array).
NSMutableArray is THE array reference implementation everyone should be using because it's extremely well tested, documented, and stable (both in terms of runtime behavior and speed).
Check it....
http://www.learn-cocos2d.com/2010/09/array-performance-comparison-carray-ccarray-nsarray-nsmutablearray/
Hope this help
Enjoy Programming
CCArray
http://www.cocos2d-x.org/embedded/cocos2d-x/d9/d2e/classcocos2d_1_1_c_c_array.html
In cocos2d-x CCArray is mutable, i.e. you can add elements to it. To create CCArray instance without capacity, you can use CCArray::array() constructor. CCMutableArray is template-based container that can store objects of the same type. CCArray stores objects as CCObject instances, so you have to cast them after getting from CCArray instance
The NSArray class contains a number of methods specifically designed to ease the creation and manipulation of arrays within Objective-C programs.

objective-c complexity reference

For the c++ STL, there is a de-facto standard location (besides the de-jour standard, I mean) to find information about the complexity guarantees of standard container operations.
Is there an analogous, web-accessible document listing complexity guarantees for NSArray, NSDictionary, etc.?
For example, I cannot find a reference that gives complexity for [NSArray count]
Correct. There isn't one. C++ / the STL (based on my limited understanding) have a significant performance focus. Objective-C / Foundation basically don't.
NSArray, NSDictionary and friends are interfaces. They tell you how to use them, not how they behave. This gives them the freedom to switch implementation under the hood for performance reasons. The point is, you don't need to care, and this won't be specified in the API so you can't even if you want to ;)
For a really good read on this subject, highlighting implementation switches, and with a rough comparison between Foundation classes and STL / C data structures, check out the Ridiculous Fish (by someone on the Apple AppKit team) blog post about "Our arrays, aren't"
Is there an analogous, web-accessible document listing complexity
guarantees for NSArray, NSDictionary, etc.?
No. If you understand what the different containers do, you'll have a pretty good idea of how they behave (e.g. dictionary == map -> nearly constant-time lookups). But don't assume that you know exactly how these structures behave, because they may change their behavior based on circumstances. In other words, a class like NSArray may not be (certainly isn't) implemented as an actual array in the sense of a C-style array even though it has that same "ordered sequence of elements" behavior.
You can, of course, analyze the complexity of your own code: your own binary search through an NSArray is always going to take O(log n) operations any way you slice it. Just don't assume that inserting an element into an NSMutableArray is going to require moving all the subsequent elements, because your "array" might really be a linked list or something else.

Which type of array to use for large amounts of numbers?

I need to store large amounts of unsigned chars and/or ints (potentially 100,000,000 and up) in an array. Mathematical operations will frequently be performed on the numbers in this array, so the array will be modified often, and the length of the array can potentially change often as well.
I can use C or Objective-C (or both). Performance wise, would it be better to use a plain C array and realloc it as necessary, or just go for an NSMutableArray? Or does anyone have any better ideas?
Please note that performance is my main concern, I am willing to write extensive reallocation code if necessary.
Also: Memory usage is a consideration, but not a concern (as long as it doesn't end up using multiple gigabytes).
Using an NSMutableArray means you have the overhead of two Objective-C message sends every time you want to get or set the value of an array element. (One message to get the element as an object, and a second to get its value as a primitive int.) A message send is much slower than a direct array access.
You could use a CFMutableArray instead of an NSMutableArray, and specify callbacks that let you store bare numbers instead of objects. But you would still need to use a function call to get or set each array value.
If you need peak performance, just use a plain C array, or a std::vector if you want to use Objective-C++.
Will your array need to grow much and how much ?
Using realloc is not very performant.
That's why I would recommend a linked list as you can find GSList in glib.
The container:
How about C++? Objective-C++ and STL might be a point, STL was made by smart people and it's actually quite efficient in skilled hands. Although, having potentially up to 100,000,000 entries requires some optimization tricks in any cases.
The framework:
You haven't specified the task itself, could it be suitable to use something like CoreData or maybe SQLite? The math can be done with SQL procedure.
The first one is good if you have some, mmm, data samples - pixels, audio chunks or something like that. The second way is definitely preferred in most other cases.

Cocoa NSArray/NSSet: -makeObjectsPerformSelector: vs. fast enumeration

I want to perform the same action over several objects stored in a NSSet.
My first attempt was using a fast enumeration:
for (id item in mySetOfObjects)
[item action];
which works pretty fine. Then I thought of:
[mySetOfObjects makeObjectsPerformSelector:#selector(action)];
And now, I don't know what is the best choice. As far as I understand, the two solutions are equivalent. But are there arguments for preferring one solution over the other?
I would argue for using makeObjectsPerformSelector, since it allows the NSSet object to take care of its own indexing, looping and message dispatching. The people who wrote the NSSet code are most likely to know the best way to implement that particular loop.
At worst, they would simply implement the exact same loop, and all you gain is slightly cleaner code (no need for the enclosing loop). At best, they made some internal optimizations and the code will actually run faster.
The topic is briefly mentioned in Apple's Code Speed Performance document, in the section titled "Unrolling Loops".
If you're concerned about performance, the best thing to do is set up a quick program which performs some selector on the objects in a set. Have it run several million times, and time the difference between the two different cases.
I too was presented with this question. I find in the Apple docs "Collections Programming Topics" under "Sets: Unordered Collections of Objects" the following:
The NSSet method objectEnumerator lets
you traverse elements of the set one
by one. And
themakeObjectsPerformSelector: and
makeObjectsPerformSelector:withObject:
methods provide for sending messages
to individual objects in the set. In
most cases, fast enumeration should be
used because it is faster and more
flexible than using an NSEnumerator or
the makeObjectsPerformSelector:
method. For more on enumeration, see
“Enumeration: Traversing a
Collection’s Elements.”
This leads me to believe that Fast Enumeration is still the most efficient means for this application.
I would not use makeObjectsPerformSelector for the simple reason that it is the kind of call that you don't see all that often. Here is why for example - I need to add debugging code as the array is enumerated, and you really can't do that with makeObjectsPerformSelector unless you change how the code works in Release mode which is a real no no.
for (id item in mySetOfObjects)
{
#if MY_DEBUG_BUILD
if ([item isAllMessedUp])
NSLog(#"we found that wily bug that has been haunting us");
#endif
[item action];
}
--Tom
makeObjectsPerformSelector: might be slightly faster, but I doubt there's going to be any practical difference 99% of the time. It is a bit more concise and readable though, I would use it for that reason.
If pure speed is the only issue (i.e. you're creating some rendering engine where every tiny CPU cycle counts), the fastest possible way to iterate through any of the NSCollection objects (as of iOS 5.0 ~ 6.0) is the various "enumerateObjectsUsingBlock" methods. I have no idea why this is, but I tested it and this seems to be the case...
I wrote small test creating collections of hundreds of thousands of objects that each have a method which sums a simple array of ints. Each of those collections were forced to perform the various types of iteration (for loop, fast enumeration, makeObjectsPerformSelector, and enumerateObjectsUsingBlock) millions of times, and in almost every case the "enumerateObjectsUsingBlock" methods won handily over the course of the tests.
The only time when this wasn't true was when memory began to fill up (when I began to run it with millions of objects), after which it began to lose to "makeObjectsPerformSelector".
I'm sorry I didn't take a snapshot of the code, but it's a very simple test to run, I highly recommend giving it a try and see for yourself. :)