objective-c complexity reference - objective-c

For the c++ STL, there is a de-facto standard location (besides the de-jour standard, I mean) to find information about the complexity guarantees of standard container operations.
Is there an analogous, web-accessible document listing complexity guarantees for NSArray, NSDictionary, etc.?
For example, I cannot find a reference that gives complexity for [NSArray count]

Correct. There isn't one. C++ / the STL (based on my limited understanding) have a significant performance focus. Objective-C / Foundation basically don't.
NSArray, NSDictionary and friends are interfaces. They tell you how to use them, not how they behave. This gives them the freedom to switch implementation under the hood for performance reasons. The point is, you don't need to care, and this won't be specified in the API so you can't even if you want to ;)
For a really good read on this subject, highlighting implementation switches, and with a rough comparison between Foundation classes and STL / C data structures, check out the Ridiculous Fish (by someone on the Apple AppKit team) blog post about "Our arrays, aren't"

Is there an analogous, web-accessible document listing complexity
guarantees for NSArray, NSDictionary, etc.?
No. If you understand what the different containers do, you'll have a pretty good idea of how they behave (e.g. dictionary == map -> nearly constant-time lookups). But don't assume that you know exactly how these structures behave, because they may change their behavior based on circumstances. In other words, a class like NSArray may not be (certainly isn't) implemented as an actual array in the sense of a C-style array even though it has that same "ordered sequence of elements" behavior.
You can, of course, analyze the complexity of your own code: your own binary search through an NSArray is always going to take O(log n) operations any way you slice it. Just don't assume that inserting an element into an NSMutableArray is going to require moving all the subsequent elements, because your "array" might really be a linked list or something else.

Related

How can I understand the performance tradeoffs in Cocoa library classes?

Today I was asked how long an NSMutableDictionary insertion takes, were that dictionary to contain 1,000,000 elements. Not coming from a computer science background, I had absolutely no idea. I was surprised to learn that it completes in (what I now understand to be called) O(n) time. Great. Wonderful.
How could someone know that, definitively?
Obviously, one could just write dozens and dozens of tests against every single Cocoa class and chart out all the time data. I'll be sure to get around to that when I have a few weeks of free time. Barring all of that...
Is this just super obvious to someone with a computer science
background?
Does Apple publish documentation that explains
this?
Does his knowledge imply that he, being a computer
science expert, did his own testing to discover this?
What you are asking about is called the "complexity" of an algorithm. It is language independent; NSDictionary time complexity is no different than any other associative dictionary such as C++ std::map. However, that doesn't mean that an NSDictionary filled with some objects is guaranteed to be able to perform insert, search, or delete operations as quickly as an std::map; all it means is that the time it takes to do those operations is linear (O(n)), on worst case, in relation to the number of elements (the n part of O(n)). Dictionary insertion could be O(1), which is constant time (operation takes the same amount of time independent of the number of elements in the dictionary) if there were no hash collisions.
The "algorithm" employed by an NSDictionary is called a Hash Table. A hash table does insertion by hashing the key input (a constant time operation), then resolving collisions, an O(n) operation. Hopefully you can see that, in the worst case, all of your insert operations will collide, which is O(n).
Tables/hashing algorithms can of course be specialized to reduce collisions within a specific set of data, but NSDictionary just uses the objects hash method of your key objects, which you can override for your NSObject subclasses if you need some sort of specialization (probably not).
Since it is a general purpose dictionary and not specialized for a specific set of data, we don't necessarily need to know the implementation details of NSDictionary (Apple's documentation for NSDictionary doesnt mention specifics) to know that these operations will be O(n). Neither do we have to run "tests" to discover the complexity.

What is the computational complexity of NSDictionary's -allKeys method?

What is the computational complexity of NSDictionary's -allKeys method?
I would assume it to be O(1) as NSDictionary probably keeps it internally somewhere but you never know :). I tried to look in documentation (NSDictionary and Collections guide) and coundn't find the answer.
Maybe there is some cheat sheet with computational complexities of Cocoa collections' methods?
EDIT:
As was pointed out by Gwendal Roué my question is a bit vague. So I should probably rephrase it like this:
Did someone make measurements of computational complexity of NSDictionary class for some given set of methods and objects and in particular for -allKeys?
EDIT2:
As was pointed out by Chris Devereux these measurements will be implementation dependent but I think it would be nice if someone could share some tables/measurements just to have some approximate numbers.
Go read this wonderful article that explains how NSArray changes its underlying implementation depending on the number of elements it contains: http://ridiculousfish.com/blog/posts/array.html
After that, you'll understand why Apple doesn't document the complexity of its collection classes, including NSDictionary.

differences between NSArray and CCArray

What are the differences between NSArray and CCArray? Also, in what cases will one be preferred to the other with respect to game programming?
CCArray emulates NSMutableArray. It is a wrapper around a C array (memory buffer). It was developed and is used internally by cocos2d because NSMutableArray was deemed too slow. However the performance improvement is minimal. Any use cases (features) of CCArray that cocos2d itself doesn't use remain a potential source of issues, including weird and hard to debug issues or terrible performance characteristics.
The most important performance critical aspect is reading the array sequential. In my latest tests that's an area where CCArray (no longer?) excels. Specifically fast enumeration: NSMutableArray is around 33 times faster!
CCArray is a perfect example why one should not reinvent the wheel, specifically when it comes to storage classes when there is already a stable, proven, and fast solution available (NSMutableArray). Any speed advantage it may have once had is long gone. What remains is a runtime behavior you will not want to deal with, including some extremely bad performance characteristics (insertion, fast enumeration).
Long story short: do not use CCArray in your own code! Treat CCArray like an internal, private class not to be used in user code (except where unavoidable, ie children array).
NSMutableArray is THE array reference implementation everyone should be using because it's extremely well tested, documented, and stable (both in terms of runtime behavior and speed).
Check it....
http://www.learn-cocos2d.com/2010/09/array-performance-comparison-carray-ccarray-nsarray-nsmutablearray/
Hope this help
Enjoy Programming
CCArray
http://www.cocos2d-x.org/embedded/cocos2d-x/d9/d2e/classcocos2d_1_1_c_c_array.html
In cocos2d-x CCArray is mutable, i.e. you can add elements to it. To create CCArray instance without capacity, you can use CCArray::array() constructor. CCMutableArray is template-based container that can store objects of the same type. CCArray stores objects as CCObject instances, so you have to cast them after getting from CCArray instance
The NSArray class contains a number of methods specifically designed to ease the creation and manipulation of arrays within Objective-C programs.

NSSet implementation

This question is just out of curiosity but, how is NSSet implemented? What data structure is behind it and what are the access times for adding and removing elements? If I had to guess, I'd say it was some sort of hashtable/dictionary data structure, but in that case why differentiate between NSSet and NSMutableSet?
Well, as Bavarious pointed out in a comment, Apple's actual CoreFoundation source is open and available for your perusal too. NSSet is implemented on top of CFSet, whose code is generated (as is that of CFDictionary) from a hash table template, using CFBasicHash to do the work.
The difference between mutablility and immutability seems to be the matter of a flag in the structure (line 91 of CFBasicHash.h), and from my reading so far just affects calls to functions such as CFBasicHashAddValue; there's a simple check for the mutability. It seems likely, however, that Cobbal is right about the copy/retain behavior between the two (I just haven't read that far yet).
PREVIOUSLY:
I find it interesting and educational occasionally to peruse the GNUstep sources when I'm wondering about implementation details. They are, of course, not at all guaranteed to be implemented the way that Apple did it, but they can be helpful in some cases. Their version of Foundation: http://gnu.ethz.ch/debian/gnustep/gnustep-base-1.20.0/Headers/Foundation/ (I hope that's the most recent version. If not, someone please correct me.)
To answer the second half of your question: one benefit of having a non-mutable version is that it allows for a very fast copy method that simply calls retain.
I find this link to be an interesting answer to your question. Apple's data structures (NSArray, NSSet, NSDictionary, etc.) are not implemented in a straightforward and "standard way." In most cases, they perform in the same way any other set would perform, but overall, they optimize automatically for the best performance. So, in truth, it's rather difficult to say. While Apple provides documentation on the efficiency of arrays in CFArray.h (equivalent for NSArrays), it offers no such documentation on the efficiency of sets, though you're free to poke around /System/Library/Frameworks/CoreFoundation.framework/Headers/ to look through other data structure implementations.
In addition, there has to be a distinction between a set and its mutable counterpart, just as there is a distinction between NSString and NSMutableString, NSArray and NSMutableArray, and NSDictionary and NSMutableDictionary (among others). For data structures and strings (and few other classes), Apple offers 'readonly' versions of classes to retain generality, along with standard 'mutable' counterparts for manipulation. It's simply Apple's standard practice.

Implementing -hash / -isEqual: / -isEqualTo...: for Objective-C collections

Note: The following SO questions are related, but neither they nor the linked resources seem to fully answer my questions, particularly in relation to implementing equality tests for collections of objects.
Best practices for overriding -isEqual: and -hash
Techniques for implementing -hash on mutable Cocoa objects
Background
NSObject provides default implementations of -hash (which returns the address of the instance, like (NSUInteger)self) and -isEqual: (which returns NO unless the addresses of the receiver and the parameter are identical). These methods are designed to be overridden as necessary, but the documentation makes it clear that you should provide both or neither. Further, if -isEqual: returns YES for two objects, then the result of -hash for those objects must be the same. If not, problems can ensue when objects that should be the same — such as two string instances for which -compare: returns NSOrderedSame — are added to a Cocoa collection or compared directly.
Context
I develop CHDataStructures.framework, an open-source library of Objective-C data structures. I have implemented a number of collections, and am currently refining and enhancing their functionality. One of the features I want to add is the ability to compare collections for equality with another.
Rather than comparing only memory addresses, these comparisons should consider the objects present in the two collections (including ordering, if applicable). This approach has quite a precedent in Cocoa, and generally uses a separate method, including the following:
-[NSArray isEqualToArray:]
-[NSDate isEqualToDate:]
-[NSDictionary isEqualToDictionary:]
-[NSNumber isEqualToNumber:]
-[NSSet isEqualToSet:]
-[NSString isEqualToString:]
-[NSValue isEqualToValue:]
I want to make my custom collections robust to tests of equality, so they may safely (and predictably) be added to other collections, and allow others (like an NSSet) to determine whether two collections are equal/equivalent/duplicates.
Problems
An -isEqualTo...: method works great on its own, but classes which define these methods usually also override -isEqual: to invoke [self isEqualTo...:] if the parameter is of the same class (or perhaps subclass) as the receiver, or [super isEqual:] otherwise. This means the class must also define -hash such that it will return the same value for disparate instances that have the same contents.
In addition, Apple's documentation for -hash stipulates the following: (emphasis mine)
"If a mutable object is added to a collection that uses hash values to determine the object's position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object's internal state information or you must make sure the object's internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)"
Edit: I definitely understand why this is necessary and totally agree with the reasoning — I mentioned it here to provide additional context, and skirted the topic of why it's the case for the sake of brevity.
All of my collections are mutable, and the hash will have to consider at least some of the contents, so the only option here is to consider it a programming error to mutate a collection stored in another collection. (My collections all adopt NSCopying, so collections like NSDictionary can successfully make a copy to use as a key, etc.)
It makes sense for me to implement -isEqual: and -hash, since (for example) an indirect user of one of my classes may not know the specific -isEqualTo...: method to call, or even care whether two objects are instances of the same class. They should be able to call -isEqual: or -hash on any variable of type id and get the expected result.
Unlike -isEqual: (which has access to two instances being compared), -hash must return a result "blindly", with access only to the data within a particular instance. Since it can't know what the hash is being used for, the result must be consistent for all possible instances that should be considered equal/identical, and must always agree with -isEqual:. (Edit: This has been debunked by the answers below, and it certainly makes life easier.) Further, writing good hash functions is non-trivial — guaranteeing uniqueness is a challenge, especially when you only have an NSUInteger (32/64 bits) in which to represent it.
Questions
Are there best practices when implementing equality comparisons -hash for collections?
Are there any peculiarities to plan for in Objective-C and Cocoa-esque collections?
Are there any good approaches for unit testing -hash with a reasonable degree of confidence?
Any suggestions on implementing -hash to agree with -isEqual: for collections containing elements of arbitrary types? What pitfalls should I know about? (Edit: Not as problematic as I first thought — as #kperryua points out, "equal -hash values do not imply -isEqual:".)
Edit: I should have clarified that I'm not confused about how to implement -isEqual: or -isEqualTo...: for collections, that's straightforward. I think my confusion stemmed mainly from (mistakenly) thinking that -hash MUST return a different value if -isEqual: returns NO. Having done cryptography in the past, I was thinking that hashes for different values MUST be different. However, the answers below made me realize that a "good" hash function is really about minimizing bucket collisions and chaining for collections that use -hash. While unique hashes are preferable, they are not a strict requirement.
I think trying to come up with some generally useful hash function that will generate unique hash values for collections is an exercise in futility. U62's suggestion of combining the hashes of all the contents will not scale well, as it makes the hash function O(n). Hash functions should really be O(1) to ensure good performance, otherwise the purpose of the hash is defeated. (Consider the common Cocoa construct of plists, which are dictionaries containing arrays and other dictionaries, potentially ad nauseum. Attempting to take the hash of the top-level dictionary of a large plist would be excruciatingly slow if the collections' hash functions were O(n).)
My suggestion would be not to worry a great deal about a collection's hash. As you stated, -isEqual: implies equal -hash values. On the other hand, equal -hash values do not imply -isEqual:. That fact gives you a lot of leeway to create a simple hash.
If you're really worried about collisions though (and you have proof in concrete measurements of real-world situations that confirm it is something to be worried about), you could still follow U62's advice to some degree. For example, you could take the hash of, say, the first and/or last element in the collection, and combine that with, say, the -count of the collection. That be enough to provide a decent hash.
I hope that answers at least one of your questions.
As for No. 1: Implementing -isEqual: is pretty cut and dry. You enumerate the contents, and check isEqual: on each of the elements.
There is one thing to be careful of that may affect what you decide to do for your collections' -hash functions. Clients of your collections must also understand the rules governing -isEqual: and -hash. If you use the contents' -hash in your collection's -hash, your collection will break if the contents' isEqual: and -hash don't agree. It's the client's fault, of course, but that's another argument against basing your -hash off of the collection's contents.
No. 2 is kind of vague. Not sure what you have in mind there.
Two collections should be considered equal if they contain the same elements, and further if the collections are ordered, that the elements are in the same order.
On the subject of hashes for collections, it should be enough to combine the hashes of the elements in some way (XOR them or modulo add them). Note that while the rules state that two objects that are equal according to IsEqual need to return the same hash, the opposite does not hold : Although uniqueness of hashes is desireable, it is not necessary for correctness of the solution. Thus an ordered collection need not take account of the order of the elements.
The excerpt from the Apple documentation is a necessary restriction by the way. An object could not maintain the same hash value under mutation while also ensuring that objects with the same value have the same hash. That applies for the simplest of objects as well as collections. Of course it only usually matters that an object's hash changes when it is inside a container that uses the hash to organise it's elements. The upshot of all this is that mutable collections shouldn't mutate when placed inside another container, but then neither should any object that has a true hash function.
I have done some investigation into the NSArray and NSMutableArray default hash implementation and (unless I have misunderstood something) it seams like Apple do not follow thier own rules:
If a mutable object is added to a collection that uses hash values to
determine the object's position in the collection, the value returned
by the hash method of the object must not change while the object is
in the collection. Therefore, either the hash method must not rely on
any of the object's internal state information or you must make sure
the object's internal state information does not change while the
object is in the collection. Thus, for example, a mutable dictionary
can be put in a hash table but you must not change it while it is in
there. (Note that it can be difficult to know whether or not a given
object is in a collection.)
Here is my test code
NSMutableArray* myMutableArray = [NSMutableArray arrayWithObjects:#"a", #"b", #"c", nil];
NSMutableArray* containerForMutableArray = [NSMutableArray arrayWithObject:myMutableArray];
NSUInteger hashBeforeMutation = [[containerForMutableArray objectAtIndex:0] hash];
[[containerForMutableArray objectAtIndex:0] removeObjectAtIndex:1];
NSUInteger hashAfterMutation = [[containerForMutableArray objectAtIndex:0] hash];
NSLog(#"Hash Before: %d", hashBeforeMutation);
NSLog(#"Hash After : %d", hashAfterMutation);
The output is:
Hash Before: 3
Hash After : 2
So it seams like the default implementation for the Hash method on both NSArray and NSMutableArray is the count of the array and it dosn't care if its inside a collection or not.