Implementing path compression in a disjoint set data structure? - objective-c

Here is my Objective-C implementation of a disjoint set.
- Positive number point to parent.
- Negative number indicate root & children count. (So they each start disjointed at -1.)
- The index acts as the data I am grouping.
Seems to work ok... just had a couple questions.
find: How can I compress the path? Because I am not doing it recursively, do I have to store the path and loop it again to set after find root?
join: I am basing join on children count instead of depth!? I guess that is not right. Do I need to do something special during join if depths equal?
Thanks.
DisjointSet.h
#interface DisjointSet : NSObject
{
NSMutableArray *_array;
}
- (id)initWithSize:(NSInteger)size;
- (NSInteger)find:(NSInteger)item;
- (void)join:(NSInteger)root1 root2:(NSInteger)root2;
#end
DisjointSet.m
#import "DisjointSet.h"
#implementation DisjointSet
- (id)initWithSize:(NSInteger)size
{
self = [super init];
if (self)
{
_array = [NSMutableArray arrayWithCapacity:size];
for (NSInteger i = 0; i < size; i++)
{
[_array addObject:[NSNumber numberWithInteger:-1]];
}
}
return self;
}
- (NSInteger)find:(NSInteger)item
{
while ([[_array objectAtIndex:item] integerValue] >= 0)
{
item = [[_array objectAtIndex:item] integerValue];
}
return item;
}
- (void)join:(NSInteger)root1 root2:(NSInteger)root2
{
if (root1 == root2) return;
NSInteger data1 = [[_array objectAtIndex:root1] integerValue];
NSInteger data2 = [[_array objectAtIndex:root2] integerValue];
if (data2 < data1)
{
[_array setObject:[NSNumber numberWithInteger:data2 + data1] atIndexedSubscript:root2];
[_array setObject:[NSNumber numberWithInteger:root2] atIndexedSubscript:root1];
}
else
{
[_array setObject:[NSNumber numberWithInteger:data1 + data2] atIndexedSubscript:root1];
[_array setObject:[NSNumber numberWithInteger:root1] atIndexedSubscript:root2];
}
}
#end

For the find operation, there is no need to store the path (separately from your _array) or to use recursion. Either of those approaches requires O(P) storage (P = path length). Instead, you can just traverse the path twice. The first time, you find the root. The second time, you set all of the children to point to the root. This takes O(P) time and O(1) storage.
- (NSInteger)findItem:(NSInteger)item {
NSInteger root;
NSNumber *rootObject = nil;
for (NSInteger i = item; !rootObject; ) {
NSInteger parent = [_array[i] integerValue];
if (parent < 0) {
root = i;
rootObject = #(i);
}
i = parent;
}
for (NSInteger i = item; i != root; ) {
NSInteger parent = [_array[i] integerValue];
_array[i] = rootObject;
i = parent;
}
return root;
}
For the merge operation, you want to store each root's rank (which is an upper bound on its depth), not each root's descendant count. Storing each root's rank allows you to merge the shorter tree into the taller tree, which guarantees O(log N) time for find operations. The rank only increases when the trees to be merged have equal rank.
- (void)joinItem:(NSInteger)a item:(NSInteger)b {
NSInteger aRank = -[_array[a] integerValue];
NSInteger bRank = -[_array[b] integerValue];
if (aRank < bRank) {
NSInteger t = a;
a = b;
b = t;
} else if (aRank == bRank) {
_array[a] = #(-aRank - 1);
}
_array[b] = #(a);
}

You definitely should implement path compression using recursion. I would not even think about trying to do it non-recursively.
Implementing the disjoin-set datastructure should be very easy, and can be done in few lines. Its very, very easy to translate it from the pseudocode to any programming language. You can find the pseudocode on Wikipedia. (Unfortunately, I can't read Objective-C, so I cannot really judge wether your code is correct or not).

Yes. To implement highest ancestor compression without recursion you need to maintain your own list. Make one pass up the chain to get pointers to the sets that need their parent pointers changed and also to learn the root. Then make a second pass to update the necessary parent pointers.
The recursive method is doing the same thing. The first pass is the "winding up" of the recursion, which stores the sets needing parent pointer updates on the program stack. The second pass is in reverse as the recursion unwinds.
I differ with those who say the recursive method is always best. In a reasonable number systems (especially embedded ones), the program stack is of limited size. There are cases where many unions are performed in a row before a find. In such cases, the parent chain can be O(n) in size for n elements. Here collapsing by recursion can blow out the stack. Since you are working in Objective C, this may be iOS. I do not know the default stack size there, but if you use recursion it's worth looking at. It might be smaller than you think. This article implies 512K for secondary threads and 1Mb for the main thread.
Iterative, constant space alternative
Actually the main reason I'm writing is to point out that you still get O(log^* n) for n ammortized operations -- just a shade less efficient than collapsing, and still effectively O(1) -- if you only do factor-of-two compression: in the find operation, change parent pointers so that they point to the grandparents instead instead of the root. This can be done with iteration in constant storage. This lecture at Princeton talks about this algorithm and implements it in a loop with 5 lines of C. See slide 29.

Related

Improving this recursive algorithm for efficient path finding

I've been trying with limited success to find a more efficient way of discovering all paths along a tree - from multiple roots to multiple leaves. I had it working somewhat using mutable strings and snipping the heads and tails (like a blockchain) but it's buggy and my gut says there has to be a more efficient route (no pun intended).
All relationships between nodes are first defined as parent-child and are held in a dictionary like so:
Parent1 = [child1, child2, child3]
Parent2 = [child1, child2, child4]
Combined with a list of all nodes, the following sets can be calculated:
Elders (nodes without parents but with children)
Younglings (nodes with parents but without children)
Everyone else (children / parents, any generation)
Free floaters (nodes with neither parents nor children)
My thinking is that the elders should be the starting point (root) and all possible paths to the younglings (leaves) could be calculated using BFS Breath-First-Search, but was under the impression that it only allowed for a single root-node / elder. I need an algorithm that can accept multiple elders and multiple younglings. My code so far:
NSMutableArray * ma = [NSMutableArray new];
NSArray * from = [NSArray arrayWithArray:[eldersSet allObjects]];
NSArray * to = [NSArray arrayWithArray:[youngSet allObjects]];
for (int n = 0; n < (int)from.count; n++){
NSString * fromString = from[n];
for (int k = 0; k < (int)to.count; k++){
NSString * toString = to[k];
NSString * toFrom = [NSString stringWithFormat:#"%# -> %#", fromString, toString];
NSArray * array = [self searchTreeFrom:fromString childID:toString runArray:[NSArray new]];
if (array.count == 0){
continue;
}
[ma addObject:#{#"path":toFrom, #"array":array}];
}
}
NSLog(#"ma is %#", ma);
-(NSArray *)searchTreeFrom:(NSString *)parentID childID:(NSString *)childID runArray:(NSArray *)runArray{
if ([parentID isEqualToString:childID]){
NSMutableArray * ma = [NSMutableArray array];
[ma addObjectsFromArray:runArray];
[ma addObject:childID];
return (NSArray *)ma;
}
NSMutableArray * result = [NSMutableArray new];
if ([localR valueForKey:parentID]){
NSArray * localArray = localR[parentID];
for (int n = 0; n < (int)localArray.count; n++){
result = [NSMutableArray arrayWithArray:[self searchTreeFrom:localArray[n] childID:childID runArray:result]];
if (result.count > 0){
[result insertObject:parentID atIndex:0];
return result;
}
}
}
return result;
}
It's mostly working, even where routes overlap in an (X) shape, with two elders, one shared child node in the middle and two younglings - it still calculates all the paths correctly. However a (<>) diamond shape leads to the issue that the returned path only goes along the top route (the first child node).
A (<>) diamond has two paths, but the issue is that only one path is requested, from single elder to single youngling, whereas in the former (X) four are requested (two elders, two younglings).
I'm wondering if I'm going about this all wrong, as the total number of paths in a (<>) diamond style shape cannot be known in advance, and instead to focus on the recursive function, not returning a value where n == 0, but forcing it through the entire list of children.
Has anyone had any experience with multiple-root -> multiple-leaf path generation? Am I doing something stupid? The goal isn't to find the most efficient path or anything, only to discover all paths. I should mention that the localR variable listed above is the parent=[child,child] dictionary. I'll include output and visual below:

How to implement circular buffer in Objective C for high performance

We want to add an array of doubles to a circular buffer in Objective C many times a second.
We are currently using a NSMutableArray nested within another NSMutableArray (2D array). This works fine but is too slow for our needs.
We want to add to the circular buffer many times a second. When we do this and do performance monitoring we see the call to removeObjectAtIndex:0 become a bottleneck (shift n-1 objects or O(n-1)). This is because we have many thousands of entries in our circular buffer.
We have looked at possibly using STL and std::deque. We have also looked at CHDataStructures. As you know, STL is in in C++ and can be integrated but is not as straight forward as an Objective C solution. CHDataStructures is getting dated and is not ARC compliant.
Please suggest how we should implement a circular buffer (for our array of doubles) for high performance with a code sample if possible.
Having read your comment (and thought about it a bit more) I realised use of a regular NSArray would be better, as there are no memory management issues (NSArrays retain their objects naturally). Just define the capacity up front to avoid it having to reallocate memory as it runs. Calling [self resetBuffer] will quickly release all data and start again.
#define BUFFER_SIZE 1000
#implementation ViewController {
NSMutableArray *circularBuffer;
NSUInteger bufferHead;
}
- (instancetype)initWithCoder:(NSCoder *)aDecoder {
if (self = [super initWithCoder:aDecoder]) {
[self resetBuffer];
}
return self;
}
- (void)addArrayToBuffer:(NSMutableArray *)incoming {
if (bufferHead < circularBuffer.count)
[circularBuffer replaceObjectAtIndex:bufferHead withObject:incoming];
else
[circularBuffer addObject:incoming];
bufferHead = (bufferHead + 1) % BUFFER_SIZE;
}
- (NSArray *)bufferContent {
if (circularBuffer.count < BUFFER_SIZE) {
return circularBuffer;
} else {
NSArray *arrHead = [circularBuffer objectsAtIndexes:[NSIndexSet indexSetWithIndexesInRange:NSMakeRange(0, bufferHead)]];
NSArray *arrTail = [circularBuffer objectsAtIndexes:[NSIndexSet indexSetWithIndexesInRange:NSMakeRange(bufferHead, BUFFER_SIZE-bufferHead)]];
return [arrTail arrayByAddingObjectsFromArray:arrHead];
}
}
- (void)resetBuffer {
circularBuffer = [NSMutableArray arrayWithCapacity:BUFFER_SIZE];
bufferHead = 0;
}

NSMutableArray was mutated while being enumerated

I have an array in an old objective-C app that I am using to learn more "complicated" coding. It is back from the old days of OS X and was very much broken. I have gotten it to work (mostly)! However, the app has an NSMutableArray of images, 7 in total. I use a random number generator to insert the images on the screen, some code to allow them to fall, and then, using screen bounds, when they reach "0" on the Y axis they are removed from the array.
I initially just had:
if( currentFrame.origin.y+currentFrame.size.height <= 0 )
{
[flakesArray removeObject:myItem];
I have read when removing objects from an array it is best practice to iterate in reverse...so I have this bit of code:
for (NSInteger i = myArray.count - 1; i >= 0; i--)
{ //added for for statement
if( currentFrame.origin.y+currentFrame.size.height <= 0 )
{
[myArray removeObjectAtIndex:i];
}
Sadly both methods result in the same mutated while enumerated error. Am I missing something obvious?
If I add an NSLog statement I can get, I think, the index of the item that needs to be removed:
NSLog (#"Shazam! %ld", (long)i);
2017-01-07 14:39:42.086667 MyApp[45995:7500033] Shazam! 2
I have looked through a lot and tried several different methods including this one, which looks to be the most popular with the same error.
Thank you in advance! I will happily provide any additional information!
Adding more:
Sorry guys I am not explicitly calling NSFastEnumeration but I have this:
- (void) drawRectCocoa:(NSRect)rect
{
NSEnumerator* flakesEnum = [flakesArray objectEnumerator];
then
for( i = 0; i < numberToCreate; i++ )
{
[self newObject:self];
}
while( oneFlake = [flakesEnum nextObject] )
It is here where:
if( currentFrame.origin.y+currentFrame.size.height <= 0 )
{
NSLog (#"Shazam! %i", oneFlake);
[flakesArray removeObject:oneFlake];
}
Thank you all. I am learning a lot from this discussion!
There are two ways to go: (1) collect the objects to remove then remove them with removeObjectsInArray:.
NSMutableArray *removeThese = [NSMutableArray array];
for (id item in myArray) {
if (/* item satisfies some condition for removal */) {
[removeThese addObject:item];
}
}
// the following (and any other method that mutates the array) must be done
// *outside of* the loop that enumerates the array
[myArray removeObjectsInArray:removeThese];
Alternatively, reverseObjectEnumeration is tolerant of removes during iteration...
for (id item in [myArray reverseObjectEnumerator]) {
if (/* item satisfies some condition for removal */) {
[myArray removeObject: item];
}
}
As per the error, you may not mutate any NSMutableArray (or any NSMutable... collection) while it is being enumerated as part of any fast enumeration loop (for (... in ...) { ... }).
#danh's answer works as well, but involves allocating a new array of elements. There are two simpler and more efficient ways to filter an array:
[array filterUsingPredicate:[NSPredicate predicateWithBlock:^(id element, NSDictionary<NSString *,id> *bindings) {
// if element should stay, return YES; if it should be removed, return NO
}];
or
NSMutableIndexSet *indicesToRemove = [NSMutableIndexSet new];
for (NSUInteger i = 0; i < array.count; i += 1) {
if (/* array[i] should be removed */) {
[indicesToRemove addIndex:i];
}
}
[array removeObjectsAtIndexes:indicesToRemove];
filterUsingPredicate: will likely be slightly faster (since it uses fast enumeration itself), but depending on the specific application, removeObjectsAtIndexes: may be more flexible.
No matter what, if you're using your array inside a fast enumeration loop, you will have to perform the modification outside of the loop. You can use filterUsingPredicate: to replace the loop altogether, or you can keep the loop and keep track of the indices of the elements you want to remove for later.

slow performance ai objective-c iOS

I already have my brain broken with ai for tic-tac-toe type board game.
Problem is slow ai performance on high levels (even low levels has not so quick).
AI uses recursive method to find best move from number of available moves.
Here is some code:
#impelementation AIClass
- (NSMutableArray *)findLegalMoves
{
// Here is code that finds available legal moves
// Loop over board array
}
- (float)scoreOpponentsTurn:(float)min max:(float)max depth:(int)depth
{
moveType t; // moveType is struct defined in .h file
// typedef struct { int x, y; } moveType
NSMutableArray *moves = [self findLegalMoves];
for ( NSValue *val in moves ) {
[val getValue:&it]
float score = [self scoreMove:it min:min max:max depth:depth];
if ( score > min ) {
min = score;
}
if ( score > max ) {
min = 1000000000;
}
}
return min;
}
- (float)scoreMove:(moveType)m min:(float)min max:(float)max depth:(int)depth
{
NSMutableArray *changes = [[NSMutableArray alloc] init];
NSMutableArray *undo = [[NSMutableArray alloc] init];
float score;
[self.board evaluateChangesOnCol:m.x Row:m.y];
if ( [self.board checkForWin:&changes undo:&undo] ) {
score = 1000000000 + [self calcH]; //calcH - evals heuristic like sum of params
} else if ( depth > 0 ) {
score = - [self scoreOpponentsTurn:-1000000000 max:-min depth:depth - 1];
} else {
score = [self calcH];
}
[self.board applyChanges:undo];
}
- (moveType)findBestMove
{
NSMutableArray *legalMoves = [self findLegalMoves];
NSMutableArray *bestMoves = [[NSMutableArray alloc] init];
int min = -1000000000;
int max = 1000000000;
moveType move;
for ( NSValue *moveIt in legalMoves ) {
[moveIt getValue:&move];
float score = [self scoreMove:move min:min max:max depth:depth];
// Here i have some conditions to decide current move is best or not
}
// and pick random move from best moves and assign it to move variable
return move;
}
#end
And if number of legal moves like 3 and more (over recursive search it grows) this algorithm
works very slow.
It's my first objective-c experience.
Here is my guesses about how to improve performance:
Remove recursion (but I don't see another solution)
Use multithreading (how?)
May be use some ai library?
Sorry for my english.
Throwing away recursion in an algorithm that is a natural fit for recursion is not a good idea. Rather, you need to memoize your recursive solution. This common trick speeds up recursive solutions with common subproblems by orders of magnitude.
Consider these two sequences of moves:
x:(1,1) o:(2,1) x:(1,0) o:(2,0)
and
x:(1,0) o:(2,0) x:(1,1) o:(2,1)
The sequences are different, but they arrive at the same final state:
| | x | o
------------
| | x | o
Here is the root cause of the slowness: when your program arrives at a repeated state for the second time, it evaluates the position exactly as if it's the first time that it has seen it. This is wasteful: identical positions with three-move look-aheads will be evaluated four times; with four-move look-aheads, they would be evaluated eight times, and so on. This causes slowness proportional to 2^N, where N is the depth of your look-ahead.
Fixing this requires an addition of a lookup table. Given a state of the game, this table would give you the score for you or for the opponent, if such score has been calculated before. Your recursive functions would build a position key, and try a lookup in the score table. If the answer is there, it would be returned immediately. Otherwise, your function would construct the answer, and store it at the position key. Next time the same position occurs through a different series of moves, the answer would be reused.
You might want to try Alpha-beta pruning. It's possible that your game has as high branching factor, in which case you will want to avoid searching areas that won't affect the outcome.
You can also limit your search to a certain depth. Pick a depth that retrieves competent moves, but doesn't take too long.

Best practices for overriding isEqual: and hash

How do you properly override isEqual: in Objective-C? The "catch" seems to be that if two objects are equal (as determined by the isEqual: method), they must have the same hash value.
The Introspection section of the Cocoa Fundamentals Guide does have an example on how to override isEqual:, copied as follows, for a class named MyWidget:
- (BOOL)isEqual:(id)other {
if (other == self)
return YES;
if (!other || ![other isKindOfClass:[self class]])
return NO;
return [self isEqualToWidget:other];
}
- (BOOL)isEqualToWidget:(MyWidget *)aWidget {
if (self == aWidget)
return YES;
if (![(id)[self name] isEqual:[aWidget name]])
return NO;
if (![[self data] isEqualToData:[aWidget data]])
return NO;
return YES;
}
It checks pointer equality, then class equality, and finally compares the objects using isEqualToWidget:, which only checks the name and data properties. What the example doesn't show is how to override hash.
Let's assume there are other properties that do not affect equality, say age. Shouldn't the hash method be overridden such that only name and data affect the hash? And if so, how would you do that? Just add the hashes of name and data? For example:
- (NSUInteger)hash {
NSUInteger hash = 0;
hash += [[self name] hash];
hash += [[self data] hash];
return hash;
}
Is that sufficient? Is there a better technique? What if you have primitives, like int? Convert them to NSNumber to get their hash? Or structs like NSRect?
(Brain fart: Originally wrote "bitwise OR" them together with |=. Meant add.)
Start with
NSUInteger prime = 31;
NSUInteger result = 1;
Then for every primitive you do
result = prime * result + var
For objects you use 0 for nil and otherwise their hashcode.
result = prime * result + [var hash];
For booleans you use two different values
result = prime * result + ((var)?1231:1237);
Explanation and Attribution
This is not tcurdt's work, and comments were asking for more explanation, so I believe an edit for attribution is fair.
This algorithm was popularized in the book "Effective Java", and the relevant chapter can currently be found online here. That book popularized the algorithm, which is now a default in a number of Java applications (including Eclipse). It derived, however, from an even older implementation which is variously attributed to Dan Bernstein or Chris Torek. That older algorithm originally floated around on Usenet, and certain attribution is difficult. For example, there is some interesting commentary in this Apache code (search for their names) that references the original source.
Bottom line is, this is a very old, simple hashing algorithm. It is not the most performant, and it is not even proven mathematically to be a "good" algorithm. But it is simple, and a lot of people have used it for a long time with good results, so it has a lot of historical support.
I'm just picking up Objective-C myself, so I can't speak for that language specifically, but in the other languages I use if two instances are "Equal" they must return the same hash - otherwise you are going to have all sorts of problems when trying to use them as keys in a hashtable (or any dictionary-type collections).
On the other hand, if 2 instances are not equal, they may or may not have the same hash - it is best if they don't. This is the difference between an O(1) search on a hash table and an O(N) search - if all your hashes collide you may find that searching your table is no better than searching a list.
In terms of best practices your hash should return a random distribution of values for its input. This means that, for example, if you have a double, but the majority of your values tend to cluster between 0 and 100, you need to make sure that the hashes returned by those values are evenly distributed across the entire range of possible hash values. This will significantly improve your performance.
There are a number of hashing algorithms out there, including several listed here. I try to avoid creating new hash algorithms as it can have large performance implications, so using the existing hash methods and doing a bitwise combination of some sort as you do in your example is a good way to avoid it.
A simple XOR over the hash values of critical properties is sufficient
99% of the time.
For example:
- (NSUInteger)hash
{
return [self.name hash] ^ [self.data hash];
}
Solution found at http://nshipster.com/equality/ by Mattt Thompson (which also referred to this question in his post :~)
I found this thread extremely helpful supplying everything I needed to get my isEqual: and hash methods implemented with one catch. When testing object instance variables in isEqual: the example code uses:
if (![(id)[self name] isEqual:[aWidget name]])
return NO;
This repeatedly failed (i.e., returned NO) without and error, when I knew the objects were identical in my unit testing. The reason was, one of the NSString instance variables was nil so the above statement was:
if (![nil isEqual: nil])
return NO;
and since nil will respond to any method, this is perfectly legal but
[nil isEqual: nil]
returns nil, which is NO, so when both the object and the one being tested had a nil object they would be considered not equal (i.e., isEqual: would return NO).
This simple fix was to change the if statement to:
if ([self name] != [aWidget name] && ![(id)[self name] isEqual:[aWidget name]])
return NO;
This way, if their addresses are the same it skips the method call no matter if they are both nil or both pointing to the same object but if either is not nil or they point to different objects then the comparator is appropriately called.
I hope this saves someone a few minutes of head scratching.
The hash function should create a semi-unique value that is not likely to collide or match another object's hash value.
Here is the full hash function, which can be adapted to your classes instance variables. It uses NSUInteger's rather than int for compatibility on 64/32bit applications.
If the result becomes 0 for different objects, you run the risk of colliding hashes. Colliding hashes can result in unexpected program behavior when working with some of the collection classes that depend on the hash function. Make sure to test your hash function prior to use.
-(NSUInteger)hash {
NSUInteger result = 1;
NSUInteger prime = 31;
NSUInteger yesPrime = 1231;
NSUInteger noPrime = 1237;
// Add any object that already has a hash function (NSString)
result = prime * result + [self.myObject hash];
// Add primitive variables (int)
result = prime * result + self.primitiveVariable;
// Boolean values (BOOL)
result = prime * result + (self.isSelected ? yesPrime : noPrime);
return result;
}
The easy but inefficient way is to return the same -hash value for every instance. Otherwise, yes, you must implement hash based only on objects which affect equality. This is tricky if you use lax comparisons in -isEqual: (e.g. case-insensitive string comparisons). For ints, you can generally use the int itself, unless you’ll be comparing with NSNumbers.
Don’t use |=, though, it will saturate. Use ^= instead.
Random fun fact: [[NSNumber numberWithInt:0] isEqual:[NSNumber numberWithBool:NO]], but [[NSNumber numberWithInt:0] hash] != [[NSNumber numberWithBool:NO] hash]. (rdar://4538282, open since 05-May-2006)
Remember that you only need to provide hash that's equal when isEqual is true. When isEqual is false, the hash doesn't have to be unequal though presumably it is. Hence:
Keep hash simple. Pick a member (or few members) variable that is the most distinctive.
For example, for CLPlacemark, the name only is enough. Yes there are 2 or 3 distincts CLPlacemark with the exact same name but those are rare. Use that hash.
#interface CLPlacemark (equal)
- (BOOL)isEqual:(CLPlacemark*)other;
#end
#implementation CLPlacemark (equal)
...
-(NSUInteger) hash
{
return self.name.hash;
}
#end
Notice I do not bother specifying the city, the country, etc. The name is enough. Perhaps the name and CLLocation.
Hash should be evenly distributed. So you can combine several members variable using the caret ^ (xor sign)
So it's something like
hash = self.member1.hash ^ self.member2.hash ^ self.member3.hash
That way the hash will be evenly distributed.
Hash must be O(1), and not O(n)
So what to do in array?
Again, simple. You do not have to hash all members of the array. Enough to hash the first element, last element, the count, maybe some middle elements, and that's it.
The equals and hash contracts are well specified and thoroughly researched in the Java world (see #mipardi's answer), but all the same considerations should apply to Objective-C.
Eclipse does a reliable job of generating these methods in Java, so here's an Eclipse example ported by hand to Objective-C:
- (BOOL)isEqual:(id)object {
if (self == object)
return true;
if ([self class] != [object class])
return false;
MyWidget *other = (MyWidget *)object;
if (_name == nil) {
if (other->_name != nil)
return false;
}
else if (![_name isEqual:other->_name])
return false;
if (_data == nil) {
if (other->_data != nil)
return false;
}
else if (![_data isEqual:other->_data])
return false;
return true;
}
- (NSUInteger)hash {
const NSUInteger prime = 31;
NSUInteger result = 1;
result = prime * result + [_name hash];
result = prime * result + [_data hash];
return result;
}
And for a subclass YourWidget which adds a property serialNo:
- (BOOL)isEqual:(id)object {
if (self == object)
return true;
if (![super isEqual:object])
return false;
if ([self class] != [object class])
return false;
YourWidget *other = (YourWidget *)object;
if (_serialNo == nil) {
if (other->_serialNo != nil)
return false;
}
else if (![_serialNo isEqual:other->_serialNo])
return false;
return true;
}
- (NSUInteger)hash {
const NSUInteger prime = 31;
NSUInteger result = [super hash];
result = prime * result + [_serialNo hash];
return result;
}
This implementation avoids some subclassing pitfalls in the sample isEqual: from Apple:
Apple's class test other isKindOfClass:[self class] is asymmetric for two different subclasses of MyWidget. Equality needs to be symmetric: a=b if and only if b=a. This could easily be fixed by changing the test to other isKindOfClass:[MyWidget class], then all MyWidget subclasses would be mutually comparable.
Using an isKindOfClass: subclass test prevents subclasses from overriding isEqual: with a refined equality test. This is because equality needs to be transitive: if a=b and a=c then b=c. If a MyWidget instance compares equal to two YourWidget instances, then those YourWidget instances must compare equal to each other, even if their serialNo differs.
The second issue can be fixed by only considering objects to be equal if they belong to the exact same class, hence the [self class] != [object class] test here. For typical application classes, this seems to be the best approach.
However, there certainly are cases where the isKindOfClass: test is preferable. This is more typical of framework classes than application classes. For example, any NSString should compare equal to any other other NSString with the same underlying character sequence, regardless of the NSString/NSMutableString distinction, and also regardless of what private classes in the NSString class cluster are involved.
In such cases, isEqual: should have well-defined, well-documented behaviour, and it should be made clear that subclasses can't override this. In Java, the 'no override' restriction can be enforced by flagging the equals and hashcode methods as final, but Objective-C has no equivalent.
Hold on, surely a far easier way to do this is to first override - (NSString )description and provide a string representation of your object state (you must represent the entire state of your object in this string).
Then, just provide the following implementation of hash:
- (NSUInteger)hash {
return [[self description] hash];
}
This is based on the principle that "if two string objects are equal (as determined by the isEqualToString: method), they must have the same hash value."
Source: NSString Class Reference
This doesn't directly answer your question (at all) but I've used MurmurHash before to generate hashes: murmurhash
Guess I should explain why: murmurhash is bloody fast...
I've found this page to be a helpful guide in override equals- and hash-type methods. It includes a decent algorithm for calculating hash codes. The page is geared towards Java, but it's pretty easy to adapt it to Objective-C/Cocoa.
I'm an Objective C newbie too, but I found an excellent article about identity vs. equality in Objective C here. From my reading it looks like you might be able to just keep the default hash function (which should provide a unique identity) and implement the isEqual method so that it compares data values.
Quinn is just wrong that the reference to the murmur hash is useless here. Quinn is right that you want to understand the theory behind hashing. The murmur distills a lot of that theory into an implementation. Figuring out how to apply that implementation to this particular application is worth exploring.
Some of the key points here:
The example function from tcurdt suggests that '31' is a good multiplier because it is prime. One needs to show that being prime is a necessary and sufficient condition. In fact 31 (and 7) are probably not particularly good primes because 31 == -1 % 32. An odd multiplier with about half the bits set and half the bits clear is likely to be better. (The murmur hash multiplication constant has that property.)
This type of hash function would likely be stronger if, after multiplying, the result value were adjusted via a shift and xor. The multiplication tends to produce the results of lots of bit interactions at the high end of the register and low interaction results at the bottom end of the register. The shift and xor increases the interactions at the bottom end of the register.
Setting the initial result to a value where about half the bits are zero and about half the bits are one would also tend to be useful.
It may be useful to be careful about the order in which elements are combined. One should probably first process booleans and other elements where the values are not strongly distributed.
It may be useful to add a couple of extra bit scrambling stages at the end of the computation.
Whether or not the murmur hash is actually fast for this application is an open question. The murmur hash premixes the bits of each input word. Multiple input words can be processed in parallel which helps multiple-issue pipelined cpus.
Combining #tcurdt's answer with #oscar-gomez's answer for getting property names, we can create an easy drop-in solution for both isEqual and hash:
NSArray *PropertyNamesFromObject(id object)
{
unsigned int propertyCount = 0;
objc_property_t * properties = class_copyPropertyList([object class], &propertyCount);
NSMutableArray *propertyNames = [NSMutableArray arrayWithCapacity:propertyCount];
for (unsigned int i = 0; i < propertyCount; ++i) {
objc_property_t property = properties[i];
const char * name = property_getName(property);
NSString *propertyName = [NSString stringWithUTF8String:name];
[propertyNames addObject:propertyName];
}
free(properties);
return propertyNames;
}
BOOL IsEqualObjects(id object1, id object2)
{
if (object1 == object2)
return YES;
if (!object1 || ![object2 isKindOfClass:[object1 class]])
return NO;
NSArray *propertyNames = PropertyNamesFromObject(object1);
for (NSString *propertyName in propertyNames) {
if (([object1 valueForKey:propertyName] != [object2 valueForKey:propertyName])
&& (![[object1 valueForKey:propertyName] isEqual:[object2 valueForKey:propertyName]])) return NO;
}
return YES;
}
NSUInteger MagicHash(id object)
{
NSUInteger prime = 31;
NSUInteger result = 1;
NSArray *propertyNames = PropertyNamesFromObject(object);
for (NSString *propertyName in propertyNames) {
id value = [object valueForKey:propertyName];
result = prime * result + [value hash];
}
return result;
}
Now, in your custom class you can easily implement isEqual: and hash:
- (NSUInteger)hash
{
return MagicHash(self);
}
- (BOOL)isEqual:(id)other
{
return IsEqualObjects(self, other);
}
Note that if you're creating a object that can be mutated after creation, the hash value must not change if the object is inserted into a collection. Practically speaking, this means that the hash value must be fixed from the point of the initial object creation. See Apple's documentation on the NSObject protocol's -hash method for more information:
If a mutable object is added to a collection that uses hash values to determine the object’s position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object’s internal state information or you must make sure the object’s internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)
This sounds like complete whackery to me since it potentially effectively renders hash lookups far less efficient, but I suppose it's better to err on the side of caution and follow what the documentation says.
Sorry if I risk sounding a complete boffin here but...
...nobody bothered mentioning that to follow 'best practices' you should definitely not specify an equals method that would NOT take into account all data owned by your target object, e.g whatever data is aggregated to your object, versus an associate of it, should be taken into account when implementing equals.
If you don't want to take, say 'age' into account in a comparison, then you should write a comparator and use that to perform your comparisons instead of isEqual:.
If you define an isEqual: method that performs equality comparison arbitrarily, you incur the risk that this method is misused by another developer, or even yourself, once you've forgotten the 'twist' in your equals interpretation.
Ergo, although this is a great q&a about hashing, you don't normally need to redefine the hashing method, you should probably define an ad-hoc comparator instead.