Best practices for overriding isEqual: and hash - objective-c

How do you properly override isEqual: in Objective-C? The "catch" seems to be that if two objects are equal (as determined by the isEqual: method), they must have the same hash value.
The Introspection section of the Cocoa Fundamentals Guide does have an example on how to override isEqual:, copied as follows, for a class named MyWidget:
- (BOOL)isEqual:(id)other {
if (other == self)
return YES;
if (!other || ![other isKindOfClass:[self class]])
return NO;
return [self isEqualToWidget:other];
}
- (BOOL)isEqualToWidget:(MyWidget *)aWidget {
if (self == aWidget)
return YES;
if (![(id)[self name] isEqual:[aWidget name]])
return NO;
if (![[self data] isEqualToData:[aWidget data]])
return NO;
return YES;
}
It checks pointer equality, then class equality, and finally compares the objects using isEqualToWidget:, which only checks the name and data properties. What the example doesn't show is how to override hash.
Let's assume there are other properties that do not affect equality, say age. Shouldn't the hash method be overridden such that only name and data affect the hash? And if so, how would you do that? Just add the hashes of name and data? For example:
- (NSUInteger)hash {
NSUInteger hash = 0;
hash += [[self name] hash];
hash += [[self data] hash];
return hash;
}
Is that sufficient? Is there a better technique? What if you have primitives, like int? Convert them to NSNumber to get their hash? Or structs like NSRect?
(Brain fart: Originally wrote "bitwise OR" them together with |=. Meant add.)

Start with
NSUInteger prime = 31;
NSUInteger result = 1;
Then for every primitive you do
result = prime * result + var
For objects you use 0 for nil and otherwise their hashcode.
result = prime * result + [var hash];
For booleans you use two different values
result = prime * result + ((var)?1231:1237);
Explanation and Attribution
This is not tcurdt's work, and comments were asking for more explanation, so I believe an edit for attribution is fair.
This algorithm was popularized in the book "Effective Java", and the relevant chapter can currently be found online here. That book popularized the algorithm, which is now a default in a number of Java applications (including Eclipse). It derived, however, from an even older implementation which is variously attributed to Dan Bernstein or Chris Torek. That older algorithm originally floated around on Usenet, and certain attribution is difficult. For example, there is some interesting commentary in this Apache code (search for their names) that references the original source.
Bottom line is, this is a very old, simple hashing algorithm. It is not the most performant, and it is not even proven mathematically to be a "good" algorithm. But it is simple, and a lot of people have used it for a long time with good results, so it has a lot of historical support.

I'm just picking up Objective-C myself, so I can't speak for that language specifically, but in the other languages I use if two instances are "Equal" they must return the same hash - otherwise you are going to have all sorts of problems when trying to use them as keys in a hashtable (or any dictionary-type collections).
On the other hand, if 2 instances are not equal, they may or may not have the same hash - it is best if they don't. This is the difference between an O(1) search on a hash table and an O(N) search - if all your hashes collide you may find that searching your table is no better than searching a list.
In terms of best practices your hash should return a random distribution of values for its input. This means that, for example, if you have a double, but the majority of your values tend to cluster between 0 and 100, you need to make sure that the hashes returned by those values are evenly distributed across the entire range of possible hash values. This will significantly improve your performance.
There are a number of hashing algorithms out there, including several listed here. I try to avoid creating new hash algorithms as it can have large performance implications, so using the existing hash methods and doing a bitwise combination of some sort as you do in your example is a good way to avoid it.

A simple XOR over the hash values of critical properties is sufficient
99% of the time.
For example:
- (NSUInteger)hash
{
return [self.name hash] ^ [self.data hash];
}
Solution found at http://nshipster.com/equality/ by Mattt Thompson (which also referred to this question in his post :~)

I found this thread extremely helpful supplying everything I needed to get my isEqual: and hash methods implemented with one catch. When testing object instance variables in isEqual: the example code uses:
if (![(id)[self name] isEqual:[aWidget name]])
return NO;
This repeatedly failed (i.e., returned NO) without and error, when I knew the objects were identical in my unit testing. The reason was, one of the NSString instance variables was nil so the above statement was:
if (![nil isEqual: nil])
return NO;
and since nil will respond to any method, this is perfectly legal but
[nil isEqual: nil]
returns nil, which is NO, so when both the object and the one being tested had a nil object they would be considered not equal (i.e., isEqual: would return NO).
This simple fix was to change the if statement to:
if ([self name] != [aWidget name] && ![(id)[self name] isEqual:[aWidget name]])
return NO;
This way, if their addresses are the same it skips the method call no matter if they are both nil or both pointing to the same object but if either is not nil or they point to different objects then the comparator is appropriately called.
I hope this saves someone a few minutes of head scratching.

The hash function should create a semi-unique value that is not likely to collide or match another object's hash value.
Here is the full hash function, which can be adapted to your classes instance variables. It uses NSUInteger's rather than int for compatibility on 64/32bit applications.
If the result becomes 0 for different objects, you run the risk of colliding hashes. Colliding hashes can result in unexpected program behavior when working with some of the collection classes that depend on the hash function. Make sure to test your hash function prior to use.
-(NSUInteger)hash {
NSUInteger result = 1;
NSUInteger prime = 31;
NSUInteger yesPrime = 1231;
NSUInteger noPrime = 1237;
// Add any object that already has a hash function (NSString)
result = prime * result + [self.myObject hash];
// Add primitive variables (int)
result = prime * result + self.primitiveVariable;
// Boolean values (BOOL)
result = prime * result + (self.isSelected ? yesPrime : noPrime);
return result;
}

The easy but inefficient way is to return the same -hash value for every instance. Otherwise, yes, you must implement hash based only on objects which affect equality. This is tricky if you use lax comparisons in -isEqual: (e.g. case-insensitive string comparisons). For ints, you can generally use the int itself, unless you’ll be comparing with NSNumbers.
Don’t use |=, though, it will saturate. Use ^= instead.
Random fun fact: [[NSNumber numberWithInt:0] isEqual:[NSNumber numberWithBool:NO]], but [[NSNumber numberWithInt:0] hash] != [[NSNumber numberWithBool:NO] hash]. (rdar://4538282, open since 05-May-2006)

Remember that you only need to provide hash that's equal when isEqual is true. When isEqual is false, the hash doesn't have to be unequal though presumably it is. Hence:
Keep hash simple. Pick a member (or few members) variable that is the most distinctive.
For example, for CLPlacemark, the name only is enough. Yes there are 2 or 3 distincts CLPlacemark with the exact same name but those are rare. Use that hash.
#interface CLPlacemark (equal)
- (BOOL)isEqual:(CLPlacemark*)other;
#end
#implementation CLPlacemark (equal)
...
-(NSUInteger) hash
{
return self.name.hash;
}
#end
Notice I do not bother specifying the city, the country, etc. The name is enough. Perhaps the name and CLLocation.
Hash should be evenly distributed. So you can combine several members variable using the caret ^ (xor sign)
So it's something like
hash = self.member1.hash ^ self.member2.hash ^ self.member3.hash
That way the hash will be evenly distributed.
Hash must be O(1), and not O(n)
So what to do in array?
Again, simple. You do not have to hash all members of the array. Enough to hash the first element, last element, the count, maybe some middle elements, and that's it.

The equals and hash contracts are well specified and thoroughly researched in the Java world (see #mipardi's answer), but all the same considerations should apply to Objective-C.
Eclipse does a reliable job of generating these methods in Java, so here's an Eclipse example ported by hand to Objective-C:
- (BOOL)isEqual:(id)object {
if (self == object)
return true;
if ([self class] != [object class])
return false;
MyWidget *other = (MyWidget *)object;
if (_name == nil) {
if (other->_name != nil)
return false;
}
else if (![_name isEqual:other->_name])
return false;
if (_data == nil) {
if (other->_data != nil)
return false;
}
else if (![_data isEqual:other->_data])
return false;
return true;
}
- (NSUInteger)hash {
const NSUInteger prime = 31;
NSUInteger result = 1;
result = prime * result + [_name hash];
result = prime * result + [_data hash];
return result;
}
And for a subclass YourWidget which adds a property serialNo:
- (BOOL)isEqual:(id)object {
if (self == object)
return true;
if (![super isEqual:object])
return false;
if ([self class] != [object class])
return false;
YourWidget *other = (YourWidget *)object;
if (_serialNo == nil) {
if (other->_serialNo != nil)
return false;
}
else if (![_serialNo isEqual:other->_serialNo])
return false;
return true;
}
- (NSUInteger)hash {
const NSUInteger prime = 31;
NSUInteger result = [super hash];
result = prime * result + [_serialNo hash];
return result;
}
This implementation avoids some subclassing pitfalls in the sample isEqual: from Apple:
Apple's class test other isKindOfClass:[self class] is asymmetric for two different subclasses of MyWidget. Equality needs to be symmetric: a=b if and only if b=a. This could easily be fixed by changing the test to other isKindOfClass:[MyWidget class], then all MyWidget subclasses would be mutually comparable.
Using an isKindOfClass: subclass test prevents subclasses from overriding isEqual: with a refined equality test. This is because equality needs to be transitive: if a=b and a=c then b=c. If a MyWidget instance compares equal to two YourWidget instances, then those YourWidget instances must compare equal to each other, even if their serialNo differs.
The second issue can be fixed by only considering objects to be equal if they belong to the exact same class, hence the [self class] != [object class] test here. For typical application classes, this seems to be the best approach.
However, there certainly are cases where the isKindOfClass: test is preferable. This is more typical of framework classes than application classes. For example, any NSString should compare equal to any other other NSString with the same underlying character sequence, regardless of the NSString/NSMutableString distinction, and also regardless of what private classes in the NSString class cluster are involved.
In such cases, isEqual: should have well-defined, well-documented behaviour, and it should be made clear that subclasses can't override this. In Java, the 'no override' restriction can be enforced by flagging the equals and hashcode methods as final, but Objective-C has no equivalent.

Hold on, surely a far easier way to do this is to first override - (NSString )description and provide a string representation of your object state (you must represent the entire state of your object in this string).
Then, just provide the following implementation of hash:
- (NSUInteger)hash {
return [[self description] hash];
}
This is based on the principle that "if two string objects are equal (as determined by the isEqualToString: method), they must have the same hash value."
Source: NSString Class Reference

This doesn't directly answer your question (at all) but I've used MurmurHash before to generate hashes: murmurhash
Guess I should explain why: murmurhash is bloody fast...

I've found this page to be a helpful guide in override equals- and hash-type methods. It includes a decent algorithm for calculating hash codes. The page is geared towards Java, but it's pretty easy to adapt it to Objective-C/Cocoa.

I'm an Objective C newbie too, but I found an excellent article about identity vs. equality in Objective C here. From my reading it looks like you might be able to just keep the default hash function (which should provide a unique identity) and implement the isEqual method so that it compares data values.

Quinn is just wrong that the reference to the murmur hash is useless here. Quinn is right that you want to understand the theory behind hashing. The murmur distills a lot of that theory into an implementation. Figuring out how to apply that implementation to this particular application is worth exploring.
Some of the key points here:
The example function from tcurdt suggests that '31' is a good multiplier because it is prime. One needs to show that being prime is a necessary and sufficient condition. In fact 31 (and 7) are probably not particularly good primes because 31 == -1 % 32. An odd multiplier with about half the bits set and half the bits clear is likely to be better. (The murmur hash multiplication constant has that property.)
This type of hash function would likely be stronger if, after multiplying, the result value were adjusted via a shift and xor. The multiplication tends to produce the results of lots of bit interactions at the high end of the register and low interaction results at the bottom end of the register. The shift and xor increases the interactions at the bottom end of the register.
Setting the initial result to a value where about half the bits are zero and about half the bits are one would also tend to be useful.
It may be useful to be careful about the order in which elements are combined. One should probably first process booleans and other elements where the values are not strongly distributed.
It may be useful to add a couple of extra bit scrambling stages at the end of the computation.
Whether or not the murmur hash is actually fast for this application is an open question. The murmur hash premixes the bits of each input word. Multiple input words can be processed in parallel which helps multiple-issue pipelined cpus.

Combining #tcurdt's answer with #oscar-gomez's answer for getting property names, we can create an easy drop-in solution for both isEqual and hash:
NSArray *PropertyNamesFromObject(id object)
{
unsigned int propertyCount = 0;
objc_property_t * properties = class_copyPropertyList([object class], &propertyCount);
NSMutableArray *propertyNames = [NSMutableArray arrayWithCapacity:propertyCount];
for (unsigned int i = 0; i < propertyCount; ++i) {
objc_property_t property = properties[i];
const char * name = property_getName(property);
NSString *propertyName = [NSString stringWithUTF8String:name];
[propertyNames addObject:propertyName];
}
free(properties);
return propertyNames;
}
BOOL IsEqualObjects(id object1, id object2)
{
if (object1 == object2)
return YES;
if (!object1 || ![object2 isKindOfClass:[object1 class]])
return NO;
NSArray *propertyNames = PropertyNamesFromObject(object1);
for (NSString *propertyName in propertyNames) {
if (([object1 valueForKey:propertyName] != [object2 valueForKey:propertyName])
&& (![[object1 valueForKey:propertyName] isEqual:[object2 valueForKey:propertyName]])) return NO;
}
return YES;
}
NSUInteger MagicHash(id object)
{
NSUInteger prime = 31;
NSUInteger result = 1;
NSArray *propertyNames = PropertyNamesFromObject(object);
for (NSString *propertyName in propertyNames) {
id value = [object valueForKey:propertyName];
result = prime * result + [value hash];
}
return result;
}
Now, in your custom class you can easily implement isEqual: and hash:
- (NSUInteger)hash
{
return MagicHash(self);
}
- (BOOL)isEqual:(id)other
{
return IsEqualObjects(self, other);
}

Note that if you're creating a object that can be mutated after creation, the hash value must not change if the object is inserted into a collection. Practically speaking, this means that the hash value must be fixed from the point of the initial object creation. See Apple's documentation on the NSObject protocol's -hash method for more information:
If a mutable object is added to a collection that uses hash values to determine the object’s position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object’s internal state information or you must make sure the object’s internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)
This sounds like complete whackery to me since it potentially effectively renders hash lookups far less efficient, but I suppose it's better to err on the side of caution and follow what the documentation says.

Sorry if I risk sounding a complete boffin here but...
...nobody bothered mentioning that to follow 'best practices' you should definitely not specify an equals method that would NOT take into account all data owned by your target object, e.g whatever data is aggregated to your object, versus an associate of it, should be taken into account when implementing equals.
If you don't want to take, say 'age' into account in a comparison, then you should write a comparator and use that to perform your comparisons instead of isEqual:.
If you define an isEqual: method that performs equality comparison arbitrarily, you incur the risk that this method is misused by another developer, or even yourself, once you've forgotten the 'twist' in your equals interpretation.
Ergo, although this is a great q&a about hashing, you don't normally need to redefine the hashing method, you should probably define an ad-hoc comparator instead.

Related

How to find if an object of a class with same data already exists in a NSMutableArray?

I apologize for this basic question, but I am 2-month new to obj-c.
Problem:
I am not able to find if an object with same data already exists in the NSMutableArray.
What I am doing?
ScanDigInfoForTable* sfile = [[ScanDigInfoForTable alloc]init];
sfile.data = "myData";
int inde = [_DataList indexOfObject:sfile] ;
if(inde == -1)
[_DataList addObject:sfile];
ScanDigInfoForTable* sfile2 = [[ScanDigInfoForTable alloc]init];
sfile2.data = "myData";
inde = [_DataList indexOfObject:sfile2] ;
if(inde == -1)
[_DataList addObject:sfile2];
Issue:
The _DataList get 2 objects instead of 1. Many thanks in advance for your attention.
S.P: I already know that I may traverse the whole array in a loop in order to check the data already exists. Looking for a better solution as the array may have thousands of records.
Well, comparing two custom objects is really not that simple for the simple fact there is no defined way to declare equality. It is individual choice to define the rules for equality for the objects they are creating.
In your case, it would be two step process:
Step 1: Implement isEqual: in your ScanDigInfoForTable class. Assuming ScanDigInfoForTable is a model class and that it has three string properties - code, data & itemID (you can have any type).
- (BOOL)isEqual:(ScanDigInfoForTable *)other {
return [self.code isEqualToString:other.code] && [self.data isEqualToString:other.data] && [self.itemID isEqualToString:other.itemID];
}
Step 2: Call containsObject: method on NSMutableArray. This method would internally call isEqual: to give you the results based on the rules you defined.
// If the object does not exist in the list, we add it
if (![_DataList containsObject:sfile2]) {
[_DataList addObject:sfile2];
}
In Objective-C object equality is determined by the methods -isEqual: and -hash.
When testing object membership in a collection the items of the collection are sent isEqual:. The default implementation only compares the addresses of objects, which is why you are seeing duplicates. Your objects do no provide their own implementation of equality based on the data they contain.
To fix this you can override isEqual: to compare objects based on the data they represent. Using your example in your question, this could just be:
- (BOOL) isEqual:(id)object {
BOOL result = N0;
if (object != self){
if ([object isKindOfClass:[self class]]){
result = [[self data] isEqual:[(ScanDigInfoForTable *)object data]];
}
} else {
result = YES;
}
return result;
}
Mike Ash has a great article about implementing equality. In general, if you are implementing a custom class you should make equality a part of that.
You can user filteredArrayUsingPredicate for example
NSArray * matches = [_DataList filteredArrayUsingPredicate:
[NSPredicate predicateWithFormat:#"data == %# ",sfile2.data]];
if(matches.count == 0) {
[_DataList addObject:sfile2];
}
Something like this?
NSMutableSet* set1 = [NSMutableSet setWithArray:array1];
NSMutableSet* set2 = [NSMutableSet setWithArray:array2];
[set1 intersectSet:set2]; //this will give you only the obejcts that are in both sets
NSArray* result = [set1 allObjects];
This has the benefit of not looking up the objects in the array, while looping through another array, which has N^2 complexity.
and also set2 doesn't have to be mutable, might as well use just
NSSet* set2 = [NSSet setWithArray:array2];

Logically ANDing NSUInteger and String Type?

I've searched Stackoverflow and other sites, but I can't seem to find this answer.
In Apple Text Editor source, they have at least one routine that does some apparently strange logical ANDing between two non-boolean variables. Casting them as Bools CAN be done, but doesn't make much sense. I'm learning Swift and much less familiar with Objective-C, but for the life of me, I can't figure out how they are trying to achieve the goal stated as "Build list of encodings, sorted, and including only those with human readable names."
Here is the code:
/* Return a sorted list of all available string encodings.
*/
+ (NSArray *)allAvailableStringEncodings {
static NSMutableArray *allEncodings = nil;
if (!allEncodings) { // Build list of encodings, sorted, and including only those with human readable names
const CFStringEncoding *cfEncodings = CFStringGetListOfAvailableEncodings();
CFStringEncoding *tmp;
NSInteger cnt, num = 0;
while (cfEncodings[num] != kCFStringEncodingInvalidId) num++; // Count
tmp = malloc(sizeof(CFStringEncoding) * num);
memcpy(tmp, cfEncodings, sizeof(CFStringEncoding) * num); // Copy the list
qsort(tmp, num, sizeof(CFStringEncoding), encodingCompare); // Sort it
allEncodings = [[NSMutableArray alloc] init]; // Now put it in an NSArray
for (cnt = 0; cnt < num; cnt++) {
NSStringEncoding nsEncoding = CFStringConvertEncodingToNSStringEncoding(tmp[cnt]);
if (nsEncoding && [NSString localizedNameOfStringEncoding:nsEncoding]) [allEncodings addObject:[NSNumber numberWithUnsignedInteger:nsEncoding]];
}
free(tmp);
}
return allEncodings;
}
The line in question contains the "&&." Any guidance would be appreciated.
Objective-C is a strict superset of C, so the same rules for logical
operators apply. In contrast to Swift, which is much more strict with
types, the logical operators in C take arbitrary scalar operands.
(The boolean type bool did not even exist in early versions of C,
it was added with the C99 standard.)
The C standard specifies (see e.g. http://port70.net/~nsz/c/c11/n1570.pdf, which is a draft of the C11 standard):
6.5.13 Logical AND operator
Constraints
2 Each of the operands shall have scalar type.
Semantics
3 The && operator shall yield 1 if both of its operands compare
unequal to 0; otherwise, it yields 0. The result has type int.
In your case, in
if (nsEncoding && [NSString localizedNameOfStringEncoding:nsEncoding])
the left operand has type NSUInteger (which can be unsigned long
or unsigned int, depending on the platform), and the right
operand has type NSString *, which is a pointer type. Therefore
the above expression is equivalent to
if (nsEncoding != 0 && [NSString localizedNameOfStringEncoding:nsEncoding] != 0)
where the zero in the right operand is the null pointer constant
which is usually written as NULL, or nil for Objective-C pointers:
if (nsEncoding != 0 && [NSString localizedNameOfStringEncoding:nsEncoding] != nil)
Some more information how this relates to Swift
Cocoa/Cocoa Touch Objective-C methods which return an object pointer
usually return nil to indicate an error
(compare Handling Error Objects Returned From Methods
in the "Error Handling Programming Guide"). So
[NSString localizedNameOfStringEncoding:nsEncoding] != nil
would mean "no localized name for the encoding could be determined".
The Swift equivalent would be a method returning an optional string,
and you could check the success with
NSString.localizedNameOfStringEncoding(nsEncoding) != nil
However, this does not compile, and here is the reason why: If you option-click on the Objective-C localizedNameOfStringEncoding method
in Xcode to show its declaration then you'll see
+ (NSString * _Nonnull)localizedNameOfStringEncoding:(NSStringEncoding)encoding
Here _Nonnull indicates that the method is not expected to return
nil. This kind of nullability annotations were introduced to
improve the mapping of Objective-C methods to Swift, see for example
"Nullability and Objective-C" in the Swift Blog.
Because of this _Nonnull annotation, the method is imported to Swift
as
public class func localizedNameOfStringEncoding(encoding: UInt) -> String
So testing the return value in Objective-C can be done but makes no
sense because the method always returns a non-nil value.
In Swift the compiler assumes that the return value is never nil
and returns a non-optional String.
The translation of that if-statement to Swift would therefore just be
if nsEncoding != 0 {
// ...
}

Implementing path compression in a disjoint set data structure?

Here is my Objective-C implementation of a disjoint set.
- Positive number point to parent.
- Negative number indicate root & children count. (So they each start disjointed at -1.)
- The index acts as the data I am grouping.
Seems to work ok... just had a couple questions.
find: How can I compress the path? Because I am not doing it recursively, do I have to store the path and loop it again to set after find root?
join: I am basing join on children count instead of depth!? I guess that is not right. Do I need to do something special during join if depths equal?
Thanks.
DisjointSet.h
#interface DisjointSet : NSObject
{
NSMutableArray *_array;
}
- (id)initWithSize:(NSInteger)size;
- (NSInteger)find:(NSInteger)item;
- (void)join:(NSInteger)root1 root2:(NSInteger)root2;
#end
DisjointSet.m
#import "DisjointSet.h"
#implementation DisjointSet
- (id)initWithSize:(NSInteger)size
{
self = [super init];
if (self)
{
_array = [NSMutableArray arrayWithCapacity:size];
for (NSInteger i = 0; i < size; i++)
{
[_array addObject:[NSNumber numberWithInteger:-1]];
}
}
return self;
}
- (NSInteger)find:(NSInteger)item
{
while ([[_array objectAtIndex:item] integerValue] >= 0)
{
item = [[_array objectAtIndex:item] integerValue];
}
return item;
}
- (void)join:(NSInteger)root1 root2:(NSInteger)root2
{
if (root1 == root2) return;
NSInteger data1 = [[_array objectAtIndex:root1] integerValue];
NSInteger data2 = [[_array objectAtIndex:root2] integerValue];
if (data2 < data1)
{
[_array setObject:[NSNumber numberWithInteger:data2 + data1] atIndexedSubscript:root2];
[_array setObject:[NSNumber numberWithInteger:root2] atIndexedSubscript:root1];
}
else
{
[_array setObject:[NSNumber numberWithInteger:data1 + data2] atIndexedSubscript:root1];
[_array setObject:[NSNumber numberWithInteger:root1] atIndexedSubscript:root2];
}
}
#end
For the find operation, there is no need to store the path (separately from your _array) or to use recursion. Either of those approaches requires O(P) storage (P = path length). Instead, you can just traverse the path twice. The first time, you find the root. The second time, you set all of the children to point to the root. This takes O(P) time and O(1) storage.
- (NSInteger)findItem:(NSInteger)item {
NSInteger root;
NSNumber *rootObject = nil;
for (NSInteger i = item; !rootObject; ) {
NSInteger parent = [_array[i] integerValue];
if (parent < 0) {
root = i;
rootObject = #(i);
}
i = parent;
}
for (NSInteger i = item; i != root; ) {
NSInteger parent = [_array[i] integerValue];
_array[i] = rootObject;
i = parent;
}
return root;
}
For the merge operation, you want to store each root's rank (which is an upper bound on its depth), not each root's descendant count. Storing each root's rank allows you to merge the shorter tree into the taller tree, which guarantees O(log N) time for find operations. The rank only increases when the trees to be merged have equal rank.
- (void)joinItem:(NSInteger)a item:(NSInteger)b {
NSInteger aRank = -[_array[a] integerValue];
NSInteger bRank = -[_array[b] integerValue];
if (aRank < bRank) {
NSInteger t = a;
a = b;
b = t;
} else if (aRank == bRank) {
_array[a] = #(-aRank - 1);
}
_array[b] = #(a);
}
You definitely should implement path compression using recursion. I would not even think about trying to do it non-recursively.
Implementing the disjoin-set datastructure should be very easy, and can be done in few lines. Its very, very easy to translate it from the pseudocode to any programming language. You can find the pseudocode on Wikipedia. (Unfortunately, I can't read Objective-C, so I cannot really judge wether your code is correct or not).
Yes. To implement highest ancestor compression without recursion you need to maintain your own list. Make one pass up the chain to get pointers to the sets that need their parent pointers changed and also to learn the root. Then make a second pass to update the necessary parent pointers.
The recursive method is doing the same thing. The first pass is the "winding up" of the recursion, which stores the sets needing parent pointer updates on the program stack. The second pass is in reverse as the recursion unwinds.
I differ with those who say the recursive method is always best. In a reasonable number systems (especially embedded ones), the program stack is of limited size. There are cases where many unions are performed in a row before a find. In such cases, the parent chain can be O(n) in size for n elements. Here collapsing by recursion can blow out the stack. Since you are working in Objective C, this may be iOS. I do not know the default stack size there, but if you use recursion it's worth looking at. It might be smaller than you think. This article implies 512K for secondary threads and 1Mb for the main thread.
Iterative, constant space alternative
Actually the main reason I'm writing is to point out that you still get O(log^* n) for n ammortized operations -- just a shade less efficient than collapsing, and still effectively O(1) -- if you only do factor-of-two compression: in the find operation, change parent pointers so that they point to the grandparents instead instead of the root. This can be done with iteration in constant storage. This lecture at Princeton talks about this algorithm and implements it in a loop with 5 lines of C. See slide 29.

Recursivity in methods, Algorithm and NSValue issue

-(BOOL)isInArray:(CGPoint)point{
if ([valid count]==0) {
return NO;
}
for (NSValue *value in valid) {
CGPoint er=[value CGPointValue];
if( CGPointEqualToPoint(point,er)) return NO;
}
return YES;
}
-(void)check:(CGPoint)next{
if (!next.y==0) {
int ics=(int) next.x;
int igrec=(int)next.y;
if (mat[ics][igrec]==mat[ics-1][igrec]){
if (![self isInArray:next]) {
[valid addObject:[NSValue valueWithCGPoint:next]];
NSLog(#"valid y!=0 : %#",valid);
[self check:CGPointMake(ics-1, igrec)];
}
}
}
}
y are columns , x are rows , mat is a C matrix
what i'm trying to do here is this:i get a point, next, in a matrix,mat (i'll use struct but for the scope of testing i use CGPoint..it's basicaly the same thing), and for that point i check if it's on the first row and if it's not i check if the value is equal to the value of the row above .If it is, i add the coord of the point into an array and move to the value above (recursively). I have ifs for left,right, and below too...but the idea is the same.
My issues:
For some reason it doesn't work as it should, even with a mat full of 1 values
The NSMutableArray i use to store the points is always null (note that the NSLog gets called so it should've added an object already)
Does recursivity work with methods?
If you have a better idea how to do this...i'm listening
The "valid" array is nil because you haven't allocated it. (You can send an addObject: message, or any message, to a nil pointer--it just doesn't do anything.) Make sure you've got
valid = [[NSMutableArray alloc] init];
somewhere before you're calling this code.
Also, "!next.y==0" is questionable. It might turn out to be identical to "next.y != 0" even if ! has a higher precedence that ==, but I wouldn't guarantee it. That's all I spot for now, without really grokking what this code is trying to do..
Oh, another quick note: Instead of writing your own isInArray, just use NSArray's containsObject:. The inner part of the check method (second indent) is then
NSValue* pointVal = [NSValue valueWithCGPoint:next];
if ( ![valid containsObject:pointVal] )
{
[valid addObject:next];
[self check:CGPointMake(ics-1, igrec)];
}
Or, if you don't care about the order of the points in the valid array you could use an NSMutableSet instead and not worry about checking if the point is already in the collection.
And yes, recursion in methods is fine. They're really the same as C functions, just with a couple hidden arguments (the self pointer and the method name) and called through a dispatch function.

Objective-C switch using objects?

I'm doing some Objective-C programming that involves parsing an NSXmlDocument and populating an objects properties from the result.
First version looked like this:
if([elementName compare:#"companyName"] == 0)
[character setCorporationName:currentElementText];
else if([elementName compare:#"corporationID"] == 0)
[character setCorporationID:currentElementText];
else if([elementName compare:#"name"] == 0)
...
But I don't like the if-else-if-else pattern this produces. Looking at the switch statement I see that i can only handle ints, chars etc and not objects... so is there a better implementation pattern I'm not aware of?
BTW I did actually come up with a better solution for setting the object's properties, but I want to know specifically about the if-else vs switch pattern in Objective-C
You should take advantage of Key-Value Coding:
[character setValue:currentElementText forKey:elementName];
If the data is untrusted, you might want to check that the key is valid:
if (![validKeysCollection containsObject:elementName])
// Exception or error
I hope you'll all forgive me for going out on a limb here, but I would like to address the more general question of parsing XML documents in Cocoa without the need of if-else statements. The question as originally stated assigns the current element text to an instance variable of the character object. As jmah pointed out, this can be solved using key-value coding. However, in a more complex XML document this might not be possible. Consider for example the following.
<xmlroot>
<corporationID>
<stockSymbol>EXAM</stockSymbol>
<uuid>31337</uuid>
</corporationID>
<companyName>Example Inc.</companyName>
</xmlroot>
There are multiple approaches to dealing with this. Off of the top of my head, I can think of two using NSXMLDocument. The first uses NSXMLElement. It is fairly straightforward and does not involve the if-else issue at all. You simply get the root element and go through its named elements one by one.
NSXMLElement* root = [xmlDocument rootElement];
// Assuming that we only have one of each element.
[character setCorperationName:[[[root elementsForName:#"companyName"] objectAtIndex:0] stringValue]];
NSXMLElement* corperationId = [root elementsForName:#"corporationID"];
[character setCorperationStockSymbol:[[[corperationId elementsForName:#"stockSymbol"] objectAtIndex:0] stringValue]];
[character setCorperationUUID:[[[corperationId elementsForName:#"uuid"] objectAtIndex:0] stringValue]];
The next one uses the more general NSXMLNode, walks through the tree, and directly uses the if-else structure.
// The first line is the same as the last example, because NSXMLElement inherits from NSXMLNode
NSXMLNode* aNode = [xmlDocument rootElement];
while(aNode = [aNode nextNode]){
if([[aNode name] isEqualToString:#"companyName"]){
[character setCorperationName:[aNode stringValue]];
}else if([[aNode name] isEqualToString:#"corporationID"]){
NSXMLNode* correctParent = aNode;
while((aNode = [aNode nextNode]) == nil && [aNode parent != correctParent){
if([[aNode name] isEqualToString:#"stockSymbol"]){
[character setCorperationStockSymbol:[aNode stringValue]];
}else if([[aNode name] isEqualToString:#"uuid"]){
[character setCorperationUUID:[aNode stringValue]];
}
}
}
}
This is a good candidate for eliminating the if-else structure, but like the original problem, we can't simply use switch-case here. However, we can still eliminate if-else by using performSelector. The first step is to define the a method for each element.
- (NSNode*)parse_companyName:(NSNode*)aNode
{
[character setCorperationName:[aNode stringValue]];
return aNode;
}
- (NSNode*)parse_corporationID:(NSNode*)aNode
{
NSXMLNode* correctParent = aNode;
while((aNode = [aNode nextNode]) == nil && [aNode parent != correctParent){
[self invokeMethodForNode:aNode prefix:#"parse_corporationID_"];
}
return [aNode previousNode];
}
- (NSNode*)parse_corporationID_stockSymbol:(NSNode*)aNode
{
[character setCorperationStockSymbol:[aNode stringValue]];
return aNode;
}
- (NSNode*)parse_corporationID_uuid:(NSNode*)aNode
{
[character setCorperationUUID:[aNode stringValue]];
return aNode;
}
The magic happens in the invokeMethodForNode:prefix: method. We generate the selector based on the name of the element, and perform that selector with aNode as the only parameter. Presto bango, we've eliminated the need for an if-else statement. Here's the code for that method.
- (NSNode*)invokeMethodForNode:(NSNode*)aNode prefix:(NSString*)aPrefix
{
NSNode* ret = nil;
NSString* methodName = [NSString stringWithFormat:#"%#%#:", prefix, [aNode name]];
SEL selector = NSSelectorFromString(methodName);
if([self respondsToSelector:selector])
ret = [self performSelector:selector withObject:aNode];
return ret;
}
Now, instead of our larger if-else statement (the one that differentiated between companyName and corporationID), we can simply write one line of code
NSXMLNode* aNode = [xmlDocument rootElement];
while(aNode = [aNode nextNode]){
aNode = [self invokeMethodForNode:aNode prefix:#"parse_"];
}
Now I apologize if I got any of this wrong, it's been a while since I've written anything with NSXMLDocument, it's late at night and I didn't actually test this code. So if you see anything wrong, please leave a comment or edit this answer.
However, I believe I have just shown how properly-named selectors can be used in Cocoa to completely eliminate if-else statements in cases like this. There are a few gotchas and corner cases. The performSelector: family of methods only takes 0, 1, or 2 argument methods whose arguments and return types are objects, so if the types of the arguments and return type are not objects, or if there are more than two arguments, then you would have to use an NSInvocation to invoke it. You have to make sure that the method names you generate aren't going to call other methods, especially if the target of the call is another object, and this particular method naming scheme won't work on elements with non-alphanumeric characters. You could get around that by escaping the XML element names in your method names somehow, or by building an NSDictionary using the method names as the keys and the selectors as the values. This can get pretty memory intensive and end up taking a longer time. performSelector dispatch like I described is pretty fast. For very large if-else statements, this method may even be faster than an if-else statement.
If you want to use as little code as possible, and your element names and setters are all named so that if elementName is #"foo" then setter is setFoo:, you could do something like:
SEL selector = NSSelectorFromString([NSString stringWithFormat:#"set%#:", [elementName capitalizedString]]);
[character performSelector:selector withObject:currentElementText];
or possibly even:
[character setValue:currentElementText forKey:elementName]; // KVC-style
Though these will of course be a bit slower than using a bunch of if statements.
[Edit: The second option was already mentioned by someone; oops!]
Dare I suggest using a macro?
#define TEST( _name, _method ) \
if ([elementName isEqualToString:# _name] ) \
[character _method:currentElementText]; else
#define ENDTEST { /* empty */ }
TEST( "companyName", setCorporationName )
TEST( "setCorporationID", setCorporationID )
TEST( "name", setName )
:
:
ENDTEST
One way I've done this with NSStrings is by using an NSDictionary and enums. It may not be the most elegant, but I think it makes the code a little more readable. The following pseudocode is extracted from one of my projects:
typedef enum { UNKNOWNRESIDUE, DEOXYADENINE, DEOXYCYTOSINE, DEOXYGUANINE, DEOXYTHYMINE } SLSResidueType;
static NSDictionary *pdbResidueLookupTable;
...
if (pdbResidueLookupTable == nil)
{
pdbResidueLookupTable = [[NSDictionary alloc] initWithObjectsAndKeys:
[NSNumber numberWithInteger:DEOXYADENINE], #"DA",
[NSNumber numberWithInteger:DEOXYCYTOSINE], #"DC",
[NSNumber numberWithInteger:DEOXYGUANINE], #"DG",
[NSNumber numberWithInteger:DEOXYTHYMINE], #"DT",
nil];
}
SLSResidueType residueIdentifier = [[pdbResidueLookupTable objectForKey:residueType] intValue];
switch (residueIdentifier)
{
case DEOXYADENINE: do something; break;
case DEOXYCYTOSINE: do something; break;
case DEOXYGUANINE: do something; break;
case DEOXYTHYMINE: do something; break;
}
The if-else implementation you have is the right way to do this, since switch won't work with objects. Apart from maybe being a bit harder to read (which is subjective), there is no real downside in using if-else statements this way.
Although there's not necessarily a better way to do something like that for one time use, why use "compare" when you can use "isEqualToString"? That would seem to be more performant since the comparison would halt at the first non-matching character, rather than going through the whole thing to calculate a valid comparison result (though come to think of it the comparison might be clear at the same point) - also though it would look a little cleaner because that call returns a BOOL.
if([elementName isEqualToString:#"companyName"] )
[character setCorporationName:currentElementText];
else if([elementName isEqualToString:#"corporationID"] )
[character setCorporationID:currentElementText];
else if([elementName isEqualToString:#"name"] )
There is actually a fairly simple way to deal with cascading if-else statements in a language like Objective-C. Yes, you can use subclassing and overriding, creating a group of subclasses that implement the same method differently, invoking the correct implementation at runtime using a common message. This works well if you wish to choose one of a few implementations, but it can result in a needless proliferation of subclasses if you have many small, slightly different implementations like you tend to have in long if-else or switch statements.
Instead, factor out the body of each if/else-if clause into its own method, all in the same class. Name the messages that invoke them in a similar fashion. Now create an NSArray containing the selectors of those messages (obtained using #selector()). Coerce the string you were testing in the conditionals into a selector using NSSelectorFromString() (you may need to concatenate additional words or colons to it first depending on how you named those messages, and whether or not they take arguments). Now have self perform the selector using performSelector:.
This approach has the downside that it can clutter-up the class with many new messages, but it's probably better to clutter-up a single class than the entire class hierarchy with new subclasses.
Posting this as a response to Wevah's answer above -- I would've edited, but I don't have high enough reputation yet:
unfortunately the first method breaks for fields with more than one word in them -- like xPosition. capitalizedString will convert that to Xposition, which when combined with the format give you setXposition: . Definitely not what was wanted here. Here is what I'm using in my code:
NSString *capName = [elementName stringByReplacingCharactersInRange:NSMakeRange(0, 1) withString:[[elementName substringToIndex:1] uppercaseString]];
SEL selector = NSSelectorFromString([NSString stringWithFormat:#"set%#:", capName]);
Not as pretty as the first method, but it works.
I have come up with a solution that uses blocks to create a switch-like structure for objects. There it goes:
BOOL switch_object(id aObject, ...)
{
va_list args;
va_start(args, aObject);
id value = nil;
BOOL matchFound = NO;
while ( (value = va_arg(args,id)) )
{
void (^block)(void) = va_arg(args,id);
if ( [aObject isEqual:value] )
{
block();
matchFound = YES;
break;
}
}
va_end(args);
return matchFound;
}
As you can see, this is an oldschool C function with variable argument list. I pass the object to be tested in the first argument, followed by the case_value-case_block pairs. (Recall that Objective-C blocks are just objects.) The while loop keeps extracting these pairs until the object value is matched or there are no cases left (see notes below).
Usage:
NSString* str = #"stuff";
switch_object(str,
#"blah", ^{
NSLog(#"blah");
},
#"foobar", ^{
NSLog(#"foobar");
},
#"stuff", ^{
NSLog(#"stuff");
},
#"poing", ^{
NSLog(#"poing");
},
nil); // <-- sentinel
// will print "stuff"
Notes:
this is a first approximation without any error checking
the fact that the case handlers are blocks, requires additional care when it comes to visibility, scope and memory management of variables referenced from within
if you forget the sentinel, you are doomed :P
you can use the boolean return value to trigger a "default" case when none of the cases have been matched
The most common refactoring suggested for eliminating if-else or switch statements is introducing polymorphism (see http://www.refactoring.com/catalog/replaceConditionalWithPolymorphism.html). Eliminating such conditionals is most important when they are duplicated. In the case of XML parsing like your sample you are essentially moving the data to a more natural structure so that you won't have to duplicate the conditional elsewhere. In this case the if-else or switch statement is probably good enough.
In this case, I'm not sure if you can easily refactor the class to introduce polymorphism as Bradley suggests, since it's a Cocoa-native class. Instead, the Objective-C way to do it is to use a class category to add an elementNameCode method to NSSting:
typedef enum {
companyName = 0,
companyID,
...,
Unknown
} ElementCode;
#interface NSString (ElementNameCodeAdditions)
- (ElementCode)elementNameCode;
#end
#implementation NSString (ElementNameCodeAdditions)
- (ElementCode)elementNameCode {
if([self compare:#"companyName"]==0) {
return companyName;
} else if([self compare:#"companyID"]==0) {
return companyID;
} ... {
}
return Unknown;
}
#end
In your code, you could now use a switch on [elementName elementNameCode] (and gain the associated compiler warnings if you forget to test for one of the enum members etc.).
As Bradley points out, this may not be worth it if the logic is only used in one place.
What we've done in our projects where we need to so this sort of thing over and over, is to set up a static CFDictionary mapping the strings/objects to check against to a simple integer value. It leads to code that looks like this:
static CFDictionaryRef map = NULL;
int count = 3;
const void *keys[count] = { #"key1", #"key2", #"key3" };
const void *values[count] = { (uintptr_t)1, (uintptr_t)2, (uintptr_t)3 };
if (map == NULL)
map = CFDictionaryCreate(NULL,keys,values,count,&kCFTypeDictionaryKeyCallBacks,NULL);
switch((uintptr_t)CFDictionaryGetValue(map,[node name]))
{
case 1:
// do something
break;
case 2:
// do something else
break;
case 3:
// this other thing too
break;
}
If you're targeting Leopard only, you could use an NSMapTable instead of a CFDictionary.
Similar to Lvsti I am using blocks to perform a switching pattern on objects.
I wrote a very simple filter block based chain, that takes n filter blocks and performs each filter on the object.
Each filter can alter the object, but must return it. No matter what.
NSObject+Functional.h
#import <Foundation/Foundation.h>
typedef id(^FilterBlock)(id element, NSUInteger idx, BOOL *stop);
#interface NSObject (Functional)
-(id)processByPerformingFilterBlocks:(NSArray *)filterBlocks;
#end
NSObject+Functional.m
#implementation NSObject (Functional)
-(id)processByPerformingFilterBlocks:(NSArray *)filterBlocks
{
__block id blockSelf = self;
[filterBlocks enumerateObjectsUsingBlock:^( id (^block)(id,NSUInteger idx, BOOL*) , NSUInteger idx, BOOL *stop) {
blockSelf = block(blockSelf, idx, stop);
}];
return blockSelf;
}
#end
Now we can set up n FilterBlocks to test for the different cases.
FilterBlock caseYES = ^id(id element, NSUInteger idx, BOOL *breakAfter){
if ([element isEqualToString:#"YES"]) {
NSLog(#"You did it");
*breakAfter = YES;
}
return element;
};
FilterBlock caseNO = ^id(id element, NSUInteger idx, BOOL *breakAfter){
if ([element isEqualToString:#"NO"] ) {
NSLog(#"Nope");
*breakAfter = YES;
}
return element;
};
Now we stick those block we want to test as a filter chain in an array:
NSArray *filters = #[caseYES, caseNO];
and can perform it on an object
id obj1 = #"YES";
id obj2 = #"NO";
[obj1 processByPerformingFilterBlocks:filters];
[obj2 processByPerformingFilterBlocks:filters];
This approach can be used for switching but also for any (conditional) filter chain application, as the blocks can edit the element and pass it on.