Say I have an object called Person which has the property socialSecurityNumber, and this class overrides the isEqual: method to return true when the social security number properties are equal. And say I've put a bunch of instances of Person into an NSDictionary.
If I now instantiate a newPerson object which happens to have the same social security number as one already in the dictionary, and I do [myDictionary objectForKey:newPerson], will it use the isEqual: and return YES, or will it compare pointers and return NO?
I know I can write a simple test to find out, but I want to understand how exactly objectForKey: finds a match in a dictionary, and generally how consistent this is across Cocoa (i.e. does NSArray's indexofObject: work the same?)
NSDictionary works like a hashtable. So it uses both -hash and -isEqual: to find the object in the dictionary corresponding to the given key.
So to answer your question for NSDictionary, this uses isEqual: and not pointer comparison. But you also should implement hash in addition to isEqual: on your Person class for this to work.
From the NSDictionary Class Reference documentation:
A key-value pair within a dictionary is called an entry. Each entry consists of one object that represents the key and a second object that is that key’s value. Within a dictionary, the keys are unique. That is, no two keys in a single dictionary are equal (as determined by isEqual:).
From the isEqual: method documentation:
If two objects are equal, they must have the same hash value. This last point is particularly important if you define isEqual: in a subclass and intend to put instances of that subclass into a collection. Make sure you also define hash in your subclass.
This behavior is consistent across the various container classes in Cocoa. For example, from the NSArray's indexOfObject: method documentation:
Starting at index 0, each element of the array is sent an isEqual: message until a match is found or the end of the array is reached. This method passes the anObject parameter to each isEqual: message. Objects are considered equal if isEqual: (declared in the NSObject protocol) returns YES.
You should always read the documentation : as pointed out by the extracts quoted above, these kind of details are often explained in the "Discussion" or "Special Consideration" sections of the method documentation or in the "Overview" section of the class documentation itself.
how consistent this is across Cocoa (i.e. does NSArray's indexofObject: work the same?)
It is consistent and at the same time it isn't. What I mean is that there are two methods that could be used: isEqual and hash. You should not be too much concerned about which is used when. What you should instead focus on is to respect the NSObject protocol requirements and make sure that if two objects are equal according to isEqual they also have the same hash.
From the isEqual documentation in the NSObject Protocol Reference
If two objects are equal, they must have the same hash value. This
last point is particularly important if you define isEqual: in a
subclass and intend to put instances of that subclass into a
collection. Make sure you also define hash in your subclass.
Related
I'm trying to implement a class for use as a key in an NSDictionary. The docs say that in order to be used as a key the object needs to implement the NSCopying protocol, which I've done.
I'm seeing some very strange behaviour, where values seem to mysteriously become nil even though I can see the objects being stored correctly in the dictionary.
I've implemented copyWithZone: and isEqual: correctly as far as I can see but it's still not working.
What the documentation does not make clear is that to use an object as the key in a NSDictionary it must override BOTH the isEqual: and hash methods, as well as implementing NSCopying.
The contract for isEqual: and hash is that if isEqual: returns YES for 2 objects then their hash methods MUST return the same value. It's okay for 2 objects that are NOT equal to have the same hash but if they ARE equal then they MUST have the same hash.
Failing to correctly override hash will lead to all sorts of hard to debug issues when you try reading and writing from the dictionary.
The Apple guide for isEqual says:
Returns a Boolean value that indicates whether the receiver and a
given object are equal. (required)
This method defines what it means for instances to be equal. For
example, a container object might define two containers as equal if
their corresponding objects all respond YES to an isEqual: request.
See the NSData, NSDictionary, NSArray, and NSString class
specifications for examples of the use of this method.
If two objects are equal, they must have the same hash value. This
last point is particularly important if you define isEqual: in a
subclass and intend to put instances of that subclass into a
collection. Make sure you also define hash in your subclass.
So my question is if I want to compare two UIButtons or two UILabels (two UIViews) using isEqual, and beforehand I have checked if their classes are the same class and then call isEqual, what is getting checked? are the properties, values, action messages, target objects are getting checked?
Thanks
the isEqual: method of NSObject checks whether the hash of the two objects are equal. In practice, the hash is the address of the instance if it isn't overridden. However, on simple data container classes, isEqual is overridden, and, for example, the isEqual: method of NSString invokes isEqualToString: after checking that the object being compared to is an NSString instance. Same applies, as I've mentioned before, to NSData, NSNumber, NSDate, NSArray and NSDictionary. However, UIView (and all its parents) don't override isEqual: as there's no obvious way to decide whether two views are considered equal. You'd better compare another, more significant property of the views to be examined.
No, isEqual does a simple check for the memory address of pointers to see if they are the same object. You'd have to use some other method to check if same the buttons had the same titles but were two separate button instances.
Let's say variable A and B hold instances of managed objects in the same managed object context. I need to make sure that they are associated with the same "record" in the persistent store. The section on Faulting and Uniquing in the Core Data Programming Guide says that:
Core Data ensures that—in a given managed object context—an entry in a persistent store is associated with only one managed object.
From this, it seems that a pointer comparison is sufficient for my purpose. Or does it ever make sense to use isEqual: to compare managed objects in the same context?
Use == to determine if two pointers point to the same object. Use -isEqual to determine if two objects are "equal", where the notion of equality depends on the objects being compared. -isEqual: normally compares the values returned by the -hash method. I wrote previously that it seemed possible that -isEqual: might return true if two managed objects contain the same values. That's clearly not right. There are some caveats in the docs about making sure that the hash value for a mutable object doesn't change while it's in a collection, and that knowing whether a given object is in a collection can be difficult. It seems certain that the hash for a managed object doesn't depend on the data that that object contains, and much more likely that it's connected to something immutable about the object; the object's -objectID value seems a likely candidate.
Given all that, I'm changing my opinion ;-). Each record is only represented once in a given context, so == is probably safe, but -isEqual: seems to better express your intention.
Pointer comparison is fine for objects retrieved from a single managed object context, the documentation on uniquing you quote promises as much.
ObjectID should be used for testing object equality across managed object contexts.
isEqual does not do attribute tests, because it is documented to not fault the object. In fact, looking at the disassembled function it is definitely just a pointer compare.
So the semantics of the equality test for managed objects are simply "points to the same object (record) in the managed object context" and will compare false for objects in different contexts.
Warning: Since NSManagedObject isEqual compares objectIDs, a comparison can fail if one instance is using the temporary objectID and the other instance is using the permanent objectID.
Background: When an NSManagedObject is created, it is assigned a temporary objectID. It is converted into a permanent objectID when the NSManagedObject is actually persisted into the store. You can see the difference if you print the objectID:
x-coredata:///MyEntity/t03BF9735-A005-4ED9-96BA-462BD65FA25F118 (temporary ID)
x-coredata://EB8922D9-DC06-4256-A21B-DFFD47D7E6DA/MyEntity/p3 (permanent ID)
When an objectID is converted to permanent, instances of the NSManagedObject in other threads and collections are not updated. So if you put an NSManagedObject into an NSArray when it has a temporary objectID, using methods like containsObject will fail if you try to find the object with the permanent objectID. Remember containsObject uses isEqual.
Finally, a couple of useful methods are NSManagedObjectID isTemporaryID and NSManagedObjectContext obtainPermanentIDsForObjects:error:.
I tried to figure out this code referencing: Cocoa: Dictionary with enum keys?
+ (NSValue*)valueWithReference:(id)target
{
return [NSValue valueWithBytes:&target objCType:#encode(id*)];
}
And,
[table setObject:anObject forKey:[NSValue valueWithReference:keyObject]];
But it feels something not good. Any recommendations?
You're absolutely right it's not good.
For one, you're encoding the wrong type (it should be #encode(id), not #encode(id*)), but in most cases this shouldn't cause a big problem.
The bigger problem is that this completely ignores memory management. The object won't be retained or copied. If some other code releases it, it could just disappear, and then your dictionary key will be a boxed pointer to garbage or even a completely different object. This is basically the world's most advanced dangling pointer.
You have two good options:
You could either add NSCopying to the class or create a copyable subclass.
This option will only work for objects that can meaningfully be copied. This is most classes, but not necessarily all (e.g. it might be bad to have multiple objects representing the same input stream)
Implementing copying can be a pain even for classes where it makes sense — not difficult, per se, but kind of annoying
You could instead create the dictionary with the CFDictionary API. Since Core Foundation types don't have a generic copy function, CFDictionary just retains its keys by default (though you can customize its behavior however you like). But CFDictionary is also toll-free bridged with NSDictionary, which means that you can just cast a CFDictionaryRef to an NSDictionary* (or NSMutableDictionary*) and then treat it like any other NSDictionary.
This means that the object you're using as a key must not change (at least not in a way that affects its hash value) while it's in the dictionary — ensuring this doesn't happen is why NSDictionary normally wants to copy its keys
For the later reference.
Now I know that there are some more options.
Override methods in NSCopying protocol, and return the self instead of copying itself. (you should retain it if you are not using ARC) Also you ensure the object to always return same value for -hash method.
Make a copyable simple container class holds strong reference to the original key object. The container is copyable but, it just passes original key when it being copied. Override equality/hash methods also to match semantics. Even just an instance of NSArray contains only the key object works well.
Method #1 looks pretty safe but actually I'm not sure that's safe. Because I don't know internal behavior of NSDictionary. So I usually use #2 way which is completely safe in Cocoa convention.
Update
Now we Have NSHashTable and NSMapTable also in iOS since version 6.0.
I'm not 100% sure about the correctness of this solution, but I'm posting it just in case.
If you do not want to use a CFDictionary, maybe you could use this simple category:
#implementation NSMutableDictionary(NonCopyableKeys)
- (void)setObject:(id)anObject forNonCopyableKey:(id)aKey {
[self setObject:anObject forKey:[NSValue valueWithPointer:aKey]];
}
- (id)objectForNonCopyableKey:(id)aKey {
return [self objectForKey:[NSValue valueWithPointer:aKey]];
}
- (void)removeObjectForNonCopyableKey:(id)aKey {
[self removeObjectForKey:[NSValue valueWithPointer:aKey]];
}
#end
This is a generalization of a similar method I saw online (can't find the original source) for using an NSMutableDictionary that can store objects with UITouch keys.
The same restriction as in Chuck's answer applies: the object you're using as a key must not change in a way that affects its hash value and must not be freed while it's in the dictionary .
Also make sure you don't mix -(void)setObject:(id)anObject forNonCopyableKey:(id)aKey and - (id)objectForKey:(id)aKey methods, as it won't work (the latter will return nil).
This seems to work fine, but there might be some unwanted side effects that I am not thinking of. If anybody finds out that this solution has any additional problems or caveats, please comment.
Note: The following SO questions are related, but neither they nor the linked resources seem to fully answer my questions, particularly in relation to implementing equality tests for collections of objects.
Best practices for overriding -isEqual: and -hash
Techniques for implementing -hash on mutable Cocoa objects
Background
NSObject provides default implementations of -hash (which returns the address of the instance, like (NSUInteger)self) and -isEqual: (which returns NO unless the addresses of the receiver and the parameter are identical). These methods are designed to be overridden as necessary, but the documentation makes it clear that you should provide both or neither. Further, if -isEqual: returns YES for two objects, then the result of -hash for those objects must be the same. If not, problems can ensue when objects that should be the same — such as two string instances for which -compare: returns NSOrderedSame — are added to a Cocoa collection or compared directly.
Context
I develop CHDataStructures.framework, an open-source library of Objective-C data structures. I have implemented a number of collections, and am currently refining and enhancing their functionality. One of the features I want to add is the ability to compare collections for equality with another.
Rather than comparing only memory addresses, these comparisons should consider the objects present in the two collections (including ordering, if applicable). This approach has quite a precedent in Cocoa, and generally uses a separate method, including the following:
-[NSArray isEqualToArray:]
-[NSDate isEqualToDate:]
-[NSDictionary isEqualToDictionary:]
-[NSNumber isEqualToNumber:]
-[NSSet isEqualToSet:]
-[NSString isEqualToString:]
-[NSValue isEqualToValue:]
I want to make my custom collections robust to tests of equality, so they may safely (and predictably) be added to other collections, and allow others (like an NSSet) to determine whether two collections are equal/equivalent/duplicates.
Problems
An -isEqualTo...: method works great on its own, but classes which define these methods usually also override -isEqual: to invoke [self isEqualTo...:] if the parameter is of the same class (or perhaps subclass) as the receiver, or [super isEqual:] otherwise. This means the class must also define -hash such that it will return the same value for disparate instances that have the same contents.
In addition, Apple's documentation for -hash stipulates the following: (emphasis mine)
"If a mutable object is added to a collection that uses hash values to determine the object's position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object's internal state information or you must make sure the object's internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)"
Edit: I definitely understand why this is necessary and totally agree with the reasoning — I mentioned it here to provide additional context, and skirted the topic of why it's the case for the sake of brevity.
All of my collections are mutable, and the hash will have to consider at least some of the contents, so the only option here is to consider it a programming error to mutate a collection stored in another collection. (My collections all adopt NSCopying, so collections like NSDictionary can successfully make a copy to use as a key, etc.)
It makes sense for me to implement -isEqual: and -hash, since (for example) an indirect user of one of my classes may not know the specific -isEqualTo...: method to call, or even care whether two objects are instances of the same class. They should be able to call -isEqual: or -hash on any variable of type id and get the expected result.
Unlike -isEqual: (which has access to two instances being compared), -hash must return a result "blindly", with access only to the data within a particular instance. Since it can't know what the hash is being used for, the result must be consistent for all possible instances that should be considered equal/identical, and must always agree with -isEqual:. (Edit: This has been debunked by the answers below, and it certainly makes life easier.) Further, writing good hash functions is non-trivial — guaranteeing uniqueness is a challenge, especially when you only have an NSUInteger (32/64 bits) in which to represent it.
Questions
Are there best practices when implementing equality comparisons -hash for collections?
Are there any peculiarities to plan for in Objective-C and Cocoa-esque collections?
Are there any good approaches for unit testing -hash with a reasonable degree of confidence?
Any suggestions on implementing -hash to agree with -isEqual: for collections containing elements of arbitrary types? What pitfalls should I know about? (Edit: Not as problematic as I first thought — as #kperryua points out, "equal -hash values do not imply -isEqual:".)
Edit: I should have clarified that I'm not confused about how to implement -isEqual: or -isEqualTo...: for collections, that's straightforward. I think my confusion stemmed mainly from (mistakenly) thinking that -hash MUST return a different value if -isEqual: returns NO. Having done cryptography in the past, I was thinking that hashes for different values MUST be different. However, the answers below made me realize that a "good" hash function is really about minimizing bucket collisions and chaining for collections that use -hash. While unique hashes are preferable, they are not a strict requirement.
I think trying to come up with some generally useful hash function that will generate unique hash values for collections is an exercise in futility. U62's suggestion of combining the hashes of all the contents will not scale well, as it makes the hash function O(n). Hash functions should really be O(1) to ensure good performance, otherwise the purpose of the hash is defeated. (Consider the common Cocoa construct of plists, which are dictionaries containing arrays and other dictionaries, potentially ad nauseum. Attempting to take the hash of the top-level dictionary of a large plist would be excruciatingly slow if the collections' hash functions were O(n).)
My suggestion would be not to worry a great deal about a collection's hash. As you stated, -isEqual: implies equal -hash values. On the other hand, equal -hash values do not imply -isEqual:. That fact gives you a lot of leeway to create a simple hash.
If you're really worried about collisions though (and you have proof in concrete measurements of real-world situations that confirm it is something to be worried about), you could still follow U62's advice to some degree. For example, you could take the hash of, say, the first and/or last element in the collection, and combine that with, say, the -count of the collection. That be enough to provide a decent hash.
I hope that answers at least one of your questions.
As for No. 1: Implementing -isEqual: is pretty cut and dry. You enumerate the contents, and check isEqual: on each of the elements.
There is one thing to be careful of that may affect what you decide to do for your collections' -hash functions. Clients of your collections must also understand the rules governing -isEqual: and -hash. If you use the contents' -hash in your collection's -hash, your collection will break if the contents' isEqual: and -hash don't agree. It's the client's fault, of course, but that's another argument against basing your -hash off of the collection's contents.
No. 2 is kind of vague. Not sure what you have in mind there.
Two collections should be considered equal if they contain the same elements, and further if the collections are ordered, that the elements are in the same order.
On the subject of hashes for collections, it should be enough to combine the hashes of the elements in some way (XOR them or modulo add them). Note that while the rules state that two objects that are equal according to IsEqual need to return the same hash, the opposite does not hold : Although uniqueness of hashes is desireable, it is not necessary for correctness of the solution. Thus an ordered collection need not take account of the order of the elements.
The excerpt from the Apple documentation is a necessary restriction by the way. An object could not maintain the same hash value under mutation while also ensuring that objects with the same value have the same hash. That applies for the simplest of objects as well as collections. Of course it only usually matters that an object's hash changes when it is inside a container that uses the hash to organise it's elements. The upshot of all this is that mutable collections shouldn't mutate when placed inside another container, but then neither should any object that has a true hash function.
I have done some investigation into the NSArray and NSMutableArray default hash implementation and (unless I have misunderstood something) it seams like Apple do not follow thier own rules:
If a mutable object is added to a collection that uses hash values to
determine the object's position in the collection, the value returned
by the hash method of the object must not change while the object is
in the collection. Therefore, either the hash method must not rely on
any of the object's internal state information or you must make sure
the object's internal state information does not change while the
object is in the collection. Thus, for example, a mutable dictionary
can be put in a hash table but you must not change it while it is in
there. (Note that it can be difficult to know whether or not a given
object is in a collection.)
Here is my test code
NSMutableArray* myMutableArray = [NSMutableArray arrayWithObjects:#"a", #"b", #"c", nil];
NSMutableArray* containerForMutableArray = [NSMutableArray arrayWithObject:myMutableArray];
NSUInteger hashBeforeMutation = [[containerForMutableArray objectAtIndex:0] hash];
[[containerForMutableArray objectAtIndex:0] removeObjectAtIndex:1];
NSUInteger hashAfterMutation = [[containerForMutableArray objectAtIndex:0] hash];
NSLog(#"Hash Before: %d", hashBeforeMutation);
NSLog(#"Hash After : %d", hashAfterMutation);
The output is:
Hash Before: 3
Hash After : 2
So it seams like the default implementation for the Hash method on both NSArray and NSMutableArray is the count of the array and it dosn't care if its inside a collection or not.