Why don't NSSet/NSMutableSet/NSCountedSet force immutable objects as entries? - objective-c

NSDictionary keys are id<NSCopying> but the value for a set is just id, and the docs indicate their values are retained. According to the Set Fundamentals of the Collection Programming Topics docs:
You can, however, modify individual objects themselves (if they support modification).
If you modify an object, this could affect the hashvalue of the object, which would affect lookups. I assumed that an NSSet is a fast lookup?
Here's an example that shows how things break if you mutate objects:
NSMutableString *str = [NSMutableString stringWithString: #"AWESOME"];
NSCountedSet *countedSet = [[NSCountedSet alloc] init];
[countedSet addObject: str];
[countedSet addObject: str];
NSLog(#"%#", #([countedSet countForObject: #"AWESOME"]));
[str appendString: #" NOT AWESOME"];
NSLog(#"%#", #([countedSet countForObject: #"AWESOME NOT AWESOME"]));
NSLog(#"%#", #([countedSet countForObject: #"AWESOME"]));
NSLog(#"%#", #([countedSet countForObject: str]));
for(NSString *s in countedSet) {
NSLog(#"%# - %#", str, #([countedSet countForObject: s]));
}
NSSet *set = [NSSet setWithArray: #[ str ]];
NSLog(#"Set Contains string, %#", #([set containsObject: str]));
[str appendString: #"asdf"];
NSLog(#"Set Contains string, %#", #([set containsObject: str]));
NSLog(#"%#", set);
And output with my interpretation:
[64844:303] 2 // Count is 2
[64844:303] 0 // Count should be 2 - if it looks for the literal string
[64844:303] 0 // Count should be 0, but can't find original object either
[64844:303] 0 // Count should be 2 - asking for actual object that's in there
[64844:303] AWESOME NOT AWESOME - 0 // Should be 2 - asking for actual object that it just retrieved
[64844:303] Set Contains string, 1 // Correct, pre-mutation
[64844:303] Set Contains string, 0 // Should be true, object is in there
[65070:303] {(
"AWESOME NOT AWESOMEasdf" // see? It's in there
)}
My take:
The set likely buckets based on hash value, when the hash is changed out behind the set, it doesn't know what to do and lookups are broken. The documentation is lacking in this area.
My question restated:
Docs say you can mutate objects, which is not intuitive.
Mutating objects breaks sets.
WTF?

That line from the docs is confusing. However, note that three paragraphs down it goes on to say:
If mutable objects are stored in a set, either the hash method of the
objects shouldn’t depend on the internal state of the mutable objects
or the mutable objects shouldn’t be modified while they’re in the set.
For example, a mutable dictionary can be put in a set, but you must
not change it while it is in there. (Note that it can be difficult to
know whether or not a given object is in a collection).
What your code is demonstrating is a known property of the hash-based collection classes. It can affect dictionaries, too, if a key object is implemented such that copying returns the original, which is inherently mutable.
There's no real way to test if an object is mutable. So, it can't force immutability.
Also, as alluded to in the quote above, it's possible to make a mutable class whose hash and equality are not affected by mutations.
Finally, it would too severely limit the utility of those collection classes if they could only be used with copyable classes and made copies of the elements (like dictionaries make copies of their keys). The collections are used to represent relationships, among other things, and it wouldn't do if you tried to establish a relationship between objects but instead established a relationship to a separate copy.

Since the only reliable way of ensuring an object's immutability in Objective-C is to make a copy, Cocoa designers had two choices:
Make NSSet copy the objects - That would be safe, bit it would severely restrict the use of NSSet due to increased memory usage.
Use retained objects - That would keep memory usage to a bare minimum, but it would give the users a way to shoot themselves in a foot by mutating an object inside NSSet.
Designers picked the second approach over the first one, because it fixes a danger that could be avoided by proper coding technique. In contrast, selecting the first approach would be "binding" on everybody, in the sense that inserting a new object would always make a copy.
Currently, users have a choice of inserting copies of objects that they create manually, thus emulating the first approach. However, an implementation that forces a copy cannot emulate an implementation that retains objects, making it a less flexible choice.

Related

Converting NSArray to NSSet, custom class instances transfer inconsistently

Ran into a interesting little problem. I was writing a method to filter an array to the unique objects:
- (NSArray*)distinctObjectsByAddress {
NSSet* uniqueSet = [NSSet setWithArray:self];
NSArray* retArray = [uniqueSet allObjects];
return retArray;
}
and wrote a unit test to check:
- (void)testDistinctObjectsByAddress5 {
Person* adam1 = [[Person alloc] initWithFirstName:#"adam" lastName:#"adam" andParent:nil];
Person* adam2 = [[Person alloc] initWithFirstName:#"adam" lastName:#"adam" andParent:nil];
testPersonArray = [NSArray arrayWithObjects:adam1,adam2, nil];
NSArray* checkArray = [testPersonArray distinctObjectsByAddress];
STAssertEquals([checkArray count], [testPersonArray count], #"Array %# counts should match %# %#",checkArray,adam1,adam2);
}
Pretty simple. The interesting part is that about 80-90% of the time the test passes and every so often it fails because the distinctObjectsByAddress method only returns one object. I've been able to trace it to the [NSSet setWithArray:self] call but I've also been able to verify that the two person objects are two different objects (at least they have different address). I'm assuming that setWithArray: is just doing a basic address compare but I don't understand why it is sometimes producing two objects like it should and sometimes producing only one.
Something I just tried was changing adam2 so that the first and last name were not exactly the same as adam1. This seems to fix the error. Does this point to some sort of compiler optimization when the objects are logically the same?
I'm assuming that setWithArray is just doing a basic address compare
That's incorrect. NSSet uses the -isEqual: and -hash methods on the objects that are added to it. It depends on how those are implemented in Person or its superclasses.
If [person1 isEqual:person2] then you would expect the set to contain one object. If not, then the set should contain two objects.
My guess is that Person does not follow the rules in its -isEqual: and -hash methods. Most likely, the two objects are equal, but their hashes are not equal like they should be. (Except for the 10-20% of the time that you're getting lucky.)
Does this point to some sort of compiler optimization when the objects are logically the same?
No, there is no compiler optimization that would merge the two objects into one.
Most likely you did not implement hash for Person, and sometimes the identical Person object hashes into two different buckets.

NSMutableArray vs NSArray which is better

This is a bit of a silly question, but if I want to add an object to an array I can do it with both NSMutableArray and NSArray, which should I use?
NSMutableArray * array1;
[array1 addObject:obj];
NSArray * array2;
array2 = [array2 arrayByAddingObject:obj];
Use NSMutableArray, that is what it is there for. If I was looking at code and I saw NSArray I would expect it's collection to stay constant forever, whereas if I see NSMuteableArray I know that the collection is destined to change.
It might not sound like much right now, but as your project grows and as you spend more time on it you will see the value of this eventually.
NSMutableArray is not threadsafe, while NSArray is. This could be a huge problem if you're multithreading.
NSMutableArray and NSArray both are build on CFArray, performance/complexity should be same. The access time for a value in the array is guaranteed to be at
worst O(lg N) for any implementation, current and future, but will
often be O(1) (constant time). Linear search operations similarly
have a worst case complexity of O(N*lg N), though typically the
bounds will be tighter, and so on. Insertion or deletion operations
will typically be linear in the number of values in the array, but
may be O(N*lg N) clearly in the worst case in some implementations.
When deciding which is best to use:
NSMutableArray is primarily used for when you are building collections and you want to modify them. Think of it as dynamic.
NSArray is used for read only inform and either:
used to populate an NSMutableArray, to perform modifications
used to temporarily store data that is not meant to be edited
What you are actually doing here:
NSArray * array2;
array2 = [array2 arrayByAddingObject:obj];
is you are creating a new NSArray and changing the pointer to the location of the new array you created.
You are leaking memory this way, because it is not cleaning up the old Array before you add a new object.
if you still want to do this you will need to clean up like the following:
NSArray *oldArray;
NSArray *newArray;
newArray = [oldArray arrayByAddingObject:obj];
[oldArray release];
But the best practice is to do the following:
NSMutableArray *mutableArray;
// Initialisation etc
[mutableArray addObject:obj];
An NSArray object manages an immutable array—that is, after you have created the array, you cannot add, remove, or replace objects. You can, however, modify individual elements themselves (if they support modification). The mutability of the collection does not affect the mutability of the objects inside the collection. You should use an immutable array if the array rarely changes, or changes wholesale.
An NSMutableArray object manages a mutable array, which allows the addition and deletion of entries, allocating memory as needed. For example, given an NSMutableArray object that contains just a single dog object, you can add another dog, or a cat, or any other object. You can also, as with an NSArray object, change the dog’s name—and in general, anything that you can do with an NSArray object you can do with an NSMutableArray object. You should use a mutable array if the array changes incrementally or is very large—as large collections take more time to initialize.
Even the Q and the answer are very old, someone has to correct it.
What does "better" mean? Better what? Your Q leaks of information what the problem is and it is highly opinion-based. However, it is not closed.
If you are talking about performance, you can measure it yourself. But remember Donald Knuth: "Premature optimization is the root of all evil".
If I take your Q seriously, "better" can mean runtime performance, memory footprint, or architecture. For the first two topics it is easy to check yourself. So no answer is needed.
On an architectural point of view, things become more complicated.
First of all I have to mention, that having an instance of NSArray does not mean, that it is immutable. This is, because in Cocoa the mutable variants of collections are subclasses of the immutable variants. Therefore an instance of NSMutableArray is an instance of NSArray, but obviously mutable.
One can say that this was no good idea, especially when thinking about Barbara and Jeanette and there is a relation to the circle-ellipse problem, which is not easy to solve. However, it is as it is.
So only the docs can give you the information, whether a returned instance is immutable or not. Or you do a runtime check. For this reason, some people always do a -copy on every mutable collection.
However, mutability is another root of all evil. Therefore: If it is possible, always create an instance of NSArray as final result. Write that in your docs, if you return that instance from a method (esp. getter) or not, so everyone can rely on immutability or not. This prevents unexpected changes "behind the scene". This is important, not 0.000000000003 sec runtime or 130 bytes of memory.
This test gives the best answer:
Method 1:
NSTimeInterval start = [NSDate timeIntervalSinceReferenceDate];
NSMutableArray *mutableItems = [[NSMutableArray alloc] initWithCapacity:1000];
for (int i = 0; i < 10000; i++) {
[mutableItems addObject:[NSDate date]];
}
NSTimeInterval end = [NSDate timeIntervalSinceReferenceDate];
NSLog(#"elapsed time = %g", (end - start) * 1000.0);
Method 2:
...
NSArray *items = [[[NSArray alloc] init] autorelease];
or (int i = 0; i < 10000; i++) {
items = [items arrayByAddingObject:[NSDate date]];
}
...
Output:
Method 1: elapsed time = 0.011135 seconds.
Method 2: elapsed time = 9.712520 seconds.

When does -copy return a mutable object?

I read in Cocoa and Objective C: Up and Running that -copy will always return an immutable object and -mutableCopy will always return a mutable object:
It’s important to know that calling -copy on a mutable object returns an immutable
version. If you want to copy a mutable object and maintain mutability in the new version,
you must call -mutableCopy on the original. This is useful, though, because if you want
to “freeze” a mutable object, you can just call -copy on it.
So I have something like this:
NSMutableURLRequest *req = [[NSMutableURLRequest alloc] init];
NSLog( #"%#", [req className] ); // NSMutableURLRequest
NSLog( #"%#", [[req copy] className] ); // NSMutableURLRequest
NSLog( #"%#", [[req mutableCopy] className] ); // NSMutableURLRequest
According to this previous answer:
You cannot depend on the result of copy to be mutable! Copying an NSMutableArray may
return an NSMutableArray, since that's the original class, but copying any arbitrary
NSArray instance would not.
This seems to be somewhat isolated to NSURLRequest, since NSArray acts as intended:
NSArray *arr = [[NSMutableArray alloc] init];
NSLog( #"%#", [arr className] ); // __NSArrayM
NSLog( #"%#", [[arr copy] className] ); // __NSAraryI
NSLog( #"%#", [[array mutableCopy] className] ); // __NSArrayM
So...
When does -copy return an immutable object (as expected) and when does it return a mutable object?
How do I achieve the intended effect of getting a "frozen" copy of a mutable object that refuses to be "frozen"?
I think you've uncovered a great rift between documentation and reality.
The NSCopying protocol documentation claims:
The copy returned is immutable if the consideration “immutable vs. mutable” applies to the receiving object; otherwise the exact nature of the copy is determined by the class.
But this is clearly wrong in some cases, as you've shown in your examples (and I've sent feedback to them about this via that documentation page).
But(#2) in my opinion, it doesn't actually matter and you shouldn't care.
The point of -copy is that it will return an object you can use with the guarantee that it will behave independently of the original. This means if you have a mutable object, -copy it, and change the original object, the copy will not see the effect. (In some cases, I think this means that -copy can be optimized to do nothing, because if the object is immutable it can't be changed in the first place. I may be wrong about this. (I'm now wondering what the implications are for dictionary keys because of this, but that's a separate topic...))
As you've seen, in some cases the new object may actually be of a mutable class (even if the documentation tells us it won't). But as long as you don't rely on it being mutable (why would you?), it doesn't matter.
What should you do? Always treat the result of -copy as immutable, simple as that.
1) When does -copy return an immutable object (as expected) and when does it return a mutable object?
you should always treat it as the immutable variant. the mutable interface of the returned type should not be used. apart from optimizations, the answer should not matter and should be considered an implementation detail unless documented.
the obvious case: for a number of reasons, objc class clusters and class designs can be complex. returning a mutable copy could simply be for convenience.
2) How do I achieve the intended effect of getting a "frozen" copy of a mutable object that refuses to be "frozen"?
using the copy constructor of the immutable class is a good way (similar to St3fan's answer). like copy, it's not a guarantee.
the only reason i can think of as to why you would want to enforce this behaviour is for performance or to enforce a restricted interface (unless it's academic). if you want performance or a restricted interface, then you can simply encapsulate an instance of the type which copies on creation and exposes only the immutable interface. then you implement copy via retain (if that's your intent).
alternatively, you can write your own subclass and implement your own variant of copy.
final resort: many of the cocoa mutable/immutable classes are purely interface - you could write your own subclass if you need to ensure a particular behaviour -- but that's quite unusual.
perhaps a better description of why this should be enforced would be good - the existing implementations work just fine for the vast majority of developers/uses.
Bear in mind that there is not one copy implementation -- each class implements its own. And, as we all know, the implementation of the Objective C runtime is a little "loosey goosey" in spots. So I think we can say that mostly copy returns an immutable version, but some exceptions exist.
(BTW, what does this do:
NSArray *arr = [[NSMutable array] init];
?)
The best way to turn an object into an mutable one is to use the mutable 'constructor'. Like for example:
NSArray* array = ...;
NSMutableArray* mutableArray = [NSMutableArray arrayWithArray: array];
Copy is used to make a copy of an object. Not to change it's mutability.

Detect changes in NSArray in ObjC

I need to detect change in NSArray object - that is if some object was added/removed to/from NSArray or was just edited in-place. Are there some integrated NSArray hash functions for this task - or I need to write my own hashing function for NSArray ? Maybe someone has different solution ? Any ideas ?
All objects have a -hash method but not all objects have a good implementation.
NSArray's documentation doesn't define it's result, but testing reveals it returns the length of the array - not very useful:
NSLog(#"%lu", #[#"foo"].hash); // output: 1
NSLog(#"%lu", #[#"foo", #"bar"].hash); // output: 2
NSLog(#"%lu", #[#"hello", #"world"].hash); // output: 2
If performance isn't critical, and if the array contains <NSCoding> objects then you can simply serialise the array to NSData which has a good -hash implementation:
[NSArchiver archivedDataWithRootObject:#[#"foo"]].hash // 194519622
[NSArchiver archivedDataWithRootObject:#[#"foo", #"bar"]].hash // 123459814
[NSArchiver archivedDataWithRootObject:#[#"hello", #"world"]].hash // 215474591
For better performance there should be an answer somewhere explaining how to write your own -hash method. Basically call -hash on every object in the array (assuming the array contains objects that can be hashed reliably) and combine each together mixed in with some simple randomising math.
You could use an NSArrayController, which is Key-Value-Observing compliant. Unfortunately NSArray is only KVC compliant. This way you can easily monitor the array controller's arrangedObjects property. This should solve your problem.
Also, see this question: Key-Value-Observing a to-many relationship in Cocoa

Cocoa: Testing to find if an NSString is immutable or mutable?

This produces an immutable string object:
NSString* myStringA = #"A"; //CORRECTED FROM: NSMutableString* myStringA = #"A";
This produces a mutable string object:
NSMutableString* myStringB = [NSMutableString stringWithString:#"B"];
But both objects are reported as the same kind of object, "NSCFString":
NSLog(#"myStringA is type: %#, myStringB is type: %#",
[myStringA class], [myStringB class]);
So what is distinguishing these objects internally, and how do I test for that, so that I can easily determine if a mystery string variable is immutable or mutable before doing something evil to it?
The docs include a fairly long explanation on why Apple doesn't want you to do this and why they explicitly do not support it in Receiving Mutable Objects. The summary is:
So don’t make a decision on object
mutability based on what introspection
tells you about an object. Treat
objects as mutable or not based on
what you are handed at the API
boundaries (that is, based on the
return type). If you need to
unambiguously mark an object as
mutable or immutable when you pass it
to clients, pass that information as a
flag along with the object.
I find their NSView example the easiest to understand, and it illustrates a basic Cocoa problem. You have an NSMutableArray called "elements" that you want to expose as an array, but don't want callers to mess with. You have several options:
Expose your NSMutableArray as an NSArray.
Always make a non-mutable copy when requested
Store elements as an NSArray and create a new array every time it mutates.
I've done all of these at various points. #1 is by far the simplest and fastest solution. It's also dangerous, since the array might mutate behind the caller's back. But Apple indicates it's what they do in some cases (note the warning for -subviews in NSView). I can confirm that while #2 and #3 are much safer, they can create major performance problems, which is probably why Apple has chosen not to use them on oft-accessed members like -subviews.
The upshot of all of this is that if you use #1, then introspection will mislead you. You have an NSMutableArray cast as an NSArray, and introspection will indicate that it's mutable (introspection has no way to know otherwise). But you must not mutate it. Only the compile-time type check can tell you that, and so it's the only thing you can trust.
The fix for this would be some kind of fast copy-on-write immutable version of a mutable data structure. That way #2 could possibly be done with decent performance. I can imagine changes to the NSArray cluster that would allow this, but it doesn't exist in Cocoa today (and could impact NSArray performance in the normal case, making it a non-starter). Even if we had it, there's probably too much code out there that relies on the current behavior to ever allow mutability introspection to be trusted.
There's no (documented) way to determine if a string is mutable at runtime or not.
You would expect one of the following would work, but none of them work:
[[s class] isKindOfClass:[NSMutableString class]]; // always returns false
[s isMemberOfClass:[NSMutableString class]]; // always returns false
[s respondsToSelector:#selector(appendString)]; // always returns true
More info here, although it doesn't help you with the problem:
http://www.cocoabuilder.com/archive/cocoa/111173-mutability.html
If you want to check for debugging purposes the following code should work. Copy on immutable object is itself, while it's a true copy for mutable types, that's what the code is based on. Note that since it's calling copy it's slow, but should be fine for debugging. If you'd like to check for any other reasons than debugging see Rob answer (and forget about it).
BOOL isMutable(id object)
{
id copy = [object copy];
BOOL copyIsADifferentObject = (copy != object);
[copy release];
return copyIsADifferentObject;
}
Disclaimer: of course there is no guarantee that copy is equivalent with retain for immutable types. You can be sure that if isMutable returns NO then it's not mutable so the function should be probably named canBeMutable. In the real world however, it's a pretty safe assumption that immutable types (NSString,NSArray) will implement this optimization. There is a lot of code out including basic things like NSDictionary that expects fast copy from immutable types.