Converting NSArray to NSSet, custom class instances transfer inconsistently - objective-c

Ran into a interesting little problem. I was writing a method to filter an array to the unique objects:
- (NSArray*)distinctObjectsByAddress {
NSSet* uniqueSet = [NSSet setWithArray:self];
NSArray* retArray = [uniqueSet allObjects];
return retArray;
}
and wrote a unit test to check:
- (void)testDistinctObjectsByAddress5 {
Person* adam1 = [[Person alloc] initWithFirstName:#"adam" lastName:#"adam" andParent:nil];
Person* adam2 = [[Person alloc] initWithFirstName:#"adam" lastName:#"adam" andParent:nil];
testPersonArray = [NSArray arrayWithObjects:adam1,adam2, nil];
NSArray* checkArray = [testPersonArray distinctObjectsByAddress];
STAssertEquals([checkArray count], [testPersonArray count], #"Array %# counts should match %# %#",checkArray,adam1,adam2);
}
Pretty simple. The interesting part is that about 80-90% of the time the test passes and every so often it fails because the distinctObjectsByAddress method only returns one object. I've been able to trace it to the [NSSet setWithArray:self] call but I've also been able to verify that the two person objects are two different objects (at least they have different address). I'm assuming that setWithArray: is just doing a basic address compare but I don't understand why it is sometimes producing two objects like it should and sometimes producing only one.
Something I just tried was changing adam2 so that the first and last name were not exactly the same as adam1. This seems to fix the error. Does this point to some sort of compiler optimization when the objects are logically the same?

I'm assuming that setWithArray is just doing a basic address compare
That's incorrect. NSSet uses the -isEqual: and -hash methods on the objects that are added to it. It depends on how those are implemented in Person or its superclasses.
If [person1 isEqual:person2] then you would expect the set to contain one object. If not, then the set should contain two objects.
My guess is that Person does not follow the rules in its -isEqual: and -hash methods. Most likely, the two objects are equal, but their hashes are not equal like they should be. (Except for the 10-20% of the time that you're getting lucky.)
Does this point to some sort of compiler optimization when the objects are logically the same?
No, there is no compiler optimization that would merge the two objects into one.

Most likely you did not implement hash for Person, and sometimes the identical Person object hashes into two different buckets.

Related

Return a key : value pair from a method for use in NSDictionary

I understand I can return an NSDictionary by doing
- (NSDictionary *)keyWithValue {
return #{#"key" : #"value"};
}
but how can I return that without the enclosing #{} dictionary?
There is no tuples in Objective C unlike in Swift, Python etc. So the common way to return 2 different objects is to return an array or a dictionary.
You also can try something like:
- (NSString *)keyWithValue:(NSString**)value {
*value = #"value";
return #"key";
}
It should be used following way:
NSString *v;
NSString *k = [self keyWithValue:&v];
// now v contains #"value"
Objective-C, like C before it, doesn't allow the return of multiple values from a method. (Essentially, although a method or function can accept any number of arguments as input, it can only have a single return value.) There are historical and implementation reasons for this design but it can be frustrating when you simply have a pair/tuple to return.
If you have a method that has two distinct "results" that you need to return to the caller, you have a few choices. The very simplest in your case is to do something like what you are doing here and "wrapping" the values in a dictionary. You could similarly wrap them in a two-value array (which is a little less good since it relies on an implicit contract between caller and callee that there will be exactly two items in the array).
However, a clean and fairly standard approach here would be to create a small class with only two properties on it, and create, fill in, and return that instance with your pair of values. This arguably uses less runtime overhead than a collection object, and has the nice benefit of being semantically explicit and easy to understand for anyone else looking at it.
(There is yet another way, which involves passing pointers as arguments that are "outparams", but that's only idiomatic in rare circumstances in ObjC and I wouldn't recommend it here.)
There is no way to return a key value pair without a dictionary because that is the definition of the dictionary data structure. From apple docs:
The NSDictionary class declares the programmatic interface to objects that manage immutable associations of keys and values
You access the value with
[myDictionary objectForKey:#"myKey"];
If you want to use the returned key-value pair in another dictionary
NSMutableDictionary *otherDict = [NSMutableDictionary alloc] init];
[otherDict setObject:[myDictionary objectForKey:#"myKey"] forKey:#"myKey"];

Why don't NSSet/NSMutableSet/NSCountedSet force immutable objects as entries?

NSDictionary keys are id<NSCopying> but the value for a set is just id, and the docs indicate their values are retained. According to the Set Fundamentals of the Collection Programming Topics docs:
You can, however, modify individual objects themselves (if they support modification).
If you modify an object, this could affect the hashvalue of the object, which would affect lookups. I assumed that an NSSet is a fast lookup?
Here's an example that shows how things break if you mutate objects:
NSMutableString *str = [NSMutableString stringWithString: #"AWESOME"];
NSCountedSet *countedSet = [[NSCountedSet alloc] init];
[countedSet addObject: str];
[countedSet addObject: str];
NSLog(#"%#", #([countedSet countForObject: #"AWESOME"]));
[str appendString: #" NOT AWESOME"];
NSLog(#"%#", #([countedSet countForObject: #"AWESOME NOT AWESOME"]));
NSLog(#"%#", #([countedSet countForObject: #"AWESOME"]));
NSLog(#"%#", #([countedSet countForObject: str]));
for(NSString *s in countedSet) {
NSLog(#"%# - %#", str, #([countedSet countForObject: s]));
}
NSSet *set = [NSSet setWithArray: #[ str ]];
NSLog(#"Set Contains string, %#", #([set containsObject: str]));
[str appendString: #"asdf"];
NSLog(#"Set Contains string, %#", #([set containsObject: str]));
NSLog(#"%#", set);
And output with my interpretation:
[64844:303] 2 // Count is 2
[64844:303] 0 // Count should be 2 - if it looks for the literal string
[64844:303] 0 // Count should be 0, but can't find original object either
[64844:303] 0 // Count should be 2 - asking for actual object that's in there
[64844:303] AWESOME NOT AWESOME - 0 // Should be 2 - asking for actual object that it just retrieved
[64844:303] Set Contains string, 1 // Correct, pre-mutation
[64844:303] Set Contains string, 0 // Should be true, object is in there
[65070:303] {(
"AWESOME NOT AWESOMEasdf" // see? It's in there
)}
My take:
The set likely buckets based on hash value, when the hash is changed out behind the set, it doesn't know what to do and lookups are broken. The documentation is lacking in this area.
My question restated:
Docs say you can mutate objects, which is not intuitive.
Mutating objects breaks sets.
WTF?
That line from the docs is confusing. However, note that three paragraphs down it goes on to say:
If mutable objects are stored in a set, either the hash method of the
objects shouldn’t depend on the internal state of the mutable objects
or the mutable objects shouldn’t be modified while they’re in the set.
For example, a mutable dictionary can be put in a set, but you must
not change it while it is in there. (Note that it can be difficult to
know whether or not a given object is in a collection).
What your code is demonstrating is a known property of the hash-based collection classes. It can affect dictionaries, too, if a key object is implemented such that copying returns the original, which is inherently mutable.
There's no real way to test if an object is mutable. So, it can't force immutability.
Also, as alluded to in the quote above, it's possible to make a mutable class whose hash and equality are not affected by mutations.
Finally, it would too severely limit the utility of those collection classes if they could only be used with copyable classes and made copies of the elements (like dictionaries make copies of their keys). The collections are used to represent relationships, among other things, and it wouldn't do if you tried to establish a relationship between objects but instead established a relationship to a separate copy.
Since the only reliable way of ensuring an object's immutability in Objective-C is to make a copy, Cocoa designers had two choices:
Make NSSet copy the objects - That would be safe, bit it would severely restrict the use of NSSet due to increased memory usage.
Use retained objects - That would keep memory usage to a bare minimum, but it would give the users a way to shoot themselves in a foot by mutating an object inside NSSet.
Designers picked the second approach over the first one, because it fixes a danger that could be avoided by proper coding technique. In contrast, selecting the first approach would be "binding" on everybody, in the sense that inserting a new object would always make a copy.
Currently, users have a choice of inserting copies of objects that they create manually, thus emulating the first approach. However, an implementation that forces a copy cannot emulate an implementation that retains objects, making it a less flexible choice.

Why does NSOrderedMutableSet add objects whose data is the same?

I'm creating objects and adding them to a set using -[NSOrderedMutableSet addObject:], but I discovered that only duplicates of the objects themselves are checked for -- the object pointer's address presumably, and that it's possible to add multiple objects that have identical content.
For example:
SomeObject* object = [SomeObject alloc] initWithStuff:stuff];
SomeObject* object2 = [SomeObject alloc] initWithStuff:stuff];
[set addObject:object];
[set addObject:object];
[set addObject:object1];
[set addObject:object2];
The count will be 2.
This makes me wonder what the point of these classes is? Under what circumstances might one have an object and not know if the object itself had already been added to a collection, rather than the data contained within the object?
Whats the easiest way (or what class should I use) to use to ensure the set only contains one of each object based on content?
The way you are looking is the right way, you are forgetting a small detail: how could the NSMutableOrderedSet class know about which instances of SomeObject contain same values?
The answer is simple: you must provide your own implementations of
- (BOOL)isEqual:(id)anObject
- (NSUInteger)hash
So that your instances will return true when compared with same internal values, and two instances with same data will have same hashcode.
Apart from this sets are rather useful because they give you better complexity on checking if an instance is contained in a set or not, and you can quickly do many logical operations on them, like intersection, union, difference and whatever.
If it is a custom object you have, you'd have to implement your own isEqual: and hash method to check for equality and prevent duplicates in the set.

Comparing NSSets by a single property

I'm trying to determine if two NSSets are "equal" but not in the sense of isEqualToSet. Items in the two sets are the same class but are not the same object, or even references to the same object. They will have one property that is the same though - let's call it 'name'.
Is my best bet in comparing these two sets to do a simple set count test, then a more complex objectsPassingTest: on each item in one set, making sure an item with the same name is in the other set? I'm hoping that something simpler exists to handle this case.
I had the same problem, but I needed to compare multiple properties at the same time (class User with properties Name and Id).
I resolved this by adding a method returning an NSDictionary with the properties needed to the class:
- (NSDictionary *)itemProperties
{
NSMutableDictionary *dict = [[NSMutableDictionary alloc] init];
[dict setObject:self.name forKey:#"name"];
[dict setObject:self.id forKey:#"id"];
return dict;
}
and then using valueForKey: as Kevin Ballard mentioned:
BOOL userSetsEqual = [[userSet1 valueForKey:#"itemProperties"]
isEqualToSet:[userSet2 valueForKey:#"itemProperties"]];
... where userSet1 and userSet2 were the NSSets that contained User objects.
You could just call valueForKey: on both sets and compare the results.
if ([[set1 valueForKey:#"name"] isEqualToSet:[set2 valueForKey:#"name"]]) {
// the sets match your criteria
}
Looking through the documentation, it seems that there is no way to really handle this special case of yours. You're going to have to write some custom code to handle this. Personally, I would recommend using -sortedArrayUsingDescriptors: and then comparing the arrays, but that's just me. You could also go enumerate through one set, then narrow down the other using -filteredSetUsingPredicate: and get its count.
Whichever method you use, consider the fact that its probably not going to be super efficient. This might be unavoidable, but there are probably ways to go about it that are better than others. Food for thought.

Creating an NSArray initialized with count N, all of the same object

I want to create an NSArray with objects of the same value (say NSNumber all initialized to 1) but the count is based on another variable. There doesn't seem to be a way to do this with any of the intializers for NSArray except for one that deals with C-style array.
Any idea if there is a short way to do this?
This is what I am looking for:
NSArray *array = [[NSArray alloc] initWithObject:[NSNumber numberWithInt:0]
count:anIntVariable];
NSNumber is just one example here, it could essentially be any NSObject.
The tightest code I've been able to write for this is:
id numbers[n];
for (int x = 0; x < n; ++x)
numbers[x] = [NSNumber numberWithInt:0];
id array = [NSArray arrayWithObjects:numbers count:n];
This works because you can create runtime length determined C-arrays with C99 which Xcode uses by default.
If they are all the same value, you could also use memset (though the cast to int is naughty):
id numbers[n];
memset(numbers, (int)[NSNumber numberWithInt:0], n);
id array = [NSArray arrayWithObjects:numbers count:n];
If you know how many objects you need, then this code should work, though I haven't tested it:
id array = [NSArray arrayWithObjects:(id[5]){[NSNumber numberWithInt:0]} count:5];
I can't see any reason why this structure in a non-mutable format would be useful, but I am certain that you have your reasons.
I don't think that you have any choice but to use a NSMutableArray, build it with a for loop, and if it's really important that the result not be mutable, construct a NSArray and use arrayWithArray:
I agree with #mmc, make sure you have a valid reason to have such a structure (instead of just using the same object N times), but I'll assume you do.
There is another way to construct an immutable array which would be slightly faster, but it requires creating a C array of objects and passing it to NSArray's +arrayWithObject:count: method (which returns an autoreleased array, mind you) as follows:
id anObject = [NSNumber numberWithInt:0];
id* buffer = (id*) malloc(sizeof(id) * anIntVariable);
for (int i = 0; i < anIntVariable; i++)
buffer[i] = anObject;
NSArray* array = [NSArray arrayWithObjects:buffer count:anIntVariable];
free(buffer);
You could accomplish the same thing with even trickier pointer math, but the gains are fairly trivial. Comment if you're interested anyway.
Probably the reason there is no such method on NSArray is that the semantics are not well defined. For your case, with an immutable NSNumber, then all the different semantics are equivalent, but imagine if the object you were adding was a mutable object, like NSMutableString for example.
There are three different semantics:
retain — You'd end up with ten pointers to the same mutable string, and changing any one would change all ten.
copy — You'd end up with ten pointers to the same immutable string, or possibly ten different pointers to immeduable strings with the same value, but either way you'd not be able to change any of them.
mutableCopy — You'd end up with ten different mutable string objects, any of which you could change independently.
So Apple could write three variants of the method, or have some sort of parameter to control the semantics, both of which are ugly, so instead they left it to you to write the code. If you want, you can add it as an NSArray category method, just be sure you understand the semantic options and make it clear.
The method:
-(id)initWithArray:(NSArray *)array copyItems:(BOOL)flag
has this same issue.
Quinn's solution using arrayWithObjects:count: is a reasonably good one, probably about the best you can get for the general case. Put it in an NSArray category and that's about as good as it is going to get.