performance for reads of nsdictionary vs nsarray - objective-c

Continuing off this post: Performance hit incurred using NSMutableDictionary vs. NSMutableArray>
I am trying to run a little test to see if the performance gap is that great for read and writes between NSArray & NSDictionary as well as their mutable coutnerparts...
However, I am having difficulties finding a "balanced" test... because the dictionary has 2 (or 3 depending on how you see this) objects to loop through to get the value (not the key) seeked, while the array has only one...
Any suggestions?
--If you want more details:
What I mean is easier to explain through examples;
For the array:
(for NSString *str in array) { do smth with the string }
For the dictionary
(for NSString *str in [dictionary allValues]) { string }
OR
(for NSString *str in [dictionary allKeys]) { [dictionary valueForKey:key] }
OR
(for NSString *str in [dictionary allKeys]) { string }
OR EVEN
NSArray *valuesOrKeys = [dictionary allKeys/allValues];
(for NSString *str in valuesOrKeys) {string }
What is the "fairest" test to do for the dictionary?
--EDIT (comment)
As you all pointed (and asked why I would want that) that when a dictionary is used, it's because it fits the model better than an array...
well the reason for my asking is that an app I'm building is painfully slow and so I'm trying to figure out if the use of a different datatype would change any of that, and I am considering using basic c arrays... I have the choice at this point so I am able to change the inner workings to fit whatever type I want...

I'd like to point you at the following article: "Array", by ridiculous_fish, an engineer at Apple. Cocoa arrays are not necessarily well-implemented naïve arrays as you might expect, nor are dictionaries simple hash tables. Their performance is very circumstantial, and depends on the number of objects they hold (as well as their values, etc.). This might not directly affect the answer, but it's something to consider (NSDictionary performance will, of course, vary with the speed and reliability of your hashing function, and so on).
Additionally, if you're looking for a 'balanced' test, you'd have to look for a way for both classes to behave as close to each other as possible. You want to rule out accessing values via keys in the dictionary, because that — regardless of how fast seek times are for the underlying data structures maintained by NSDictionary — is slower than simply pulling objects from an array because you're performing more operations to do it. Access from an array is O(1), for a hash table, O(1) at best and O(n) at worst (depending on the implementation, somewhere in the middle).
There are several ways to enumerate both dictionaries and arrays, as you mentioned above. You're going to want to use the methods that are closest to each other in terms of implementation, those being either block-based enumeration (enumerateObjectsUsingBlock: for NSArray and enumerateKeysAndObjects: for NSDictionary), or fast enumeration (using either allKeys or allValues for the NSDictionary). Because the performance of these algorithms is mainly empirical, I performed several tests to note access times (each with 10000 NSNumber objects):
NSArray, Block Enumeration:
1. 10.5s
2. 9.1s
3. 10.0s
4. 9.8s
5. 9.9s
-----
9.9s Avg
NSArray, Fast Enumeration:
1. 9.7s
2. 9.5s
3. 9.3s
4. 9.1s
5. 10.5s
-----
9.6s Avg
NSDictionary, Block Enumeration
1. 10.5s
2. 10.6s
3. 9.9s
4. 11.1s
5. 11.0s
-----
10.6s Avg
NSDictionary, allKeys -> Fast Enumeration
1. 10.0s
2. 11.2s
3. 10.2s
4. 10.8s
5. 10.8s
-----
10.6s Avg
NSDictionary, allValues -> Fast Enumeration
1. 10.7s
2. 10.3s
3. 10.5s
4. 10.5s
5. 9.7s
-----
10.3s Avg
As you can see from the results of this contrived test, NSDictionary is clearly slower than NSArray (around 7% slower using block enumeration, and 7–10% slower with fast enumeration). However, this comparison is rather pointless, seeing as using the fastest enumeration for NSDictionary simply devolves it into an array anyway.
So the big question is, why would you consider using a dictionary? Arrays and hash tables aren't exactly interchangeable; what kind of model do you have that allows drop-in replacement of NSArray with NSDictionary? Regardless of the times given by contrived examples to prove performance benefits one way or another, you should always implement your models in a way that makes sense — you can optimize later for performance if you have to. I don't see how you would uses these data structures interchangeably, but anyway, NSArray is the winner here, especially considering the sequential order in which you're attempting to access values.

Here's your "balanced" test using fast enumeration:
[arr enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
// do something with objects
}];
[dict enumerateKeysAndObjectsUsingBlock:^(id key, id obj, BOOL *stop) {
// do something with objects
}];

I am trying to run a little test to see if the performance gap is that
great for read and writes between NSArray & NSDictionary as well as
their mutable coutnerparts...
Why? If it's just to satisfy your curiosity, that's one thing. But usually if you need a dictionary, an array really won't do, and vice versa. So it doesn't matter which one is faster at a given operation -- it's not like one is good alternative for the other.
However, I am having difficulties finding a "balanced" test... because
the dictionary has 2 (or 3 depending on how you see this) objects to
loop through to get the value (not the key) seeked, while the array
has only one...
You're making some assumptions here that aren't likely to be valid. There's probably not a lot of looping involved to access elements of either kind of container.

Related

What is different in enumarating NSDictionary directly in compare to enumerating its allValues?

I am trying to understand/learn Objective-C. I am lost what is the difference between enumerating
for (TypeOfValuesInDictionary *item in dictionary) {}
and enumerating
for (TypeOfValuesInDictionary *item in [dictionary allValues]) {}
?
The former tends to return less values from the dictionary than the latter but I failed to find/understand why? Documentation nor web search gave me the necessary insight/answer.
The first one iterates over the keys, not the values.
F'up: What's the difference to iterating over the array of keys?
A: It is less efficient.

What is the space complexity of iterating over keys in NSMutableDictionary?

Trying to determine run-time and space complexity of a HashTable (NSMutableDictionary) based solution following question:
Implement a TwoSum interface that has 2 methods: Store and Test. Store adds an integer to an internal data store and Test checks if an integer passed to Test is the sum of any two integers in the internal data store.
There seem to be many answers with varying store and test complexities. One promising one is
Have a Hashtable called StoreDict. I.e. NSMutableDictionary <NSNumber *, NSNumber *> *StoreDict as the internal data structure
Store(N) is implemented as
Check if StoreDict[#(N)] exists. If yes, increment count, I.e: StoreDict[#(N)] = #([StoreDict[#(N)] intValue]++)
Test(N) implemented as
Iterate through all keys. For each key, K,
If [K intValue] != N/2 check if StoreDict[#(N-[K intValue])] exists
If [K intValue] == N/2 check if [StoreDict[#(K)] intValue] >= 2
Looks like store is O(1) and Test is O(N) run-time complexity. The only question is what is the space complexity of iterating through all the keys, is it O(1) or O(N)? I.e are the keys given one-by-one (o(1) space) or are all N put into some temporary array (o(N) space)?
EDIT
I am referring to using following code to get the keys
NSEnumerator *enumerator = [StoreDict keyEnumerator];
id key;
while ((key = [enumerator nextObject])) {
/* code that uses the returned key */
}
If it is O(N) space complexity, any better solution that gets O(1) time and space complexity for this question?
or is the keyEnumerator similar to using [StoreDict allKeys] which is O(N) space complexity?
NS(Mutable)Dictionary is bridged to CFDictionary and is it reasonable to hypothesise that methods of the former are implemented in terms of functions on the latter. You can test this hypothesis by placing breakpoints on CFDictionary functions and then calling NS(Mutable)Dictionary methods.
Where will this get you? Well the source of CFDictionary is available from Apple, and with that you should be able to figure out the complexities you are after.
HTH

Objective C NSMutableArray work with nil

I want to create a NSMutableArray to represent a game board that holds grid element (assume I have a chess class). In this case, I already know the size of the board so I want to create an array by initWithCapacity:[size] and then initialize them as nil. During the game, I may insert or remove chess object into/from this array based on game. I need to check if some cell is nil sometimes.
Clearly initWithCapacity only allocate memory, but gives an empty array whose element is not assessable. I think insert [NSNull null] one by one is inefficient (I can turn to use 0 to represent nil but still not what I want).
Is there any type/structure in Objective C like a C/C++ array for my purpose? Or is it wise to use C/C++ array here? (e.g. Chess myArray[size])
It is by design that NSMutableArrays and other collections do not permit nil and [NSNull null] is used as a placeholder for the concept of "nothing" instead.
I think insert [NSNull null] one by one is inefficient
[NSNull null] is a singleton, so won't allocate lots of memory unnecessarily and is not inefficient.
This answer has more information.
Anther efficiency I'm concerning about [NSNull null] is when I first create the array, if this array is kind of large (say 100 entries)
100 elements isn't a problem. Your device will be able to iterate 100 elements very quickly. Time it if you like.
is enumeration the only way I can assign a nil to each of these 100 entries? (e.g. for (i=0;i<size;i++) {[myArray addObject:[NSNull null]];}
It's the first way I would think of doing it.
"Premature optimisation is the root of all evil"
Donald Knuth
You seem to be concentrating on optimisation too early. You should favour readability over efficiency at this point.
When your app seems to slow down, or near the end of a release, profile the application and find out what, if any problems exist. Otherwise you may find yourself spending a day "optimising" a for loop to find that you've saved 0.000001s per update.
Moreover, readable code is easier to:
debug
update
maintain
share
Micro-optimised code takes longer to produce, is prone to bugs, difficult to debug and maintain and often impossible to share as another developer may not know how to interpret your optimisations.
That's not to say "don't optimise", rather concentrate on optimising the biggest problems.
You can use something like this:
NSMutableArray *array = [NSMutableArray arrayWithArray:#[#"First", #"Second", #"Third"]];
[array addObject:[NSNull null]];
[array enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
if (![obj isKindOfClass:[NSNull class]]) {
// Do something
}
}];

NSSet -member to check equality of NSValue

I have a NSSet containing many thousands of NSValue objects (wrapping CGPoints). I would like to very quickly find if a given CGPoint value exists in the NSSet. It seems to me that the member: method of an NSSet might do the job here, except that it checks for equality using isEqual:. NSValue objects use isEqualToValue:, and so when I execute the code:
[mySet member:valueToCheck];
it actually causes Xcode to crash.
1) Is there some way to use a custom equality check to make this work for NSValue objects?
2) Is this even the best approach (i.e. is member: quick enough in the first place)? The scenario is that I have a NSSet containing a large number of points representing pixels on the screen (iPad). Later on I need to bombard that set with many thousands of points per second to see if they exist in the set. My approach seems crude for achieving this. I thought about creating something like a huge 2-dimensional bit array, with each index representing a pixel on screen. Once I know the point I'm testing for, I can just jump straight to that point in the array and check for a 1 or 0... does this sound better or worse?
Thanks
Can you get this to a simple reproducible case? For example, I just tried:
NSValue *v = [NSValue valueWithCGPoint:CGPointMake(1, 1)];
NSSet *s = [NSSet setWithObject:v];
NSLog(#"%#", [s member:[NSValue valueWithCGPoint:CGPointMake(1, 1)]]);
But it works just fine.
edit
-isEqual: is not the problem:
NSValue *v1 = [NSValue valueWithPoint:NSMakePoint(1, 1)];
NSValue *v2 = [NSValue valueWithPoint:NSMakePoint(1, 1)];
NSLog(#"%d", [v1 isEqual:v2]); //logs "1"
-hash is not the problem:
NSLog(#"%d", ([v1 hash] == [v2 hash])); //logs "1"
They are different objects:
NSLog(#"%d", (v1 != v2)); //logs "1"
The problem is in your code. Try cleaning and rebuilding.
To answer no. 2:
I don't know how NSSet is implemented internally, but considering that you know you are storing points (with X and Y), I think you would be better by implementing your own partitioning algorithm. Personally I would choose my own implementation over NSSet if you say you have thousands of points.
Storing huge 2-dimensional arrays for each pixel, would probably be the fastest way, but it will kill you in terms of memory consumption. You need something fast, but also lightweight.
There are a lot of algorithms out there and you can find them by searching "spatial partitioning algorithms" on wikipedia or google. It also depends on your programming skills, and how much time you are willing to invest in this.
For example, a pretty simple one would be to implement a quad-tree, where you start by diving your screen(or area) in 4 equal parts. Then if and where is needed, you divide that specific cell also in 4 parts. And you do this until each cell contains a small enough number of points so that you can brute-force test all of them.
You can find a very good description on wiki: http://en.wikipedia.org/wiki/Quadtree
Hope this helps,
[mySet member:valueToCheck] should not be crashing. NSValue's isEqual: works fine when I try it here, and in fact probably calls isEqualToValue: when given another NSValue to compare to. Is valueToCheck really an NSValue, or is it a CGPoint?
There is no way to override the default hash and comparison methods for NSSet. But NSSet is toll-free bridged with CFSetRef, and you can easily specify custom hashing and comparison methods there:
CFSetCallBacks callbacks = kCFTypeSetCallBacks;
callbacks.equal = customEqualFunction;
callbacks.hash = customHashFunction;
NSMutableSet *set = (NSMutableSet *)CFSetCreateMutable(NULL, 0, &callbacks);
The constraints on these functions are presumably the same as on NSObject's hash and isEqual: methods, anything that is equal must have the same hash. The C-style prototypes for customEqualFunction and customHashFunction are described here and here.
One solution would be to subclass NSSet and override member: to do your own comparison. Your own comparison could then simple call isEqualToValue:. Have a look at the subclassing notes in the NSSet documentation.
Another approach would be to add a category to NSValue that implements isEqual:. In this case I'd prefer subclassing because it's a more constrained solution.
It's not just a problem with -isEqual:, you may also have an issue with the -hash method. If you want to use an NSSet, you should probably create a custom class that wraps the CGPoint. -isEqual: is then trivial and -hash could be implemented by some method of combining the bits of both coordinates and then treating them as a NSUInteger.
You'll also want to implement the NSCopying protocol which is also trivial if your points are immutable (just retain and return self in -copyWithZone:).

Finding objects in Core Data by array attribute, performantly in >10k elements

Short:
I need to find core data objects by a key, which holds a unique immutable array (fixed length, but chosen at runtime) of arbitrary objects (for which not only element membership, but also element order determines uniqueness). NSManagedObject however forbids overriding [isEqual:]. Now what?
Long:
I have an entity (see diagram image for entity "…Link") in my Core Data model for which I have to guarantee uniqueness based on an attribute key ("tuple"). So far so good.
The entity's unique attribute however has to be an NSArray.
And to make things a bit more difficult I neither know the class type of the tuple's elements.
Nor do I know the tuple's element count. Well, actually the count is the same for every tuple (per core data context at least), but not known before the app runs.
There must only ever be one instance of my link entity with a given tuple.
And for obvious reason only ever one tuple instance with a given array of arbitrary objects.
Whereas two tuples are to be considered equal if [tuple_1 isEqual:tuple_n] returns YES. NSManagedObject forbids the overriding of [isEqual:] and [hash] though, otherwise things would be pretty much a piece of cake.
"…Tuple" objects are created together with their array of tokens (via a convenience method) and are immutable (and so is each "…Token" and its data attribute). (think of "…Tuple" as a "…Link"'s dictionary key.)
"…Tuple" implements "- (NSArray *)tokens;", which returnes a neatly ordered array of tokens, based on the "order" keys of "…TokenOrder". (Tuples are expected to contain at most 5 elements.)
I however expect to have tens of thousands (potentially even more in some edge cases) of "…Link" objects, which I have to (frequently) find based on their "tuple" attribute.
Sadly I couldn't find any article (let alone solution) for such a scenario in any literature or the web.
Any ideas?
A possible solution I've come up with so far would be:
Narrow amount of elements to compare
by tuple by adding another attribute
to "…Tuple" called "tupleHash",
which is pre-calculated on
object creation via: Snippet 1
Query with NSPredicate for objects of matching tupleHash (narrowing down the list of candidates quite a bit).
Find "…Link" featuring given tuple in narrowed candidate list by: Snippet 1
Snippet 1:
NSUInteger tupleHash = [[self class] hash];
for (id token in self.tokens) {
tupleHash ^= [token.data hash];
}
Snippet 2:
__block NSArray *tupleTokens = someTokens;
NSArray *filteredEntries = [narrowedCandidates filteredArrayUsingPredicate:
[NSPredicate predicateWithBlock: ^(id evaluatedObject, NSDictionary *bindings) {
return [evaluatedObject.tuple.tokens isEqualToArray:tupleTokens];
}]];
(Sorry, markdown appears to oppose mixing of lists with code snippets.)
Good idea of or just insane?
Thanks in advance!
I strongly suggest that you calculate a hash for your objects and store it in your database.
Your second snippet will seriously hurt performance, that's for sure.
Update:
You don't need to use the hash method of NSArray.
To calculate the hash, you can perform a SHA1 or MD5 on the array values, concatenated. There are many algorithms for hashing, these are just two.
You can create a category for NSArray, say myHash to make the code reusable.
As recommended in a comment by Joe Blow I'm just gonna go with SQLite. Core Data simply appears to be the wrong tool here.
Benefits:
Fast thanks to SQL's column indexing
No object allocation/initialization on SELECT, prior to returning the results. (which Core Data would require for attribute checks)
Easily query link tuples using JOINs.
Easy use of SQLite's JOIN, GROUP BY, ORDER BY, etc
Little to no wrapper code thanks to EGODatabase (FMDB-inspired SQLite Objective-C wrapper)