Trying to determine run-time and space complexity of a HashTable (NSMutableDictionary) based solution following question:
Implement a TwoSum interface that has 2 methods: Store and Test. Store adds an integer to an internal data store and Test checks if an integer passed to Test is the sum of any two integers in the internal data store.
There seem to be many answers with varying store and test complexities. One promising one is
Have a Hashtable called StoreDict. I.e. NSMutableDictionary <NSNumber *, NSNumber *> *StoreDict as the internal data structure
Store(N) is implemented as
Check if StoreDict[#(N)] exists. If yes, increment count, I.e: StoreDict[#(N)] = #([StoreDict[#(N)] intValue]++)
Test(N) implemented as
Iterate through all keys. For each key, K,
If [K intValue] != N/2 check if StoreDict[#(N-[K intValue])] exists
If [K intValue] == N/2 check if [StoreDict[#(K)] intValue] >= 2
Looks like store is O(1) and Test is O(N) run-time complexity. The only question is what is the space complexity of iterating through all the keys, is it O(1) or O(N)? I.e are the keys given one-by-one (o(1) space) or are all N put into some temporary array (o(N) space)?
EDIT
I am referring to using following code to get the keys
NSEnumerator *enumerator = [StoreDict keyEnumerator];
id key;
while ((key = [enumerator nextObject])) {
/* code that uses the returned key */
}
If it is O(N) space complexity, any better solution that gets O(1) time and space complexity for this question?
or is the keyEnumerator similar to using [StoreDict allKeys] which is O(N) space complexity?
NS(Mutable)Dictionary is bridged to CFDictionary and is it reasonable to hypothesise that methods of the former are implemented in terms of functions on the latter. You can test this hypothesis by placing breakpoints on CFDictionary functions and then calling NS(Mutable)Dictionary methods.
Where will this get you? Well the source of CFDictionary is available from Apple, and with that you should be able to figure out the complexities you are after.
HTH
Related
Continuing off this post: Performance hit incurred using NSMutableDictionary vs. NSMutableArray>
I am trying to run a little test to see if the performance gap is that great for read and writes between NSArray & NSDictionary as well as their mutable coutnerparts...
However, I am having difficulties finding a "balanced" test... because the dictionary has 2 (or 3 depending on how you see this) objects to loop through to get the value (not the key) seeked, while the array has only one...
Any suggestions?
--If you want more details:
What I mean is easier to explain through examples;
For the array:
(for NSString *str in array) { do smth with the string }
For the dictionary
(for NSString *str in [dictionary allValues]) { string }
OR
(for NSString *str in [dictionary allKeys]) { [dictionary valueForKey:key] }
OR
(for NSString *str in [dictionary allKeys]) { string }
OR EVEN
NSArray *valuesOrKeys = [dictionary allKeys/allValues];
(for NSString *str in valuesOrKeys) {string }
What is the "fairest" test to do for the dictionary?
--EDIT (comment)
As you all pointed (and asked why I would want that) that when a dictionary is used, it's because it fits the model better than an array...
well the reason for my asking is that an app I'm building is painfully slow and so I'm trying to figure out if the use of a different datatype would change any of that, and I am considering using basic c arrays... I have the choice at this point so I am able to change the inner workings to fit whatever type I want...
I'd like to point you at the following article: "Array", by ridiculous_fish, an engineer at Apple. Cocoa arrays are not necessarily well-implemented naïve arrays as you might expect, nor are dictionaries simple hash tables. Their performance is very circumstantial, and depends on the number of objects they hold (as well as their values, etc.). This might not directly affect the answer, but it's something to consider (NSDictionary performance will, of course, vary with the speed and reliability of your hashing function, and so on).
Additionally, if you're looking for a 'balanced' test, you'd have to look for a way for both classes to behave as close to each other as possible. You want to rule out accessing values via keys in the dictionary, because that — regardless of how fast seek times are for the underlying data structures maintained by NSDictionary — is slower than simply pulling objects from an array because you're performing more operations to do it. Access from an array is O(1), for a hash table, O(1) at best and O(n) at worst (depending on the implementation, somewhere in the middle).
There are several ways to enumerate both dictionaries and arrays, as you mentioned above. You're going to want to use the methods that are closest to each other in terms of implementation, those being either block-based enumeration (enumerateObjectsUsingBlock: for NSArray and enumerateKeysAndObjects: for NSDictionary), or fast enumeration (using either allKeys or allValues for the NSDictionary). Because the performance of these algorithms is mainly empirical, I performed several tests to note access times (each with 10000 NSNumber objects):
NSArray, Block Enumeration:
1. 10.5s
2. 9.1s
3. 10.0s
4. 9.8s
5. 9.9s
-----
9.9s Avg
NSArray, Fast Enumeration:
1. 9.7s
2. 9.5s
3. 9.3s
4. 9.1s
5. 10.5s
-----
9.6s Avg
NSDictionary, Block Enumeration
1. 10.5s
2. 10.6s
3. 9.9s
4. 11.1s
5. 11.0s
-----
10.6s Avg
NSDictionary, allKeys -> Fast Enumeration
1. 10.0s
2. 11.2s
3. 10.2s
4. 10.8s
5. 10.8s
-----
10.6s Avg
NSDictionary, allValues -> Fast Enumeration
1. 10.7s
2. 10.3s
3. 10.5s
4. 10.5s
5. 9.7s
-----
10.3s Avg
As you can see from the results of this contrived test, NSDictionary is clearly slower than NSArray (around 7% slower using block enumeration, and 7–10% slower with fast enumeration). However, this comparison is rather pointless, seeing as using the fastest enumeration for NSDictionary simply devolves it into an array anyway.
So the big question is, why would you consider using a dictionary? Arrays and hash tables aren't exactly interchangeable; what kind of model do you have that allows drop-in replacement of NSArray with NSDictionary? Regardless of the times given by contrived examples to prove performance benefits one way or another, you should always implement your models in a way that makes sense — you can optimize later for performance if you have to. I don't see how you would uses these data structures interchangeably, but anyway, NSArray is the winner here, especially considering the sequential order in which you're attempting to access values.
Here's your "balanced" test using fast enumeration:
[arr enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
// do something with objects
}];
[dict enumerateKeysAndObjectsUsingBlock:^(id key, id obj, BOOL *stop) {
// do something with objects
}];
I am trying to run a little test to see if the performance gap is that
great for read and writes between NSArray & NSDictionary as well as
their mutable coutnerparts...
Why? If it's just to satisfy your curiosity, that's one thing. But usually if you need a dictionary, an array really won't do, and vice versa. So it doesn't matter which one is faster at a given operation -- it's not like one is good alternative for the other.
However, I am having difficulties finding a "balanced" test... because
the dictionary has 2 (or 3 depending on how you see this) objects to
loop through to get the value (not the key) seeked, while the array
has only one...
You're making some assumptions here that aren't likely to be valid. There's probably not a lot of looping involved to access elements of either kind of container.
I'm sorry if this is a bit of a C-noob question: I know I need to swot up on my pointers. Unfortunately I'm on a deadline so don't have time to work through a whole book chapter, so I'm hoping for a bit more targeted advice.
I want to store some objective-C objects in a C array. I'm using ARC. If I were on the Mac I'd be able to use NSPointerArray instead, but I'm on iOS and that's not available.
I'll be storing a three-dimensional C array: conceptually my dimensions are day, height, and cacheNumber. Each element will either be a pointer to an objective-C object, or NULL.
The number of caches (i.e. the size of the cacheNumber dimension) is known at compile time, but the other two are not known. Also, the array could be very large, so I need to dynamically allocate memory for it.
Regarding ownership semantics, I need strong references to the objects.
I would like the whole three-dimensional array to be an instance variable on an objective-C object.
I plan to have a method that is - tableForCacheNumber:(int)num days:(int*)days height:(int*)height. That method should return a two-dimensional array, that is one specific cache number. (It also passes back by reference the size of the array it is returning.)
My questions:
What order should I put my dimensions so that I can easily return a pointer to the subarray for one specific cache number? (I think it should be first, but I'm not 100%.)
What should the return type of my method be, so that ARC doesn't complain? I don't mind if the returned array has an increased reference count or not, as long as I know which it's doing.
What type should my instance variable that holds the three dimensional array be? I think it should just be a pointer, since that ivar just represents the pointer to the first item that's in my array. Correct? If so, how do I specify that?
When I create the three-dimensional array (for my ivar), I guess I do something like calloc(X * Y * Z, sizeof(id)), and cast the result to the type for my ivar?
When accessing items from the three-dimensional array in the ivar, I believe I have to dereference the pointer each time, with something like (*myArray)[4][7][2]. Correct?
Will the two-dimensional array I return from the method be similarly accessed?
Do I need to tag the returned two-dimensional array with objc_returns_inner_pointer?
I'm sorry once again that this is a bit of a bad Stack Overflow question (it's too long and with too many parts). I hope the SO citizens will forgive me. To improve my interweb karma, maybe I'll write it up as a blog post when this project has shipped.
First off: while you don't have NSPointerArray, you do have CFMutableArrayRef and you can pass any callbacks you want for retain/release/description, including NULL. It may be easier (and performance is something you can measure later) to try that first.
Taking your points in order:
you should define your dimensions as [cacheNumber][days][height], as you expect. Then cache[cacheNumber] is a two-dimensional array of type id *[][]. As you've said performance is important, be aware that the fastest way to iterate this beast is:
for (/* cacheNumber loop */) {
for (/* days loop */) {
for (/* height loop */) {
//...
}
}
}
it should be of type __strong id ***: that's a pointer to a pointer to a pointer to id, which is the same as array of (array of (pointer to id)).
your ivar needs to be __strong id **** (!), because it's an array of the above things.
you guess incorrectly regarding allocating the array.. If you're using a multidimensional array, you need to do this (one dimension elided for brevity):
- (__strong id * * *)someArray {
__strong id * * *cache = (__strong id * * *)malloc(x*y*sizeof(void *));
id hello = #"Hello";
cache[0] = (__strong id * *)malloc(sizeof(void *)); //same for cache[1..x-1]
cache[0][0] = &hello; // for all cache[x][y]
return (__strong id * * *)cache;
}
correct, that is how you use such a pointer.
yeah, the two-D array works in the same way, sans the first dimension.
I don't think so, you're handing out __strong object pointers so you should be grand. That said, we're at about the limit of my ability with this stuff now so I could well be wrong.
Answering my own question because this web page gave me the missing bit of info I needed. I've also upvoted Graham's answer, since he was very helpful in getting my head round some of the syntax.
The trick I was missing is knowing that if I want to refer to items in the array via the array[1][5][2] syntax, and that I don't know the sizes of my array at compile time, I can't just calloc() a single block of data for it.
The easiest to read (although least efficient) method of doing that is just with a loop:
__strong Item ****cacheItems;
cacheItems = (__strong Item ****)calloc(kMaxZooms, sizeof(Item ***));
for (int k = 0; k < kMaxZooms; k++)
{
cacheItems[k] = (__strong Item ***)calloc((size_t)daysOnTimeline, sizeof(Item **));
for (int j = 0; j < daysOnTimeline; j++)
{
cacheItems[k][j] = (__strong Item **)calloc((size_t)kMaxHeight, sizeof(Item *));
}
}
I'm allocating a three dimensional array of Item *s, Item being an objective-C class. (I have of course left out the error handling code in this snippet.)
Once I've done that, I can refer to my array using the square brackets syntax:
cacheItems[zoom][day][heightToUse] = item;
The web page I linked to above also describes a second method for performing the memory allocations, that uses only one call to calloc() per dimension. I haven't tried that method yet, as the one I've just described is working well enough at the moment.
I would think of a different implementation. Unless it is a demonstrable (i.e. you have measured and quantified it) performance issue, trying to store Objective-C objects in plain C arrays is often a code smell.
It seems to me that you need an intermediate container object which we will call a Cache for now. One instance will exist for each cache number, and your object will hold an NS(Mutable)Array of them. Cache objects will have properties for the maximum days and height.
The Cache object would most easily be implemented with an NSArray of the objects in it, using simple arithmetic to simulate two dimensions. Your cache object would have a method -objectAtDay:Height: to access the object by its coordinates.
This way, there is no need at all to worry about memory management, ARC does it for you.
Edit
Given that performance is an issue, I would use a 1D array and roll my own arithmetic to calculate offsets. The type of your instance variable would be:
__strong id* myArray;
You can only use C multilevel subscripts (array[i][j][k]) if you know the range of all the dimensions (except the first one). This is because the actual offset is calculated as
(i * (max_j * max_k) + j * max_k + k) * sizeof(element type)
If the compiler doesn't know max_j and max_k, it can't do it. That's precisely the situation you are in.
Given that you have to use a 1D array and calculate the offsets manually, the Apple example will work fine for you.
Currently, If I want to search whether an object is in a NSArray(not sorted) or not. I have to loop over the array and check each object until I found one that match my expectation.
I doubt it's performance even if the check is only a if statement.
To improve the search performance is there any provided solution?
Or I can only sort the array some way and use some way like binary-search?
Sorting then searching will take more time than a loop iterating over each element of the array since comparison based sorting takes at best O(n * log(n)) time, iterating over the array will take O(n) time for n elements in the array.
A pragmatical solution would be to use NSArray#containsObject if you already know the object.
Otherwise you have to implement your own comparison strategy and step through the array
you can use fast enumeration.
eg:
NSEnumerator *enumerator = [set objectEnumerator];
NSNumber *setObject ;
while ((setObject = [enumerator nextObject]) != nil){
[gids appendString:[NSString stringWithFormat:#"%d",[setObject intValue]]];
}
I have a NSMutableDictionary instance and the keys I'm using are NSNumber* type.
In the scenario I have, I'm trying to use 'objectForKey' to retrieve an object in my dictionary that I know is present. But I keep getting nil for the result unless I make convert the key from NSNumber to NSString.
NSNumber *itemId = [NSNumber numberWithInt:5];
id *existingItem = [forRemovalLookup objectForKey:itemId];
if (existingItem == nil)
{
// expected: shouldn't be nil
NSLog(#"!!!!!Not expecting this to be nil");
}
Is there another operation I should use to test for the presence of a specific key in a dictionary?
It would work, but only if [itemID hash] was equal to the key's hash, and if [itemID isEqual:] returned true when compared against the key in question. I think an NSNumber's hash is simply the number it holds, but the hash of a string would be completely different even if it was just a string representation of the same number. From memory, the hash of a string is calculated by multiplying each character value by the value of an accumulator times by a certain amount.
There might be something else I'm missing, but there was a discussion on the Cocoa mailing list about class behaviour inside collection objects and the general consensus was that if a class was to hold well in a collection it should correctly return decent values for -hash and -isEqual:.
I know this answer doesn't really help you in this situation, but it may shed some light on how dictionary collections work in Cocoa.
I have an undetermined size for a dataset based on unique integer keys.
I would like to use an NSMutableArray for fast lookup since all my keys are integer based.
I want to do this.
NSMutableArray* data = [NSMutableArray array]; // just create with 0 size
then later people will start throwing data at me with integer indexes (all unique) so I just want to do something like this...
if ([data count] < index)
[data resize:index]; // ? how do you resize
and have the array resized so that i can then do...
[data insertObject:obj atIndex:index];
with all the slots between last size and new size being zero which will eventually be filled in later.
So my question is how do I resize an existing NSMutableArray?
Thanks,
Roman
Use an NSPointerArray.
http://developer.apple.com/mac/library/documentation/Cocoa/Reference/Foundation/Classes/NSPointerArray_Class/Introduction/Introduction.html
NSPointerArray is a mutable collection
modeled after NSArray but it can also
hold NULL values, which can be
inserted or extracted (and which
contribute to the object’s count).
Moreover, unlike traditional arrays,
you can set the count of the array
directly. In a garbage collected
environment, if you specify a zeroing
weak memory configuration, if an
element is collected it is replaced by
a NULL value.
If you were to use a dictionary like solution, use NSMapTable. It allows integer keys. The NSMutableDictionary based solution recommended has a tremendous amount of overhead related to all of the boxing & unboxing of integer keys.
It sounds like your needs would be better met with an NSMutableDictionary. You will need to wrap the ints into NSNumber objects as follows:
-(void)addItem:(int)key value:(id)obj
{
[data setObject:obj forKey:[NSNumber numberWithInt:key]];
}
-(id)getItem:(int)key
{
return [data objectForKey:[NSNumber numberWithInt:key]];
}
There's no easy was to enlarge the size of an NSMutableArray, since you cannot have nil objects in the in-between slots. You can, however, use [NSNull null] as a 'filler' to create the appearance of a sparse array.
As in Jason's answer, an NSMutableDictionary seems to be the best approach. It adds the overhead of converting the index values to and from NSNumbers, but this is a classic space/time trade off.
In my implementation I also included an NSIndexSet to make traversing the sparse array much simpler.
See https://github.com/LavaSlider/DSSparseArray
I have to disagree with bbum's answer on this. A NSPointerArray is an array, not a sparse array, and there are important differences between the two.
I strongly recommend that bbums solution not be used.
The documentation for NSPointerArray is available here.
Cocoa already has an array object as defined by the NSArray class. NSPointerArray inherits from NSObject, so it is not a direct subclass of NSArray. However, the NSPointerArray documentation defines the class as such:
NSPointerArray is a mutable collection modeled after NSArray but it can also hold NULL values
I will make the axiomatic assumption that this definition from the documentation asserts that this is a "logical" subclass of NSArray.
Definitions-
A "general" array is: a collection of items, each of which has a unique index number associated with it.
An array, without qualifications, is: A "general" array where the indexes of the items have the following properties: Indexes for items in the array begin at 0 and increase sequentially. All items in the array contains an index number less than the number of items in the array. Adding an item to an array must be at index + 1 of the last item in the array, or an item can be inserted in between two existing item index numbers which causes the index number of all subsequent items to be incremented by one. An item at an existing index number can be replaced by another item and this operation does not change the index numbers of the existing operations. Therefore, insert and replace are two distinct operations.
A sparse array is: A "general" array where the index number of the first item can begin at any number and the index number of subsequent items added to the array has no relation to or restrictions based on other items in the array. Inserting an item in to a sparse array does not effect the index number of other items in the array. Inserting an item and replacing an item are typically synonymous in most implementations. The count of the number of items in the sparse array has no relationship to the index numbers of the items in the sparse array.
These definitions make certain predictions about the behavior of a "black box" array that are testable. For simplicity, we'll focus on the following relationship:
In an array, the index number of all the items in the array is less than the count of the number of items in the array. While this may be true of a sparse array, it is not a requirement.
In a comment to bbum, I stated the following:
a NSPointerArray is not a sparse array, nor does it behave like one. You still have to fill all the unused indexes with NULL pointers. Output from [pointerArray insertPointer:#"test" atIndex:17]; on a freshly instantiated NSPointerArray:
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[NSConcretePointerArray insertPointer:atIndex:]: attempt to insert pointer at index 17 beyond bounds 0'
It is stated, without proving, the the behavior of NSPointerArray above violates the very definition of a sparse array. This part of the error message is revealing: attempt to insert pointer at index 17 beyond bounds 0', in particular the part about having to add the first new item at index 0.
bbum then comments:
That is incorrect. You failed to call -setCount: to set the capacity to a sufficient size.
It is non-sensical to "set the count" of the number of items in a sparse array. If NSPointerArray was a sparse array, one would expect that after adding the first item at index 17, the count of the number of items in the NSPointerArray would be one. However, following bbums advice, the number of items in the NSPointerArray after adding the first items is 18, not 1.
QED- It is shown that a NSPointerArray is in fact an array, and for the purposes of this discussion, a NSArray.
Additionally, bbum makes the following additional comments:
NSPointerArray most certainly does support holes.
This is provably false. An array requires all items contained in it to contain something, even if that something is 'nothing'. This is not true of a sparse array. This is the very definition of a 'hole' for the purposes of this discussion. A NSPointerArray does not contain holes in the sparse array sense of the term.
That was one of the whole points of writing the class. You have to set the count first.
It is provably non-sensical to "set the count" of a sparse array.
Whether the internal implementation is a sparse array or a hash or, etc, is an implementation detail.
This is true. However, the documentation for NSPointerArray does not make any reference to how it implements or manages its array of items. Furthermore, it does not state anywhere that a NSPointerArray "efficiently manages an array of NULL pointers."
QED- bbum is depending on the undocumented behavior that a NSPointerArray efficiently handles NULL pointers via a sparse array internally. Being undocumented behavior, this behavior can change at any time, or may not even apply to all uses of the NSPointerArray. A change in this behavior would be catastrophic if the highest index number stored in it are sufficiently large (~ 2^26).
And, in fact, it is not implemented as one big hunk of memory...
Again, this is a private implementation detail that is undocumented. It is extremely poor programming practice to depend on this type of behavior.