How to keep an array sorted - objective-c

I'm refactoring a project that involves passing around a lot of arrays. Currently, each method that returns an array sorts it right before returning it. This isn't ideal for a couple reasons -- there's lots of duplicated code, it's inefficient to sort an array two or three times, and it's too easy to write a new function but to forget to sort the array before returning it.
I'm looking for a way to guarantee that the array always kept in alphabetical order. My current thought is to subclass NSMutableArray and/or NSArray to create an alphabetized array class. I would need to override all of the methods that create or modify the array to call super and then sort itself.
Does this sound reasonable, or is there a better approach?
EDIT:
Since performance issues have been mentioned, I'll include the relevant information from my project. Speed is not an important concern. The whole process only takes a few seconds, and the tool is only used every so often. So simplicity and obvious correctness is more important.
Also, the use case for arrays is specific. When an array is returned, the caller always accesses every element in the array at least once.

A balanced binary tree is the standard and efficient way to keep items sorted. Almost any way to do random access with a plain array will be slow. A skip list is also efficient and you may be able to add the functionality to the array class.

Check out CHDataStructures. It's a framework that has a lot of self-sorting datastructures, like balanced binary trees and whatnot.

Related

Which is better, NSSet’s containsObject or fast enum?

I need to determine whether an object is included in a Core Data to-many relationship (which is an NSSet), and I’m trying to decide which of two solutions is better:
Solution 1)
if ([managedObject.items containsObject:itemOfInterest])
return …
Solution 2)
for (NSManagedObject *item in managedObject.items)
if ([item == itemOfInterest])
return …
Solution 1 is more concise, but the NSSet Class Ref says fast enumeration performs better than NSSet’s objectEnumerator. Does it also perform better than containsObject?
Neither. You should use an NSFetchRequest with a predicate. Your patterns can accidentally fault the entire relationship, which is very expensive and not needed just to check for whether it contains one object. There are ways to be careful and not fault the entire relationship, but it's fragile (small changes to your search lead to huge changes in performance) and so it's better to be in the habit of using NSFetchRequest rather than the collection for searching. I like to set my fetchLimit to 1 in these cases so once it finds it, it stops looking.
For convenience, you may want to create a -containsFoo: method on your managed object so you don't have to write the fetch logic all over the place.
Your two solutions above are subtly different. The first one tests whether there is an object in the collection that isEqual: to itemOfInterest. Your second solution tests whether there is an object in the collection at the same memory location as itemOfInterest. For objects with custom isEqual: logic, these can return different results. This means that solution 2 might be slightly faster for non-core data collections, but it's because you're actually testing a different thing, not because of object enumeration. (In reality, this is only true for small collections; see below.)
Why do you believe that solution 1 uses -objectEnumerator?
As #James Raybould points out, you generally should not try to rewrite the built-in methods for performance reasons. If an isEqual: version of Solution 2 were faster than Solution 1, wouldn't you think Apple would have implemented -containsObject: using the code in solution 2?
In reality, the underlying CFSet is implemented as a hash, so checking for containment is logarithmic rather than linear. Generally speaking, for large sets with reasonable hash functions, solution 1 will be faster. See the code for it in CFSet.c. Look for CFSetContainsValue(). CFSet's implementation isn't guaranteed to stay the same, of course, but it's useful for understanding how performance concerns are generally addressed within Cocoa.
I'd always go for option 1.
Its more concise, I can tell exactly what your trying to do with the code and chances are that the containsObject contains some pretty nifty optimisations.

Manipulating Objects in Methods instead of returning new Objects?

Let’s say I have a method that populates a list with some kind of objects. What are the advantages and disadvantages of following method designs?
void populate (ArrayList<String> list, other parameters ...)
ArrayList<String> populate(other parameters ...)
Which one I should prefer?
This looks like a general issue about method design but I couldn't find a satisfying answer on google, probably for not using the right keywords.
The second one seems more functional and thread safe to me. I'd prefer it in most cases. (Like every rule, there are exceptions.)
The owner of the populate method could return an immutable List (why ArrayList?).
It's also thread safe if there is no state modified in the populate method. Only passed in parameters are used, and these can also be immutable.
Other than what #duffymo mentioned, the second one is easier to understand, thus use: it is obvious what its input and output is.
Advantages to the in-out parameter:
You don't have to create as many objects. In languages like C or C++, where allocation and deallocation can be expensive, that can be a plus. In Java/C#, not so much -- GC makes allocation cheap and deallocation all but invisible, so creating objects isn't as big a deal. (You still shouldn't create them willy-nilly, but if you need one, the overhead isn't as bad as in some manual-allocation languages.)
You get to specify the type of the list. Potential plus if you need to pass that array to some other code you don't control later.
Disadvantages:
Readability issues.
In almost all languages that support function arguments, the first case is assumed to mean "do something with the entries in this list". Modifying args violates the Priciple of Least Astonishment. The second is assumed to mean "give me a list of stuff", which is what you're after.
Every time you say "ArrayList", or even "List", you take away a bit of flexibility. You add some overhead to your API. What if i don't want to create an ArrayList before calling your method? I shouldn't have to, if the method's whole purpose in life is to return me some entries. That's the API's job.
Encapsulation issues:
The method being passed a list to fill can't assume anything about that list (even that it's a list at all; it could be null).
The method passing the list can't guarantee anything about what the method does with it. If it's working correctly, sure, the API docs can say "this method won't destroy existing entries". But considering the chance of bugs, that may not be worth trusting. At least if the method returns its own list, the caller doesn't have to worry about what was in it before. And it doesn't have to worry about a bug from a thousand miles away corrupting data it should never have affected.
Thread safety issues.
The list could be locked by another thread, meaning if we try and lock on it now it could potentially lock up the app.
Or, if not locked, it could still be modified by another thread, in which case we're no less screwed. Unless you're going to write extra code to handle concurrent-modification exceptions everywhere.
Returning a new list means every call to the method can have its own list. No thread can mess with another thread's return value, unless the class is very badly designed.
Side point: Being able to specify the type of the list often leads to dependencies on the type of the list. Notice how you're passing ArrayLists around everywhere. You're painting yourself into corners by saying "This is an ArrayList" when you don't need to, but when you're passing it to a dozen methods, that's a dozen methods you'll have to change. (Not entirely related, but only slightly tangential. You could change the types to List rather than ArrayList and get rid of this. But the more you're passing that list around, the more places you'll need to change.)
Short version: Unless you have a damn good reason, use the first syntax only if you're using the existing contents of the list in your method. IE: if you're modifying it, or doing something with the existing values. If you intend to return a list of entries, then return a List of entries.
The second method is the preferred way for many reasons.
primarily because the function signature is more clear and shows what its intentions are.
It is actually recommended that you NEVER change the value of a parameter that is passed in to a function unless you explicitly mark it as an "out" parameter.
it will also be easier to use in expressions
and it will be easier to change in the future. including taking it to a more functional approach (for threading, etc.) if you would like to

Using ArrayList or HashMap

Hi I got a question on whether to use an ArrayList or HashMap.
I am trying to build a Paint program.
Each drawn object will be assigned a unique object ID.
If I want a fast retrieval speed when I click on an object, should I be using an arraylist or hashmap?
In general hashmap has O(1) while arraylist has O(n) retrieval speed.
However, I think for my case, since when I click on an object, I'll get the ID, hence the index of the array and I can do something like ArraylistObject.get(ithElement); , so in this case this will also be a O(1) retrieval process?
any inputs?
Thanks!
If objects have an ID that can be mapped 1-to-1 to an array than that will be O(1) access as well, and in practice will be slightly faster than a hashmap lookup (you don't have to compute the hash).
However, the issue will be what happens when you delete an object. You will be left with a hole in the list. When creating new objects you can then keep appending to the list and leave it to get slowly more fragmented or try and find a spare slot in which case you'll be doing an O(n) search for a spare space.
In short - a hashmap is probably more appropriate.
On the plus side, you might be able to squeeze out a little extra performance by doing ArrayLists just right.
But deleting objects is going to be a royal pain - as Paolo and Anurag said, you'll either have to put an empty placeholder (null ?) or to renumber some other other object to fill the gap.
This is likely to result in performance bugs and plain old bugs.
HashMaps, on the other hand. Simple to use, decent performance guaranteed (unless you allocate your ids really badly).
And retrieving objects by id might not turn out to be your application's bottleneck at all. As the saying goes, premature optimization is the root of all evil.
If you can guarantee that the IDs will be in a relatively small numerical range, then you should use a plain array (with the size preinitialized to the maximum ID), rather than an ArrayList. That ensures that you don't accidentally remove entries and shift everything else to fill the gap, with everything ending up at a wrong index. A plain array will also be a bit faster than an ArrayList.
If you can't make such a guarantee, use a HashMap. It's very unlikely that the speed difference would be noticeable, and it will be easier to maintain.

Naming a dictionary structure that stores keys in a predictable order?

Note: Although my particular context is Objective-C, my question actually transcends programming language choice. Also, I tagged it as "subjective" since someone is bound to complain otherwise, but I personally think it's almost entirely objective. Also, I'm aware of this related SO question, but since this was a bigger issue, I thought it better to make this a separate question. Please don't criticize the question without reading and understanding it fully. Thanks!
Most of us are familiar with the dictionary abstract data type that stores key-value associations, whether we call it a map, dictionary, associative array, hash, etc. depending on our language of choice. A simple definition of a dictionary can be summarized by three properties:
Values are accessed by key (as opposed to by index, like an array).
Each key is associated with a value.
Each key must be unique.
Any other properties are arguably conveniences or specializations for a particular purpose. For example, some languages (especially scripting languages such as PHP and Python) blur the line between dictionaries and arrays and do provide ordering for dictionaries. As useful as this can be, such additions are not a fundamental characteristics of a dictionary. In a pure sense, the actual implementation details of a dictionary are irrelevant.
For my question, the most important observation is that the order in which keys are enumerated is not defined — a dictionary may provide keys in whatever order it finds most convenient, and it is up to the client to organize them as desired.
I've created custom dictionaries that impose specific key orderings, including natural sorted order (based on object comparisons) and insertion order. It's obvious to name the former some variant on SortedDictionary (which I've actually already implemented), but the latter is more problematic. I've seen LinkedHashMap and LinkedMap (Java), OrderedDictionary (.NET), OrderedDictionary (Flash), OrderedDict (Python), and OrderedDictionary (Objective-C). Some of these are more mature, some are more proof-of-concept.
LinkedHashMap is named according to implementation in the tradition of Java collections — "linked" because it uses a doubly-linked list to track insertion order, and "hash" because it subclasses HashMap. Besides the fact that user shouldn't need to worry about that, the class name doesn't really even indicate what it does. Using ordered seems like the consensus among existing code, but web searches on this topic also revealed understandable confusion between "ordered" and "sorted", and I feel the same. The .NET implementation even has a comment about the apparent misnomer, and suggests that it should be "IndexedDictionary" instead, owing to the fact that you can retrieve and insert objects at a specific point in the ordering.
I'm designing a framework and APIs and I want to name the class as intelligently as possible. From my standpoint, indexed would probably work (depending on how people interpret it, and based on the advertised functionality of the dictionary), ordered is imprecise and has too much potential for confusion, and linked "is right out" (apologies to Monty Python). ;-)
As a user, what name would make the most sense to you? Is there a particular name that says exactly what the class does? (I'm not averse to using slightly longer names like InsertionOrderDictionary if appropriate.)
Edit: Another strong possibility (discussed in my answer below) is IndexedDictionary. I don't really like "insertion order" because it doesn't make sense if you allow the user to insert keys at a specific index, reorder the keys, etc.
I vote OrderedDictionary, for the following reasons:
"Indexed" is never used in Cocoa classes, except in one instance. It always appears as a noun (NSIndexSet, NSIndexPath, objectAtIndex:, etc). There is only one instance when "Index" appears as a verb, which is on NSPropertyDescription's "indexed" property: isIndexed and setIndexed. NSPropertyDescription is roughly analogous to a table column in a database, where "indexing" refers to optimizing to speed up search times. It would therefore make sense that with NSPropertyDescription being part of the Core Data framework, that "isIndexed" and "setIndexed" would be equivalent to an index in a SQL database. Therefore, to call it "IndexedDictionary" would seem redundant, since indices in databases are created to speed up lookup time, but a dictionary already has O(1) lookup time. However, to call it "IndexDictionary" would also be a misnomer, since an "index" in Cocoa refers to position, not order. The two are semantically different.
I understand your concern over "OrderedDictionary", but the precedent has already been set in Cocoa. When users want to maintain a specific sequence, they use "ordered": -[NSApplication orderedDocuments], -[NSWindow orderedIndex], -[NSApplication orderedWindows], etc. So, John Pirie has mostly the right idea.
However, you don't want to make insertion into the dictionary a burden on your users. They'll want to create a dictionary once and then have it maintain an appropriate order. They won't even want to request objects in a specific order. Order specification should be done during initialization.
Therefore, I recommend making OrderedDictonary a class cluster, with private subclasses of InsertionOrderDictionary and NaturalOrderDictionary and CustomOrderDictionary. Then, the user simply creates an OrderedDictionary like so:
OrderedDictionary * dict = [[OrderedDictionary alloc] initWithOrder:kInsertionOrder];
//or kNaturalOrder, etc
For a CustomOrderDictionary, you could have them give you a comparison selector, or even (if they're running 10.6) a block. I think this would provide the most flexibility for future expansion while still maintain an appropriate name.
I vote for InsertionOrderDictionary. You nailed it.
Strong vote for OrderedDictionary.
The word "ordered" means exactly what you are advertising: that in iterating through a list of items, there is a defined order to selection of those items. "Indexed" is an implementation word -- it talks more to how the ordering is achieved. Index, linked list, tree... the user doesn't care; that aspect of the data structure should be hidden. "Ordered" is the exact word for the additional feature you are offering, regardless of how you get it done.
Further, it seems like the choice of ordering could be at the user's option. Any reason why you couldn't create methods on your datatype that allow the user to switch from, say, alphabetical ordering to insertion-time ordering? In the default case, a user would choose a particular ordering and stick with it, in which case implementation would be no less efficient than if you created specialized subclasses for each ordering method. And in some less-used cases, the developer might actually wish to use any of a number of different orderings for the same data, depending on app context. (I can think of specific projects I've worked on where I would have loved to have such a data structure available.)
Call it OrderedDictionary, because that's precisely what it is. (Frankly, I have more of a problem with the use of the word "Dictionary", because that word heavily implies ordering, where popular implementations of such don't provide it, but that's my pet peeve. You really should just be able to say "Dictionary" and know that the ordering is alphabetical -- because that's what a dictionary IS -- but that argument is too late for existing implementations in the popular languages.) And allow the user to access in what order he chooses.
Since posting this question, I'm starting to lean towards something like IndexedDictionary or IndexableDictionary. While it is useful to be able to maintain arbitrary key ordering, limiting that to insertion ordering only seems like a needless restriction. Plus, my class already supports indexOfKey: and keyAtIndex:, which are (purposefully) analagous to NSArray's indexOfObject: and objectAtIndex:. I'm strongly considering adding insertObject:forKey:atIndex: which matches up with NSMutableArray's insertObject:atIndex:.
Everyone knows that inserting in the middle of an array is inefficient, but that doesn't mean we shouldn't be allowed to on the rare occasions that it's truly useful. (Besides, the implementation could secretly use a doubly-linked list or any other suitable structure for tracking the ordering if needed...)
The big question: is "indexed" or "indexable" as vague or potentially confusing as "ordered"? Would people think of database indexes, or book indexes, etc.? Would it be detrimental if they assumed it was implemented with an array, or might that simplify user understanding of the functionality?
Edit: This name makes even more sense given the fact that I'm considering adding methods that work with an NSIndexSet in the future. (NSArray has -objectsAtIndexes: as well as methods for adding/removing observers for objects at given indexes.)
What about KeyedArray?
As you said in your last paragraph, I think that InsertionOrder(ed)Dict(ionary) is pretty unambiguous; I don't see how it could be interpreted in any way other than that the keys would be returned in the order they were inserted.
By decoupling the indexed order from the insertion order, doesn't this simply boil down to keeping an array and Dictionary in a single object? I guess my vote for this type of object is IndexedKeyDictionary
In C#:
public class IndexedKeyDictionary<TKey, TValue> {
List<TKey> _keys;
Dictionary<TKey, TValue> _dictionary;
...
public GetValueAtIndex(int index) {
return _dictionary[_keys[index]];
}
public Insert(TKey key, TValue val, int index) {
_dictionary.Add(key, val);
// do some array massaging (splice, etc.) to fit the new key
_keys[index] = key;
}
public SwapKeyIndexes(TKey k1, TKey k2) {
// swap the indexes of k1 and k2, assuming they exist in _keys
}
}
What would be really cool is indexed values...so we have a way to sort the values and get the new key order. Like if the values were graph coordinates, and we could read the keys (bin names) as we move up/down along the coordinate plane. What would you call that data structure? An IndexedValueDictionary?
At first glance I'm with the first reply -- InsertionOrderDictionary, though it's a bit ambiguous as to what "InsertionOrder" means at first glance.
What you're describing sounds to me almost exactly like a C++ STL map. From what I understand, a map is a dictionary that has additional rules, including ordering. The STL simply calls it "map", which I think is fairly apt. The trick with map is you can't really give the inheritance a nod without making it redundant -- i.e. "MapDictionary". That's just too redundant. "Map" is a bit too basic and leaves a lot of room for misinterpretation.
Though "CHMap" might not be a bad choice after looking at your documentation link.
Maybe "CHMappedDictionary"? =)
Best of luck.
Edit: Thanks for the clarification, you learn something new every day. =)
Is the only difference that allKeys returns keys in a specific order? If so, I would simply add allKeysSorted and allKeysOrderdByInsertion methods to the standard NSDictionary API.
What is the goal of this insertion order dictionary? What benefits does it give the programmer vs. an array?

What sort function is used in NSArray?

This is really a three-part question, but I've answered the first question myself:
I'm working on the iPhone, with many objects (up to 200) on the screen. Each object needs to look if it's overlapping other objects, and act accordingly. My original naive implementation was for each object to run through the list of each other object to check their bounding boxes (using CGRectInsersectsRect).
So my question (and answer) is what's a better method? My new implementation is to sort the array every frame using an insertion sort (since the data will be mostly sorted already) on the y-position of each object, then check only the nearest object on either side of the one searching, to see if it's in range vertically, then check horizontally.
First question: Is insertion sort the method I want to use for an array of objects that tend to move around randomly but only to a small extent so they mostly stay in order based on the last frame? Also: what sort algorithm does the NSArray use when I call
- sortedArrayUsingSelector:
I would sort of assume that it uses a quick sort, since it's the most useful in the general case. Does anybody know if I'm wrong? Does anybody know if I can change the sort method or if I will have to write my own sorting function?
Second question: is there a function for retrieving items from a sorted array using a binary search, rather than the naive approach that I assume is used by
- indexOfObject:
or would I have to write my own?
NSArray uses many different data structures internally depending on how many objects are in the array. See a Peter Ammon blog entry for more information. But basically this means you can't expect a certain kind of sort to happen. Sometimes it is worth it to write your own array implementation using C arrays so you can control the sorts yourself.
There are definitely much faster ways to implement collision detection. Look into Bounding Volume Hierarchies like KD Trees or similar.
As far as I know indexOfObject: is the only way, but it's potentially not as dumb as you think. Everything is hashable for NSDictionary so they can use some of those smarts in NSArray.