When do I keep a map<Identifier, Object> vs a Collection<Object with identifier as field> - oop

There is one question that I often ask myself while designing a program, and I am never quite sure how to answer it.
Let's say I have an object with multiple fields, amongst which there is one serving as the identifier to that specific object. Let's also say that I need to keep track of a List of such objects somewhere else.
I now have three, and probably even more, options on how to go about it:
Have my object contain its own identifier, and all its other fields. I now use a simple array (or whatever simple list collection) of my objects where I need it. When I am looking for one specific object, I loop through my list and check for identifier equality.
Pros: 1. "Clarity" for each object instance. 2.?
Cons: Manipulating a collection of these objects gets annoying
Have my object contain all fields beside its identifier. I now use a Map with identifier as key, and object as value. When looking for one specific object, I just lookup the identifier in the map.
Pros: easy lookups and insertions,?
Cons: object instance itself doesnt know what it is,?
Combination of both: use a map with identifier as key and object having its own identifier as a field as value.
Pros: mentioned above.
Cons: looks redundant to me.
What situations would call for what? Let's use the standard hello-world example of networking for example, a chat server: how would I handle multiple "groups/channels" people are in?
What about other applications?

Your question is very wide and, actually, contains two questions.
First is “Which data structure is better — dictionary or list?”. The answer is: it depends on performance you want to achieve on insertion and search operations. Basically if you need to look through the collection, then list is ok, and if you need to have fast look-up, then dictionary is better. Dictionary has more memory overhead than list.
The second is “Do I need to have an Id field inside an entity or can I use built in hash code?”. The answer is: it depends on how you will use your object. If you want Id just to store it in a dictionary, then, most likely, you can go with hash code. There is nothing wrong with storing Id of an entity inside that entity. Either you use Id or hash code, you need to be sure that this entity will be uniquely identified by id or hash. That's the main concern with it.
You can override GetHashCode method and make it return Id of your entity. Sometimes you can find such implementation when hash code is required for collection and Id is required for database.
So, it really doesn't matter what you will choose in the end if both approaches are working for you right now.

A map<Identifier, Object> will offer you O(1) performance when retrieving an object based on its identifier. There certainly are situations where you want to achieve that.
However, in other cases it might be redundant to use this approach. It all depends on the situation at hand.

Two guidelines may answer this question:
A use case that calls for a lookup where there is an expectation of a 1:1 relationship between the key and value implies a Map structure.
OOP implies that a key which is so closely related to an object as to preform a lookup should be encapsulated within that object.
Regarding the question of redundancy, consider the key in a map is nothing but an index. Indexes are as common in data as in books.

Related

Use structures to group individual attributes or not?

I'm in doubt of how to get the best of ABAP structures and class attributes.
Let's say that I have the object Operation with 4 fields: operation id, type, description and date.
Now I can create a class with this 4 attributes, but then if I want to have a constructor, I need either 4 individual parameters or a structure than needs to be mapped to each attribute. The same happens if I want to get all this object data in one structure, for instance to return via RFC. Then a method get_operation_details( ) will need to map all of them one by one.
If I use a structure type ty_operation_details as a single class attribute, then when I add a field to the structure would also keep the constructor valid and the get_operation_details( ) method would also be always OK. However it seems wrong to have something like Operation->get_details( )-operationID, instead of operation->operation_ID if I had the attribute directly in the public section with READ-ONLY. I guess the first approach is more correct in the OO world, but we lose some of the ABAP benefits.
What do you recommend to use? Maybe one thing it could allow the first option and use structures at the same time would be a CORRESPONDING statement able to map class attributes to a flat structure, but I don't think this is possible.
Like most things, your design should follow your usage. If you primarily use a set of attributes together, consider grouping them in a structure. If you primarily use them individually, or in varying recombinations, keep them separate.
Some considerations:
Grouping makes calls shorter if you always create/update/delete a set of attributes together. You already identified this advantage.
Grouping reveals logical relations between fields, that are not clear when keeping the fields separate. For example, this could reveal that one part of your parameters is mandatory, while the rest forms several optional sets.
Grouping simplifies features that operate on state, such as the Memento or the Flyweight pattern, in that it allows to extract, store, and restore the object's state as a single structure.
Also, like many other things, there may be benefit in turning this either-or question into a I'll simply use both. For example, if your class has four individual properties, why not still offer a method that sets or gets them as a structure; of course, this will add some mapping, but the mapping would remain encapsulated within your own class, while consumer get an easy-to-consume interface.

domain modeling naming problem

There are some simple entities in an application (e.g containing only id and title) which rarely change and are being referenced by the more complex entities of the application. These are usually entities such as Country, City, Language etc.
How are these called? I've used the following names for those in the past but I'm not sure which is the best way to call them:
reference data
lookup values
dictionaries
thanks
You tagged with "ddd", so assuming that you are looking for a more Domain-Driven Design approach, drop the identifier on these objects and treat them like Value Objects.
The reason you might consider dropping the identifier is that it adds unneeded complexity to the problem domain. For example, you have a "Country" table in your implementation, I am assuming? You would still have it, but it wouldn't be a referential lookup. You would use it purely as "reference data". Load it upfront for scenarios where it needs to be referenced - maybe your UI is binding it to a dropdown list, for example...
When the entity is saved or updated, you store the value of the object, hence the "value" "object". If the user changes the entity for another value, no problem, just update the value. It is one less associative lookup that has to be made when doing CRUD operations, which makes the overall model less complex.
I would say Reference Data
See link text

DDD: Should everything fit into either Entity or Value Object?

I'm trying to follow DDD, or a least my limited understanding of it.
I'm having trouble fitting a few things into the DDD boxes though.
An example: I have a User Entity. This user Entity has a reference to a UserPreferencesInfo object - this is just a class which contains a bunch of properties regarding user preferences. These properties are fairly unrelated, other than the fact that they are all user preferences (unlike say an Address VO, where all the properties form a meaningful whole).
Question is - what is this UserPreferencesInfo object?
1) Obviously it's not an Entity (I'm just storing it as 'component' in fluent nhibernate speak (i.e. in the same DB table as the User entity).
2) VO? I understand that Value Object are supposed to be Immutable (so you cant cange them, just new them up). This makes complete sense when the object is an address for instance (the address properties form a meaningful 'whole'). But in the case of UserPreferencesInfo I don't think it makes sense. There could be 100 properties (Realistically) There could be maybe 20 properties on this object - why would I want to discard an recreate the object whenever I needed to change one property?
I feel like I need to break the rules here to get what I need, but I don't really like the idea of that (it's a slippery slope!). Am I missing something here?
Thanks
Answer 1 (the practical one)
I'm a huge proponent of DDD, but don't force it. You've already recognised that immutable VOs add more work than is required. DDD is designed to harness complexity, but in this case there is very little complexity to manage.
I would simply treat UserPreferencesInfo as an Entity, and reference it from the User aggregate. Whether you store it as a Component or in a separate table is your choice.
IMHO, the whole Entity vs VO debate can be rendered moot. It's highly unlikely that in 6 months time, another developer will look at your code and say "WTF! He's not using immutable VOs! What the heck was he thinking!!".
Answer 2 (the DDD purist)
Is UserPreferencesInfo actually part of the business domain? Others have mentioned disecting this object. But if you stick to pure DDD, you might need to determine which preferences belong to which Bounded Context.
This in turn could lead to adding Service Layers, and before you know it, you've over-engineered the solution for a very simple problem...
Here's my two cents. Short answer: UserPreferenceInfo is a value object because it describes the characteristics of an object. It's not an entity because there's no need to track an object instance over time.
Longer answer: an object with 100+ properties which are not related is not very DDD-ish. Try to group related properties together to form new VOs or you might discover new entities as well.
Another DDD smell is to have a lot of set properties in the first place. Try to find the essence of the action instead of only setting the value. Example:
// not ddd
employee.Salary = newSalary;
// more ddd
employee.GiveRaise(newSalary);
On the other hand you may very well have legitimate reasons to have a bunch of properties that are no more than getters and setters. But then there's probably simpler methods than DDD to solve the problem. There's nothing wrong with taking the best patterns and ideas from DDD but relax a little of all the "rules", especially for simpler domains.
I'd say a UserPreferenceInfo is actually a part of the User aggregate root. It should be the responsibility of the UserRepository to persist the User Aggregate Root.
Value objects only need to be newed up (in your object model) when their values are shared. A sample scenario for that would be if you check for a similar UserPreferenceInfo and associate the User with that instead of Inserting a new one everytime. Sharing Value Objects make sense if value object tables would get to large and raise speed/storage concerns. The price for sharing is paid on Insert.
It is reasonable to abstract this procedure in the DAL.
If you are not shraing value objects, there is nothing against updating.
As far as I understand, UserPreferenceInfo is a part of User entity. Ergo User entity is an Aggregate root which is retrieved or saved using UserRepository as a whole, along with UserPreferenceInfo and other objects.
Personally, I think that UserPreferenceInfo is entity type, since it has identity - it can be changed, saved and retrieved from repository and still be regarded as the same object (i.e. has identity). But it depends on your usage of it.
It doesn't matter IMHO how object is represented in the DAL - is it stored in a separate table or part of other table. One of the benefits of DDD is persistence ignorance and is ususally a good thing.
Of course, I may be wrong, I am new to DDD too.
Question is - what is this UserPreferencesInfo object?
I don't know how this case is supported by NHibernate, but some ORMs support special concepts for them. For example DataObjects.Net include Structures concept. It seems that you need something like this in NH.
First time ever posting on a blog. Hope I do it right.
Anyway, since you haven't showed us the UserPreferencesInfo object, I am not sure how it's constructed such that you can have a variable number of things in it.
If it were me, I'd make a single class called UserPreference, with id, userid, key, value, displaytype, and whatever other fields you may need in it. This is an entity. it has an id and is tied to a certain user.
Then in your user entity (the root I am assuming), have an ISet.
100 properties sounds like a lot.
Try breaking UserPreferenceInfo up into smaller (more cohesive) types, which likely/hopefully are manageable as VOs.

Reading a pointer from XML without being sure the relevant Obj-C instance exists

I have a "parent" Obj-C object containing (in a collection) a bunch of objects whose instance variables point to one another, possibly circularly (fear not, no retaining going on between these "siblings"). I write the parent object to XML, which of course involves (among other things) writing out its "children", in no particular order, and due to the possible circularity, I replace these references between the children with unique IDs that each child has.
The problem is reading this XML back in... as I create one "child", I come across an ID, but there's no guarantee the object it refers to has been created yet. Since the references are possibly circular, there isn't even an order in which to read them that solves this problem.
What do I do? My current solution is to replace (in the actual instance variables) the references with strings containing the unique IDs. This is nasty, though, because to use these instance variables, instead of something like [oneObject aSibling] I now have to do something like [theParent childWithID:[oneObject aSiblingID]]. I suppose I could create an aSibling method to simplify things, but it feels like there's a cleaner way than all this. Is there?
This sounds an awful lot like you are re-inventing NSCoding as it handles circular references, etc... Now, there might be a good reason to re-invent that wheel. Only you can answer that question.
In any case, sounds like you want a two pass unarchival process.
Pass 1: Grab all the objects out of the backing store and reconstitute. As each object comes out, shove it in a dictionary or map with the UID as the key. Whenever an object contains a UID, register the object as needing to be fixed up; add it to a set or array that you keep around during unarchival.
Pass 2: Walk the set or array of objects that need to be fixed up and fix 'em up, replacing the UIDs with objects from the map you built in pass #1.
I hit a bit of parse error on that last paragraph. Assuming your classes are sensibly declared, they ought to be able to repair themselves on the fly.
(All things considered, this is exactly the kind of data structure that is much easier to implement in a GC'd environment. If you are targeting Mac OS X, not the iPhone, turning on GC is going to make your life easier, most likely)
Java's serialization process does much the same thing. Every object it writes out, it puts in a 'previously seen objects' table. When it comes to writing out a subsequent reference, if it's seen the object before, then it writes out a code which indicates that it's a previously seen object from the list. When the reverse occurs, whenever it sees such a reference, it replaces it on the fly with the instance before.
That approach means that you don't have to use this map for all instances, but rather the substitution happens only for objects you've seen a second time. However, you still need to be able to uniquely reference the first instance you've got written, whether by some pointer to a part in the data structure or not is dependent on what you're writing.

Naming a dictionary structure that stores keys in a predictable order?

Note: Although my particular context is Objective-C, my question actually transcends programming language choice. Also, I tagged it as "subjective" since someone is bound to complain otherwise, but I personally think it's almost entirely objective. Also, I'm aware of this related SO question, but since this was a bigger issue, I thought it better to make this a separate question. Please don't criticize the question without reading and understanding it fully. Thanks!
Most of us are familiar with the dictionary abstract data type that stores key-value associations, whether we call it a map, dictionary, associative array, hash, etc. depending on our language of choice. A simple definition of a dictionary can be summarized by three properties:
Values are accessed by key (as opposed to by index, like an array).
Each key is associated with a value.
Each key must be unique.
Any other properties are arguably conveniences or specializations for a particular purpose. For example, some languages (especially scripting languages such as PHP and Python) blur the line between dictionaries and arrays and do provide ordering for dictionaries. As useful as this can be, such additions are not a fundamental characteristics of a dictionary. In a pure sense, the actual implementation details of a dictionary are irrelevant.
For my question, the most important observation is that the order in which keys are enumerated is not defined — a dictionary may provide keys in whatever order it finds most convenient, and it is up to the client to organize them as desired.
I've created custom dictionaries that impose specific key orderings, including natural sorted order (based on object comparisons) and insertion order. It's obvious to name the former some variant on SortedDictionary (which I've actually already implemented), but the latter is more problematic. I've seen LinkedHashMap and LinkedMap (Java), OrderedDictionary (.NET), OrderedDictionary (Flash), OrderedDict (Python), and OrderedDictionary (Objective-C). Some of these are more mature, some are more proof-of-concept.
LinkedHashMap is named according to implementation in the tradition of Java collections — "linked" because it uses a doubly-linked list to track insertion order, and "hash" because it subclasses HashMap. Besides the fact that user shouldn't need to worry about that, the class name doesn't really even indicate what it does. Using ordered seems like the consensus among existing code, but web searches on this topic also revealed understandable confusion between "ordered" and "sorted", and I feel the same. The .NET implementation even has a comment about the apparent misnomer, and suggests that it should be "IndexedDictionary" instead, owing to the fact that you can retrieve and insert objects at a specific point in the ordering.
I'm designing a framework and APIs and I want to name the class as intelligently as possible. From my standpoint, indexed would probably work (depending on how people interpret it, and based on the advertised functionality of the dictionary), ordered is imprecise and has too much potential for confusion, and linked "is right out" (apologies to Monty Python). ;-)
As a user, what name would make the most sense to you? Is there a particular name that says exactly what the class does? (I'm not averse to using slightly longer names like InsertionOrderDictionary if appropriate.)
Edit: Another strong possibility (discussed in my answer below) is IndexedDictionary. I don't really like "insertion order" because it doesn't make sense if you allow the user to insert keys at a specific index, reorder the keys, etc.
I vote OrderedDictionary, for the following reasons:
"Indexed" is never used in Cocoa classes, except in one instance. It always appears as a noun (NSIndexSet, NSIndexPath, objectAtIndex:, etc). There is only one instance when "Index" appears as a verb, which is on NSPropertyDescription's "indexed" property: isIndexed and setIndexed. NSPropertyDescription is roughly analogous to a table column in a database, where "indexing" refers to optimizing to speed up search times. It would therefore make sense that with NSPropertyDescription being part of the Core Data framework, that "isIndexed" and "setIndexed" would be equivalent to an index in a SQL database. Therefore, to call it "IndexedDictionary" would seem redundant, since indices in databases are created to speed up lookup time, but a dictionary already has O(1) lookup time. However, to call it "IndexDictionary" would also be a misnomer, since an "index" in Cocoa refers to position, not order. The two are semantically different.
I understand your concern over "OrderedDictionary", but the precedent has already been set in Cocoa. When users want to maintain a specific sequence, they use "ordered": -[NSApplication orderedDocuments], -[NSWindow orderedIndex], -[NSApplication orderedWindows], etc. So, John Pirie has mostly the right idea.
However, you don't want to make insertion into the dictionary a burden on your users. They'll want to create a dictionary once and then have it maintain an appropriate order. They won't even want to request objects in a specific order. Order specification should be done during initialization.
Therefore, I recommend making OrderedDictonary a class cluster, with private subclasses of InsertionOrderDictionary and NaturalOrderDictionary and CustomOrderDictionary. Then, the user simply creates an OrderedDictionary like so:
OrderedDictionary * dict = [[OrderedDictionary alloc] initWithOrder:kInsertionOrder];
//or kNaturalOrder, etc
For a CustomOrderDictionary, you could have them give you a comparison selector, or even (if they're running 10.6) a block. I think this would provide the most flexibility for future expansion while still maintain an appropriate name.
I vote for InsertionOrderDictionary. You nailed it.
Strong vote for OrderedDictionary.
The word "ordered" means exactly what you are advertising: that in iterating through a list of items, there is a defined order to selection of those items. "Indexed" is an implementation word -- it talks more to how the ordering is achieved. Index, linked list, tree... the user doesn't care; that aspect of the data structure should be hidden. "Ordered" is the exact word for the additional feature you are offering, regardless of how you get it done.
Further, it seems like the choice of ordering could be at the user's option. Any reason why you couldn't create methods on your datatype that allow the user to switch from, say, alphabetical ordering to insertion-time ordering? In the default case, a user would choose a particular ordering and stick with it, in which case implementation would be no less efficient than if you created specialized subclasses for each ordering method. And in some less-used cases, the developer might actually wish to use any of a number of different orderings for the same data, depending on app context. (I can think of specific projects I've worked on where I would have loved to have such a data structure available.)
Call it OrderedDictionary, because that's precisely what it is. (Frankly, I have more of a problem with the use of the word "Dictionary", because that word heavily implies ordering, where popular implementations of such don't provide it, but that's my pet peeve. You really should just be able to say "Dictionary" and know that the ordering is alphabetical -- because that's what a dictionary IS -- but that argument is too late for existing implementations in the popular languages.) And allow the user to access in what order he chooses.
Since posting this question, I'm starting to lean towards something like IndexedDictionary or IndexableDictionary. While it is useful to be able to maintain arbitrary key ordering, limiting that to insertion ordering only seems like a needless restriction. Plus, my class already supports indexOfKey: and keyAtIndex:, which are (purposefully) analagous to NSArray's indexOfObject: and objectAtIndex:. I'm strongly considering adding insertObject:forKey:atIndex: which matches up with NSMutableArray's insertObject:atIndex:.
Everyone knows that inserting in the middle of an array is inefficient, but that doesn't mean we shouldn't be allowed to on the rare occasions that it's truly useful. (Besides, the implementation could secretly use a doubly-linked list or any other suitable structure for tracking the ordering if needed...)
The big question: is "indexed" or "indexable" as vague or potentially confusing as "ordered"? Would people think of database indexes, or book indexes, etc.? Would it be detrimental if they assumed it was implemented with an array, or might that simplify user understanding of the functionality?
Edit: This name makes even more sense given the fact that I'm considering adding methods that work with an NSIndexSet in the future. (NSArray has -objectsAtIndexes: as well as methods for adding/removing observers for objects at given indexes.)
What about KeyedArray?
As you said in your last paragraph, I think that InsertionOrder(ed)Dict(ionary) is pretty unambiguous; I don't see how it could be interpreted in any way other than that the keys would be returned in the order they were inserted.
By decoupling the indexed order from the insertion order, doesn't this simply boil down to keeping an array and Dictionary in a single object? I guess my vote for this type of object is IndexedKeyDictionary
In C#:
public class IndexedKeyDictionary<TKey, TValue> {
List<TKey> _keys;
Dictionary<TKey, TValue> _dictionary;
...
public GetValueAtIndex(int index) {
return _dictionary[_keys[index]];
}
public Insert(TKey key, TValue val, int index) {
_dictionary.Add(key, val);
// do some array massaging (splice, etc.) to fit the new key
_keys[index] = key;
}
public SwapKeyIndexes(TKey k1, TKey k2) {
// swap the indexes of k1 and k2, assuming they exist in _keys
}
}
What would be really cool is indexed values...so we have a way to sort the values and get the new key order. Like if the values were graph coordinates, and we could read the keys (bin names) as we move up/down along the coordinate plane. What would you call that data structure? An IndexedValueDictionary?
At first glance I'm with the first reply -- InsertionOrderDictionary, though it's a bit ambiguous as to what "InsertionOrder" means at first glance.
What you're describing sounds to me almost exactly like a C++ STL map. From what I understand, a map is a dictionary that has additional rules, including ordering. The STL simply calls it "map", which I think is fairly apt. The trick with map is you can't really give the inheritance a nod without making it redundant -- i.e. "MapDictionary". That's just too redundant. "Map" is a bit too basic and leaves a lot of room for misinterpretation.
Though "CHMap" might not be a bad choice after looking at your documentation link.
Maybe "CHMappedDictionary"? =)
Best of luck.
Edit: Thanks for the clarification, you learn something new every day. =)
Is the only difference that allKeys returns keys in a specific order? If so, I would simply add allKeysSorted and allKeysOrderdByInsertion methods to the standard NSDictionary API.
What is the goal of this insertion order dictionary? What benefits does it give the programmer vs. an array?