Variable Naming Conventions For Maps/Lists in Dynamically-Typed languages - variables

I am getting into Groovy language, which has dynamic typing (as well as optional static typing). It also has native support for Lists, Maps, and Ranges, so I find myself using lists and maps a lot, especially lists of lists, lists of maps, maps of lists, etc.
In static languages (esp with Generics) you always have an idea of what your type is. I am fairly new to dynamic languages, and it's getting a bit difficult to keep track of what my variable is supposed to be, so I was wondering if other people use some kind of variable naming conventions to keep these straight.
For example, suppose I have a map of dates as key and integers as values. Or List of integers, or List of Maps that contain strings as keys and account objects as values.
It seems like creating a clear convention behind variable names will help me keep track of what data type structure I am dealing with without having to look it up.
Any tips?

This is a common beginner's lament. You could use a naming convention, but odds are you'll drop it before too long and focus on what the variable represents (its meaning in relation to the rest of the code) rather than worrying about how it's represented (it's "type").

The name of your variable should explain to someone reading the code what it is supposed to be, what it stands for. If you have a map of dates to integers, does it represent, for example (suggested variable names are in brackets):
a number of payments due on that date (paymentsDue)
a number of days between mapped date and some other point in time (daysPassed)
a number of messages posted on that date on Stack Overflow (numberOfPostedMessages)
In languages where variable type is not readily available, you might want to append a prefix of suffix, such as paymentsDueMap. I would, however, advise against encoding any additional type information inside a variable name, such as datesToInts - that routinely does more harm than good.
Finally, if you have a complex data structure, such as a list of maps between strings and accounts, the best thing would be to encapsulate that into a separate class, and name it according to its intent.

In static languages (esp with Generics) you always have an idea of what your type is.
After a while of programming in dynamic languages, you learn that using types this way is a crutch. Two pieces of advice:
Use good variable naming. For instance, if you have a map of dates to ints, you can name it something like BirthdateToTotalLookup.
Learn what visual clues to look for. It may seem obvious, but it took me a while to get in the habit of looking for clues like this:
sum += x['10-16-92']
From the piece of code above, I can tell that x is a map that has a date as a key and returns a number of some kind.

If the names can be kept short, then I tend to name maps something like "nounToNoun". So using your example of dates mapping to integers, I would name that "dateToCount" (if the integers are counters for something). That way its obvious that it is a map, and its obvious what is being mapped to what. The problem is that sometimes it is difficult to keep these sort of names short and readable. For example, "userToLoginHistory" starts getting a little unwieldy.
For lists I generally use a plural for the variable name. So "user" would be a single user, and "users" would be a list of users.
To be honest, I am not sure what a good name would be for a list of maps.

One of the benefits of dynamic languages is that even if you're using an object as a Map - it doesn't HAVE to be a map. All it has to do is support whatever messages are sent to it. In Groovy, if I know that a given method expects a map so it can look up things by a String key - I can give it the full map, a stripped-down map, an Expando with a property named the same thing as the key, or any other object that has a property named the same thing as the key. This is because someObject["keyname"] and someObject.keyname are the same thing. (Of course if the code calls someObject.get("keyname") I've got to wire that method up somehow.)
The point is, in a dynamic language like Groovy you think less about TYPES and more about SUPPORTED MESSAGES. If it's conceptually a map, fine - naming it birthdateToTotal would make sense (though I prefer to call it 'totals', because totals[birthdate] looks better than birthdateToTotal[birthdate]) - but if it doesn't have to be specified, don't specify it. You leave yourself flexibility later.

This is something you'll outgrow over time. Not to say I don't know a 20-year programmer still using Hungarian, but he's coding in a static-typed language, so it's almost understandable.
Consider this. That variable you're naming might be a HashMap, so what type do you add to the name? Map? This is a middle-of-the-road answer. Why not Collection? Since that way if you decide to change the WAY the data is stored, you don't have to change the variable name. Why not HashMap, if you really want to let the reader know what's going on.
As you may suspect, none of these are necessary. The point of a dynamic language (and even of polymorphism) is that you don't need to know the exact type of the variable being presented, only the data itself is important. While you might like a hint as to how to interface to that data, you'll soon find you already know in most cases, or can easily put that info in the variable without specifying types: addressesByZipCode, totalByBirthdate, etc.

Related

Whats the purpose of using method overloading?

I want to know the exact reason why the method overloading is done in OOP without using different method names to every variation as it was asked at an interview. Please help me to understand this concept.
Without using any fancy terms, let's say you're building an API, and there's a method called crush which let's say crushes or destroys whatever parameter is given to it. If you follow your way, you'll have to use atleast three different methods, each for an int, float and char (I'm using the most general types as an example). Now the more types there are, the more methods you'll have to create with that so many different names. Therefore the developer using your API, is going to have to remember so many different names for something as simple as a method that destroys its parameter. As much as it's difficult, it's also much less readable because again, remembering too many names for a singular function (function as in job).
Method overloading isn't used for everything, it's intended to be used for methods or functions that might take different types of data at different points, but internally follow a constant procedure or does a singular thing no matter what type of data it's passed.
You won't be writing one version of print that takes an int as a parameter, and returns the modulus of that, and another version of print that takes a string as an argument, and prints that to stdout. You can, but that's not how it's meant to be used.
It is mainly so as to be able to follow a relatively well-known software design principle called "Syntactic Consistency" from the book "Principles of Programming Languages" by Bruce J. MacLennan, which says the following:
Similar things should look similar, whereas
different things should look different.
When you see two functions with different names, you might be tempted to believe that they do different things. If they do in fact do different things, it is okay, but what if they do the same thing? In that case, it would be nice if the functions have the exact same name, so as to indicate that they do, in fact, do the same thing.
Of course you can misuse overloading. If you go around writing functions that do different things, taking advantage of overloading to give them the same name, then you are shooting yourself in the foot.

Pascal-style arrays, built-in len() function vs .length? What are the pros/cons

What are the differences between these two? Why would you pick one over the other, is it just personal preference, or is there an actual reason behind why you would use either a built-in function or whatever .length is.
I think using *.length over *.length() or len(*) is kind of a historical artifact, which was probably done to make getting the length of an array as fast as possible. Arrays after all, are a very basic data structure in many languages, and getting the length of one is an extremely common operation. And accessing a property is much faster than calling a method.
Nowadays a compiler could probably optimize that kind of thing out, but back then I think there was a pull towards ease-of-implementation which guided many languages to simply have *.length as a property.
However, in any OOP language at least, it's more consistent to have *.length(), because while arrays have immutable lengths, and can afford to have *.length exposed as a constant value, other data structures which you can add or remove values would not be able to do this.

Some questions on enum and streamwriters vb

Some questions i had regarding vb.net functions:
How do you differentiate between an enumeration and a record? As far as i'm aware an enumerated type is simply one constant with multiple identifiers and that a structure contains different data types?
When declared, does a variable of a structure type need to use all its fields or can some be omitted?
Am i correct in saying sets don't exist in vb.net and the closest thing is an arraylist?
Is there much of a difference in streamreaders/writers and binaryreaders/writers when referring to reading and writing to text/binary files in terms of being called and used? (Aka is the only difference the data being read? [2-3 line examples would help]
I'm a bit confused about transformation variables; I know that they gain their value from the fixed calculation of another variable, but i can't seem to gain an understanding of it.
How do you differentiate between an enumeration and a record?
In what context? Basically an Enum is a list of constants and can be used pretty much anywhere. Records mainly have to do with Databases and Datasets, which means a record could be made up of any data types
As far as i'm aware an enumerated type is simply one constant with multiple
identifiers and that a structure contains different data types?
A structure is basically a way of organizing a certain set of variables.
When declared, does a variable of a structure type need to use all its
fields or can some be omitted?
Every field is part of the structure when it is declared.
Am i correct in saying sets don't exist in vb.net and the closest
thing is an arraylist?
Not sure what you mean by set. .net contains classes for several different types of collections, of which arraylist is just one.
Is there much of a difference in streamreaders/writers and
binaryreaders/writers when referring to reading and writing to
text/binary files in terms of being called and used?
Basically the main difference is that because binary doesn't normally include line breaks, that method uses buffers, where as text readers use strings and recognize line breaks.
I'm a bit confused about transformation variables; I know that they
gain their value from the fixed calculation of another variable, but i
can't seem to gain an understanding of it.
Not really sure what you're getting at here. I suspect that that has to do with higher math functions and not really .net specific.

How to model a relationship that NHibernate (or Hibernate) doesn’t easily support

I have a situation in which the ideal relationship, I believe, would involve Value Object Inheritance. This is unfortunately not supported in NHibernate so any solution I come up with will be less than perfect.
Let’s say that:
“Item” entities have a “Location” that can be in one of multiple different formats.
These formats are completely different with no overlapping fields.
We will deal with each Location in the format that is provided in the data with no attempt to convert from one format to another.
Each Item has exactly one Location.
“SpecialItem” is a subtype of Item, however, that is unique in that it has exactly two Locations.
“Group” entities aggregate Items.
“LocationGroup” is as subtype of Group.
LocationGroup also has a single Location that can be in any of the formats as described above.
Although I’m interested in Items by Group, I’m also interested in being able to find all items with the same Location, regardless of which group they are in.
I apologize for the number of stipulations listed above, but I’m afraid that simplifying it any further wouldn’t really reflect the difficulties of the situation. Here is how the above could be diagrammed:
Mapping Dilemma Diagram http://www.freeimagehosting.net/uploads/592ad48b1a.jpg
Analyzing the above, I make the following observations:
I treat Locations polymorphically, referring to the supertype rather than the subtype.
Logically, Locations should be “Value Objects” rather than entities since it is meaningless to differentiate between two Location objects that have all the same values. Thus equality between Locations should be based on field comparisons, not identifiers. Also, value objects should be immutable and shared references should not be allowed.
Using NHibernate (or Hibernate) one would typically map value objects using the “component” keyword which would cause the fields of the class to be mapped directly into the database table that represents the containing class. Put another way, there would not be a separate “Locations” table in the database (and Locations would therefore have no identifiers).
NHibernate (or Hibernate) do not currently support inheritance for value objects.
My choices as I see them are:
Ignore the fact that Locations should be value objects and map them as entities. This would take care of the inheritance mapping issues since NHibernate supports entity inheritance. The downside is that I then have to deal with aliasing issues. (Meaning that if multiple objects share a reference to the same Location, then changing values for one object’s Location would cause the location to change for other objects that share the reference to the same Location record.) I want to avoid this if possible. Another downside is that entities are typically compared by their IDs. This would mean that two Location objects would be considered not equal even if the values of all their fields are the same. This would be invalid and unacceptable from the business perspective.
Flatten Locations into a single class so that there are no longer inheritance relationships for Locations. This would allow Locations to be treated as value objects which could easily be handled by using “component” mapping in NHibernate. The downside in this case would be that the domain model becomes weaker, more fragile and less maintainable.
Do some “creative” mapping in the hbm files in order to force Location fields to be mapped into the containing entities’ tables without using the “component” keyword. This approach is described by Colin Jack here. My situation is more complicated than the one he describes due to the fact that SpecialItem has a second Location and the fact that a different entity, LocatedGroup, also has Locations. I could probably get it to work, but the mappings would be non-intuitive and therefore hard to understand and maintain by other developers in the future. Also, I suspect that these tricky mappings would likely not be possible using Fluent NHibernate so I would use the advantages of using that tool, at least in that situation.
Surely others out there have run into similar situations. I’m hoping someone who has “been there, done that” can share some wisdom. :-)
So here’s the question… Which approach should be preferred in this situation? Why? Is there a better option that I haven't considered?
Just a few observations / questions...
if the different location formats have no overlapping fields, what is the commonality in them which would make them candidates for a subclass hierarchy? Can you actually define a common interface for the base class Location?
is a TypeALocation comparable with a TypeBLocation?
are the two locations in SpecialItem of the same type, or can they be mixed?
can an item change its location to a different type runtime?
As you state above, value objects can't be polymorphic. Based on what you describe, I don't see how can you treat locations polymorphically.
Update If you can't define a common base interface for your location types, it is very awkward to try and treat them polymorphically, regardless of whether there is ORM or not. Taking your example below, even for accessing any information about the actual location I live, you needed to downcast it to either a street address or a lat/long coordinate. Polimorphism is meant exactly to avoid the need for such downcasts (and switches on type fields, etc.)!
Looking at the options you describe above, with all this taken into account:
Just as you, I don't like it either (hardly suprising).
Can be a viable option if there aren't many location types and you can be reasonably sure that you have implemented all the types ever needed. In this case the domain class would practically be the analog of a C union, with a type field. It is a bit awkward to use, but the polymorphic attempt would be even more awkward IMHO.
It is definitely an interesting idea which I will probably experiment with in a pet project sometime, but I am not quite sure I would like such tricks in my production code. I guess it could also be done with a custom mapping type which would map your component to a specific subclass. But then again, we're back trying to fit these incompatible types into a type hierarchy... the only good reason to try this path is if there are many location types and/or new types may appear in the future.

Naming a dictionary structure that stores keys in a predictable order?

Note: Although my particular context is Objective-C, my question actually transcends programming language choice. Also, I tagged it as "subjective" since someone is bound to complain otherwise, but I personally think it's almost entirely objective. Also, I'm aware of this related SO question, but since this was a bigger issue, I thought it better to make this a separate question. Please don't criticize the question without reading and understanding it fully. Thanks!
Most of us are familiar with the dictionary abstract data type that stores key-value associations, whether we call it a map, dictionary, associative array, hash, etc. depending on our language of choice. A simple definition of a dictionary can be summarized by three properties:
Values are accessed by key (as opposed to by index, like an array).
Each key is associated with a value.
Each key must be unique.
Any other properties are arguably conveniences or specializations for a particular purpose. For example, some languages (especially scripting languages such as PHP and Python) blur the line between dictionaries and arrays and do provide ordering for dictionaries. As useful as this can be, such additions are not a fundamental characteristics of a dictionary. In a pure sense, the actual implementation details of a dictionary are irrelevant.
For my question, the most important observation is that the order in which keys are enumerated is not defined — a dictionary may provide keys in whatever order it finds most convenient, and it is up to the client to organize them as desired.
I've created custom dictionaries that impose specific key orderings, including natural sorted order (based on object comparisons) and insertion order. It's obvious to name the former some variant on SortedDictionary (which I've actually already implemented), but the latter is more problematic. I've seen LinkedHashMap and LinkedMap (Java), OrderedDictionary (.NET), OrderedDictionary (Flash), OrderedDict (Python), and OrderedDictionary (Objective-C). Some of these are more mature, some are more proof-of-concept.
LinkedHashMap is named according to implementation in the tradition of Java collections — "linked" because it uses a doubly-linked list to track insertion order, and "hash" because it subclasses HashMap. Besides the fact that user shouldn't need to worry about that, the class name doesn't really even indicate what it does. Using ordered seems like the consensus among existing code, but web searches on this topic also revealed understandable confusion between "ordered" and "sorted", and I feel the same. The .NET implementation even has a comment about the apparent misnomer, and suggests that it should be "IndexedDictionary" instead, owing to the fact that you can retrieve and insert objects at a specific point in the ordering.
I'm designing a framework and APIs and I want to name the class as intelligently as possible. From my standpoint, indexed would probably work (depending on how people interpret it, and based on the advertised functionality of the dictionary), ordered is imprecise and has too much potential for confusion, and linked "is right out" (apologies to Monty Python). ;-)
As a user, what name would make the most sense to you? Is there a particular name that says exactly what the class does? (I'm not averse to using slightly longer names like InsertionOrderDictionary if appropriate.)
Edit: Another strong possibility (discussed in my answer below) is IndexedDictionary. I don't really like "insertion order" because it doesn't make sense if you allow the user to insert keys at a specific index, reorder the keys, etc.
I vote OrderedDictionary, for the following reasons:
"Indexed" is never used in Cocoa classes, except in one instance. It always appears as a noun (NSIndexSet, NSIndexPath, objectAtIndex:, etc). There is only one instance when "Index" appears as a verb, which is on NSPropertyDescription's "indexed" property: isIndexed and setIndexed. NSPropertyDescription is roughly analogous to a table column in a database, where "indexing" refers to optimizing to speed up search times. It would therefore make sense that with NSPropertyDescription being part of the Core Data framework, that "isIndexed" and "setIndexed" would be equivalent to an index in a SQL database. Therefore, to call it "IndexedDictionary" would seem redundant, since indices in databases are created to speed up lookup time, but a dictionary already has O(1) lookup time. However, to call it "IndexDictionary" would also be a misnomer, since an "index" in Cocoa refers to position, not order. The two are semantically different.
I understand your concern over "OrderedDictionary", but the precedent has already been set in Cocoa. When users want to maintain a specific sequence, they use "ordered": -[NSApplication orderedDocuments], -[NSWindow orderedIndex], -[NSApplication orderedWindows], etc. So, John Pirie has mostly the right idea.
However, you don't want to make insertion into the dictionary a burden on your users. They'll want to create a dictionary once and then have it maintain an appropriate order. They won't even want to request objects in a specific order. Order specification should be done during initialization.
Therefore, I recommend making OrderedDictonary a class cluster, with private subclasses of InsertionOrderDictionary and NaturalOrderDictionary and CustomOrderDictionary. Then, the user simply creates an OrderedDictionary like so:
OrderedDictionary * dict = [[OrderedDictionary alloc] initWithOrder:kInsertionOrder];
//or kNaturalOrder, etc
For a CustomOrderDictionary, you could have them give you a comparison selector, or even (if they're running 10.6) a block. I think this would provide the most flexibility for future expansion while still maintain an appropriate name.
I vote for InsertionOrderDictionary. You nailed it.
Strong vote for OrderedDictionary.
The word "ordered" means exactly what you are advertising: that in iterating through a list of items, there is a defined order to selection of those items. "Indexed" is an implementation word -- it talks more to how the ordering is achieved. Index, linked list, tree... the user doesn't care; that aspect of the data structure should be hidden. "Ordered" is the exact word for the additional feature you are offering, regardless of how you get it done.
Further, it seems like the choice of ordering could be at the user's option. Any reason why you couldn't create methods on your datatype that allow the user to switch from, say, alphabetical ordering to insertion-time ordering? In the default case, a user would choose a particular ordering and stick with it, in which case implementation would be no less efficient than if you created specialized subclasses for each ordering method. And in some less-used cases, the developer might actually wish to use any of a number of different orderings for the same data, depending on app context. (I can think of specific projects I've worked on where I would have loved to have such a data structure available.)
Call it OrderedDictionary, because that's precisely what it is. (Frankly, I have more of a problem with the use of the word "Dictionary", because that word heavily implies ordering, where popular implementations of such don't provide it, but that's my pet peeve. You really should just be able to say "Dictionary" and know that the ordering is alphabetical -- because that's what a dictionary IS -- but that argument is too late for existing implementations in the popular languages.) And allow the user to access in what order he chooses.
Since posting this question, I'm starting to lean towards something like IndexedDictionary or IndexableDictionary. While it is useful to be able to maintain arbitrary key ordering, limiting that to insertion ordering only seems like a needless restriction. Plus, my class already supports indexOfKey: and keyAtIndex:, which are (purposefully) analagous to NSArray's indexOfObject: and objectAtIndex:. I'm strongly considering adding insertObject:forKey:atIndex: which matches up with NSMutableArray's insertObject:atIndex:.
Everyone knows that inserting in the middle of an array is inefficient, but that doesn't mean we shouldn't be allowed to on the rare occasions that it's truly useful. (Besides, the implementation could secretly use a doubly-linked list or any other suitable structure for tracking the ordering if needed...)
The big question: is "indexed" or "indexable" as vague or potentially confusing as "ordered"? Would people think of database indexes, or book indexes, etc.? Would it be detrimental if they assumed it was implemented with an array, or might that simplify user understanding of the functionality?
Edit: This name makes even more sense given the fact that I'm considering adding methods that work with an NSIndexSet in the future. (NSArray has -objectsAtIndexes: as well as methods for adding/removing observers for objects at given indexes.)
What about KeyedArray?
As you said in your last paragraph, I think that InsertionOrder(ed)Dict(ionary) is pretty unambiguous; I don't see how it could be interpreted in any way other than that the keys would be returned in the order they were inserted.
By decoupling the indexed order from the insertion order, doesn't this simply boil down to keeping an array and Dictionary in a single object? I guess my vote for this type of object is IndexedKeyDictionary
In C#:
public class IndexedKeyDictionary<TKey, TValue> {
List<TKey> _keys;
Dictionary<TKey, TValue> _dictionary;
...
public GetValueAtIndex(int index) {
return _dictionary[_keys[index]];
}
public Insert(TKey key, TValue val, int index) {
_dictionary.Add(key, val);
// do some array massaging (splice, etc.) to fit the new key
_keys[index] = key;
}
public SwapKeyIndexes(TKey k1, TKey k2) {
// swap the indexes of k1 and k2, assuming they exist in _keys
}
}
What would be really cool is indexed values...so we have a way to sort the values and get the new key order. Like if the values were graph coordinates, and we could read the keys (bin names) as we move up/down along the coordinate plane. What would you call that data structure? An IndexedValueDictionary?
At first glance I'm with the first reply -- InsertionOrderDictionary, though it's a bit ambiguous as to what "InsertionOrder" means at first glance.
What you're describing sounds to me almost exactly like a C++ STL map. From what I understand, a map is a dictionary that has additional rules, including ordering. The STL simply calls it "map", which I think is fairly apt. The trick with map is you can't really give the inheritance a nod without making it redundant -- i.e. "MapDictionary". That's just too redundant. "Map" is a bit too basic and leaves a lot of room for misinterpretation.
Though "CHMap" might not be a bad choice after looking at your documentation link.
Maybe "CHMappedDictionary"? =)
Best of luck.
Edit: Thanks for the clarification, you learn something new every day. =)
Is the only difference that allKeys returns keys in a specific order? If so, I would simply add allKeysSorted and allKeysOrderdByInsertion methods to the standard NSDictionary API.
What is the goal of this insertion order dictionary? What benefits does it give the programmer vs. an array?