I'm a relatively new programmer, so hopefully I'm using the correct terminology; if something isn't clear, please do let me know. In a very large legacy application, we have a users database:
users
id type name
1 3 Jeff
2 2 Holly
3 4 Tyler
Throughout most of the application, a user's info is gotten as an associative array. We happen to use php, but I would think this problem applies to any language.
The Problem
Throughout the app we have hundreds upon hundreds of getting a row from users as an associative array, $user['type'] and using this to do some business logic; now we want to make a change to how type works, and it is extremely difficult to even FIND all of the places this field is used. The word "type" is used in comments, other variables, etc.
The Question
Is this one of the reasons for OOP? Is OOP the primary solution to us not causing more of this problem/fixing it, or is there some other way? It seems that if instead of an associative array, we had a user object and were doing a $type = $user->getType(); it would be much easier for us to use our IDE to find all usages of ->getType()
Abstraction solves this type of problem. For example, imagine that a user is defined as having a username and an email address, both of data type 'string'. If these two properties were accidentally switched in their assignments, the compiler would not reject this, and the resulting bug would only surface when the application is running.
However, if username and email address are abstracted as 'Username' and 'EmailAddress', respectively, one would have to assign values of the correct types in order for the compiler to accept the code.
Abstraction can be done in programming paradigms other than OO, but typically OO lends itself better to it because, well, abstracting concepts is basically thinking in terms of objects.
As for your particular case, it is also solved with abstraction. Instead of working with an 'array of values' representing a user, you would instead be working with an abstraction of a user, implemented as a class 'User' with an 'id' property, a 'type' property, and a 'name' property. When you pass that abstraction (or object, when implemented in an OOP language), there is no ambiguity about what it represents and what its physical definition is (that is, 'id' of type int, 'type' of type int, 'name' of type string).
Some people say that strong typing solves this problem, but that is not necessarily true if the abstraction is not enforced. Think back to the example of inverting username and email address: even if the compiler enforces the check that username is a string and email address is a string, it still cannot detect that you have switched the two when assigning their values.
Some also say that only OO languages solve this problem. Again, this is not true: as long as the language supports abstraction and the programmer puts in the work, the problem can be avoided. For example, struct in C can do this--which is also an argument for the claim that one can do OOP with a non-OOP language.
Having said all this, in practice, you would weigh the benefit of how much abstraction you need versus the effort required to do so. For example, one would not necessarily abstract username and email address for the example above because modern IDEs provide enough hints to help reduce such potential mistakes.
Related
Some object oriented languages (e.g. Smalltalk) do not allow accessing of
fields of any other than the current receiver object. For example:
expressions like this.good, or this.like:=false would be legal, but expressions like x.like or this.like.good are illegal.
What I don't understand is: why??
What is the rationale for such a restriction?
This is one the the core ideas of OOP called encapsulation. No one except from the object itself knows about it's internal state.
This provides better isolation, as internal state can be changed over the time an if you are accessing it directly - you're screwed. Also if someone can mess with your object's state directly you never know if something will change during a runtime when you don't expect it.
In general it's not hard to define accessors, an in the end you end up with: x like, x like: false in smalltalk and x.like(), x.setLike(false) in C-like languages. Ruby and Scala allow you to define methods with spaces and invoke them without parenthesis so they look just like field access: x.like, x.like = false. You don't have a big overhead if you are forced to write accessors, but if you allow programmers to do everything they want with objects state then you get a chaos in your code, and this is actually a big problem.
To understand all bad things that can happen if you don't use it some time is required. And when you begin to develop, you don't understand what can happen if you keep fields public. That's why C++ is a bad language to start with, as for beginners it's easier to deal with direct field access.
Also if you think about accessing fields directly, then whole idea of OOP is broken. Because you can use any data you have as in procedural languages, and classes then just play a role of grouping functions of defining data structures.
You can read more about encapsulation on wikipedia.
Also there is a very interesting post about What is Object Oriented Programming: A Critical Approach
Adele Goldberg put it rather graphically with the aphorism "ask, don't touch".
Reading the wikipedia entry about God Objects, it says that a class is a god object when it knows too much or does too much.
I see the logic behind this, but if it's true, then how do you couple every different class? Don't you always use a master class for connecting window management, DB connections, etc?
The main function/method may know about the existence of the windows, databases, and other objects. It may perform over-arching tasks like introduce the model to the controller.
But that doesn't mean it manages all the little details. It probably doesn't know anything about how the database or windows are implemented.
If it did, it could be accused of being a God object.
A god object is an object that contains references, directly or indirectly, to most if not all objects within an application. As the question observes, it is almost impossible to avoid having a god object in an application. Some object must hold references to the various subsystems: UI, database, communications, business logic, etc. Note that the god object need not be application-defined. Many frameworks have built-in god objects with names like "application context", "application environment", "session", "activator", etc.
The issue is not whether a god object exists, but rather how it is used. I will illustrate with an extreme example...
Let's say that in my application I want to standardize how many decimal places of precision to show when displaying numbers. However, I want the precision to be configurable. I create a class whose responsibility is to convert numbers to strings:
class NumberFormatter {
...
String format(double value) {
int decimalPlaces = getConfiguredPrecision();
return formatDouble(value, decimalPlaces);
}
int getConfiguredPrecision() {
return /* what ??? */;
}
}
The question is, how does getConfiguredPrecision figure out what to return? One way would be to give NumberFormatter a reference to the global application context which it stores in a member field called _appContext. Then we could write:
return _appContext.getPreferenceManager().getNumericPreferences().getDecimalPlaces();
By doing this, we have just made NumberFormatter into a god object as well! Why? Because now we can (indirectly) reference virtually any object in the application through its _appContext field. Is this bad? Yes, it is.
I'm going to write a unit test for NumberFormatter. Let's set up the parameters... it needs an application context?! WTF, that has 57 methods I need to mock. Oh, it only needs the pref manager... WTF, I have to mock 14 methods! Numeric prefs!?! Screw it, the class is simple enough, I don't need to test it...
Let's say that the application context had another method, getDatabaseManager(). Last week we were using SQL, so the method returned an SQL database object. But this week, we've decided to change to a NoSQL database and the method now returns a new type. Is NumberFormatter affected by the change? Hmmm, I can't remember... yeah, it might be, I see it takes an application context in the constructor... let me open the source and take a look... nope, we're in luck: it only accesses getPreferenceManager()... now let's check the other 93 classes that take an application context as a parameter...
This same scenario occurs if a change is made to the preferences manager, or the numeric preferences object. The moral of the story is that an object should only hold references to the things that it needs to perform its job, and only those things. In the case of NumberFormatter, all it needs to know is a single integer -- the number of decimal places. It could be created directly by the application god object who knows the magic number (or the pref manager or better still, numeric prefs), without turning the formatter into a god object itself. Furthermore, any components that need to format numbers could be given a formatter instead of the god object. Wins all around.
So, to summarize, the problem is not the existence of a god object but rather the act of conferring god-like status to other objects willy-nilly.
Incidentally, the design principle that tackles this problem head-on has become known as the Law of Demeter. Or "when paying at a restaurant, give the server your money not your wallet."
In my experience this most often occurs when you're dealing with code that is the product of "Develop as you go" project management (or lack there of). When a project is not thought through and planned and object responsibilities are loose and not delegated properly. In theses scenarios you find a "god-object" being the catchall for code that doesn't have any obvious organization or delegation.
It is not the interconnectedness or coupling of the different classes that is the problem with god-objects, it's the fact that a god-object many times can accomplish most if not all responsibilities of it's derived children, and are fairly unpredictable (by anyone other than the developer) as to what their defined responsibilities are.
Simply knowing about "multiple" classes doesn't make one a God; knowing about multiple classes in order to solve a problem that should be split into several sub-problems does make one a God.
I think the focus should be on whether a problem should be split into several sub-problems, not on the number of classes a given object knows about (as you pointed out, sometimes knowing about several classes is necessary).
Gods are over-hyped.
I'm trying to follow DDD, or a least my limited understanding of it.
I'm having trouble fitting a few things into the DDD boxes though.
An example: I have a User Entity. This user Entity has a reference to a UserPreferencesInfo object - this is just a class which contains a bunch of properties regarding user preferences. These properties are fairly unrelated, other than the fact that they are all user preferences (unlike say an Address VO, where all the properties form a meaningful whole).
Question is - what is this UserPreferencesInfo object?
1) Obviously it's not an Entity (I'm just storing it as 'component' in fluent nhibernate speak (i.e. in the same DB table as the User entity).
2) VO? I understand that Value Object are supposed to be Immutable (so you cant cange them, just new them up). This makes complete sense when the object is an address for instance (the address properties form a meaningful 'whole'). But in the case of UserPreferencesInfo I don't think it makes sense. There could be 100 properties (Realistically) There could be maybe 20 properties on this object - why would I want to discard an recreate the object whenever I needed to change one property?
I feel like I need to break the rules here to get what I need, but I don't really like the idea of that (it's a slippery slope!). Am I missing something here?
Thanks
Answer 1 (the practical one)
I'm a huge proponent of DDD, but don't force it. You've already recognised that immutable VOs add more work than is required. DDD is designed to harness complexity, but in this case there is very little complexity to manage.
I would simply treat UserPreferencesInfo as an Entity, and reference it from the User aggregate. Whether you store it as a Component or in a separate table is your choice.
IMHO, the whole Entity vs VO debate can be rendered moot. It's highly unlikely that in 6 months time, another developer will look at your code and say "WTF! He's not using immutable VOs! What the heck was he thinking!!".
Answer 2 (the DDD purist)
Is UserPreferencesInfo actually part of the business domain? Others have mentioned disecting this object. But if you stick to pure DDD, you might need to determine which preferences belong to which Bounded Context.
This in turn could lead to adding Service Layers, and before you know it, you've over-engineered the solution for a very simple problem...
Here's my two cents. Short answer: UserPreferenceInfo is a value object because it describes the characteristics of an object. It's not an entity because there's no need to track an object instance over time.
Longer answer: an object with 100+ properties which are not related is not very DDD-ish. Try to group related properties together to form new VOs or you might discover new entities as well.
Another DDD smell is to have a lot of set properties in the first place. Try to find the essence of the action instead of only setting the value. Example:
// not ddd
employee.Salary = newSalary;
// more ddd
employee.GiveRaise(newSalary);
On the other hand you may very well have legitimate reasons to have a bunch of properties that are no more than getters and setters. But then there's probably simpler methods than DDD to solve the problem. There's nothing wrong with taking the best patterns and ideas from DDD but relax a little of all the "rules", especially for simpler domains.
I'd say a UserPreferenceInfo is actually a part of the User aggregate root. It should be the responsibility of the UserRepository to persist the User Aggregate Root.
Value objects only need to be newed up (in your object model) when their values are shared. A sample scenario for that would be if you check for a similar UserPreferenceInfo and associate the User with that instead of Inserting a new one everytime. Sharing Value Objects make sense if value object tables would get to large and raise speed/storage concerns. The price for sharing is paid on Insert.
It is reasonable to abstract this procedure in the DAL.
If you are not shraing value objects, there is nothing against updating.
As far as I understand, UserPreferenceInfo is a part of User entity. Ergo User entity is an Aggregate root which is retrieved or saved using UserRepository as a whole, along with UserPreferenceInfo and other objects.
Personally, I think that UserPreferenceInfo is entity type, since it has identity - it can be changed, saved and retrieved from repository and still be regarded as the same object (i.e. has identity). But it depends on your usage of it.
It doesn't matter IMHO how object is represented in the DAL - is it stored in a separate table or part of other table. One of the benefits of DDD is persistence ignorance and is ususally a good thing.
Of course, I may be wrong, I am new to DDD too.
Question is - what is this UserPreferencesInfo object?
I don't know how this case is supported by NHibernate, but some ORMs support special concepts for them. For example DataObjects.Net include Structures concept. It seems that you need something like this in NH.
First time ever posting on a blog. Hope I do it right.
Anyway, since you haven't showed us the UserPreferencesInfo object, I am not sure how it's constructed such that you can have a variable number of things in it.
If it were me, I'd make a single class called UserPreference, with id, userid, key, value, displaytype, and whatever other fields you may need in it. This is an entity. it has an id and is tied to a certain user.
Then in your user entity (the root I am assuming), have an ISet.
100 properties sounds like a lot.
Try breaking UserPreferenceInfo up into smaller (more cohesive) types, which likely/hopefully are manageable as VOs.
I 'm concern about what techniques should I use to choose the right object in OOP
Is there any must-read book about OOP in terms of how to choose objects?
Best,
Just write something that gets the job done, even if it's ugly, then refactor continuously:
eliminate duplicate code (don't repeat yourself)
increase cohesion
reduce coupling
But:
don't over-engineer; keep it simple
don't write stuff you ain't gonna need
It's not a precise recipe, just some general guidelines. Keep practicing.
P.S.
Code objects are not related to tangible real-life objects; they are just constructs that hold related information together.
Don't believe what the Java books/schools teach about objects; they're lying.
You probably mean "the right class", rather than "the right object". :-)
There are a few techniques, such as text analysis (a.k.a. underlining the nouns) and Class Responsibility Collaborator (CRC).
With "underlining the nouns", you basically start with a written, natural language (i.e. plain English) description of the problem you want to solve and underline the nouns. That gives you a list of candidate classes. You will need to perform several passes to refine it into a list of classes to implement.
For CRC, check out the Wikipedia.
I suggest The OPEN Toolbox of Techniques for full reference.
Hope it helps.
I am assuming that there is understanding of what is sctruct, type, class, set, state, alphabet, scalar and vector and relationship.
Object is a noun, method is a verb. Object members can represent identity, state or scalar value per field. Relationships between objects usually are represented with references, where references are members of objects. In cases, when relationships are complex, multidirectional, have arity greater than 2, represent some sort of grouping or containment, then relationships can be expressed as objects.
For other, broader technical reasons objects are most likely the only way to represent any form of information in OOP languages.
I am adding a second answer due to demian's comment:
Sometimes the class is so obvious
because it's tangible, but other times
the concept of object it's to abstract
like a db connector.
That is true. My preferred approach is to perform a behavioural analysis of the system (using use cases, for example), and then derive system operations. Once you have a stable list of system operations (such as PrintDocument, SaveDocument, SpellCheck, MergeMail, etc. for a word processor) you need to assign each of them to a class. If you have developed a list of candidate classes with some of the techniques that I mentioned earlier, you will be able to allocate some of the operations. But some will remain unallocated. These will signal the need of more abstract or unintuitive classes, which you will need to make up, using your good judgment.
The whole method is documented in a white paper at www.openmetis.com.
You should check out Domain-Driven Design, by Eric Evans. It provides very useful concepts in thinking about the objects in your model, what their function are in the domain, and how they could be organized to work together. It's not a cookbook, and probably not a beginner book - but then, I read it at different stages of my career, and every time I found something valuable in it...
(source: domaindrivendesign.org)
I am getting into Groovy language, which has dynamic typing (as well as optional static typing). It also has native support for Lists, Maps, and Ranges, so I find myself using lists and maps a lot, especially lists of lists, lists of maps, maps of lists, etc.
In static languages (esp with Generics) you always have an idea of what your type is. I am fairly new to dynamic languages, and it's getting a bit difficult to keep track of what my variable is supposed to be, so I was wondering if other people use some kind of variable naming conventions to keep these straight.
For example, suppose I have a map of dates as key and integers as values. Or List of integers, or List of Maps that contain strings as keys and account objects as values.
It seems like creating a clear convention behind variable names will help me keep track of what data type structure I am dealing with without having to look it up.
Any tips?
This is a common beginner's lament. You could use a naming convention, but odds are you'll drop it before too long and focus on what the variable represents (its meaning in relation to the rest of the code) rather than worrying about how it's represented (it's "type").
The name of your variable should explain to someone reading the code what it is supposed to be, what it stands for. If you have a map of dates to integers, does it represent, for example (suggested variable names are in brackets):
a number of payments due on that date (paymentsDue)
a number of days between mapped date and some other point in time (daysPassed)
a number of messages posted on that date on Stack Overflow (numberOfPostedMessages)
In languages where variable type is not readily available, you might want to append a prefix of suffix, such as paymentsDueMap. I would, however, advise against encoding any additional type information inside a variable name, such as datesToInts - that routinely does more harm than good.
Finally, if you have a complex data structure, such as a list of maps between strings and accounts, the best thing would be to encapsulate that into a separate class, and name it according to its intent.
In static languages (esp with Generics) you always have an idea of what your type is.
After a while of programming in dynamic languages, you learn that using types this way is a crutch. Two pieces of advice:
Use good variable naming. For instance, if you have a map of dates to ints, you can name it something like BirthdateToTotalLookup.
Learn what visual clues to look for. It may seem obvious, but it took me a while to get in the habit of looking for clues like this:
sum += x['10-16-92']
From the piece of code above, I can tell that x is a map that has a date as a key and returns a number of some kind.
If the names can be kept short, then I tend to name maps something like "nounToNoun". So using your example of dates mapping to integers, I would name that "dateToCount" (if the integers are counters for something). That way its obvious that it is a map, and its obvious what is being mapped to what. The problem is that sometimes it is difficult to keep these sort of names short and readable. For example, "userToLoginHistory" starts getting a little unwieldy.
For lists I generally use a plural for the variable name. So "user" would be a single user, and "users" would be a list of users.
To be honest, I am not sure what a good name would be for a list of maps.
One of the benefits of dynamic languages is that even if you're using an object as a Map - it doesn't HAVE to be a map. All it has to do is support whatever messages are sent to it. In Groovy, if I know that a given method expects a map so it can look up things by a String key - I can give it the full map, a stripped-down map, an Expando with a property named the same thing as the key, or any other object that has a property named the same thing as the key. This is because someObject["keyname"] and someObject.keyname are the same thing. (Of course if the code calls someObject.get("keyname") I've got to wire that method up somehow.)
The point is, in a dynamic language like Groovy you think less about TYPES and more about SUPPORTED MESSAGES. If it's conceptually a map, fine - naming it birthdateToTotal would make sense (though I prefer to call it 'totals', because totals[birthdate] looks better than birthdateToTotal[birthdate]) - but if it doesn't have to be specified, don't specify it. You leave yourself flexibility later.
This is something you'll outgrow over time. Not to say I don't know a 20-year programmer still using Hungarian, but he's coding in a static-typed language, so it's almost understandable.
Consider this. That variable you're naming might be a HashMap, so what type do you add to the name? Map? This is a middle-of-the-road answer. Why not Collection? Since that way if you decide to change the WAY the data is stored, you don't have to change the variable name. Why not HashMap, if you really want to let the reader know what's going on.
As you may suspect, none of these are necessary. The point of a dynamic language (and even of polymorphism) is that you don't need to know the exact type of the variable being presented, only the data itself is important. While you might like a hint as to how to interface to that data, you'll soon find you already know in most cases, or can easily put that info in the variable without specifying types: addressesByZipCode, totalByBirthdate, etc.