How do you fight growing parameter list in class hierarchy? - oop

I have a strong feeling that I do not know what pattern or particular language technique use in this situation.
So, the question itself is how to manage the growing parameter list in class hierarchy in language that has OOP support? I mean if for root class in the hierarchy you have, let's say 3 or 4 parameters, then in it's derived class you need to call base constructor and pass additional parameters for derived part of the object, and so forth... Parameter lists become enormous even if you have depth of inheritance more than two.
I`m pretty sure that many of SOwers faced this problem. And I am interested in ways how to solve it. Many thanks in advance.

Constructors with long parameter lists is an indication that your class is trying to do too much. One approach to resolving that problem is to break it apart, and use a "coordinator" class to manage the pieces. Subclasses that have constructor parameter lists that differ significantly from their superclass is another example of a class doing too much. If a subclass truly is-a superclass, then it shouldn't require significantly more data to do its job.
That said, there are occasional cases where a class needs to work on a large number of related objects. In this situation, I would create a new object to hold the related parameters.

Alternatives:
Use setter injection instead of constructor injection
Encapsulate the parameters in a separate container class, and pass that between constructors instead.

Don't use constructors to initialize the whole object at once. Only have it initialize those things which (1) are absolutely required for the existence of the object and (2) which must be done immediately at its creation. This will dramatically reduce the number of parameters you have to pass (likely to zero).
For a typical hierarchy like SalariedEmployee >> Employee >> Person you will have getters and setters to retrieve and change the various properties of the object.

Seeing the code would help me suggest a solution..
However long parameter lists are a code-smell, so I'd take a careful look at the design which requires this. The suggested refactorings to counter this are
Introduce Parameter Object
Preserve Whole Object
However if you find that you absolutely need this and a long inheritance chain, consider using a hash / property bag like object as the sole parameter
public MyClass(PropertyBag configSettings)
{
// each class extracts properties it needs and applies them
m_Setting1 = configSettings["Setting1"];
}

Possibilities:
Perhaps your class(es) are doing too much if they require so much state to be provided up-front? Aim to adhere to the Single Responsibility Principle.
Perhaps some of these parameters should logically exist in a value object of their own that is itself passed in as a parameter?
For classes whose construction really is complex, consider using the builder or factory pattern to instantiate these objects in a readable way - unlike method names, constructor parameters lack the ability to self document.

Another tip: Keep your class hierarchy shallow and prefer composition to inheritence. That way your constructor parameter list will remain short.

Related

Flaw: Constructor does Real Work

I have a class which represents a set of numbers. The constructor takes three arguments: startValue, endValue and stepSize.
The class is responsible for holding a list containing all values between start and end value taking the stepSize into consideration.
Example: startValue: 3, endValue: 1, stepSize = -1, Collection = { 3,2,1 }
I am currently creating the collection and some info strings about the object in the constructor. The public members are read only info strings and the collection.
My constructor does three things at the moment:
Checks the arguments; this could throw an exception from the constructor
Fills values into the collection
Generates the information strings
I can see that my constructor does real work but how can I fix this, or, should I fix this? If I move the "methods" out of the constructor it is like having init function and leaving me with an not fully initialized object. Is the existence of my object doubtful? Or is it not that bad to have some work done in the constructor because it is still possible to test the constructor because no object references are created.
For me it looks wrong but it seems that I just can't find a solution. I also have taken a builder into account but I am not sure if that's right because you can't choose between different types of creations. However single unit tests would have less responsibility.
I am writing my code in C# but I would prefer a general solution, that's why the text contains no code.
EDIT: Thanks for editing my poor text (: I changed the title back because it represents my opinion and the edited title did not. I am not asking if real work is a flaw or not. For me, it is. Take a look at this reference.
http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/
The blog states the problems quite well. Still I can't find a solution.
Concepts that urge you to keep your constructors light weight:
Inversion of control (Dependency Injection)
Single responsibility principle (as applied to the constructor rather than a class)
Lazy initialization
Testing
K.I.S.S.
D.R.Y.
Links to arguments of why:
How much work should be done in a constructor?
What (not) to do in a constructor
Should a C++ constructor do real work?
http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/
If you check the arguments in the constructor that validation code can't be shared if those arguments come in from any other source (setter, constructor, parameter object)
If you fill values into the collection or generate the information strings in the constructor that code can't be shared with other constructors you may need to add later.
In addition to not being able to be shared there is also being delayed until really needed (lazy init). There is also overriding thru inheritance that offers more options with many methods that just do one thing rather then one do everything constructor.
Your constructor only needs to put your class into a usable state. It does NOT have to be fully initialized. But it is perfectly free to use other methods to do the real work. That just doesn't take advantage of the "lazy init" idea. Sometimes you need it, sometimes you don't.
Just keep in mind anything that the constructor does or calls is being shoved down the users / testers throat.
EDIT:
You still haven't accepted an answer and I've had some sleep so I'll take a stab at a design. A good design is flexible so I'm going to assume it's OK that I'm not sure what the information strings are, or whether our object is required to represent a set of numbers by being a collection (and so provides iterators, size(), add(), remove(), etc) or is merely backed by a collection and provides some narrow specialized access to those numbers (such as being immutable).
This little guy is the Parameter Object pattern
/** Throws exception if sign of endValue - startValue != stepSize */
ListDefinition(T startValue, T endValue, T stepSize);
T can be int or long or short or char. Have fun but be consistent.
/** An interface, independent from any one collection implementation */
ListFactory(ListDefinition ld){
/** Make as many as you like */
List<T> build();
}
If we don't need to narrow access to the collection, we're done. If we do, wrap it in a facade before exposing it.
/** Provides read access only. Immutable if List l kept private. */
ImmutableFacade(List l);
Oh wait, requirements change, forgot about 'information strings'. :)
/** Build list of info strings */
InformationStrings(String infoFilePath) {
List<String> read();
}
Have no idea if this is what you had in mind but if you want the power to count line numbers by twos you now have it. :)
/** Assuming information strings have a 1 to 1 relationship with our numbers */
MapFactory(List l, List infoStrings){
/** Make as many as you like */
Map<T, String> build();
}
So, yes I'd use the builder pattern to wire all that together. Or you could try to use one object to do all that. Up to you. But I think you'll find few of these constructors doing much of anything.
EDIT2
I know this answer's already been accepted but I've realized there's room for improvement and I can't resist. The ListDefinition above works by exposing it's contents with getters, ick. There is a "Tell, don't ask" design principle that is being violated here for no good reason.
ListDefinition(T startValue, T endValue, T stepSize) {
List<T> buildList(List<T> l);
}
This let's us build any kind of list implementation and have it initialized according to the definition. Now we don't need ListFactory. buildList is something I call a shunt. It returns the same reference it accepted after having done something with it. It simply allows you to skip giving the new ArrayList a name. Making a list now looks like this:
ListDefinition<int> ld = new ListDefinition<int>(3, 1, -1);
List<int> l = new ImmutableFacade<int>( ld.buildList( new ArrayList<int>() ) );
Which works fine. Bit hard to read. So why not add a static factory method:
List<int> l = ImmutableRangeOfNumbers.over(3, 1, -1);
This doesn't accept dependency injections but it's built on classes that do. It's effectively a dependency injection container. This makes it a nice shorthand for popular combinations and configurations of the underlying classes. You don't have to make one for every combination. The point of doing this with many classes is now you can put together whatever combination you need.
Well, that's my 2 cents. I'm gonna find something else to obsess on. Feedback welcome.
As far as cohesion is concerned, there's no "real work", only work that's in line (or not) with the class/method's responsibility.
A constructor's responsibility is to create an instance of a class. And a valid instance for that matter. I'm a big fan of keeping the validation part as intrinsic as possible, so that you can see the invariants every time you look at the class. In other words, that the class "contains its own definition".
However, there are cases when an object is a complex assemblage of multiple other objects, with conditional logic, non-trivial validation or other creation sub-tasks involved. This is when I'd delegate the object creation to another class (Factory or Builder pattern) and restrain the accessibility scope of the constructor, but I think twice before doing it.
In your case, I see no conditionals (except argument checking), no composition or inspection of complex objects. The work done by your constructor is cohesive with the class because it essentially only populates its internals. While you may (and should) of course extract atomic, well identified construction steps into private methods inside the same class, I don't see the need for a separate builder class.
The constructor is a special member function, in a way that it constructor, but after all - it is a member function. As such, it is allowed to do things.
Consider for example c++ std::fstream. It opens a file in the constructor. Can throw an exception, but doesn't have to.
As long as you can test the class, it is all good.
It's true, a constructur should do minimum of work oriented to a single aim - successful creaation of the valid object. Whatever it takes is ok. But not more.
In your example, creating this collection in the constructor is perfectly valid, as object of your class represent a set of numbers (your words). If an object is set of numbers, you should clearly create it in the constructor! On the contrary - the constructur does not perform what it is made for - a fresh, valid object construction.
These info strings call my attention. What is their purpose? What exactly do you do? This sounds like something periferic, something that can be left for later and exposed through a method, like
String getInfo()
or similar.
If you want to use Microsoft's .NET Framework was an example here, it is perfectly valid both semantically and in terms of common practice, for a constructor to do some real work.
An example of where Microsoft does this is in their implementation of System.IO.FileStream. This class performs string processing on path names, opens new file handles, opens threads, binds all sorts of things, and invokes many system functions. The constructor is actually, in effect, about 1,200 lines of code.
I believe your example, where you are creating a list, is absolutely fine and valid. I would just make sure that you fail as often as possible. Say if you the minimum size higher than the maximum size, you could get stuck in an infinite loop with a poorly written loop condition, thus exhausting all available memory.
The takeaway is "it depends" and you should use your best judgement. If all you wanted was a second opinion, then I say you're fine.
It's not a good practice to do "real work" in the constructor: you can initialize class members, but you shouldn't call other methods or do more "heavy lifting" in the constructor.
If you need to do some initialization which requires a big amount of code running, a good practice will be to do it in an init() method which will be called after the object was constructed.
The reasoning for not doing heavy lifting inside the constructor is: in case something bad happens, and fails silently, you'll end up having a messed up object and it'll be a nightmare to debug and realize where the issues are coming from.
In the case you describe above I would only do the assignments in the constructor and then, in two separate methods, I would implement the validations and generate the string-information.
Implementing it this way also conforms with SRP: "Single Responsibility Principle" which suggests that any method/function should do one thing, and one thing only.

Worker vs data class

I have a data class which encapsulates relevant data items in it. Those data items are set and get by users one by one when needed.
My confusion about the design has to do with which object should be responsible for handling the update of multiple properties of that data object. Sometimes an update operation will be performed which affects many properties at once.
So, which class should have the update() method?. Is it the data class itself or another manager class ? The update() method requires data exchange with many different objects, so I don't want to make it a member of the data class because I believe it should know nothing about the other objects required for update. I want the data class to be only a data-structure. Am I thinking wrong? What would be the right approach?
My code:
class RefData
{
Matrix mX;
Vector mV;
int mA;
bool mB;
getX();
setB();
update(); // which affects almost any member attributes in the class, but requires many relations with many different classes, which makes this class dependant on them.
}
or,
class RefDataUpdater
{
update(RefData*); // something like this ?
}
There is this really great section in the book Clean Code, by Robert C. Martin, that speaks directly to this issue.
And the answer is it depends. It depends on what you are trying to accomplish in your design--and
if you might have more than one data-object that exhibit similar behaviors.
First, your data class could be considered a Data Transfer Object (DTO). As such, its ideal form is simply a class without any public methods--only public properties -- basically a data structure. It will not encapsulate any behavior, it simply groups together related data. Since other objects manipulate these data objects, if you were to add a property to the data object, you'd need to change all the other objects that have functions that now need to access that new property. However, on the flip side, if you added a new function to a manager class, you need to make zero changes to the data object class.
So, I think often you want to think about how many data objects might have an update function that relates directly to the properties of that class. If you have 5 classes that contain 3-4 properties but all have an update function, then I'd lean toward having the update function be part of the "data-class" (which is more of an OO-design). But, if you have one data-class in which it is likely to have properties added to it in the future, then I'd lean toward the DTO design (object as a data structure)--which is more procedural (requiring other functions to manipulate it) but still can be part of an otherwise Object Oriented architecture.
All this being said, as Robert Martin points out in the book:
There are ways around this that are well known to experienced
object-oriented designers: VISITOR, or dual-dispatch, for example.
But these techniques carry costs of their own and generally return the
structure to that of a procedural program.
Now, in the code you show, you have properties with types of Vector, and Matrix, which are probably more complex types than a simple DTO would contain, so you may want to think about what those represent and whether they could be moved to separate classes--with different functions to manipulate--as you typically would not expose a Matrix or a Vector directly as a property, but encapsulate them.
As already written, it depends, but I'd probably go with an external support class that handles the update.
For once, I'd like to know why you'd use such a method? I believe it's safe to assume that the class doesn't only call setter methods for a list of parameters it receives, but I'll consider this case as well
1) the trivial updater method
In this case I mean something like this:
public update(a, b, c)
{
setA(a);
setB(b);
setC(c);
}
In this case I'd probably not use such a method at all, I'd either define a macro for it or I'd call the setter themselves. But if it must be a method, then I'd place it inside the data class.
2) the complex updater method
The method in this case doesn't only contain calls to setters, but it also contains logic. If the logic is some sort of simple property update logic I'd try to put that logic inside the setters (that's what they are for in the first place), but if the logic involves multiple properties I'd put this logic inside an external supporting class (or a business logic class if any appropriate already there) since it's not a great idea having logic reside inside data classes.
Developing clear code that can be easily understood is very important and it's my belief that by putting logic of any kind (except for say setter logic) inside data classes won't help you achieving that.
Edit
I just though I'd add something else. Where to put such methods also depend upon your class and what purpose it fulfills. If we're talking for instance about Business/Domain Object classes, and we're not using an Anemic Domain Model these classes are allowed (and should contain) behavior/logic.
On the other hand, if this data class is say an Entity (persistence objects) which is not used in the Domain Model as well (complex Domain Model) I would strongly advice against placing logic inside them. The same goes for data classes which "feel" like pure data objects (more like structs), don't pollute them, keep the logic outside.
I guess like everywhere in software, there's no silver bullet and the right answer is: it depends (upon the classes, what this update method is doing, what's the architecture behind the application and other application specific considerations).

Should ecapsulated objects be public or private?

I'm a little unclear as to how far to take the idea in making all members within a class private and make public methods to handle mutations. Primitive types are not the issue, it's encapsulated object that I am unclear about. The benefit of making object members private is the ability to hide methods that do not apply to the context of class being built. The downside is that you have to provide public methods to pass parameters to the underlying object (more methods, more work). On the otherside, if you want to have all methods and properties exposed for the underlying object, couldn't you just make the object public? What are the dangers in having objects exposed this way?
For example, I would find it useful to have everything from a vector, or Array List, exposed. The only downside I can think of is that public members could potentially assigned a type that its not via implicit casting (or something to that affect). Would a volitile designation reduce the potential for problems?
Just a side note: I understand that true enapsulation implies that members are private.
What are the dangers in having objects exposed this way?
Changing the type of those objects would require changing the interface to the class. With private objects + public getters/setters, you'd only have to modify the code in the getters and setters, assuming you want to keep the things being returned the same.
Note that this is why properties are useful in languages such as Python, which technically doesn't have private class members, only obscured ones at most.
The problem with making instance variables public is that you can never change your mind later, and make them private, without breaking existing code that relies on directly public access to those instance vars. Some examples:
You decide to later make your class thread-safe by synchronizing all access to instance vars, or maybe by using a ThreadLocal to create a new copy of the value for each thread. Can't do it if any thread can directly access the variables.
Using your example of a vector or array list - at some point, you realize that there is a security flaw in your code because those classes are mutable, so somebody else can replace the contents of the list. If this were only available via an accessor method, you could easily solve the problem by making an immutable copy of the list upon request, but you can't do that with a public variable.
You realize later that one of your instance vars is redundant and can be derived based on other variables. Once again, easy if you're using accessors, impossible with public variables.
I think that it boils down to a practical point - if you know that you're the only one who will be using this code, and it pains you to write accessors (every IDE will do it for you automatically), and you don't mind changing your own code later if you decide to break the API, then go for it. But if other people will be using your class, or if you would like to make it easier to refactor later for your own use, stick with accessors.
Object oriented design is just a guideline. Think about it from the perspective of the person who will be using your class. Balance OOD with making it intuitive and easy to use.
You could run into issues depending on the language you are using and how it treats return statements or assignment operators. In some cases it may give you a reference, or values in other cases.
For example, say you have a PrimeCalculator class that figures out prime numbers, then you have another class that does something with those prime numbers.
public PrimeCalculator calculatorObject = new PrimeCalculator();
Vector<int> primeNumbers = calculatorObject.PrimeNumbersVector;
/* do something complicated here */
primeNumbers.clear(); // free up some memory
When you use this stuff later, possibly in another class, you don't want the overhead of calculating the numbers again so you use the same calculatorObject.
Vector<int> primes = calculatorObject.PrimeNumbersVector;
int tenthPrime = primes.elementAt(9);
It may not exactly be clear at this point whether primes and primeNumbers reference the same Vector. If they do, trying to get the tenth prime from primes would throw an error.
You can do it this way if you're careful and understand what exactly is happening in your situation, but you have a smaller margin of error using functions to return a value rather than assigning the variable directly.
Well you can check the post :
first this
then this
This should solve your confusion . It solved mine ! Thanks to Nicol Bolas.
Also read the comments below the accepted answer (also notice the link given in the second last comment by me ( in the first post) )
Also visit the wikipedia post

OOP Design: Where to put object specific "compare" method?

I have some measurement object instances from a series of test runs stored in a test collection object. I also have some logic that can compare two test result object instances and tell me if they are "close enough".
Where should this logic be placed?
On the object as a method? Like: instance.approximately_equal(other)
On the object's class as a class/static method? class.approximately_equal(a,b)
On the collection object as a method? collection.approximately_equal(a,b)
What is the correct OO design for this?
(I ask, since although #1 would seem the correct solution, I'd never be asking if some one instance is approximately_equal to a different instance. Only if "some group of objects" are equal to each other. It got me thinking...)
Thanks
The object oriented design books I have read suggest putting cross class functionality into service provider objects. This will decouple the two objects and reduce complexity, but may be overkill if your project is small.
I would use option 1 (instance method) since that enables you to refine the comparison logic in derived classes (if needed).
I've found #3 is the least obtrusive and leads to less bloated code, because it tends to force you to make those methods as flexible/reusable as possible. For example, in C++, you'd potentially just use operator overloading to handle it; if you have a utility class (or, if you plan on extending a native data type), the net effect is the same, just with a different presentation.

Is it poor design to create objects that only execute code during the constructor?

In my design I am using objects that evaluate a data record. The constructor is called with the data record and type of evaluation as parameters and then the constructor calls all of the object's code necessary to evaluate the record. This includes using the type of evaluation to find additional parameter-like data in a text file.
There are in the neighborhood of 250 unique evaluation types that use the same or similar code and unique parameters coming from the text file.
Some of these evaluations use different code so I benefit a lot from this model because I can use inheritance and polymorphism.
Once the object is created there isn't any need to execute additional code on the object (at least for now) and it is used more like a struct; its kept on a list and 3 properties are used later.
I think this design is the easiest to understand, code, and read.
A logical alternative I guess would be using functions that return score structs, but you can't inherit from methods so it would make it kind of sloppy imo.
I am using vb.net and these classes will be used in an asp.net web app as well as in a distributed app.
thanks for your input
Executing code in a constructor is OK; but having only properties with no methods might be a violation of the tell don't ask principle: perhaps instead those properties should be private, and the code which uses ("asks") those properties should become methods of the class (which you can invoke or "tell").
In general, putting code that does anything significant in the constructor a not such a good idea, because you'll eventually get hamstrung on the rigid constructor execution order when you subclass.
Constructors are best used for getting your object to a consistent state. "Real" work is best handled in instance methods. With the work implemented as a method, you gain:
separation of what you want to evaluate from when you want to evaluate it.
polymorphism (if using virtual methods)
the option to split up the work into logical pieces, implementing each piece as a concrete template method. These template methods can be overridden in subclasses, which provides for "do it mostly like my superclass, but do this bit differently".
In short, I'd use methods to implement the main computation. If you're concerned that an object will be created without it's evaluation method being called, you can use a factory to create the objects, which calls the evaluate method after construction. You get the safety of constructors, with the execution order flexibility of methods.