Is it poor design to create objects that only execute code during the constructor? - oop

In my design I am using objects that evaluate a data record. The constructor is called with the data record and type of evaluation as parameters and then the constructor calls all of the object's code necessary to evaluate the record. This includes using the type of evaluation to find additional parameter-like data in a text file.
There are in the neighborhood of 250 unique evaluation types that use the same or similar code and unique parameters coming from the text file.
Some of these evaluations use different code so I benefit a lot from this model because I can use inheritance and polymorphism.
Once the object is created there isn't any need to execute additional code on the object (at least for now) and it is used more like a struct; its kept on a list and 3 properties are used later.
I think this design is the easiest to understand, code, and read.
A logical alternative I guess would be using functions that return score structs, but you can't inherit from methods so it would make it kind of sloppy imo.
I am using vb.net and these classes will be used in an asp.net web app as well as in a distributed app.
thanks for your input

Executing code in a constructor is OK; but having only properties with no methods might be a violation of the tell don't ask principle: perhaps instead those properties should be private, and the code which uses ("asks") those properties should become methods of the class (which you can invoke or "tell").

In general, putting code that does anything significant in the constructor a not such a good idea, because you'll eventually get hamstrung on the rigid constructor execution order when you subclass.
Constructors are best used for getting your object to a consistent state. "Real" work is best handled in instance methods. With the work implemented as a method, you gain:
separation of what you want to evaluate from when you want to evaluate it.
polymorphism (if using virtual methods)
the option to split up the work into logical pieces, implementing each piece as a concrete template method. These template methods can be overridden in subclasses, which provides for "do it mostly like my superclass, but do this bit differently".
In short, I'd use methods to implement the main computation. If you're concerned that an object will be created without it's evaluation method being called, you can use a factory to create the objects, which calls the evaluate method after construction. You get the safety of constructors, with the execution order flexibility of methods.

Related

Flaw: Constructor does Real Work

I have a class which represents a set of numbers. The constructor takes three arguments: startValue, endValue and stepSize.
The class is responsible for holding a list containing all values between start and end value taking the stepSize into consideration.
Example: startValue: 3, endValue: 1, stepSize = -1, Collection = { 3,2,1 }
I am currently creating the collection and some info strings about the object in the constructor. The public members are read only info strings and the collection.
My constructor does three things at the moment:
Checks the arguments; this could throw an exception from the constructor
Fills values into the collection
Generates the information strings
I can see that my constructor does real work but how can I fix this, or, should I fix this? If I move the "methods" out of the constructor it is like having init function and leaving me with an not fully initialized object. Is the existence of my object doubtful? Or is it not that bad to have some work done in the constructor because it is still possible to test the constructor because no object references are created.
For me it looks wrong but it seems that I just can't find a solution. I also have taken a builder into account but I am not sure if that's right because you can't choose between different types of creations. However single unit tests would have less responsibility.
I am writing my code in C# but I would prefer a general solution, that's why the text contains no code.
EDIT: Thanks for editing my poor text (: I changed the title back because it represents my opinion and the edited title did not. I am not asking if real work is a flaw or not. For me, it is. Take a look at this reference.
http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/
The blog states the problems quite well. Still I can't find a solution.
Concepts that urge you to keep your constructors light weight:
Inversion of control (Dependency Injection)
Single responsibility principle (as applied to the constructor rather than a class)
Lazy initialization
Testing
K.I.S.S.
D.R.Y.
Links to arguments of why:
How much work should be done in a constructor?
What (not) to do in a constructor
Should a C++ constructor do real work?
http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/
If you check the arguments in the constructor that validation code can't be shared if those arguments come in from any other source (setter, constructor, parameter object)
If you fill values into the collection or generate the information strings in the constructor that code can't be shared with other constructors you may need to add later.
In addition to not being able to be shared there is also being delayed until really needed (lazy init). There is also overriding thru inheritance that offers more options with many methods that just do one thing rather then one do everything constructor.
Your constructor only needs to put your class into a usable state. It does NOT have to be fully initialized. But it is perfectly free to use other methods to do the real work. That just doesn't take advantage of the "lazy init" idea. Sometimes you need it, sometimes you don't.
Just keep in mind anything that the constructor does or calls is being shoved down the users / testers throat.
EDIT:
You still haven't accepted an answer and I've had some sleep so I'll take a stab at a design. A good design is flexible so I'm going to assume it's OK that I'm not sure what the information strings are, or whether our object is required to represent a set of numbers by being a collection (and so provides iterators, size(), add(), remove(), etc) or is merely backed by a collection and provides some narrow specialized access to those numbers (such as being immutable).
This little guy is the Parameter Object pattern
/** Throws exception if sign of endValue - startValue != stepSize */
ListDefinition(T startValue, T endValue, T stepSize);
T can be int or long or short or char. Have fun but be consistent.
/** An interface, independent from any one collection implementation */
ListFactory(ListDefinition ld){
/** Make as many as you like */
List<T> build();
}
If we don't need to narrow access to the collection, we're done. If we do, wrap it in a facade before exposing it.
/** Provides read access only. Immutable if List l kept private. */
ImmutableFacade(List l);
Oh wait, requirements change, forgot about 'information strings'. :)
/** Build list of info strings */
InformationStrings(String infoFilePath) {
List<String> read();
}
Have no idea if this is what you had in mind but if you want the power to count line numbers by twos you now have it. :)
/** Assuming information strings have a 1 to 1 relationship with our numbers */
MapFactory(List l, List infoStrings){
/** Make as many as you like */
Map<T, String> build();
}
So, yes I'd use the builder pattern to wire all that together. Or you could try to use one object to do all that. Up to you. But I think you'll find few of these constructors doing much of anything.
EDIT2
I know this answer's already been accepted but I've realized there's room for improvement and I can't resist. The ListDefinition above works by exposing it's contents with getters, ick. There is a "Tell, don't ask" design principle that is being violated here for no good reason.
ListDefinition(T startValue, T endValue, T stepSize) {
List<T> buildList(List<T> l);
}
This let's us build any kind of list implementation and have it initialized according to the definition. Now we don't need ListFactory. buildList is something I call a shunt. It returns the same reference it accepted after having done something with it. It simply allows you to skip giving the new ArrayList a name. Making a list now looks like this:
ListDefinition<int> ld = new ListDefinition<int>(3, 1, -1);
List<int> l = new ImmutableFacade<int>( ld.buildList( new ArrayList<int>() ) );
Which works fine. Bit hard to read. So why not add a static factory method:
List<int> l = ImmutableRangeOfNumbers.over(3, 1, -1);
This doesn't accept dependency injections but it's built on classes that do. It's effectively a dependency injection container. This makes it a nice shorthand for popular combinations and configurations of the underlying classes. You don't have to make one for every combination. The point of doing this with many classes is now you can put together whatever combination you need.
Well, that's my 2 cents. I'm gonna find something else to obsess on. Feedback welcome.
As far as cohesion is concerned, there's no "real work", only work that's in line (or not) with the class/method's responsibility.
A constructor's responsibility is to create an instance of a class. And a valid instance for that matter. I'm a big fan of keeping the validation part as intrinsic as possible, so that you can see the invariants every time you look at the class. In other words, that the class "contains its own definition".
However, there are cases when an object is a complex assemblage of multiple other objects, with conditional logic, non-trivial validation or other creation sub-tasks involved. This is when I'd delegate the object creation to another class (Factory or Builder pattern) and restrain the accessibility scope of the constructor, but I think twice before doing it.
In your case, I see no conditionals (except argument checking), no composition or inspection of complex objects. The work done by your constructor is cohesive with the class because it essentially only populates its internals. While you may (and should) of course extract atomic, well identified construction steps into private methods inside the same class, I don't see the need for a separate builder class.
The constructor is a special member function, in a way that it constructor, but after all - it is a member function. As such, it is allowed to do things.
Consider for example c++ std::fstream. It opens a file in the constructor. Can throw an exception, but doesn't have to.
As long as you can test the class, it is all good.
It's true, a constructur should do minimum of work oriented to a single aim - successful creaation of the valid object. Whatever it takes is ok. But not more.
In your example, creating this collection in the constructor is perfectly valid, as object of your class represent a set of numbers (your words). If an object is set of numbers, you should clearly create it in the constructor! On the contrary - the constructur does not perform what it is made for - a fresh, valid object construction.
These info strings call my attention. What is their purpose? What exactly do you do? This sounds like something periferic, something that can be left for later and exposed through a method, like
String getInfo()
or similar.
If you want to use Microsoft's .NET Framework was an example here, it is perfectly valid both semantically and in terms of common practice, for a constructor to do some real work.
An example of where Microsoft does this is in their implementation of System.IO.FileStream. This class performs string processing on path names, opens new file handles, opens threads, binds all sorts of things, and invokes many system functions. The constructor is actually, in effect, about 1,200 lines of code.
I believe your example, where you are creating a list, is absolutely fine and valid. I would just make sure that you fail as often as possible. Say if you the minimum size higher than the maximum size, you could get stuck in an infinite loop with a poorly written loop condition, thus exhausting all available memory.
The takeaway is "it depends" and you should use your best judgement. If all you wanted was a second opinion, then I say you're fine.
It's not a good practice to do "real work" in the constructor: you can initialize class members, but you shouldn't call other methods or do more "heavy lifting" in the constructor.
If you need to do some initialization which requires a big amount of code running, a good practice will be to do it in an init() method which will be called after the object was constructed.
The reasoning for not doing heavy lifting inside the constructor is: in case something bad happens, and fails silently, you'll end up having a messed up object and it'll be a nightmare to debug and realize where the issues are coming from.
In the case you describe above I would only do the assignments in the constructor and then, in two separate methods, I would implement the validations and generate the string-information.
Implementing it this way also conforms with SRP: "Single Responsibility Principle" which suggests that any method/function should do one thing, and one thing only.

can overriding of a method be prevented by downcasting to a superclass?

I'm trying to understand whether the answer to the following question is the same in all major OOP languages; and if not, then how do those languages differ.
Suppose I have class A that defines methods act and jump; method act calls method jump. A's subclass B overrides method jump (i.e., the appropriate syntax is used to ensure that whenever jump is called, the implementation in class B is used).
I have object b of class B. I want it to behave exactly as if it was of class A. In other words, I want the jump to be performed using the implementation in A. What are my options in different languages?
For example, can I achieve this with some form of downcasting? Or perhaps by creating a proxy object that knows which methods to call?
I would want to avoid creating a brand new object of class A and carefully setting up the sharing of internal state between a and b because that's obviously not future-proof, and complicated. I would also want to avoid copying the state of b into a brand new object of class A because there might be a lot of data to copy.
UPDATE
I asked this question specifically about Python, but it seems this is impossible to achieve in Python and technically it can be done... kinda..
It appears that apart from technical feasibility, there's a strong argument against doing this from a design perspective. I'm asking about that in a separate question.
The comments reiterated: Prefer composition over inheritance.
Inheritance works well when your subclasses have well defined behavioural differences from their superclass, but you'll frequently hit a point where that model gets awkward or stops making sense. At that point, you need to reconsider your design.
Composition is usually the better solution. Delegating your object's varying behaviour to a different object (or objects) may reduce or eliminate your need for subclassing.
In your case, the behavioural differences between class A and class B could be encapsulated in the Strategy pattern. You could then change the behaviour of class A (and class B, if still required) at the instance level, simply by assigning a new strategy.
The Strategy pattern may require more code in the short run, but it's clean and maintainable. Method swizzling, monkey patching, and all those cool things that allow us to poke around in our specific language implementation are fun, but the potential for unexpected side effects is high and the code tends to be difficult to maintain.
What you are asking is completely unrelated/unsupported by OOP programming.
If you subclass an object A with class B and override its methods, when a concrete instance of B is created then all the overriden/new implementation of the base methods are associated with it (either we talk about Java or C++ with virtual tables etc).
You have instantiated object B.
Why would you expect that you could/would/should be able to call the method of the superclass if you have overriden that method?
You could call it explicitely of course e.g. by calling super inside the method, but you can not do it automatically, and casting will not help you do that either.
I can't imagine why you would want to do that.
If you need to use class A then use class A.
If you need to override its functionality then use its subclass B.
Most programming languages go to some trouble to support dynamic dispatch of virtual functions (the case of calling the overridden method jump in a subclass instead of the parent class's implementation) -- to the degree that working around it or avoiding it is difficult. In general, specialization/polymorphism is a desirable feature -- arguably a goal of OOP in the first place.
Take a look at the Wikipedia article on Virtual Functions, which gives a useful overview of the support for virtual functions in many programming languages. It will give you a place to start when considering a specific language, as well as the trade-offs to weigh when looking at a language where the programmer can control how dispatch behaves (see the section on C++, for example).
So loosely, the answer to your question is, "No, the behavior is not the same in all programming languages." Furthermore, there is no language independent solution. C++ may be your best bet if you need the behavior.
You can actually do this with Python (sort of), with some awful hacks. It requires that you implement something like the wrappers we were discussing in your first Python-specific question, but as a subclass of B. You then need to implement write-proxying as well (the wrapper object shouldn't contain any of the state normally associated with the class hierarchy, it should redirect all attribute access to the underlying instance of B.
But rather than redirecting method lookup to A and then calling the method with the wrapped instance, you'd call the method passing the wrapper object as self. This is legal because the wrapper class is a subclass of B, so the wrapper instance is an instance of the classes whose methods you're calling.
This would be very strange code, requiring you to dynamically generate classes using both IS-A and HAS-A relationships at the same time. It would probably also end up fairly fragile and have bizarre results in a lot of corner cases (you generally can't write 100% perfect wrapper classes in Python exactly because this sort of strange thing is possible).
I'm completely leaving aside weather this is a good idea or not.

OOP Design: Where to put object specific "compare" method?

I have some measurement object instances from a series of test runs stored in a test collection object. I also have some logic that can compare two test result object instances and tell me if they are "close enough".
Where should this logic be placed?
On the object as a method? Like: instance.approximately_equal(other)
On the object's class as a class/static method? class.approximately_equal(a,b)
On the collection object as a method? collection.approximately_equal(a,b)
What is the correct OO design for this?
(I ask, since although #1 would seem the correct solution, I'd never be asking if some one instance is approximately_equal to a different instance. Only if "some group of objects" are equal to each other. It got me thinking...)
Thanks
The object oriented design books I have read suggest putting cross class functionality into service provider objects. This will decouple the two objects and reduce complexity, but may be overkill if your project is small.
I would use option 1 (instance method) since that enables you to refine the comparison logic in derived classes (if needed).
I've found #3 is the least obtrusive and leads to less bloated code, because it tends to force you to make those methods as flexible/reusable as possible. For example, in C++, you'd potentially just use operator overloading to handle it; if you have a utility class (or, if you plan on extending a native data type), the net effect is the same, just with a different presentation.

Is it good convention for a class to perform functions on itself?

I've always been taught that if you are doing something to an object, that should be an external thing, so one would Save(Class) rather than having the object save itself: Class.Save().
I've noticed that in the .Net libraries, it is common to have a class modify itself as with String.Format() or sort itself as with List.Sort().
My question is, in strict OOP is it appropriate to have a class which performs functions on itself when called to do so, or should such functions be external and called on an object of the class' type?
Great question. I have just recently reflected on a very similar issue and was eventually going to ask much the same thing here on SO.
In OOP textbooks, you sometimes see examples such as Dog.Bark(), or Person.SayHello(). I have come to the conclusion that those are bad examples. When you call those methods, you make a dog bark, or a person say hello. However, in the real world, you couldn't do this; a dog decides himself when it's going to bark. A person decides itself when it will say hello to someone. Therefore, these methods would more appropriately be modelled as events (where supported by the programming language).
You would e.g. have a function Attack(Dog), PlayWith(Dog), or Greet(Person) which would trigger the appropriate events.
Attack(dog) // triggers the Dog.Bark event
Greet(johnDoe) // triggers the Person.SaysHello event
As soon as you have more than one parameter, it won't be so easy deciding how to best write the code. Let's say I want to store a new item, say an integer, into a collection. There's many ways to formulate this; for example:
StoreInto(1, collection) // the "classic" procedural approach
1.StoreInto(collection) // possible in .NET with extension methods
Store(1).Into(collection) // possible by using state-keeping temporary objects
According to the thinking laid out above, the last variant would be the preferred one, because it doesn't force an object (the 1) to do something to itself. However, if you follow that programming style, it will soon become clear that this fluent interface-like code is quite verbose, and while it's easy to read, it can be tiring to write or even hard to remember the exact syntax.
P.S.: Concerning global functions: In the case of .NET (which you mentioned in your question), you don't have much choice, since the .NET languages do not provide for global functions. I think these would be technically possible with the CLI, but the languages disallow that feature. F# has global functions, but they can only be used from C# or VB.NET when they are packed into a module. I believe Java also doesn't have global functions.
I have come across scenarios where this lack is a pity (e.g. with fluent interface implementations). But generally, we're probably better off without global functions, as some developers might always fall back into old habits, and leave a procedural codebase for an OOP developer to maintain. Yikes.
Btw., in VB.NET, however, you can mimick global functions by using modules. Example:
Globals.vb:
Module Globals
Public Sub Save(ByVal obj As SomeClass)
...
End Sub
End Module
Demo.vb:
Imports Globals
...
Dim obj As SomeClass = ...
Save(obj)
I guess the answer is "It Depends"... for Persistence of an object I would side with having that behavior defined within a separate repository object. So with your Save() example I might have this:
repository.Save(class)
However with an Airplane object you may want the class to know how to fly with a method like so:
airplane.Fly()
This is one of the examples I've seen from Fowler about an aenemic data model. I don't think in this case you would want to have a separate service like this:
new airplaneService().Fly(airplane)
With static methods and extension methods it makes a ton of sense like in your List.Sort() example. So it depends on your usage pattens. You wouldn't want to have to new up an instance of a ListSorter class just to be able to sort a list like this:
new listSorter().Sort(list)
In strict OOP (Smalltalk or Ruby), all methods belong to an instance object or a class object. In "real" OOP (like C++ or C#), you will have static methods that essentially stand completely on their own.
Going back to strict OOP, I'm more familiar with Ruby, and Ruby has several "pairs" of methods that either return a modified copy or return the object in place -- a method ending with a ! indicates that the message modifies its receiver. For instance:
>> s = 'hello'
=> "hello"
>> s.reverse
=> "olleh"
>> s
=> "hello"
>> s.reverse!
=> "olleh"
>> s
=> "olleh"
The key is to find some middle ground between pure OOP and pure procedural that works for what you need to do. A Class should do only one thing (and do it well). Most of the time, that won't include saving itself to disk, but that doesn't mean Class shouldn't know how to serialize itself to a stream, for instance.
I'm not sure what distinction you seem to be drawing when you say "doing something to an object". In many if not most cases, the class itself is the best place to define its operations, as under "strict OOP" it is the only code that has access to internal state on which those operations depend (information hiding, encapsulation, ...).
That said, if you have an operation which applies to several otherwise unrelated types, then it might make sense for each type to expose an interface which lets the operation do most of the work in a more or less standard way. To tie it in to your example, several classes might implement an interface ISaveable which exposes a Save method on each. Individual Save methods take advantage of their access to internal class state, but given a collection of ISaveable instances, some external code could define an operation for saving them to a custom store of some kind without having to know the messy details.
It depends on what information is needed to do the work. If the work is unrelated to the class (mostly equivalently, can be made to work on virtually any class with a common interface), for example, std::sort, then make it a free function. If it must know the internals, make it a member function.
Edit: Another important consideration is performance. In-place sorting, for example, can be miles faster than returning a new, sorted, copy. This is why quicksort is faster than merge sort in the vast majority of cases, even though merge sort is theoretically faster, which is because quicksort can be performed in-place, whereas I've never heard of an in-place merge-sort. Just because it's technically possible to perform an operation within the class's public interface, doesn't mean that you actually should.

How do you fight growing parameter list in class hierarchy?

I have a strong feeling that I do not know what pattern or particular language technique use in this situation.
So, the question itself is how to manage the growing parameter list in class hierarchy in language that has OOP support? I mean if for root class in the hierarchy you have, let's say 3 or 4 parameters, then in it's derived class you need to call base constructor and pass additional parameters for derived part of the object, and so forth... Parameter lists become enormous even if you have depth of inheritance more than two.
I`m pretty sure that many of SOwers faced this problem. And I am interested in ways how to solve it. Many thanks in advance.
Constructors with long parameter lists is an indication that your class is trying to do too much. One approach to resolving that problem is to break it apart, and use a "coordinator" class to manage the pieces. Subclasses that have constructor parameter lists that differ significantly from their superclass is another example of a class doing too much. If a subclass truly is-a superclass, then it shouldn't require significantly more data to do its job.
That said, there are occasional cases where a class needs to work on a large number of related objects. In this situation, I would create a new object to hold the related parameters.
Alternatives:
Use setter injection instead of constructor injection
Encapsulate the parameters in a separate container class, and pass that between constructors instead.
Don't use constructors to initialize the whole object at once. Only have it initialize those things which (1) are absolutely required for the existence of the object and (2) which must be done immediately at its creation. This will dramatically reduce the number of parameters you have to pass (likely to zero).
For a typical hierarchy like SalariedEmployee >> Employee >> Person you will have getters and setters to retrieve and change the various properties of the object.
Seeing the code would help me suggest a solution..
However long parameter lists are a code-smell, so I'd take a careful look at the design which requires this. The suggested refactorings to counter this are
Introduce Parameter Object
Preserve Whole Object
However if you find that you absolutely need this and a long inheritance chain, consider using a hash / property bag like object as the sole parameter
public MyClass(PropertyBag configSettings)
{
// each class extracts properties it needs and applies them
m_Setting1 = configSettings["Setting1"];
}
Possibilities:
Perhaps your class(es) are doing too much if they require so much state to be provided up-front? Aim to adhere to the Single Responsibility Principle.
Perhaps some of these parameters should logically exist in a value object of their own that is itself passed in as a parameter?
For classes whose construction really is complex, consider using the builder or factory pattern to instantiate these objects in a readable way - unlike method names, constructor parameters lack the ability to self document.
Another tip: Keep your class hierarchy shallow and prefer composition to inheritence. That way your constructor parameter list will remain short.