Should a long method used only once be in its own class or in a function? - oop

A lot of times in code on the internet or code from my co-workers I see them creating an Object with just one method which only gets used once in the whole application. Like this:
class iOnlyHaveOneMethod{
public function theOneMethod(){
//loads and loads of code, say 100's of lines
// but it only gets used once in the whole application
}
}
if($foo){
$bar = new iOnlyHaveOneMEthod;
$bar->theOneMethod();
}
Is that really better then:
if($foo){
//loads and loads of code which only gets used here and nowhere else
}
?
For readability it makes sense to move the loads and loads of code away, but shouldn't it just be in a function?
function loadsAndLoadsOfCode(){
//Loads and loads of code
}
if($foo){ loadsAndLoadsOfCode(); }
Is moving the code to a new object really better then just creating a function or putting the code in there directly?
To me the function part makes more sense and seems more readible then creating an object which hardly is of any use since it just holds one method.

The problem is not whether it's in a function or an object.
The problem is that you have hundreds of lines in one blob. Whether that mass of code is in a method of an object or just a class seems more or less irrelevant to me, just being minor syntatic sugar.
What are those hundreds of lines doing? That's the place to look to implement object oriented best practice.
If your other developers really think using an object instead of a function makes it significantly more "object oriented" but having a several-hundred line function/method isn't seen as a code smell, then I think organisationally you have some education to do.

Well, if there really is "loads and loads" of code in the method, then it should be broken down into several protected methods in that class, in which case the use of a class scope is justified.
Perhaps that code isn't reusable because it hasn't been factored well into several distinct methods. By moving it into a class and breaking it down, you might find it could be better reused elsewhere. At least it would be much more maintainable.

Whilst the function with hundreds of lines of code clearly indicates a problem (as others have already pointed out), placing it in a separate instance class rather than a static function does have advantages, which you can exploit by rejigging your example a fraction:
// let's instead assume that $bar was set earlier using a setter
if($foo){
$bar = getMyBar();
$bar->theOneMethod();
}
This gives you a couple of advantages now:
This is a simple example of the Strategy Pattern. if $bar implements an interface that provides theOneMethod() then you can dynamically switch implementations of that method;
Testing your class independently of $bar->theOneMethod() is dramatically easier, as you can replace $bar with a mock at testing time.
Neither of these advantages are available if you just use a static function.
I would argue that, whilst simple static functions have their place, non-trivial methods (as this clearly is by the 'hundreds of lines' comment) deserve their own instance anyway:
to separate concerns;
to aid testing;
to aid refactoring and reimplementation.

You are really asking two questions here:
Is just declaring a function better than creating an object to hold only this function?
Should any function contain "loads of code"?
The first part: If you want to be able to dynamically switch functions, you may need the explicit object encapsulation as a workaround in languages that cannot handle functions this way. Of course, having to allocate a new object, assign it to a variable, then call the function from that variable is a bit dumb when all you want to do is call a function.
The second part: Ideally not, but there is no clear definition of "loads", and it may be the appropriate thing to do in certain cases.

yes, the presences of loads and loads of code is a Code Smell.

I'd say you almost never want to have either a block or a method with loads of code in it -- doesn't matter if it's in it's own class or not.
Moving it to an object might be a first step in refactoring 'though - so it might make sense in that way. First move it to its own class and later split it down to several smaller methods.

Well, I'd say it depends on how tightly coupled the block of code is with the calling section of code.
If it's so tightly coupled, that I can't imagine it being used anywhere else, I'd prefer sticking it in a private method of the calling class. That way it won't be visible to other parts of your system, guaranteeing it won't be misused by others.
On the other hand, if the block of code is generic enough (email validation i.e.) to possibly be interesting in other parts of the system, I'd have no problem extracting that part into it's own class, and then consider that to be a utility class. Even if it means it will be a single-method class.
If your question was more in the lines of "what to do with hundreds and hundreds of lines of code", then you really need to be doing some refactoring.

As much as a single method with lots of code is a code smell. My first thought was to at least make the method static. No data in the class so no need for creating an object.

I think i would look to rephrase the question that you are asking. I think you want to ask the questions is my class supporting singles responsibility principle. Is there anyway to decompose the pieces of your class into seperate smaller pieces that might change independently of each other (data access and parsing, etc . .). Can you unit test your class easily . .
If you can say yes to the above items, i wouldn't worry about method versus new class as the whole point here is that you have readable, maintainable code.
In my team we have red flag if a class gets long (over x amount of lines) but that is just a heuristic as if you class has 2000 lines of codes it probably can get broken down and is probably not supporting SRP.

For testability, it is definitely better to break it out into a separate class with separate method(s). It is a whole lot easier to write unit tests for single methods than as part of an inline if statement in a code-behind file or whatnot.
That being said, I agree with everyone else that the method should be broken out into single responsibility methods instead of hundreds of lines of code. This too will make it more readable and easier to test. And hopefully, you might get some reuse out of some of the logic contained in that big mess of code.

Related

Flaw: Constructor does Real Work

I have a class which represents a set of numbers. The constructor takes three arguments: startValue, endValue and stepSize.
The class is responsible for holding a list containing all values between start and end value taking the stepSize into consideration.
Example: startValue: 3, endValue: 1, stepSize = -1, Collection = { 3,2,1 }
I am currently creating the collection and some info strings about the object in the constructor. The public members are read only info strings and the collection.
My constructor does three things at the moment:
Checks the arguments; this could throw an exception from the constructor
Fills values into the collection
Generates the information strings
I can see that my constructor does real work but how can I fix this, or, should I fix this? If I move the "methods" out of the constructor it is like having init function and leaving me with an not fully initialized object. Is the existence of my object doubtful? Or is it not that bad to have some work done in the constructor because it is still possible to test the constructor because no object references are created.
For me it looks wrong but it seems that I just can't find a solution. I also have taken a builder into account but I am not sure if that's right because you can't choose between different types of creations. However single unit tests would have less responsibility.
I am writing my code in C# but I would prefer a general solution, that's why the text contains no code.
EDIT: Thanks for editing my poor text (: I changed the title back because it represents my opinion and the edited title did not. I am not asking if real work is a flaw or not. For me, it is. Take a look at this reference.
http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/
The blog states the problems quite well. Still I can't find a solution.
Concepts that urge you to keep your constructors light weight:
Inversion of control (Dependency Injection)
Single responsibility principle (as applied to the constructor rather than a class)
Lazy initialization
Testing
K.I.S.S.
D.R.Y.
Links to arguments of why:
How much work should be done in a constructor?
What (not) to do in a constructor
Should a C++ constructor do real work?
http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/
If you check the arguments in the constructor that validation code can't be shared if those arguments come in from any other source (setter, constructor, parameter object)
If you fill values into the collection or generate the information strings in the constructor that code can't be shared with other constructors you may need to add later.
In addition to not being able to be shared there is also being delayed until really needed (lazy init). There is also overriding thru inheritance that offers more options with many methods that just do one thing rather then one do everything constructor.
Your constructor only needs to put your class into a usable state. It does NOT have to be fully initialized. But it is perfectly free to use other methods to do the real work. That just doesn't take advantage of the "lazy init" idea. Sometimes you need it, sometimes you don't.
Just keep in mind anything that the constructor does or calls is being shoved down the users / testers throat.
EDIT:
You still haven't accepted an answer and I've had some sleep so I'll take a stab at a design. A good design is flexible so I'm going to assume it's OK that I'm not sure what the information strings are, or whether our object is required to represent a set of numbers by being a collection (and so provides iterators, size(), add(), remove(), etc) or is merely backed by a collection and provides some narrow specialized access to those numbers (such as being immutable).
This little guy is the Parameter Object pattern
/** Throws exception if sign of endValue - startValue != stepSize */
ListDefinition(T startValue, T endValue, T stepSize);
T can be int or long or short or char. Have fun but be consistent.
/** An interface, independent from any one collection implementation */
ListFactory(ListDefinition ld){
/** Make as many as you like */
List<T> build();
}
If we don't need to narrow access to the collection, we're done. If we do, wrap it in a facade before exposing it.
/** Provides read access only. Immutable if List l kept private. */
ImmutableFacade(List l);
Oh wait, requirements change, forgot about 'information strings'. :)
/** Build list of info strings */
InformationStrings(String infoFilePath) {
List<String> read();
}
Have no idea if this is what you had in mind but if you want the power to count line numbers by twos you now have it. :)
/** Assuming information strings have a 1 to 1 relationship with our numbers */
MapFactory(List l, List infoStrings){
/** Make as many as you like */
Map<T, String> build();
}
So, yes I'd use the builder pattern to wire all that together. Or you could try to use one object to do all that. Up to you. But I think you'll find few of these constructors doing much of anything.
EDIT2
I know this answer's already been accepted but I've realized there's room for improvement and I can't resist. The ListDefinition above works by exposing it's contents with getters, ick. There is a "Tell, don't ask" design principle that is being violated here for no good reason.
ListDefinition(T startValue, T endValue, T stepSize) {
List<T> buildList(List<T> l);
}
This let's us build any kind of list implementation and have it initialized according to the definition. Now we don't need ListFactory. buildList is something I call a shunt. It returns the same reference it accepted after having done something with it. It simply allows you to skip giving the new ArrayList a name. Making a list now looks like this:
ListDefinition<int> ld = new ListDefinition<int>(3, 1, -1);
List<int> l = new ImmutableFacade<int>( ld.buildList( new ArrayList<int>() ) );
Which works fine. Bit hard to read. So why not add a static factory method:
List<int> l = ImmutableRangeOfNumbers.over(3, 1, -1);
This doesn't accept dependency injections but it's built on classes that do. It's effectively a dependency injection container. This makes it a nice shorthand for popular combinations and configurations of the underlying classes. You don't have to make one for every combination. The point of doing this with many classes is now you can put together whatever combination you need.
Well, that's my 2 cents. I'm gonna find something else to obsess on. Feedback welcome.
As far as cohesion is concerned, there's no "real work", only work that's in line (or not) with the class/method's responsibility.
A constructor's responsibility is to create an instance of a class. And a valid instance for that matter. I'm a big fan of keeping the validation part as intrinsic as possible, so that you can see the invariants every time you look at the class. In other words, that the class "contains its own definition".
However, there are cases when an object is a complex assemblage of multiple other objects, with conditional logic, non-trivial validation or other creation sub-tasks involved. This is when I'd delegate the object creation to another class (Factory or Builder pattern) and restrain the accessibility scope of the constructor, but I think twice before doing it.
In your case, I see no conditionals (except argument checking), no composition or inspection of complex objects. The work done by your constructor is cohesive with the class because it essentially only populates its internals. While you may (and should) of course extract atomic, well identified construction steps into private methods inside the same class, I don't see the need for a separate builder class.
The constructor is a special member function, in a way that it constructor, but after all - it is a member function. As such, it is allowed to do things.
Consider for example c++ std::fstream. It opens a file in the constructor. Can throw an exception, but doesn't have to.
As long as you can test the class, it is all good.
It's true, a constructur should do minimum of work oriented to a single aim - successful creaation of the valid object. Whatever it takes is ok. But not more.
In your example, creating this collection in the constructor is perfectly valid, as object of your class represent a set of numbers (your words). If an object is set of numbers, you should clearly create it in the constructor! On the contrary - the constructur does not perform what it is made for - a fresh, valid object construction.
These info strings call my attention. What is their purpose? What exactly do you do? This sounds like something periferic, something that can be left for later and exposed through a method, like
String getInfo()
or similar.
If you want to use Microsoft's .NET Framework was an example here, it is perfectly valid both semantically and in terms of common practice, for a constructor to do some real work.
An example of where Microsoft does this is in their implementation of System.IO.FileStream. This class performs string processing on path names, opens new file handles, opens threads, binds all sorts of things, and invokes many system functions. The constructor is actually, in effect, about 1,200 lines of code.
I believe your example, where you are creating a list, is absolutely fine and valid. I would just make sure that you fail as often as possible. Say if you the minimum size higher than the maximum size, you could get stuck in an infinite loop with a poorly written loop condition, thus exhausting all available memory.
The takeaway is "it depends" and you should use your best judgement. If all you wanted was a second opinion, then I say you're fine.
It's not a good practice to do "real work" in the constructor: you can initialize class members, but you shouldn't call other methods or do more "heavy lifting" in the constructor.
If you need to do some initialization which requires a big amount of code running, a good practice will be to do it in an init() method which will be called after the object was constructed.
The reasoning for not doing heavy lifting inside the constructor is: in case something bad happens, and fails silently, you'll end up having a messed up object and it'll be a nightmare to debug and realize where the issues are coming from.
In the case you describe above I would only do the assignments in the constructor and then, in two separate methods, I would implement the validations and generate the string-information.
Implementing it this way also conforms with SRP: "Single Responsibility Principle" which suggests that any method/function should do one thing, and one thing only.

When should I extract blocks of code to private methods

When I'm writing a method I try to extract code blocks within that method out to private methods.
For example, should I need to transform one of the input parameters, I create a private method that accepts the parameter value and returns the transformed value. I call this private method from the body of the 'main' method - in essence I try to encapsulate whatever the transform operation is within the private method and name the method appropriately.
I'm really looking for answers on whether folks think this general approach is a good idea. I've had mixed feedback from other devs some of whom favor keeping all the code within the one method. I argue that small private methods encapsulate these single tasks, they argue that the class is kept cleaner if the code in kept in the one method.
It would be great to get some answers from the community on which approach you feel reflects better design or is more in line with OOP principles.
I generally do the same for many reasons:
It helps reuse.
It makes the methods have a single responsibility. This in turn makes them easy to communicate their purpose. I think that the SRP not only applies to classes but also to methods.
It makes methods easy to read and understand. My methods generally don't have more than 6 or 7 lines.
It makes it easy to later refactor them (e.g. in case you need to decouple some behavior into another object, which is very common as a system evolves).
I use, as a general rule of thumb, that having to put comments in your method body to explain what is going on is a smell and means that it can be refactored into smaller pieces.
HTH
In brief it's generally a good idea.
For more info, take a look at Neal Ford's Composed Method article from DeveloperWorks. In this article Neal illustrates how to refactor to private methods and thus isolate areas of code suitable for reuse.
The really important benefit of this exercise is the ability to
harvest reusable code. When you look at the code in Listing 1, you
don't see reusable assets; you just see a pile of code. By pulling the
olio method apart, I discover reusable assets. But the advantages go
beyond reuse. I've also created the foundation for a simple framework
to handle persistence in my application. When it comes time to create
another simple boundary class to harvest some entity from a database,
I already have code to help me do that. This is the essence of
extracting frameworks rather than building them in an ivory tower.
Well in general you should put into methods code that you need at more places (in order to do not repeat yourself).
It could also make sense to put code into other methods if a method would be very long, then it could make sense to split it up into a number of methods.

Is it OK to create an object inside a function

I work on a class in VBA, that encapsulates downloading stuff with MSXML2.XmlHttp.
There are three possibilities for the return value: Text, XML and Stream.
Should I create a function for each:
aText=myDownloader.TextSynchronous(URL,formData,dlPost,....)
aXml.load myDownloader.XmlSynchronous(URL,formData,dlPost,....)
Or can I just return the XmlHttpObject I created inside the class and then have
aText=myDownloader.Synchronous(URL,formData,dlPost,.....).ResponseText
aXML=myDownloader.Synchronous(URL,formData,dlPost,.....).ResponseXML
In the former case I can set the obj to nothing in the class but have to write several functions that are more or less the same.
In the latter case, I relay on the "garbage collector" but have a leaner class.
Both should work, but which one is better coding style?
In my opinion, the first way is better because you don't expose low level details to a high level of the abstraction.
I did something similar with a web crawler in Java, so I have a class only to manipulate the URL connection getting all the needed data (low level) and a high level class using the low level class that return an object called Page.
You can have a third method that only execute myDownloader.Synchronous(URL,formData,dlPost,.....) and stores the returned object in a private variable and the others method only manipulate this object. This form, you will only open the connection one time.
After much seeking around in the web (triggered by the comment by EmmadKareem) I found this:
First of all, Dont do localObject=Nothing at the end of a method - the variable goes out of scope anyway and is discarded. see this older but enlightening post on msdn
VBA uses reference counting and apart from some older bugs on ADO this seems to work woute well and (as I understand) immediately discards ressources that are not used anymore. So from a performance/memory usage point of view this seems not to be a problem.
As to the coding style: I think the uncomfortable fdeeling I had when I designed this could go away by simply renaming the function to myDownloader.getSyncDLObj(...) or some such.
There seem to be two camps on codestyle. One promotes clean code, which is easy to read, but uses five lines everytime you use it. Its most important prerogative is "every function should do one thing and one thing only. Their approach would probably look something like
myDownloader.URL="..."
myDownloader.method=dlSync
myDownloader.download
aText=myDownloader.getXmlHttpObj.ResponseText
myDownloader.freeResources
and one is OK with the more cluttered, but less lineconsuming
aText=myDownloader.getSyncObj(...).ResponseText
both have their merits both none is wrong, dangerous or frowned upon. As this is a helper class and I use it to remove the inner workings of the xmlhttp from the main code I am more comfortable with the second approach here. (One line for one goal ;)
I would be very interested on anyones take on that matter

Why is the java.util.Scanner class declared 'final'?

I use the Scanner class for reading multiple similar files. I would like to extend it to make sure they all use the same delimiter and I can also add methods like skipUntilYouFind(String thisHere) that all valid for them all.
I can make a utility-class that contain them, or embed the Scanner Class as a variable inside another class but this is more cumbersome.
I have found some reasons to declare a class final, but why is it done here?
Probably because extending it and overwriting some of it's methods would probably break it. And making it easier to overwrite methods would expose to much of the inner workings, so if in the future they decide to change those (for performance or some other reasons), it would be harder for them to change the class without breaking all the classes that extend it.
For example, consider the following method in the class:
public boolean nextBoolean() {
clearCaches();
return Boolean.parseBoolean(next(boolPattern()));
}
Say you want to overwrite this because you want to make 'awesome' evaluate to a 'true' boolean (for whatever reason). If you overwrite it, you can't call super.nextBoolean(), since that would consume the next token using the default logic. But if you don't call super.nextBoolean(), clearCaches() won't be called, possibly breaking the other not overwritten methods. You can't call clearCaches() because it's private. If they made it protected, but then realized that it's causing a performance problem, and wanted a new implementation that doesn't clear caches anymore, then they might break your overwritten implementation which would still be calling that.
So basically it's so they can easily change the hidden parts inside the class, which are quite complex, and protecting you from making a broken child class (or a class that could be easily be broken).
I suppose it is due to security reasons. This class reads user input, so that someone with bad intentions could extend it, modify it's behavior and you'd be screwed. If it is final, it is not that easy for the bad guy, because if he makes his own type of Scanner (not java.util.Scanner), the principles of Polymorphism would be broken. See the bad guy can be smart enough to write a bot/script which does this automatically on remote servers... He can even do it by dynamic classloading in compiled application.
I think that the link you provided explains it all.
In your case it seems like you should prefer composition instead of inheritance anyway. You are creating a utility that has some predefined behavior, and that can hide some (or all) of the details of the Scanner class.
I've seen many implementations that used inheritance in order to change a behavior. The end result was usually a monolithic design, and in some cases, a broken contract, and/or broken behavior.

Is it good convention for a class to perform functions on itself?

I've always been taught that if you are doing something to an object, that should be an external thing, so one would Save(Class) rather than having the object save itself: Class.Save().
I've noticed that in the .Net libraries, it is common to have a class modify itself as with String.Format() or sort itself as with List.Sort().
My question is, in strict OOP is it appropriate to have a class which performs functions on itself when called to do so, or should such functions be external and called on an object of the class' type?
Great question. I have just recently reflected on a very similar issue and was eventually going to ask much the same thing here on SO.
In OOP textbooks, you sometimes see examples such as Dog.Bark(), or Person.SayHello(). I have come to the conclusion that those are bad examples. When you call those methods, you make a dog bark, or a person say hello. However, in the real world, you couldn't do this; a dog decides himself when it's going to bark. A person decides itself when it will say hello to someone. Therefore, these methods would more appropriately be modelled as events (where supported by the programming language).
You would e.g. have a function Attack(Dog), PlayWith(Dog), or Greet(Person) which would trigger the appropriate events.
Attack(dog) // triggers the Dog.Bark event
Greet(johnDoe) // triggers the Person.SaysHello event
As soon as you have more than one parameter, it won't be so easy deciding how to best write the code. Let's say I want to store a new item, say an integer, into a collection. There's many ways to formulate this; for example:
StoreInto(1, collection) // the "classic" procedural approach
1.StoreInto(collection) // possible in .NET with extension methods
Store(1).Into(collection) // possible by using state-keeping temporary objects
According to the thinking laid out above, the last variant would be the preferred one, because it doesn't force an object (the 1) to do something to itself. However, if you follow that programming style, it will soon become clear that this fluent interface-like code is quite verbose, and while it's easy to read, it can be tiring to write or even hard to remember the exact syntax.
P.S.: Concerning global functions: In the case of .NET (which you mentioned in your question), you don't have much choice, since the .NET languages do not provide for global functions. I think these would be technically possible with the CLI, but the languages disallow that feature. F# has global functions, but they can only be used from C# or VB.NET when they are packed into a module. I believe Java also doesn't have global functions.
I have come across scenarios where this lack is a pity (e.g. with fluent interface implementations). But generally, we're probably better off without global functions, as some developers might always fall back into old habits, and leave a procedural codebase for an OOP developer to maintain. Yikes.
Btw., in VB.NET, however, you can mimick global functions by using modules. Example:
Globals.vb:
Module Globals
Public Sub Save(ByVal obj As SomeClass)
...
End Sub
End Module
Demo.vb:
Imports Globals
...
Dim obj As SomeClass = ...
Save(obj)
I guess the answer is "It Depends"... for Persistence of an object I would side with having that behavior defined within a separate repository object. So with your Save() example I might have this:
repository.Save(class)
However with an Airplane object you may want the class to know how to fly with a method like so:
airplane.Fly()
This is one of the examples I've seen from Fowler about an aenemic data model. I don't think in this case you would want to have a separate service like this:
new airplaneService().Fly(airplane)
With static methods and extension methods it makes a ton of sense like in your List.Sort() example. So it depends on your usage pattens. You wouldn't want to have to new up an instance of a ListSorter class just to be able to sort a list like this:
new listSorter().Sort(list)
In strict OOP (Smalltalk or Ruby), all methods belong to an instance object or a class object. In "real" OOP (like C++ or C#), you will have static methods that essentially stand completely on their own.
Going back to strict OOP, I'm more familiar with Ruby, and Ruby has several "pairs" of methods that either return a modified copy or return the object in place -- a method ending with a ! indicates that the message modifies its receiver. For instance:
>> s = 'hello'
=> "hello"
>> s.reverse
=> "olleh"
>> s
=> "hello"
>> s.reverse!
=> "olleh"
>> s
=> "olleh"
The key is to find some middle ground between pure OOP and pure procedural that works for what you need to do. A Class should do only one thing (and do it well). Most of the time, that won't include saving itself to disk, but that doesn't mean Class shouldn't know how to serialize itself to a stream, for instance.
I'm not sure what distinction you seem to be drawing when you say "doing something to an object". In many if not most cases, the class itself is the best place to define its operations, as under "strict OOP" it is the only code that has access to internal state on which those operations depend (information hiding, encapsulation, ...).
That said, if you have an operation which applies to several otherwise unrelated types, then it might make sense for each type to expose an interface which lets the operation do most of the work in a more or less standard way. To tie it in to your example, several classes might implement an interface ISaveable which exposes a Save method on each. Individual Save methods take advantage of their access to internal class state, but given a collection of ISaveable instances, some external code could define an operation for saving them to a custom store of some kind without having to know the messy details.
It depends on what information is needed to do the work. If the work is unrelated to the class (mostly equivalently, can be made to work on virtually any class with a common interface), for example, std::sort, then make it a free function. If it must know the internals, make it a member function.
Edit: Another important consideration is performance. In-place sorting, for example, can be miles faster than returning a new, sorted, copy. This is why quicksort is faster than merge sort in the vast majority of cases, even though merge sort is theoretically faster, which is because quicksort can be performed in-place, whereas I've never heard of an in-place merge-sort. Just because it's technically possible to perform an operation within the class's public interface, doesn't mean that you actually should.