Passing object references needlessly through a middleman - oop

I often find myself needing reference to an object that is several objects away, or so it seems. The options I see are passing a reference through a middle-man or just making something available statically. I understand the danger of global scope, but passing a reference through an object that does nothing with it feels ridiculous. I'm okay with a little bit passing around, I suppose. I suspect there's a line to be drawn somewhere.
Does anyone have insight on where to draw this line?
Or a good way to deal with the problem of distributing references amongst dependent objects?

Use the Law of Demeter (with moderation and good taste, not dogmatically). If you're coding a.b.c.d.e, something IS wrong -- you've nailed forevermore the implementation of a to have a b which has a c which... EEP!-) One or at the most two dots is the maximum you should be using. But the alternative is NOT to plump things into globals (and ensure thread-unsafe, buggy, hard-to-maintain code!), it is to have each object "surface" those characteristics it is designed to maintain as part of its interface to clients going forward, instead of just letting poor clients go through such undending chains of nested refs!

This smells of an abstraction that may need some improvement. You seem to be violating the Law of Demeter.

In some cases a global isn't too bad.
Consider, you're probably programming against an operating system's API. That's full of globals, you can probably access a file or the registry, write to the console. Look up a window handle. You can do loads of stuff to access state that is global across the whole computer, or even across the internet... and you don't have to pass a single reference to your class to access it. All this stuff is global if you access the OS's API.
So, when you consider the number of global things that often exist, a global in your own program probably isn't as bad as many people try and make out and scream about.
However, if you want to have very nice OO code that is all unit testable, I suppose you should be writing wrapper classes around any access to globals whether they come from the OS, or are declared yourself to encapsulate them. This means you class that uses this global state can get references to the wrappers, and they could be replaced with fakes.
Hmm, anyway. I'm not quite sure what advice I'm trying to give here, other than say, structuring code is all a balance! And, how to do it for your particular problem depends on your preferences, preferences of people who will use the code, how you're feeling on the day on the academic to pragmatic scale, how big the code base is, how safety critical the system is and how far off the deadline for completion is.

I believe your question is revealing something about your classes. Maybe the responsibilities could be improved ? Maybe moving some code would solve problems ?
Tell, don't ask.
That's how it was explained to me. There is a natural tendency to call classes to obtain some data. Taken too far, asking too much, typically leads to heavy "getter sequences". But there is another way. I must admit it is not easy to find, but improves gradually in a specific code and in the coder's habits.
Class A wants to perform a calculation, and asks B's data. Sometimes, it is appropriate that A tells B to do the job, possibly passing some parameters. This could replace B's "getName()", used by A to check the validity of the name, by an "isValid()" method on B.
"Asking" has been replaced by "telling" (calling a method that executes the computation).
For me, this is the question I ask myself when I find too many getter calls. Gradually, the methods encounter their place in the correct object, and everything gets a bit simpler, I have less getters and less call to them. I have less code, and it provides more semantic, a better alignment with the functional requirement.
Move the data around
There are other cases where I move some data. For example, if a field moves two objects up, the length of the "getter chain" is reduced by two.
I believe nobody can find the correct model at first.
I first think about it (using hand-written diagrams is quick and a big help), then code it, then think again facing the real thing... Then I code the rest, and any smells I feel in the code, I think again...
Split and merge objects
If a method on A needs data from C, with B as a middle man, I can try if A and C would have some in common. Possibly, A or a part of A could become C (possible splitting of A, merging of A and C) ...
However, there are cases where I keep the getters of course.
But it's less likely a long chain will be created.
A long chain will probably get broken by one of the techniques above.

I have three patterns for this:
Pass the necessary reference to the object's constructor -- the reference can then be stored as a data member of the object, and doesn't need to be passed again; this implies that the object's factory has the necessary reference. For example, when I'm creating a DOM, I pass the element name to the DOM node when I construct the DOM node.
Let things remember their parent, and get references to properties via their parent; this implies that the parent or ancestor has the necessary property. For example, when I'm creating a DOM, there are various things which are stored as properties of the top-level DomDocument ancestor, and its child nodes can access those properties via the reference which each one has to its parent.
Put all the different things which are passed around as references into a single class, and then pass around just that one class instance as the only thing that's passed around. For example, there are many properties required to render a DOM (e.g. the GDI graphics handle, the viewport coordinates, callback events, etc.) ... I put all of these things into a single 'Context' instance which is passed as the only parameter to the methods of the DOM nodes to be rendered, and each method can get whichever properties it needs out of that context parameter.

Related

What are the drawbacks of encapsulating arguments for different cases in one object?

I'll give you an example about path finding. When you wnat to find a path, you can pick a final destination, a initial position and find the fastest way between the two, or you can just define the first position, and let the algorithm show every path you can finish, or you may want to mock this for a test and just say the final destination and assume you "teleport" to there, and so on. It's clear that the function is the same: finding a path. But the arguments may vary between implementations. I've searched a lot and found a lot of solutions: getting rid of the interface, putting all the arguments as fields in the implementation, using the visitor pattern...
But I'd like to know from you guys what is the drawback of putting every possible argument (not state) in one object (let's call it MovePreferences) and letting every implementation take what it needs. Sure, may you need another implementation that takes as argument that you didn't expect, you will need to change the MovePreferences, but it don't sound too bad, since you will only add methods to it, not refactor any existing method. Even though this MovePreferences is not an object of my domain, I'm still tempted to do it. What do you think?
(If you have a better solution to this problem, feel free to add it to your answer.)
The question you are asking is really why have interfaces at all, no, why have any concept of context short of 'whatever I need?' I think the answers to that are pretty straightforward: programming with shared global state is easy for you, the programmer, and quickly turns into a vortex for everyone else once they have to coalesce different features, for different customers, render enhancements, etc.
Now the far other end of the spectrum is the DbC argument: every single interface must be a highly constrained contract that not only keeps the knowledge exchanged to an absolute minimum, but makes the possibility of mayhem minimal.
Frankly, this is one of the reasons why dependency injection can quickly turn into a mess: as soon as design issues like this come up, people just start injecting more 'objects,' often to get access to just one property, whose scope might not be the same as the scope of the present operation. [Different kind of nightmare.]
Unfortunately, there's almost no information in your question. Do I think it would be possible to correctly model the notion of a Route? Sure. That doesn't sound very challenging. Here are a few ideas:
Make a class called Route that has starting and ending points. Then a collection of Traversals. The idea here would be that a Route could completely ignore the notion of how someone got from point a to point b, where traversal could contain information about roads, traffic, closures, whatever. Then your mocked case could just have no Traversals inside.
Another option would be to make Route a Composite so that each trip is then seen as the stringing together of various segments. That's the way routes are usually presented: go 2 miles on 2 South, exit, go 3 miles east on Santa Monica Boulevard, etc. In this scenario, you could just have Routes that have no children.
Finally, you will probably need a creational pattern. Perhaps a Builder. That simplifies mocking things too because you can just make a mock builder and have it construct Routes that consist of whatever you need.
The other advantage of combining the Composite and Builder is that you could make a builder that can build a new Route from an existing one by trying to improve only the troubling subsegments, e.g. it got traffic information that the 2S was slow, it could just replace that one segment and present its new route.
Consider an example,
Say if 5 arguments are encapsulated in an object and passed on to 3 methods.
If the object undergoes change in structure, then we need to run test cases for all the 3 methods. Instead if the method accepts only the arguments they need, they need not be tested.
Only problem I see out of this is Increase in Testing Efforts
Secondly you will naturally violate Single Responsibility Principle(SRP) if you pass more arguments than what the method actually needs.

Is a class that manages multiple classes a "god object"?

Reading the wikipedia entry about God Objects, it says that a class is a god object when it knows too much or does too much.
I see the logic behind this, but if it's true, then how do you couple every different class? Don't you always use a master class for connecting window management, DB connections, etc?
The main function/method may know about the existence of the windows, databases, and other objects. It may perform over-arching tasks like introduce the model to the controller.
But that doesn't mean it manages all the little details. It probably doesn't know anything about how the database or windows are implemented.
If it did, it could be accused of being a God object.
A god object is an object that contains references, directly or indirectly, to most if not all objects within an application. As the question observes, it is almost impossible to avoid having a god object in an application. Some object must hold references to the various subsystems: UI, database, communications, business logic, etc. Note that the god object need not be application-defined. Many frameworks have built-in god objects with names like "application context", "application environment", "session", "activator", etc.
The issue is not whether a god object exists, but rather how it is used. I will illustrate with an extreme example...
Let's say that in my application I want to standardize how many decimal places of precision to show when displaying numbers. However, I want the precision to be configurable. I create a class whose responsibility is to convert numbers to strings:
class NumberFormatter {
...
String format(double value) {
int decimalPlaces = getConfiguredPrecision();
return formatDouble(value, decimalPlaces);
}
int getConfiguredPrecision() {
return /* what ??? */;
}
}
The question is, how does getConfiguredPrecision figure out what to return? One way would be to give NumberFormatter a reference to the global application context which it stores in a member field called _appContext. Then we could write:
return _appContext.getPreferenceManager().getNumericPreferences().getDecimalPlaces();
By doing this, we have just made NumberFormatter into a god object as well! Why? Because now we can (indirectly) reference virtually any object in the application through its _appContext field. Is this bad? Yes, it is.
I'm going to write a unit test for NumberFormatter. Let's set up the parameters... it needs an application context?! WTF, that has 57 methods I need to mock. Oh, it only needs the pref manager... WTF, I have to mock 14 methods! Numeric prefs!?! Screw it, the class is simple enough, I don't need to test it...
Let's say that the application context had another method, getDatabaseManager(). Last week we were using SQL, so the method returned an SQL database object. But this week, we've decided to change to a NoSQL database and the method now returns a new type. Is NumberFormatter affected by the change? Hmmm, I can't remember... yeah, it might be, I see it takes an application context in the constructor... let me open the source and take a look... nope, we're in luck: it only accesses getPreferenceManager()... now let's check the other 93 classes that take an application context as a parameter...
This same scenario occurs if a change is made to the preferences manager, or the numeric preferences object. The moral of the story is that an object should only hold references to the things that it needs to perform its job, and only those things. In the case of NumberFormatter, all it needs to know is a single integer -- the number of decimal places. It could be created directly by the application god object who knows the magic number (or the pref manager or better still, numeric prefs), without turning the formatter into a god object itself. Furthermore, any components that need to format numbers could be given a formatter instead of the god object. Wins all around.
So, to summarize, the problem is not the existence of a god object but rather the act of conferring god-like status to other objects willy-nilly.
Incidentally, the design principle that tackles this problem head-on has become known as the Law of Demeter. Or "when paying at a restaurant, give the server your money not your wallet."
In my experience this most often occurs when you're dealing with code that is the product of "Develop as you go" project management (or lack there of). When a project is not thought through and planned and object responsibilities are loose and not delegated properly. In theses scenarios you find a "god-object" being the catchall for code that doesn't have any obvious organization or delegation.
It is not the interconnectedness or coupling of the different classes that is the problem with god-objects, it's the fact that a god-object many times can accomplish most if not all responsibilities of it's derived children, and are fairly unpredictable (by anyone other than the developer) as to what their defined responsibilities are.
Simply knowing about "multiple" classes doesn't make one a God; knowing about multiple classes in order to solve a problem that should be split into several sub-problems does make one a God.
I think the focus should be on whether a problem should be split into several sub-problems, not on the number of classes a given object knows about (as you pointed out, sometimes knowing about several classes is necessary).
Gods are over-hyped.

Is a function an example of encapsulation?

By putting functionality into a function, does that alone constitute an example of encapsulation or do you need to use objects to have encapsulation?
I'm trying to understand the concept of encapsulation. What I thought was if I go from something like this:
n = n + 1
which is executed out in the wild as part of a big body of code and then I take that, and put it in a function such as this one, then I have encapsulated that addition logic in a method:
addOne(n)
n = n + 1
return n
Or is it more the case that it is only encapsulation if I am hiding the details of addOne from the outside world - like if it is an object method and I use an access modifier of private/protected?
I will be the first to disagree with what seems to be the answer trend. Yes, a function encapsulates some amount of implementation. You don't need an object (which I think you use to mean a class).
See Meyers too.
Perhaps you are confusing abstraction with encapsulation, which is understood in the broader context of object orientation.
Encapsulation properly includes all three of the following:
Abstraction
Implementation Hiding
Division of Responsibility
Abstraction is only one component of encapsulation. In your example you have abstracted the adding functionality from the main body of code in which it once resided. You do this by identifying some commonality in the code - recognizing a concept (addition) over a specific case (adding the number one to the variable n). Because of this ability, abstraction makes an encapsulated component - a method or an object - reusable.
Equally important to the notion of encapsulation is the idea of implementation hiding. This is why encapsulation is discussed in the arena of object orientation. Implementation hiding protects an object from its users and vice versa. In OO, you do this by presenting an interface of public methods to the users of your object, while the implementation of the object takes place inside private methods.
This serves two benefits. First, by limiting access to your object, you avoid a situation where users of the object can leave the object in an invalid state. Second, from the user's perspective, when they use your object they are only loosely coupled to it - if you change your implementation later on, they are not impacted.
Finally, division of responsility - in the broader context of an OO design - is something that must be considered to address encapsulation properly. It's no use encapsulating a random collection of functions - responsibility needs to be cleanly and logically defined so that there is as little overlap or ambiguity as possible. For example, if we have a Toilet object we will want to wall off its domain of responsibilities from our Kitchen object.
In a limited sense, though, you are correct that a function, let's say, 'modularizes' some functionality by abstracting it. But, as I've said, 'encapsulation' as a term is understood in the broader context of object orientation to apply to a form of modularization that meets the three criteria listed above.
Sure it is.
For example, a method that operates only on its parameters would be considered "better encapsulated" than a method that operates on global static data.
Encapsulation has been around long before OOP :)
A method is no more an example of encapsulation than a car is an example of good driving. Encapsulation isn't about the synax, it is a logical design issue. Both objects and methods can exhibit good and bad encapsulation.
The simplest way to think about it is whether the code hides/abstracts the details from other parts of the code that don't have a need to know/care about the implementation.
Going back to the car example:
Automatic transmission offers good encapsulation: As a driver you care about forward/back and speed.
Manual Transmission is bad encapsulation: From the driver's perspective the specific gear required for low/high speeds is generally irrelevant to the intent of the driver.
No, objects aren't required for encapsulation. In the very broadest sense, "encapsulation" just means "hiding the details from view" and in that regard a method is encapsulating its implementation details.
That doesn't really mean you can go out and say your code is well-designed just because you divided it up into methods, though. A program consisting of 500 public methods isn't much better than that same program implemented in one 1000-line method.
In building a program, regardless of whether you're using object oriented techniques or not, you need to think about encapsulation at many different places: hiding the implementation details of a method, hiding data from code that doesn't need to know about it, simplifying interfaces to modules, etc.
Update: To answer your updated question, both "putting code in a method" and "using an access modifier" are different ways of encapsulating logic, but each one acts at a different level.
Putting code in a method hides the individual lines of code that make up that method so that callers don't need to care about what those lines are; they only worry about the signature of the method.
Flagging a method on a class as (say) "private" hides that method so that a consumer of the class doesn't need to worry about it; they only worry about the public methods (or properties) of your class.
The abstract concept of encapsulation means that you hide implementation details. Object-orientation is but one example of the use of ecnapsulation. Another example is the language called module-2 that uses (or used) implementation modules and definition modules. The definition modules hid the actual implementation and therefore provided encapsulation.
Encapsulation is used when you can consider something a black box. Objects are a black box. You know the methods they provide, but not how they are implemented.
[EDIT]
As for the example in the updated question: it depends on how narrow or broad you define encapsulation. Your AddOne example does not hide anything I believe. It would be information hiding/encapsulation if your variable would be an array index and you would call your method moveNext and maybe have another function setValue and getValue. This would allow people (together maybe with some other functions) to navigate your structure and setting and getting variables with them being aware of you using an array. If you programming language would support other or richer concepts you could change the implementation of moveNext, setValue and getValue with changing the meaning and the interface. To me that is encapsulation.
It's a component-level thing
Check this out:
In computer science, Encapsulation is the hiding of the internal mechanisms and data structures of a software component behind a defined interface, in such a way that users of the component (other pieces of software) only need to know what the component does, and cannot make themselves dependent on the details of how it does it. The purpose is to achieve potential for change: the internal mechanisms of the component can be improved without impact on other components, or the component can be replaced with a different one that supports the same public interface.
(I don't quite understand your question, let me know if that link doesn't cover your doubts)
Let's simplify this somewhat with an analogy: you turn the key of your car and it starts up. You know that there's more to it than just the key, but you don't have to know what is going on in there. To you, key turn = motor start. The interface of the key (that is, e.g., the function call) hides the implementation of the starter motor spinning the engine, etc... (the implementation). That's encapsulation. You're spared from having to know what's going on under the hood, and you're happy for it.
If you created an artificial hand, say, to turn the key for you, that's not encapsulation. You're turning the key with additional middleman cruft without hiding anything. That's what your example reminds me of - it's not encapsulating implementation details, even though both are accomplished through function calls. In this example, anyone picking up your code will not thank you for it. They will, in fact, be more likely to club you with your artificial hand.
Any method you can think of to hide information (classes, functions, dynamic libraries, macros) can be used for encapsulation.
Encapsulation is a process in which attributes(data member) and behavior(member function) of a objects in combined together as a single entity refer as class.
The Reference Model of Open Distributed Processing - written by the International Organisation for Standardization - defines the following concepts:
Entity: Any concrete or abstract thing of interest.
Object: A model of an entity. An object is characterised by its behaviour and, dually, by its state.
Behaviour (of an object): A collection of actions with a set of constraints on when they may occur.
Interface: An abstraction of the behaviour of an object that consists of a subset of the interactions of that object together with a set of constraints on when they may occur.
Encapsulation: the property that the information contained in an object is accessible only through interactions at the interfaces supported by the object.
These, you will appreciate, are quite broad. Let us see, however, whether putting functionality within a function can logically be considered to constitute towards encapsulation in these terms.
Firstly, a function is clearly a model of a, 'Thing of interest,' in that it represents an algorithm you (presumably) desire executed and that algorithm pertains to some problem you are trying to solve (and thus is a model of it).
Does a function have behaviour? It certainly does: it contains a collection of actions (which could be any number of executable statements) that are executed under the constraint that the function must be called from somewhere before it can execute. A function may not spontaneously be called at any time, without causal factor. Sounds like legalese? You betcha. But let's plough on, nonetheless.
Does a function have an interface? It certainly does: it has a name and a collection of formal parameters, which in turn map to the executable statements contained in the function in that, once a function is called, the name and parameter list are understood to uniquely identify the collection of executable statements to be run without the calling party's specifying those actual statements.
Does a function have the property that the information contained in the function is accessible only through interactions at the interfaces supported by the object? Hmm, well, it can.
As some information is accessible via its interface, some information must be hidden and inaccessible within the function. (The property such information exhibits is called information hiding, which Parnas defined by arguing that modules should be designed to hide both difficult decisions and decisions that are likely to change.) So what information is hidden within a function?
To see this, we should first consider scale. It's easy to claim that, for example, Java classes can be encapsulated within a package: some of the classes will be public (and hence be the package's interface) and some will be package-private (and hence information-hidden within the package). In encapsulation theory, the classes form nodes and the packages form encapsulated regions, with the entirety forming an encapsulated graph; the graph of classes and packages is called the third graph.
It's also easy to claim that functions (or methods) themselves are encapsulated within classes. Again, some functions will be public (and hence be part of the class's interface) and some will be private (and hence information-hidden within the class). The graph of functions and classes is called the second graph.
Now we come to functions. If functions are to be a means of encapsulation themselves they they should contain some information public to other functions and some information that's information-hidden within the function. What could this information be?
One candidate is given to us by McCabe. In his landmark paper on cyclomatic complexity, Thomas McCabe describes source code where, 'Each node in the graph corresponds to a block of code in the program where the flow is sequential and the arcs correspond to branches taken in the program.'
Let us take the McCabian block of sequential execution as the unit of information that may be encapsulated within a function. As the first block within the function is always the first and only guaranteed block to be executed, we can consider the first block to be public, in that it may be called by other functions. All the other blocks within the function, however, cannot be called by other functions (except in languages that allow jumping into functions mid-flow) and so these blocks may be considered information-hidden within the function.
Taking these (perhaps slightly tenuous) definitions, then we may say yes: putting functionality within a function does constitute to encapsulation. The encapsulation of blocks within functions is the first graph.
There is a caveate, however. Would you consider a package whose every class was public to be encapsulated? According to the definitions above, it does pass the test, as you can say that the interface to the package (i.e., all the public classes) do indeed offer a subset of the package's behaviour to other packages. But the subset in this case is the entire package's behaviour, as no classes are information-hidden. So despite regorously satisfying the above definitions, we feel that it does not satisfy the spirit of the definitions, as surely something must be information-hidden for true encapsulation to be claimed.
The same is true for the exampe you give. We can certainly consider n = n + 1 to be a single McCabian block, as it (and the return statement) are a single, sequential flow of executions. But the function into which you put this thus contains only one block, and that block is the only public block of the function, and therefore there are no information-hidden blocks within your proposed function. So it may satisfy the definition of encapsulation, but I would say that it does not satisfy the spirit.
All this, of course, is academic unless you can prove a benefit such encapsulation.
There are two forces that motivate encapsulation: the semantic and the logical.
Semantic encapsulation merely means encapsulation based on the meaning of the nodes (to use the general term) encapsulated. So if I tell you that I have two packages, one called, 'animal,' and one called 'mineral,' and then give you three classes Dog, Cat and Goat and ask into which packages these classes should be encapsulated, then, given no other information, you would be perfectly right to claim that the semantics of the system would suggest that the three classes be encapsulated within the, 'animal,' package, rather than the, 'mineral.'
The other motivation for encapsulation, however, is logic.
The configuration of a system is the precise and exhaustive identification of each node of the system and the encapsulated region in which it resides; a particular configuration of a Java system is - at the third graph - to identify all the classes of the system and specify the package in which each class resides.
To logically encapsulate a system means to identify some mathematical property of the system that depends on its configuration and then to configure that system so that the property is mathematically minimised.
Encapsulation theory proposes that all encapsulated graphs express a maximum potential number of edges (MPE). In a Java system of classes and packages, for example, the MPE is the maximum potential number of source code dependencies that can exist between all the classes of that system. Two classes within the same package cannot be information-hidden from one another and so both may potentially form depdencies on one another. Two package-private classes in separate packages, however, may not form dependencies on one another.
Encapsulation theory tells us how many packages we should have for a given number of classes so that the MPE is minimised. This can be useful because the weak form of the Principle of Burden states that the maximum potential burden of transforming a collection of entities is a function of the maximum potential number of entities transformed - in other words, the more potential source code dependencies you have between your classes, the greater the potential cost of doing any particular update. Minimising the MPE thus minimises the maximum potential cost of updates.
Given n classes and a requirement of p public classes per package, encapsulation theory shows that the number of packages, r, we should have to minimise the MPE is given by the equation: r = sqrt(n/p).
This also applies to the number of functions you should have, given the total number, n, of McCabian blocks in your system. Functions always have just one public block, as we mentioned above, and so the equation for the number of functions, r, to have in your system simplifies to: r = sqrt(n).
Admittedly, few considered the total number of blocks in their system when practicing encapsulation, but it's readily done at the class/package level. And besides, minimising MPE is almost entirely entuitive: it's done by minimising the number of public classes and trying to uniformly distribute classes over packages (or at least avoid have most packages with, say, 30 classes, and one monster pacakge with 500 classes, in which case the internal MPE of the latter can easily overwhelm the MPE of all the others).
Encapsulation thus involves striking a balance between the semantic and the logical.
All great fun.
in strict object-oriented terminology, one might be tempted to say no, a "mere" function is not sufficiently powerful to be called encapsulation...but in the real world the obvious answer is "yes, a function encapsulates some code".
for the OO purists who bristle at this blasphemy, consider a static anonymous class with no state and a single method; if the AddOne() function is not encapsulation, then neither is this class!
and just to be pedantic, encapsulation is a form of abstraction, not vice-versa. ;-)
It's not normally very meaningful to speak of encapsulation without reference to properties rather than solely methods -- you can put access controls on methods, certainly, but it's difficult to see how that's going to be other than nonsensical without any data scoped to the encapsulated method. Probably you could make some argument validating it, but I suspect it would be tortuous.
So no, you're most likely not using encapsulation just because you put a method in a class rather than having it as a global function.

How do you define a Single Responsibility?

I know about "class having a single reason to change". Now, what is that exactly? Are there some smells/signs that could tell that class does not have a single responsibility? Or could the real answer hide in YAGNI and only refactor to a single responsibility the first time your class changes?
The Single Responsibility Principle
There are many obvious cases, e.g. CoffeeAndSoupFactory. Coffee and soup in the same appliance can lead to quite distasteful results. In this example, the appliance might be broken into a HotWaterGenerator and some kind of Stirrer. Then a new CoffeeFactory and SoupFactory can be built from those components and any accidental mixing can be avoided.
Among the more subtle cases, the tension between data access objects (DAOs) and data transfer objects (DTOs) is very common. DAOs talk to the database, DTOs are serializable for transfer between processes and machines. Usually DAOs need a reference to your database framework, therefore they are unusable on your rich clients which neither have the database drivers installed nor have the necessary privileges to access the DB.
Code Smells
The methods in a class start to be grouped by areas of functionality ("these are the Coffee methods and these are the Soup methods").
Implementing many interfaces.
Write a brief, but accurate description of what the class does.
If the description contains the word "and" then it needs to be split.
Well, this principle is to be used with some salt... to avoid class explosion.
A single responsibility does not translate to single method classes. It means a single reason for existence... a service that the object provides for its clients.
A nice way to stay on the road... Use the object as person metaphor... If the object were a person, who would I ask to do this? Assign that responsibility to the corresponding class. However you wouldn't ask the same person to do your manage files, compute salaries, issue paychecks, and verify financial records... Why would you want a single object to do all these? (it's okay if a class takes on multiple responsibilities as long as they are all related and coherent.)
If you employ a CRC card, it's a nice subtle guideline. If you're having trouble getting all the responsibilities of that object on a CRC card, it's probably doing too much... a max of 7 would do as a good marker.
Another code smell from the refactoring book would be HUGE classes. Shotgun surgery would be another... making a change to one area in a class causes bugs in unrelated areas of the same class...
Finding that you are making changes to the same class for unrelated bug-fixes again and again is another indication that the class is doing too much.
A simple and practical method to check single responsibility (not only classes but also method of classes) is the name choice. When you design a class, if you easily find a name for the class that specify exactly what it defines, you're in the right way.
A difficulty to choose a name is near always a symptom of bad design.
the methods in your class should be cohesive...they should work together and make use of the same data structures internally. If you find you have too many methods that don't seem entirely well related, or seem to operate on different things, then quite likely you don't have a good single responsibility.
Often it's hard to initially find responsibilities, and sometimes you need to use the class in several different contexts and then refactor the class into two classes as you start to see the distinctions. Sometimes you find that it's because you are mixing an abstract and concrete concept together. They tend to be harder to see, and, again, use in different contexts will help clarify.
The obvious sign is when your class ends up looking like a Big Ball of Mud, which is really the opposite of SRP (single responsibility principle).
Basically, all the object's services should be focused on carrying out a single responsibility, meaning every time your class changes and adds a service which does not respect that, you know you're "deviating" from the "right" path ;)
The cause is usually due to some quick fixes hastily added to the class to repair some defects. So the reason why you are changing the class is usually the best criteria to detect if you are about to break the SRP.
Martin's Agile Principles, Patterns, and Practices in C# helped me a lot to grasp SRP. He defines SRP as:
A class should have only one reason to change.
So what is driving change?
Martin's answer is:
[...] each responsibility is an axis of change. (p. 116)
and further:
In the context of the SRP, we define a responsibility to be a reason for change. If you can think of more than one motive for changing a class, that class has more than one responsibility (p. 117)
In fact SRP is encapsulating change. If change happens, it should just have a local impact.
Where is YAGNI?
YAGNI can be nicely combined with SRP: When you apply YAGNI, you wait until some change is actually happening. If this happens you should be able to clearly see the responsibilities which are inferred from the reason(s) for change.
This also means that responsibilities can evolve with each new requirement and change. Thinking further SRP and YAGNI will provide you the means to think in flexible designs and architectures.
Perhaps a little more technical than other smells:
If you find you need several "friend" classes or functions, that's usually a good smell of bad SRP - because the required functionality is not actually exposed publically by your class.
If you end up with an excessively "deep" hierarchy (a long list of derived classes until you get to leaf classes) or "broad" hierarchy (many, many classes derived shallowly from a single parent class). It's usually a sign that the parent class does either too much or too little. Doing nothing is the limit of that, and yes, I have seen that in practice, with an "empty" parent class definition just to group together a bunch of unrelated classes in a single hierarchy.
I also find that refactoring to single responsibility is hard. By the time you finally get around to it, the different responsibilities of the class will have become entwined in the client code making it hard to factor one thing out without breaking the other thing. I'd rather err on the side of "too little" than "too much" myself.
Here are some things that help me figure out if my class is violating SRP:
Fill out the XML doc comments on a class. If you use words like if, and, but, except, when, etc., your classes probably is doing too much.
If your class is a domain service, it should have a verb in the name. Many times you have classes like "OrderService", which should probably be broken up into "GetOrderService", "SaveOrderService", "SubmitOrderService", etc.
If you end up with MethodA that uses MemberA and MethodB that uses MemberB and it is not part of some concurrency or versioning scheme, you might be violating SRP.
If you notice that you have a class that just delegates calls to a lot of other classes, you might be stuck in proxy class hell. This is especially true if you end up instantiating the proxy class everywhere when you could just use the specific classes directly. I have seen a lot of this. Think ProgramNameBL and ProgramNameDAL classes as a substitute for using a Repository pattern.
I've also been trying to get my head around the SOLID principles of OOD, specifically the single responsibility principle, aka SRP (as a side note the podcast with Jeff Atwood, Joel Spolsky and "Uncle Bob" is worth a listen). The big question for me is: What problems is SOLID trying to address?
OOP is all about modeling. The main purpose of modeling is to present a problem in a way that allows us to understand it and solve it. Modeling forces us to focus on the important details. At the same time we can use encapsulation to hide the "unimportant" details so that we only have to deal with them when absolutely necessary.
I guess you should ask yourself: What problem is your class trying to solve? Has the important information you need to solve this problem risen to the surface? Are the unimportant details tucked away so that you only have to think about them when absolutely necessary?
Thinking about these things results in programs that are easier to understand, maintain and extend. I think this is at the heart of OOD and the SOLID principles, including SRP.
Another rule of thumb I'd like to throw in is the following:
If you feel the need to either write some sort of cartesian product of cases in your test cases, or if you want to mock certain private methods of the class, Single Responsibility is violated.
I recently had this in the following way:
I had a cetain abstract syntax tree of a coroutine which will be generated into C later. For now, think of the nodes as Sequence, Iteration and Action. Sequence chains two coroutines, Iteration repeats a coroutine until a userdefined condition is true and Action performs a certain userdefined action. Furthermore, it is possible to annotate Actions and Iterations with codeblocks, which define the actions and conditions to evaluate as the coroutine walks ahead.
It was necessary to apply a certain transformation to all of these code blocks (for those interested: I needed to replace the conceptual user variables with actual implementation variables in order to prevent variable clashes. Those who know lisp macros can think of gensym in action :) ). Thus, the simplest thing that would work was a visitor which knows the operation internally and just calls them on the annotated code block of the Action and Iteration on visit and traverses all the syntax tree nodes. However, in this case, I'd have had to duplicate the assertion "transformation is applied" in my testcode for the visitAction-Method and the visitIteration-Method. In other words, I had to check the product test cases of the responsibilities Traversion (== {traverse iteration, traverse action, traverse sequence}) x Transformation (well, codeblock transformed, which blew up into iteration transformed and action transformed). Thus, I was tempted to use powermock to remove the transformation-Method and replace it with some 'return "I was transformed!";'-Stub.
However, according to the rule of thumb, I split the class into a class TreeModifier which contains a NodeModifier-instance, which provides methods modifyIteration, modifySequence, modifyCodeblock and so on. Thus, I could easily test the responsibility of traversing, calling the NodeModifier and reconstructing the tree and test the actual modification of the code blocks separately, thus removing the need for the product tests, because the responsibilities were separated now (into traversing and reconstructing and the concrete modification).
It also is interesting to notice that later on, I could heavily reuse the TreeModifier in various other transformations. :)
If you're finding troubles extending the functionality of the class without being afraid that you might end up breaking something else, or you cannot use class without modifying tons of its options which modify its behavior smells like your class doing too much.
Once I was working with the legacy class which had method "ZipAndClean", which was obviously zipping and cleaning specified folder...

Dealing with "global" data structures in an object-oriented world

This is a question with many answers - I am interested in knowing what others consider to be "best practice".
Consider the following situation: you have an object-oriented program that contains one or more data structures that are needed by many different classes. How do you make these data structures accessible?
You can explicitly pass references around, for example, in the constructors. This is the "proper" solution, but it means duplicating parameters and instance variables all over the program. This makes changes or additions to the global data difficult.
You can put all of the data structures inside of a single object, and pass around references to this object. This can either be an object created just for this purpose, or it could be the "main" object of your program. This simplifies the problems of (1), but the data structures may or may not have anything to do with one another, and collecting them together in a single object is pretty arbitrary.
You can make the data structures "static". This lets you reference them directly from other classes, without having to pass around references. This entirely avoids the disadvantages of (1), but is clearly not OO. This also means that there can only ever be a single instance of the program.
When there are a lot of data structures, all required by a lot of classes, I tend to use (2). This is a compromise between OO-purity and practicality. What do other folks do? (For what it's worth, I mostly come from the Java world, but this discussion is applicable to any OO language.)
Global data isn't as bad as many OO purists claim!
After all, when implementing OO classes you've usually using an API to your OS. What the heck is this if it isn't a huge pile of global data and services!
If you use some global stuff in your program, you're merely extending this huge environment your class implementation can already see of the OS with a bit of data that is domain specific to your app.
Passing pointers/references everywhere is often taught in OO courses and books, academically it sounds nice. Pragmatically, it is often the thing to do, but it is misguided to follow this rule blindly and absolutely. For a decent sized program, you can end up with a pile of references being passed all over the place and it can result in completely unnecessary drudgery work.
Globally accessible services/data providers (abstracted away behind a nice interface obviously) are pretty much a must in a decent sized app.
I must really really discourage you from using option 3 - making the data static. I've worked on several projects where the early developers made some core data static, only to later realise they did need to run two copies of the program - and incurred a huge amount of work making the data non-static and carefully putting in references into everything.
So in my experience, if you do 3), you will eventually end up doing 1) at twice the cost.
Go for 1, and be fine-grained about what data structures you reference from each object. Don't use "context objects", just pass in precisely the data needed. Yes, it makes the code more complicated, but on the plus side, it makes it clearer - the fact that a FwurzleDigestionListener is holding a reference to both a Fwurzle and a DigestionTract immediately gives the reader an idea about its purpose.
And by definition, if the data format changes, so will the classes that operate on it, so you have to change them anyway.
You might want to think about altering the requirement that lots of objects need to know about the same data structures. One reason there does not seem to be a clean OO way of sharing data is that sharing data is not very object-oriented.
You will need to look at the specifics of your application but the general idea is to have one object responsible for the shared data which provides services to the other objects based on the data encapsulated in it. However these services should not involve giving other objects the data structures - merely giving other objects the pieces of information they need to meet their responsibilites and performing mutations on the data structures internally.
I tend to use 3) and be very careful about the synchronisation and locking across threads. I agree it is less OO, but then you confess to having global data, which is very un-OO in the first place.
Don't get too hung up on whether you are sticking purely to one programming methodology or another, find a solution which fits your problem. I think there are perfectly valid contexts for singletons (Logging for instance).
I use a combination of having one global object and passing interfaces in via constructors.
From the one main global object (usually named after what your program is called or does) you can start up other globals (maybe that have their own threads). This lets you control the setting up of program objects in the main objects constructor and tearing them down again in the right order when the application stops in this main objects destructor. Using static classes directly makes it tricky to initialize/uninitialize any resources these classes use in a controlled manner. This main global object also has properties for getting at the interfaces of different sub-systems of your application that various objects may want to get hold of to do their work.
I also pass references to relevant data-structures into constructors of some objects where I feel it is useful to isolate those objects from the rest of the world within the program when they only need to be concerned with a small part of it.
Whether an object grabs the global object and navigates its properties to get the interfaces it wants or gets passed the interfaces it uses via its constructor is a matter of taste and intuition. Any object you're implementing that you think might be reused in some other project should definately be passed data structures it should use via its constructor. Objects that grab the global object should be more to do with the infrastructure of your application.
Objects that receive interfaces they use via the constructor are probably easier to unit-test because you can feed them a mock interface, and tickle their methods to make sure they return the right arguments or interact with mock interfaces correctly. To test objects that access the main global object, you have to mock up the main global object so that when they request interfaces (I often call these services) from it they get appropriate mock objects and can be tested against them.
I prefer using the singleton pattern as described in the GoF book for these situations. A singleton is not the same as either of the three options described in the question. The constructor is private (or protected) so that it cannot be used just anywhere. You use a get() function (or whatever you prefer to call it) to obtain an instance. However, the architecture of the singleton class guarantees that each call to get() returns the same instance.
We should take care not to confuse Object Oriented Design with Object Oriented Implementation. Al too often, the term OO Design is used to judge an implementation, just as, imho, it is here.
Design
If in your design you see a lot of objects having a reference to exactly the same object, that means a lot of arrows. The designer should feel an itch here. He should verify whether this object is just commonly used, or if it is really a utility (e.g. a COM factory, a registry of some kind, ...).
From the project's requirements, he can see if it really needs to be a singleton (e.g. 'The Internet'), or if the object is shared because it's too general or too expensive or whatsoever.
Implementation
When you are asked to implement an OO Design in an OO language, you face a lot of decisions, like the one you mentioned: how should I implement all the arrows to the oft used object in the design?
That's the point where questions are addressed about 'static member', 'global variable' , 'god class' and 'a-lot-of-function-arguments'.
The Design phase should have clarified if the object needs to be a singleton or not. The implementation phase will decide on how this singleness will be represented in the program.
Option 3) while not purist OO, tends to be the most reasonable solution. But I would not make your class a singleton; and use some other object as a static 'dictionary' to manage those shared resources.
I don't like any of your proposed solutions:
You are passing around a bunch of "context" objects - the things that use them don't specify what fields or pieces of data they are really interested in
See here for a description of the God Object pattern. This is the worst of all worlds
Simply do not ever use Singleton objects for anything. You seem to have identified a few of the potential problems yourself