When is it okay to depend on concrete classes? - oop

Today I was asked a question I could not find an answer for so here I am, asking for your help!
The Dependency Inversion Principle states that both concrete classes and abstractions should depend on abstractions right?
Regardless of that we still depend on framework classes like Integer and String. Is there a good answer as to why that is okay?
I know we should not reinvent the wheel just because it may change ever so slightly, and these particular classes I mentioned will most likely never change in a way their users will notice (their interfaces won't change).

As an additional point, note that the elements you are citing are actually in java.lang, i.e. there is no import statement, and one could say therefore no dependency to anything induced by using these types, besides the fact you use Java.
As soon as you step out of java.lang, I believe you are usually better off applying DIP, i.e. always prefer using a List<T> over using a ArrayList<T>.
The problem is to limit dependencies across borders of "components" (or module/package...) to only functional dependencies (i.e. abstractions, interfaces in Java). At some point you do need concrete implementations, that are necessarily built using at least some data structures, even if it is only arrays of basic int type. This does not violate DIP.
The use of Plain Old Data as suggested in #jaco0646 's answer, that are not violating DIP is kind of borderline ; in most cases you could use a signature that explicitly passes the fields of the struct you are considering instead of packing them into a single object ; this approach is indeed more general, e.g. you can implement it without having that POD class, maybe relying on some relational DBMS, you can interact with code written in any language etc...
However in practice, it can make sense to use POD in signatures, so that if I add a field to a POD, this will automatically propagate to all signatures that use the struct. Some of these functions may not use the new field, so we are now giving them too much information (we are leaking, with respect to strict "need to know"). Still, it can be a pragmatic answer in many cases to opt for this approach.
If we look at e.g. webservices, there is general tendency to consider POD are not a problem in service signatures, and using them helps keep clients compatible even if some new fields appear in the struct.

In OOP, an object is the encapsulated combination of data and behavior. The data is hidden; the behavior is exposed. It is these objects to which the DIP applies. Ideally, these objects should be instantiated in a Composition Root, which is the only component that depends on the objects' concrete classes.
Obviously something has to depend on the concrete classes in order to instantiate them. This is typically your DI container. The idea is that the container's sole concern is instantiating concrete classes, so everything else can obey the DIP.
On the other hand, opposite to objects, we have primitive data structures. These classes are not (necessarily) encapsulated, expose their data, and have little or no behavior. It is fine to depend on the concrete classes of data structures. These are not "objects" in the OOP sense. The DIP does not apply to data structures. Dependencies on concrete data structures should be local; however, and not exposed outside of the object that owns those dependencies.
Note that you will often see "hybrids" in code: classes that behave as both objects and data structures. They expose both their data and behavior. Hybrids are the worst of both worlds, and whether you apply the DIP to them or not, the bigger problem is that they are trying to serve two opposing purposes and violating encapsulation.

Related

"Many functions operating upon few abstractions" principle vs OOP

The creator of the Clojure language claims that "open, and large, set of functions operate upon an open, and small, set of extensible abstractions is the key to algorithmic reuse and library interoperability". Obviously it contradicts the typical OOP approach where you create a lot of abstractions (classes) and a relatively small set of functions operating on them. Please suggest a book, a chapter in a book, an article, or your personal experience that elaborate on the topics:
motivating examples of problems that appear in OOP and how using "many functions upon few abstractions" would address them
how to effectively do MFUFA* design
how to refactor OOP code towards MFUFA
how OOP languages' syntax gets in the way of MFUFA
*MFUFA: "many functions upon few abstractions"
There are two main notions of "abstraction" in programming:
parameterisation ("polymorphism", genericity).
encapsulation (data hiding),
[Edit: These two are duals. The first is client-side abstraction, the second implementer-side abstraction (and in case you care about these things: in terms of formal logic or type theory, they correspond to universal and existential quantification, respectively).]
In OO, the class is the kitchen sink feature for achieving both kinds of abstraction.
Ad (1), for almost every "pattern" you need to define a custom class (or several). In functional programming on the other hand, you often have more lightweight and direct methods to achieve the same goals, in particular, functions and tuples. It is often pointed out that most of the "design patterns" from the GoF are redundant in FP, for example.
Ad (2), encapsulation is needed a little bit less often if you don't have mutable state lingering around everywhere that you need to keep in check. You still build ADTs in FP, but they tend to be simpler and more generic, and hence you need fewer of them.
When you write program in object-oriented style, you make emphasis on expressing domain area in terms of data types. And at first glance this looks like a good idea - if we work with users, why not to have a class User? And if users sell and buy cars, why not to have class Car? This way we can easily maintain data and control flow - it just reflects order of events in the real world. While this is quite convenient for domain objects, for many internal objects (i.e. objects that do not reflect anything from real world, but occur only in program logic) it is not so good. Maybe the best example is a number of collection types in Java. In Java (and many other OOP languages) there are both arrays, Lists. In JDBC there's ResultSet which is also kind of collection, but doesn't implement Collection interface. For input you will often use InputStream that provides interface for sequential access to the data - just like linked list! However it doesn't implement any kind of collection interface as well. Thus, if your code works with database and uses ResultSet it will be harder to refactor it for text files and InputStream.
MFUFA principle teaches us to pay less attention to type definition and more to common abstractions. For this reason Clojure introduces single abstraction for all mentioned types - sequence. Any iterable is automatically coerced to sequence, streams are just lazy lists and result set may be transformed to one of previous types easily.
Another example is using PersistentMap interface for structs and records. With such common interfaces it becomes very easy to create resusable subroutines and do not spend lots of time to refactoring.
To summarize and answer your questions:
One simple example of an issue that appears in OOP frequently: reading data from many different sources (e.g. DB, file, network, etc.) and processing it in the same way.
To make good MFUFA design try to make abstractions as common as possible and avoid ad-hoc implementations. E.g. avoid types a-la UserList - List<User> is good enough in most cases.
Follow suggestions from point 2. In addition, try to add as much interfaces to your data types (classes) as it possible. For example, if you really need to have UserList (e.g. when it should have a lot of additional functionality), add both List and Iterable interfaces to its definition.
OOP (at least in Java and C#) is not very well suited for this principle, because they try to encapsulate the whole object's behavior during initial design, so it becomes hard add more functions to them. In most cases you can extend class in question and put methods you need into new object, but 1) if somebody else implements their own derived class, it will not be compatible with yours; 2) sometimes classes are final or all fields are made private, so derived classes don't have access to them (e.g. to add new functions to class String one should implement additional classStringUtils). Nevertheless, rules I described above make it much easier to use MFUFA in OOP-code. And best example here is Clojure itself, which is gracefully implemented in OO-style but still follows MFUFA principle.
UPD. I remember another description of difference between object oriented and functional styles, that maybe summarizes better all I said above: designing program in OO style is thinking in terms of data types (nouns), while designing in functional style is thinking in terms of operations (verbs). You may forget that some nouns are similar (e.g. forget about inheritance), but you should always remember that many verbs in practice do the same thing (e.g. have same or similar interfaces).
A much earlier version of the quote:
"The simple structure and natural applicability of lists are reflected in functions that are amazingly nonidiosyncratic. In Pascal the plethora of declarable data structures induces a specialization within functions that inhibits and penalizes casual cooperation. It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures."
...comes from the foreword to the famous SICP book. I believe this book has a lot of applicable material on this topic.
I think you're not getting that there's a difference between libraries and programmes.
OO libraries which work well usually generate a small number of abstractions, which programmes use to build the abstractions for their domain. Larger OO libraries (and programmes) use inheritance to create different versions of methods and introduce new methods.
So, yes, the same principle applies to OO libraries.

When do you need to create abstractions in the form of interfaces?

When do you encourage programming against an interface and not directly to a concrete class?
A guideline that I follow is to create abstractions whenever code requires to cross a logical/physical boundary, most especially when infrastructure-related concerns are involved.
Another checkpoint would be if a dependency will likely change in the future, due to possible additional concerns code (such as caching, transactional awareness, invoking a webservice instead of in-process execution) or if such dependencies have direct references to infrastructure integration points.
If code depends on something that does not require control to cross a logical/physical boundary, I more or less don't create abstractions to interact with those.
Am I missing anything?
Also, use interfaces when
Multiple objects will need to be acted upon in a particular fashion, but are not fundamentally related. Perhaps many of your business objects access a particular utility object, and when they do they need to give a reference of themselves to that utility object so the utility object can call a particular method. Have that method in an interface and pass that interface to that utility object.
Passing around interfaces as parameters can be very helpful in unit testing. Even if you have just one type of object that sports a particular interface, and hence don't really need a defined interface, you might define/implement an interface solely to "fake" that object in unit tests.
related to the first 2 bullets, check out the Observer pattern and the Dependency Injection. I'm not saying to implement these patterns, but they illustrate types of places where interfaces are really helpful.
Another twist on this is for implementing a couple of the SOLID Principals, Open Closed principal and the Interface Segregation principle. Like the previous bullet, don't get stressed about strictly implementing these principals everywhere (right away at least), but use these concepts to help move your thinking away from just what objects go where to thinking more about contracts and dependency
In the end, let's not make it too complicated: we're in a strongly typed world in .NET. If you need to call a method or set a property but the object you're passing/using could be fundamentally different, use an interface.
I would add that if your code is not going to be referenced by another library (for a while at least), then the decision of whether to use an interface in a particular situation is one that you can responsibly put off. The "extract interface" refactoring is easy to do these days. In my current project, I've got an object being passed around that I'm thinking maybe I should switch to an interface; I'm not stressing about it.
Interfaces abstraction are convenient when doing unit test. It helps for mocking test objects. It very useful in TDD for developing without actually using data from your database.
If you don't need any features of the class that aren't found in the Interface...then why not always prefer the Interface implementation?
It will make your code easier to modify in the future and easier to test (mocking).
you have the right idea, already. i would only add a couple of notes to this...
first, abstraction does not mean 'interface'. for example, a "connection string" is an abstraction, even though it's just a string... it's not about the 'type' of the thing in question, it's about the intention of use for that thing.
and secondly, if you are doing test automation of any kind, look for the pain and friction that are exposed by writing the tests. if you find yourself having to set up too many external conditions for a test, it's a sign that you need a better abstraction between the thing your testing and the things it interacts with.
I think you've said it pretty well. Much of this will be a stylistic thing. There are open source projects I've looked at where everything has an interface and an implementation, and it's kind of frustrating, but it might make iterative development a little easier, since any objects implementation can break but dummies will still work. But honestly, I can dummy any class that doesn't overuse the final keyword by inheritance.
I would add to your list this: anything which can be thought of as a black box should be abstracted. This includes some of the things you've mentioned, but it also includes hairy algorithms, which are likely to have multiple useful implementations with different advantages for different situation.
Additionally, interfaces come in handy very often with composite objects. That's the only way something like java's swing library gets anything done, but it can also be useful for more mundane objects. (I personally like having an interface like ValidityChecker with ways to and-compose or or-compose subordinate ValidityCheckers.)
Most of the useful things that come with the Interface passing have been already said. However I would add:
implementing an interface to an object, or later multiple objects, FORCES all the implementers to follow an IDENTICAL pattern to implement contract with the object. This can be useful in case you have not so OOP-experienced-programmers actually writing the implementation code.
in some languages you can add attributes on the interface itself, which can be different from the actual object implementation attribute as sense and intent

Is Inheritance really needed?

I must confess I'm somewhat of an OOP skeptic. Bad pedagogical and laboral experiences with object orientation didn't help. So I converted into a fervent believer in Visual Basic (the classic one!).
Then one day I found out C++ had changed and now had the STL and templates. I really liked that! Made the language useful. Then another day MS decided to apply facial surgery to VB, and I really hated the end result for the gratuitous changes (using "end while" instead of "wend" will make me into a better developer? Why not drop "next" for "end for", too? Why force the getter alongside the setter? Etc.) plus so much Java features which I found useless (inheritance, for instance, and the concept of a hierarchical framework).
And now, several years afterwards, I find myself asking this philosophical question: Is inheritance really needed?
The gang-of-four say we should favor object composition over inheritance. And after thinking of it, I cannot find something you can do with inheritance you cannot do with object aggregation plus interfaces. So I'm wondering, why do we even have it in the first place?
Any ideas? I'd love to see an example of where inheritance would be definitely needed, or where using inheritance instead of composition+interfaces can lead to a simpler and easier to modify design. In former jobs I've found if you need to change the base class, you need to modify also almost all the derived classes for they depended on the behaviour of parent. And if you make the base class' methods virtual... then not much code sharing takes place :(
Else, when I finally create my own programming language (a long unfulfilled desire I've found most developers share), I'd see no point in adding inheritance to it...
Really really short answer: No. Inheritance is not needed because only byte code is truly needed. But obviously, byte code or assemble is not a practically way to write your program. OOP is not the only paradigm for programming. But, I digress.
I went to college for computer science in the early 2000s when inheritance (is a), compositions (has a), and interfaces (does a) were taught on an equal footing. Because of this, I use very little inheritance because it is often suited better by composition. This was stressed because many of the professors had seen bad code (along with what you have described) because of abuse of inheritance.
Regardless of creating a language with or without inheritances, can you create a programming language which prevents bad habits and bad design decisions?
I think asking for situations where inheritance is really needed is missing the point a bit. You can fake inheritance by using an interface and some composition. This doesnt mean inheritance is useless. You can do anything you did in VB6 in assembly code with some extra typing, that doesn't mean VB6 was useless.
I usually just start using an interface. Sometimes I notice I actually want to inherit behaviour. That usually means I need a base class. It's that simple.
Inheritance defines an "Is-A" relationship.
class Point( object ):
# some set of features: attributes, methods, etc.
class PointWithMass( Point ):
# An additional feature: mass.
Above, I've used inheritance to formally declare that PointWithMass is a Point.
There are several ways to handle object P1 being a PointWithMass as well as Point. Here are two.
Have a reference from PointWithMass object p1 to some Point object p1-friend. The p1-friend has the Point attributes. When p1 needs to engage in Point-like behavior, it needs to delegate the work to its friend.
Rely on language inheritance to assure that all features of Point are also applicable to my PointWithMass object, p1. When p1 needs to engage in Point-like behavior, it already is a Point object and can just do what needs to be done.
I'd rather not manage the extra objects floating around to assure that all superclass features are part of a subclass object. I'd rather have inheritance to be sure that each subclass is an instance of it's own class, plus is an instance of all superclasses, too.
Edit.
For statically-typed languages, there's a bonus. When I rely on the language to handle this, a PointWithMass can be used anywhere a Point was expected.
For really obscure abuse of inheritance, read about C++'s strange "composition through private inheritance" quagmire. See Any sensible examples of creating inheritance without creating subtyping relations? for some further discussion on this. It conflates inheritance and composition; it doesn't seem to add clarity or precision to the resulting code; it only applies to C++.
The GoF (and many others) recommend that you only favor composition over inheritance. If you have a class with a very large API, and you only want to add a very small number of methods to it, leaving the base implementation alone, I would find it inappropriate to use composition. You'd have to re-implement all of the public methods of the encapsulated class to just return their value. This is a waste of time (programmer and CPU) when you can just inherit all of this behavior, and spend your time concentrating on new methods.
So, to answer your question, no you don't absolutely need inheritance. There are, however, many situations where it's the right design choice.
The problem with inheritance is that it conflates the issue of sub-typing (asserting an is-a relationship) and code reuse (e.g., private inheritance is for reuse only).
So, no it's an overloaded word that we don't need. I'd prefer sub-typing (using the 'implements' keyword) and import (kinda like Ruby does it in class definitions)
Inheritance lets me push off a whole bunch of bookkeeping onto the compiler because it gives me polymorphic behavior for object hierarchies that I would otherwise have to create and maintain myself. Regardless of how good a silver bullet OOP is, there will always be instances where you want to employ a certain type of behavior because it just makes sense to do. And ultimately, that's the point of OOP: it makes a certain class of problems much easier to solve.
The downsides of composition is that it may disguise the relatedness of elements and it may be harder for others to understand. With,say, a 2D Point class and the desire to extend it to higher dimensions, you would presumably have to add (at least) Z getter/setter, modify getDistance(), and maybe add a getVolume() method. So you have the Objects 101 elements: related state and behavior.
A developer with a compositional mindset would presumably have defined a getDistance(x, y) -> double method and would now define a getDistance(x, y, z) -> double method. Or, thinking generally, they might define a getDistance(lambdaGeneratingACoordinateForEveryAxis()) -> double method. Then they would probably write createTwoDimensionalPoint() and createThreeDimensionalPoint() factory methods (or perhaps createNDimensionalPoint(n) ) that would stitch together the various state and behavior.
A developer with an OO mindset would use inheritance. Same amount of complexity in the implementation of domain characteristics, less complexity in terms of initializing the object (constructor takes care of it vs. a Factory method), but not as flexible in terms of what can be initialized.
Now think about it from a comprehensibility / readability standpoint. To understand the composition, one has a large number of functions that are composed programmatically inside another function. So there's little in terms of static code 'structure' (files and keywords and so forth) that makes the relatedness of Z and distance() jump out. In the OO world, you have a great big flashing red light telling you the hierarchy. Additionally, you have an essentially universal vocabulary to discuss structure, widely known graphical notations, a natural hierarchy (at least for single inheritance), etc.
Now, on the other hand, a well-named and constructed Factory method will often make explicit more of the sometimes-obscure relationships between state and behavior, since a compositional mindset facilitates functional code (that is, code that passes state via parameters, not via this ).
In a professional environment with experienced developers, the flexibility of composition generally trumps its more abstract nature. However, one should never discount the importance of comprehensibility, especially in teams that have varying degrees of experience and/or high levels of turnover.
Inheritance is an implementation decision. Interfaces almost always represent a better design, and should usually be used in an external API.
Why write a lot of boilerplate code forwarding method calls to a composed member object when the compiler will do it for you with inheritance?
This answer to another question summarises my thinking pretty well.
Does anyone else remember all of the OO-purists going ballistic over the COM implementation of "containment" instead of "inheritance?" It achieved essentially the same thing, but with a different kind of implementation. This reminds me of your question.
I strictly try to avoid religious wars in software development. ("vi" OR "emacs" ... when everybody knows its "vi"!) I think they are a sign of small minds. Comp Sci Professors can afford to sit around and debate these things. I'm working in the real world and could care less. All of this stuff are simply attempts at giving useful solutions to real problems. If they work, people will use them. The fact that OO languages and tools have been commercially available on a wide scale for going on 20 years is a pretty good bet that they are useful to a lot of people.
There are a lot of features in a programming language that are not really needed. But they are there for a variety of reasons that all basically boil down to reusability and maintainability.
All a business cares about is producing (quality of course) cheaply and quickly.
As a developer you help do this is by becoming more efficient and productive. So you need to make sure the code you write is easily reusable and maintainable.
And, among other things, this is what inheritance gives you - the ability to reuse without reinventing the wheel, as well as the ability to easily maintain your base object without having to perform maintenance on all similar objects.
There's lots of useful usages of inheritance, and probably just as many which are less useful. One of the useful ones is the stream class.
You have a method that should be able stream data. By using the stream base class as input to the method you ensure that your method can be used to write to many kinds of streams without change. To the file system, over the network, with compression, etc.
No.
for me, OOP is mostly about encapsulation of state and behavior and polymorphism.
and that is. but if you want static type checking, you'll need some way to group different types, so the compiler can check while still allowing you to use new types in place of another, related type. creating a hierarchy of types lets you use the same concept (classes) for types and for groups of types, so it's the most widely used form.
but there are other ways, i think the most general would be duck typing, and closely related, prototype-based OOP (which isn't inheritance in fact, but it's usually called prototype-based inheritance).
Depends on your definition of "needed". No, there is nothing that is impossible to do without inheritance, although the alternative may require more verbose code, or a major rewrite of your application.
But there are definitely cases where inheritance is useful. As you say, composition plus interfaces together cover almost all cases, but what if I want to supply a default behavior? An interface can't do that. A base class can. Sometimes, what you want to do is really just override individual methods. Not reimplement the class from scratch (as with an interface), but just change one aspect of it. or you may not want all members of the class to be overridable. Perhaps you have only one or two member methods you want the user to override, and the rest, which calls these (and performs validation and other important tasks before and after the user-overridden methods) are specified once and for all in the base class, and can not be overridden.
Inheritance is often used as a crutch by people who are too obsessed with Java's narrow definition of (and obsession with) OOP though, and in most cases I agree, it's the wrong solution, as if the deeper your class hierarchy, the better your software.
Inheritance is a good thing when the subclass really is the same kind of object as the superclass. E.g. if you're implementing the Active Record pattern, you're attempting to map a class to a table in the database, and instances of the class to a row in the database. Consequently, it is highly likely that your Active Record classes will share a common interface and implementation of methods like: what is the primary key, whether the current instance is persisted, saving the current instance, validating the current instance, executing callbacks upon validation and/or saving, deleting the current instance, running a SQL query, returning the name of the table that the class maps to, etc.
It also seems from how you phrase your question that you're assuming that inheritance is single but not multiple. If we need multiple inheritance, then we have to use interfaces plus composition to pull off the job. To put a fine point about it, Java assumes that implementation inheritance is singular and interface inheritance can be multiple. One need not go this route. E.g. C++ and Ruby permit multiple inheritance for your implementation and your interface. That said, one should use multiple inheritance with caution (i.e. keep your abstract classes virtual and/or stateless).
That said, as you note, there are too many real-life class hierarchies where the subclasses inherit from the superclass out of convenience rather than bearing a true is-a relationship. So it's unsurprising that a change in the superclass will have side-effects on the subclasses.
Not needed, but usefull.
Each language has got its own methods to write less code. OOP sometimes gets convoluted, but I think that is the responsability of the developers, the OOP platform is usefull and sharp when it is well used.
I agree with everyone else about the necessary/useful distinction.
The reason I like OOP is because it lets me write code that's cleaner and more logically organized. One of the biggest benefits comes from the ability to "factor-up" logic that's common to a number of classes. I could give you concrete examples where OOP has seriously reduced the complexity of my code, but that would be boring for you.
Suffice it to say, I heart OOP.
Absolutely needed? no,
But think of lamps. You can create a new lamp from scratch each time you make one, or you can take properties from the original lamp and make all sorts of new styles of lamp that have the same properties as the original, each with their own style.
Or you can make a new lamp from scratch or tell people to look at it a certain way to see the light, or , or, or
Not required, but nice :)
Thanks to all for your answers. I maintain my position that, strictly speaking, inheritance isn't needed, though I believe I found a new appreciation for this feature.
Something else: In my job experience, I have found inheritance leads to simpler, clearer designs when it's brought in late in the project, after it's noticed a lot of the classes have much commonality and you create a base class. In projects where a grand-schema was created from the very beginning, with a lot of classes in an inheritance hierarchy, refactoring is usually painful and dificult.
Seeing some answers mentioning something similar makes me wonder if this might not be exactly how inheritance's supposed to be used: ex post facto. Reminds me of Stepanov's quote: "you don't start with axioms, you end up with axioms after you have a bunch of related proofs". He's a mathematician, so he ought to know something.
The biggest problem with interfaces is that they cannot be changed. Make an interface public, then change it (add a new method to it) and break million applications all around the world, because they have implemented your interface, but not the new method. The app may not even start, a VM may refuse to load it.
Use a base class (not abstract) other programmers can inherit from (and override methods as needed); then add a method to it. Every app using your class will still work, this method just won't be overridden by anyone, but since you provide a base implementation, this one will be used and it may work just fine for all subclasses of your class... it may also cause strange behavior because sometimes overriding it would have been necessary, okay, might be the case, but at least all those million apps in the world will still start up!
I rather have my Java application still running after updating the JDK from 1.6 to 1.7 with some minor bugs (that can be fixed over time) than not having it running it at all (forcing an immediate fix or it will be useless to people).
//I found this QA very useful. Many have answered this right. But i wanted to add...
1: Ability to define abstract interface - E.g., for plugin developers. Of course, you can use function pointers, but this is better and simpler.
2: Inheritance helps model types very close to their actual relationships. Sometimes a lot of errors get caught at compile time, because you have the right type hierarchy. For instance, shape <-- triangle (lets say there is a lot of code to be reused). You might want to compose triangle with a shape object, but shape is an incomplete type. Inserting dummy implementations like double getArea() {return -1;} will do, but you are opening up room for error. That return -1 can get executed some day!
3: void func(B* b); ... func(new D()); Implicit type conversion gives a great notational convenience since Derived is Base. I remember having read Straustrup saying that he wanted to make classes first class citizens just like fundamental data types (hence overloading operators etc). Implicit conversion from Derived to Base, behaves just like an implicit conversion from a data type to broader compatible one (short to int).
Inheritance and Composition have their own pros and cons.
Refer to this related SE question on pros of inheritance and cons of composition.
Prefer composition over inheritance?
Have a look at the example in this documentation link:
The example shows different use cases of overriding by using inheritance as a mean to achieve polymorphism.
In the following, inheritance is used to present a particular property for all of several specific incarnations of the same type thing. In this case, the GeneralPresenation has a properties that are relevant to all "presentation" (the data passed to an MVC view). The Master Page is the only thing using it and expects a GeneralPresentation, though the specific views expect more info, tailored to their needs.
public abstract class GeneralPresentation
{
public GeneralPresentation()
{
MenuPages = new List<Page>();
}
public IEnumerable<Page> MenuPages { get; set; }
public string Title { get; set; }
}
public class IndexPresentation : GeneralPresentation
{
public IndexPresentation() { IndexPage = new Page(); }
public Page IndexPage { get; set; }
}
public class InsertPresentation : GeneralPresentation
{
public InsertPresentation() {
InsertPage = new Page();
ValidationInfo = new PageValidationInfo();
}
public PageValidationInfo ValidationInfo { get; set; }
public Page InsertPage { get; set; }
}

Is a function an example of encapsulation?

By putting functionality into a function, does that alone constitute an example of encapsulation or do you need to use objects to have encapsulation?
I'm trying to understand the concept of encapsulation. What I thought was if I go from something like this:
n = n + 1
which is executed out in the wild as part of a big body of code and then I take that, and put it in a function such as this one, then I have encapsulated that addition logic in a method:
addOne(n)
n = n + 1
return n
Or is it more the case that it is only encapsulation if I am hiding the details of addOne from the outside world - like if it is an object method and I use an access modifier of private/protected?
I will be the first to disagree with what seems to be the answer trend. Yes, a function encapsulates some amount of implementation. You don't need an object (which I think you use to mean a class).
See Meyers too.
Perhaps you are confusing abstraction with encapsulation, which is understood in the broader context of object orientation.
Encapsulation properly includes all three of the following:
Abstraction
Implementation Hiding
Division of Responsibility
Abstraction is only one component of encapsulation. In your example you have abstracted the adding functionality from the main body of code in which it once resided. You do this by identifying some commonality in the code - recognizing a concept (addition) over a specific case (adding the number one to the variable n). Because of this ability, abstraction makes an encapsulated component - a method or an object - reusable.
Equally important to the notion of encapsulation is the idea of implementation hiding. This is why encapsulation is discussed in the arena of object orientation. Implementation hiding protects an object from its users and vice versa. In OO, you do this by presenting an interface of public methods to the users of your object, while the implementation of the object takes place inside private methods.
This serves two benefits. First, by limiting access to your object, you avoid a situation where users of the object can leave the object in an invalid state. Second, from the user's perspective, when they use your object they are only loosely coupled to it - if you change your implementation later on, they are not impacted.
Finally, division of responsility - in the broader context of an OO design - is something that must be considered to address encapsulation properly. It's no use encapsulating a random collection of functions - responsibility needs to be cleanly and logically defined so that there is as little overlap or ambiguity as possible. For example, if we have a Toilet object we will want to wall off its domain of responsibilities from our Kitchen object.
In a limited sense, though, you are correct that a function, let's say, 'modularizes' some functionality by abstracting it. But, as I've said, 'encapsulation' as a term is understood in the broader context of object orientation to apply to a form of modularization that meets the three criteria listed above.
Sure it is.
For example, a method that operates only on its parameters would be considered "better encapsulated" than a method that operates on global static data.
Encapsulation has been around long before OOP :)
A method is no more an example of encapsulation than a car is an example of good driving. Encapsulation isn't about the synax, it is a logical design issue. Both objects and methods can exhibit good and bad encapsulation.
The simplest way to think about it is whether the code hides/abstracts the details from other parts of the code that don't have a need to know/care about the implementation.
Going back to the car example:
Automatic transmission offers good encapsulation: As a driver you care about forward/back and speed.
Manual Transmission is bad encapsulation: From the driver's perspective the specific gear required for low/high speeds is generally irrelevant to the intent of the driver.
No, objects aren't required for encapsulation. In the very broadest sense, "encapsulation" just means "hiding the details from view" and in that regard a method is encapsulating its implementation details.
That doesn't really mean you can go out and say your code is well-designed just because you divided it up into methods, though. A program consisting of 500 public methods isn't much better than that same program implemented in one 1000-line method.
In building a program, regardless of whether you're using object oriented techniques or not, you need to think about encapsulation at many different places: hiding the implementation details of a method, hiding data from code that doesn't need to know about it, simplifying interfaces to modules, etc.
Update: To answer your updated question, both "putting code in a method" and "using an access modifier" are different ways of encapsulating logic, but each one acts at a different level.
Putting code in a method hides the individual lines of code that make up that method so that callers don't need to care about what those lines are; they only worry about the signature of the method.
Flagging a method on a class as (say) "private" hides that method so that a consumer of the class doesn't need to worry about it; they only worry about the public methods (or properties) of your class.
The abstract concept of encapsulation means that you hide implementation details. Object-orientation is but one example of the use of ecnapsulation. Another example is the language called module-2 that uses (or used) implementation modules and definition modules. The definition modules hid the actual implementation and therefore provided encapsulation.
Encapsulation is used when you can consider something a black box. Objects are a black box. You know the methods they provide, but not how they are implemented.
[EDIT]
As for the example in the updated question: it depends on how narrow or broad you define encapsulation. Your AddOne example does not hide anything I believe. It would be information hiding/encapsulation if your variable would be an array index and you would call your method moveNext and maybe have another function setValue and getValue. This would allow people (together maybe with some other functions) to navigate your structure and setting and getting variables with them being aware of you using an array. If you programming language would support other or richer concepts you could change the implementation of moveNext, setValue and getValue with changing the meaning and the interface. To me that is encapsulation.
It's a component-level thing
Check this out:
In computer science, Encapsulation is the hiding of the internal mechanisms and data structures of a software component behind a defined interface, in such a way that users of the component (other pieces of software) only need to know what the component does, and cannot make themselves dependent on the details of how it does it. The purpose is to achieve potential for change: the internal mechanisms of the component can be improved without impact on other components, or the component can be replaced with a different one that supports the same public interface.
(I don't quite understand your question, let me know if that link doesn't cover your doubts)
Let's simplify this somewhat with an analogy: you turn the key of your car and it starts up. You know that there's more to it than just the key, but you don't have to know what is going on in there. To you, key turn = motor start. The interface of the key (that is, e.g., the function call) hides the implementation of the starter motor spinning the engine, etc... (the implementation). That's encapsulation. You're spared from having to know what's going on under the hood, and you're happy for it.
If you created an artificial hand, say, to turn the key for you, that's not encapsulation. You're turning the key with additional middleman cruft without hiding anything. That's what your example reminds me of - it's not encapsulating implementation details, even though both are accomplished through function calls. In this example, anyone picking up your code will not thank you for it. They will, in fact, be more likely to club you with your artificial hand.
Any method you can think of to hide information (classes, functions, dynamic libraries, macros) can be used for encapsulation.
Encapsulation is a process in which attributes(data member) and behavior(member function) of a objects in combined together as a single entity refer as class.
The Reference Model of Open Distributed Processing - written by the International Organisation for Standardization - defines the following concepts:
Entity: Any concrete or abstract thing of interest.
Object: A model of an entity. An object is characterised by its behaviour and, dually, by its state.
Behaviour (of an object): A collection of actions with a set of constraints on when they may occur.
Interface: An abstraction of the behaviour of an object that consists of a subset of the interactions of that object together with a set of constraints on when they may occur.
Encapsulation: the property that the information contained in an object is accessible only through interactions at the interfaces supported by the object.
These, you will appreciate, are quite broad. Let us see, however, whether putting functionality within a function can logically be considered to constitute towards encapsulation in these terms.
Firstly, a function is clearly a model of a, 'Thing of interest,' in that it represents an algorithm you (presumably) desire executed and that algorithm pertains to some problem you are trying to solve (and thus is a model of it).
Does a function have behaviour? It certainly does: it contains a collection of actions (which could be any number of executable statements) that are executed under the constraint that the function must be called from somewhere before it can execute. A function may not spontaneously be called at any time, without causal factor. Sounds like legalese? You betcha. But let's plough on, nonetheless.
Does a function have an interface? It certainly does: it has a name and a collection of formal parameters, which in turn map to the executable statements contained in the function in that, once a function is called, the name and parameter list are understood to uniquely identify the collection of executable statements to be run without the calling party's specifying those actual statements.
Does a function have the property that the information contained in the function is accessible only through interactions at the interfaces supported by the object? Hmm, well, it can.
As some information is accessible via its interface, some information must be hidden and inaccessible within the function. (The property such information exhibits is called information hiding, which Parnas defined by arguing that modules should be designed to hide both difficult decisions and decisions that are likely to change.) So what information is hidden within a function?
To see this, we should first consider scale. It's easy to claim that, for example, Java classes can be encapsulated within a package: some of the classes will be public (and hence be the package's interface) and some will be package-private (and hence information-hidden within the package). In encapsulation theory, the classes form nodes and the packages form encapsulated regions, with the entirety forming an encapsulated graph; the graph of classes and packages is called the third graph.
It's also easy to claim that functions (or methods) themselves are encapsulated within classes. Again, some functions will be public (and hence be part of the class's interface) and some will be private (and hence information-hidden within the class). The graph of functions and classes is called the second graph.
Now we come to functions. If functions are to be a means of encapsulation themselves they they should contain some information public to other functions and some information that's information-hidden within the function. What could this information be?
One candidate is given to us by McCabe. In his landmark paper on cyclomatic complexity, Thomas McCabe describes source code where, 'Each node in the graph corresponds to a block of code in the program where the flow is sequential and the arcs correspond to branches taken in the program.'
Let us take the McCabian block of sequential execution as the unit of information that may be encapsulated within a function. As the first block within the function is always the first and only guaranteed block to be executed, we can consider the first block to be public, in that it may be called by other functions. All the other blocks within the function, however, cannot be called by other functions (except in languages that allow jumping into functions mid-flow) and so these blocks may be considered information-hidden within the function.
Taking these (perhaps slightly tenuous) definitions, then we may say yes: putting functionality within a function does constitute to encapsulation. The encapsulation of blocks within functions is the first graph.
There is a caveate, however. Would you consider a package whose every class was public to be encapsulated? According to the definitions above, it does pass the test, as you can say that the interface to the package (i.e., all the public classes) do indeed offer a subset of the package's behaviour to other packages. But the subset in this case is the entire package's behaviour, as no classes are information-hidden. So despite regorously satisfying the above definitions, we feel that it does not satisfy the spirit of the definitions, as surely something must be information-hidden for true encapsulation to be claimed.
The same is true for the exampe you give. We can certainly consider n = n + 1 to be a single McCabian block, as it (and the return statement) are a single, sequential flow of executions. But the function into which you put this thus contains only one block, and that block is the only public block of the function, and therefore there are no information-hidden blocks within your proposed function. So it may satisfy the definition of encapsulation, but I would say that it does not satisfy the spirit.
All this, of course, is academic unless you can prove a benefit such encapsulation.
There are two forces that motivate encapsulation: the semantic and the logical.
Semantic encapsulation merely means encapsulation based on the meaning of the nodes (to use the general term) encapsulated. So if I tell you that I have two packages, one called, 'animal,' and one called 'mineral,' and then give you three classes Dog, Cat and Goat and ask into which packages these classes should be encapsulated, then, given no other information, you would be perfectly right to claim that the semantics of the system would suggest that the three classes be encapsulated within the, 'animal,' package, rather than the, 'mineral.'
The other motivation for encapsulation, however, is logic.
The configuration of a system is the precise and exhaustive identification of each node of the system and the encapsulated region in which it resides; a particular configuration of a Java system is - at the third graph - to identify all the classes of the system and specify the package in which each class resides.
To logically encapsulate a system means to identify some mathematical property of the system that depends on its configuration and then to configure that system so that the property is mathematically minimised.
Encapsulation theory proposes that all encapsulated graphs express a maximum potential number of edges (MPE). In a Java system of classes and packages, for example, the MPE is the maximum potential number of source code dependencies that can exist between all the classes of that system. Two classes within the same package cannot be information-hidden from one another and so both may potentially form depdencies on one another. Two package-private classes in separate packages, however, may not form dependencies on one another.
Encapsulation theory tells us how many packages we should have for a given number of classes so that the MPE is minimised. This can be useful because the weak form of the Principle of Burden states that the maximum potential burden of transforming a collection of entities is a function of the maximum potential number of entities transformed - in other words, the more potential source code dependencies you have between your classes, the greater the potential cost of doing any particular update. Minimising the MPE thus minimises the maximum potential cost of updates.
Given n classes and a requirement of p public classes per package, encapsulation theory shows that the number of packages, r, we should have to minimise the MPE is given by the equation: r = sqrt(n/p).
This also applies to the number of functions you should have, given the total number, n, of McCabian blocks in your system. Functions always have just one public block, as we mentioned above, and so the equation for the number of functions, r, to have in your system simplifies to: r = sqrt(n).
Admittedly, few considered the total number of blocks in their system when practicing encapsulation, but it's readily done at the class/package level. And besides, minimising MPE is almost entirely entuitive: it's done by minimising the number of public classes and trying to uniformly distribute classes over packages (or at least avoid have most packages with, say, 30 classes, and one monster pacakge with 500 classes, in which case the internal MPE of the latter can easily overwhelm the MPE of all the others).
Encapsulation thus involves striking a balance between the semantic and the logical.
All great fun.
in strict object-oriented terminology, one might be tempted to say no, a "mere" function is not sufficiently powerful to be called encapsulation...but in the real world the obvious answer is "yes, a function encapsulates some code".
for the OO purists who bristle at this blasphemy, consider a static anonymous class with no state and a single method; if the AddOne() function is not encapsulation, then neither is this class!
and just to be pedantic, encapsulation is a form of abstraction, not vice-versa. ;-)
It's not normally very meaningful to speak of encapsulation without reference to properties rather than solely methods -- you can put access controls on methods, certainly, but it's difficult to see how that's going to be other than nonsensical without any data scoped to the encapsulated method. Probably you could make some argument validating it, but I suspect it would be tortuous.
So no, you're most likely not using encapsulation just because you put a method in a class rather than having it as a global function.

Dealing with "global" data structures in an object-oriented world

This is a question with many answers - I am interested in knowing what others consider to be "best practice".
Consider the following situation: you have an object-oriented program that contains one or more data structures that are needed by many different classes. How do you make these data structures accessible?
You can explicitly pass references around, for example, in the constructors. This is the "proper" solution, but it means duplicating parameters and instance variables all over the program. This makes changes or additions to the global data difficult.
You can put all of the data structures inside of a single object, and pass around references to this object. This can either be an object created just for this purpose, or it could be the "main" object of your program. This simplifies the problems of (1), but the data structures may or may not have anything to do with one another, and collecting them together in a single object is pretty arbitrary.
You can make the data structures "static". This lets you reference them directly from other classes, without having to pass around references. This entirely avoids the disadvantages of (1), but is clearly not OO. This also means that there can only ever be a single instance of the program.
When there are a lot of data structures, all required by a lot of classes, I tend to use (2). This is a compromise between OO-purity and practicality. What do other folks do? (For what it's worth, I mostly come from the Java world, but this discussion is applicable to any OO language.)
Global data isn't as bad as many OO purists claim!
After all, when implementing OO classes you've usually using an API to your OS. What the heck is this if it isn't a huge pile of global data and services!
If you use some global stuff in your program, you're merely extending this huge environment your class implementation can already see of the OS with a bit of data that is domain specific to your app.
Passing pointers/references everywhere is often taught in OO courses and books, academically it sounds nice. Pragmatically, it is often the thing to do, but it is misguided to follow this rule blindly and absolutely. For a decent sized program, you can end up with a pile of references being passed all over the place and it can result in completely unnecessary drudgery work.
Globally accessible services/data providers (abstracted away behind a nice interface obviously) are pretty much a must in a decent sized app.
I must really really discourage you from using option 3 - making the data static. I've worked on several projects where the early developers made some core data static, only to later realise they did need to run two copies of the program - and incurred a huge amount of work making the data non-static and carefully putting in references into everything.
So in my experience, if you do 3), you will eventually end up doing 1) at twice the cost.
Go for 1, and be fine-grained about what data structures you reference from each object. Don't use "context objects", just pass in precisely the data needed. Yes, it makes the code more complicated, but on the plus side, it makes it clearer - the fact that a FwurzleDigestionListener is holding a reference to both a Fwurzle and a DigestionTract immediately gives the reader an idea about its purpose.
And by definition, if the data format changes, so will the classes that operate on it, so you have to change them anyway.
You might want to think about altering the requirement that lots of objects need to know about the same data structures. One reason there does not seem to be a clean OO way of sharing data is that sharing data is not very object-oriented.
You will need to look at the specifics of your application but the general idea is to have one object responsible for the shared data which provides services to the other objects based on the data encapsulated in it. However these services should not involve giving other objects the data structures - merely giving other objects the pieces of information they need to meet their responsibilites and performing mutations on the data structures internally.
I tend to use 3) and be very careful about the synchronisation and locking across threads. I agree it is less OO, but then you confess to having global data, which is very un-OO in the first place.
Don't get too hung up on whether you are sticking purely to one programming methodology or another, find a solution which fits your problem. I think there are perfectly valid contexts for singletons (Logging for instance).
I use a combination of having one global object and passing interfaces in via constructors.
From the one main global object (usually named after what your program is called or does) you can start up other globals (maybe that have their own threads). This lets you control the setting up of program objects in the main objects constructor and tearing them down again in the right order when the application stops in this main objects destructor. Using static classes directly makes it tricky to initialize/uninitialize any resources these classes use in a controlled manner. This main global object also has properties for getting at the interfaces of different sub-systems of your application that various objects may want to get hold of to do their work.
I also pass references to relevant data-structures into constructors of some objects where I feel it is useful to isolate those objects from the rest of the world within the program when they only need to be concerned with a small part of it.
Whether an object grabs the global object and navigates its properties to get the interfaces it wants or gets passed the interfaces it uses via its constructor is a matter of taste and intuition. Any object you're implementing that you think might be reused in some other project should definately be passed data structures it should use via its constructor. Objects that grab the global object should be more to do with the infrastructure of your application.
Objects that receive interfaces they use via the constructor are probably easier to unit-test because you can feed them a mock interface, and tickle their methods to make sure they return the right arguments or interact with mock interfaces correctly. To test objects that access the main global object, you have to mock up the main global object so that when they request interfaces (I often call these services) from it they get appropriate mock objects and can be tested against them.
I prefer using the singleton pattern as described in the GoF book for these situations. A singleton is not the same as either of the three options described in the question. The constructor is private (or protected) so that it cannot be used just anywhere. You use a get() function (or whatever you prefer to call it) to obtain an instance. However, the architecture of the singleton class guarantees that each call to get() returns the same instance.
We should take care not to confuse Object Oriented Design with Object Oriented Implementation. Al too often, the term OO Design is used to judge an implementation, just as, imho, it is here.
Design
If in your design you see a lot of objects having a reference to exactly the same object, that means a lot of arrows. The designer should feel an itch here. He should verify whether this object is just commonly used, or if it is really a utility (e.g. a COM factory, a registry of some kind, ...).
From the project's requirements, he can see if it really needs to be a singleton (e.g. 'The Internet'), or if the object is shared because it's too general or too expensive or whatsoever.
Implementation
When you are asked to implement an OO Design in an OO language, you face a lot of decisions, like the one you mentioned: how should I implement all the arrows to the oft used object in the design?
That's the point where questions are addressed about 'static member', 'global variable' , 'god class' and 'a-lot-of-function-arguments'.
The Design phase should have clarified if the object needs to be a singleton or not. The implementation phase will decide on how this singleness will be represented in the program.
Option 3) while not purist OO, tends to be the most reasonable solution. But I would not make your class a singleton; and use some other object as a static 'dictionary' to manage those shared resources.
I don't like any of your proposed solutions:
You are passing around a bunch of "context" objects - the things that use them don't specify what fields or pieces of data they are really interested in
See here for a description of the God Object pattern. This is the worst of all worlds
Simply do not ever use Singleton objects for anything. You seem to have identified a few of the potential problems yourself