How do you find a needle in a haystack? - oop

When implementing a needle search of a haystack in an object-oriented way, you essentially have three alternatives:
1. needle.find(haystack)
2. haystack.find(needle)
3. searcher.find(needle, haystack)
Which do you prefer, and why?
I know some people prefer the second alternative because it avoids introducing a third object. However, I can't help feeling that the third approach is more conceptually "correct", at least if your goal is to model "the real world".
In which cases do you think it is justified to introduce helper objects, such as the searcher in this example, and when should they be avoided?

Of the three, I prefer option #3.
The Single Responsibility Principle makes me not want to put searching capabilities on my DTOs or models. Their responsibility is to be data, not to find themselves, nor should needles need to know about haystacks, nor haystacks know about needles.
For what it's worth, I think it takes most OO practitioners a LONG time to understand why #3 is the best choice. I did OO for a decade, probably, before I really grokked it.
#wilhelmtell, C++ is one of the very few languages with template specialization that make such a system actually work. For most languages, a general purpose "find" method would be a HORRIBLE idea.

Usually actions should be applied to what you are doing the action on... in this case the haystack, so I think option 2 is the most appropriate.
You also have a fourth alternative that I think would be better than alternative 3:
haystack.find(needle, searcher)
In this case, it allows you to provide the manner in which you want to search as part of the action, and so you can keep the action with the object that is being operated on.

There is another alternative, which is the approach utilized by the STL of C++:
find(haystack.begin(), haystack.end(), needle)
I think it's a great example of C++ shouting "in your face!" to OOP. The idea is that OOP is not a silver bullet of any kind; sometimes things are best described in terms of actions, sometimes in terms of objects, sometimes neither and sometimes both.
Bjarne Stroustrup said in TC++PL that when you design a system you should strive to reflect reality under the constraints of effective and efficient code. For me, this means you should never follow anything blindly. Think about the things at hand (haystack, needle) and the context we're in (searching, that's what the expression is about).
If the emphasis is about the searching, then using an algorithm (action) that emphasizes searching (i.e. is flexibly to fit haystacks, oceans, deserts, linked lists). If the emphasis is about the haystack, encapsulate the find method inside the haystack object, and so on.
That said, sometimes you're in doubt and have hard times making a choice. In this case, be object oriented. If you change your mind later, I think it is easier to extract an action from an object then to split an action to objects and classes.
Follow these guidelines, and your code will be clearer and, well, more beautiful.

I would say that option 1 is completely out. The code should read in a way that tells you what it does. Option 1 makes me think that this needle is going to go find me a haystack.
Option 2 looks good if a haystack is meant to contain needles. ListCollections are always going to contain ListItems, so doing collection.find(item) is natural and expressive.
I think the introduction of a helper object is approproiate when:
You don't control the implementation of the objects in question
IE: search.find(ObsecureOSObject, file)
There isn't a regular or sensible relationship between the objects
IE: nameMatcher.find(houses,trees.name)

I am with Brad on this one. The more I work on immensely complex systems, the more I see the need to truly decouple objects. He's right. It's obvious that a needle shouldn't know anything about haystack, so 1 is definitely out. But, a haystack should know nothing about a needle.
If I were modeling a haystack, I might implement it as a collection -- but as a collection of hay or straw -- not a collection of needles! However, I would take into consideration that stuff does get lost in a haystack, but I know nothing about what exactly that stuff. I think it's better to not make the haystack look for items in itself (how smart is a haystack anyway). The right approach to me is to have the haystack present a collection of things that are in it, but are not straw or hay or whatever gives a haystack its essence.
class Haystack : ISearchableThingsOnAFarm {
ICollection<Hay> myHay;
ICollection<IStuffSmallEnoughToBeLostInAHaystack> stuffLostInMe;
public ICollection<Hay> Hay {
get {
return myHay;
}
}
public ICollection<IStuffSmallEnoughToBeLostInAHayStack> LostAndFound {
get {
return stuffLostInMe;
}
}
}
class Needle : IStuffSmallEnoughToBeLostInAHaystack {
}
class Farmer {
Search(Haystack haystack,
IStuffSmallEnoughToBeLostInAHaystack itemToFind)
}
There's actually more I was going to type and abstract into interfaces and then I realized how crazy I was getting. Felt like I was in a CS class in college... :P
You get the idea. I think going as loosely coupled as possible is a good thing, but maybe I was getting a bit carried away! :)

If both Needle and Haystack are DAOs, then options 1 and 2 are out of the question.
The reason for this is that DAOs should only be responsible for holding properties of the real world objects they are modeling, and only have getter and setter methods (or just direct property access). This makes serializing the DAOs to a file, or creating methods for a generic compare / generic copy easier to write, as the code wouldn't contain a whole bunch of "if" statements to skip these helper methods.
This just leaves option 3, which most would agree to be correct behaviour.
Option 3 has a few advantages, with the biggest advantage being unit testing. This is because both Needle and Haystack objects can be easily mocked up now, whereas if option 1 or 2 were used, the internal state of either Needle or Haystack would have to be modified before a search could be performed.
Secondly, with the searcher now in a separate class, all search code can be held in one place, including common search code. Whereas if the search code was put into the DAO, common search code would either be stored in a complicated class hierarchy, or with a Searcher Helper class anyway.

This entirely depends on what varies and what stays the same.
For example, I am working on a (non-OOP) framework where the find algorithm is different depending both on the type of the needle and the haystack. Apart from the fact that this would require double-dispatch in an object-oriented environment, it also means that it isn't meaningful to write either needle.find(haystack) or to write haystack.find(needle).
On the other hand, your application could happily delegate finding to either of both classes, or stick with one algorithm altogether in which case the decision is arbitrary. In that case, I would prefer the haystack.find(needle) way because it seems more logical to apply the finding to the haystack.

When implementing a needle search of a
haystack in an object-oriented way,
you essentially have three
alternatives:
needle.find(haystack)
haystack.find(needle)
searcher.find(needle, haystack)
Which do you prefer, and why?
Correct me if I'm wrong, but in all three examples you already have a reference to the needle you're looking for, so isn't this kinda like looking for your glasses when they're sitting on your nose? :p
Pun aside, I think it really depends on what you consider the responsibility of the haystack to be within the given domain. Do we just care about it in the sense of being a thing which contains needles (a collection, essentially)? Then haystack.find(needlePredicate) is fine. Otherwise, farmBoy.find(predicate, haystack) might be more appropriate.

To quote the great authors of SICP,
Programs must be written for people to read, and only incidentally for machines to execute
I prefer to have both methods 1 and 2 at hand. Using ruby as an example, it comes with .include? which is used like this
haystack.include? needle
=> returns true if the haystack includes the needle
Sometimes though, purely for readability reasons, I want to flip it round. Ruby doesn't come with an in? method, but it's a one-liner, so I often do this:
needle.in? haystack
=> exactly the same as above
If it's "more important" to emphasise the haystack, or the operation of searching, I prefer to write include?.
Often though, neither the haystack or the search is really what you care about, just that the object is present - in this case I find in? better conveys the meaning of the program.

It depends on your requirements.
For instance, if you don't care about the searcher's properties (e.g. searcher strength, vision, etc.), then I would say haystack.find(needle) would be the cleanest solution.
But, if you do care about the searcher's properties (or any other properties for that matter), I would inject an ISearcher interface into either the haystack constructor or the function to facilitate that. This supports both object oriented design (a haystack has needles) and inversion of control / dependency injection (makes it easier to unit test the "find" function).

I can think of situations where either of the first two flavours makes sense:
If the needle needs pre-processing, like in the Knuth-Morris-Pratt algorithm, needle.findIn(haystack) (or pattern.findIn(text))makes sense, because the needle object holds the intermediate tables created for the algorithm to work effectively
If the haystack needs pre-processing, like say in a trie, the haystack.find(needle) (or words.hasAWordWithPrefix(prefix)) works better.
In both the above cases, one of needle or haystack is aware of the search. Also, they both are aware of each other. If you want the needle and haystack not to be aware of each other or of the search, searcher.find(needle, haystack) would be appropriate.

Easy: Burn the haystack! Afterward, only the needle will remain. Also, you could try magnets.
A harder question: How do you find one particular needle in a pool of needles?
Answer: thread each one and attach the other end of each strand of thread to a sorted index (i.e. pointers)

A mix of 2 and 3, really.
Some haystacks don't have a specialized search strategy; an example of this is an array. The only way to find something is to start at the beginning and test each item until you find the one you want.
For this kind of thing, a free function is probably best (like C++).
Some haystacks can have a specialized search strategy imposed on them. An array can be sorted, allowing you to use binary searching, for example. A free function (or pair of free functions, e.g. sort and binary_search) is probably the best option.
Some haystacks have an integrated search strategy as part of their implementation; associative containers (hashed or ordered sets and maps) all do, for instance. In this case, finding is probably an essential lookup method, so it should probably be a method, i.e. haystack.find(needle).

The answer to this question should actually depend on the domain the solution is implemented for.
If you happen to simulate a physical search in a physical haystack, you might have the classes
Space
Straw
Needle
Seeker
Space
knows which objects are located at which coordinates
implements the laws of nature (converts energy, detects collisions, etc.)
Needle, Straw
are located in Space
react to forces
Seeker
interacts with space:
moves hand, applies magnetic field, burns hay, applies x-rays, looks for needle...
Thus seeker.find(needle, space) or seeker.find(needle, space, strategy)
The haystack just happens to be in the space where you are looking for the needle. When you abstract away space as a kind of virtual machine (think of: the matrix) you could get the above with haystack instead of space (solution 3/3b):
seeker.find(needle, haystack) or seeker.find(needle, haystack, strategy)
But the matrix was the Domain, which should only be replaced by haystack, if your needle couldn't be anywhere else.
And then again, it was just an anology. Interestingly this opens the mind for totally new directions:
1. Why did you loose the needle in the first place? Can you change the process, so you wouldn't loose it?
2. Do you have to find the lost needle or can you simply get another one and forget about the first? (Then it would be nice, if the needle dissolved after a while)
3. If you loose your needles regularly and you need to find them again then you might want to
make needles that are able to find themselves, e.g. they regularly ask themselves: Am I lost? If the answer is yes, they send their GPS-calculated position to somebody or start beeping or whatever:
needle.find(space) or needle.find(haystack) (solution 1)
install a haystack with a camera on each straw, afterwards you can ask the haystack hive mind if it saw the needle lately:
haystack.find(needle) (solution 2)
attach RFID tags to your needles, so you can easily triangulate them
That all just to say that in your implementation you made the needle and the haystack and most of the time the matrix on some kind of level.
So decide according to your domain:
Is it the purpose of the haystack to contain needles? Then go for solution 2.
Is it natural that the needle gets lost just anywhere? Then go for solution 1.
Does the needle get lost in the haystack by accident? Then go for solution 3. (or consider another recovering strategy)

haystack.find(needle), but there should be a searcher field.
I know that dependency injection is all the rage, so it doesn't surprise me that #Mike Stone's haystack.find(needle, searcher) has been accepted. But I disagree: the choice of what searcher is used seems to me a decision for the haystack.
Consider two ISearchers: MagneticSearcher iterates over the volume, moving and stepping the magnet in a manner consistent with the magnet's strength. QuickSortSearcher divides the stack in two until the needle is evident in one of the subpiles. The proper choice of searcher may depend upon how large the haystack is (relative to the magnetic field, for instance), how the needle got into the haystack (i.e., is the needle's position truly random or it it biased?), etc.
If you have haystack.find(needle, searcher), you're saying "the choice of which is the best search strategy is best done outside the context of the haystack." I don't think that's likely to be correct. I think it's more likely that "haystacks know how best to search themselves." Add a setter and you can still manually inject the searcher if you need to override or for testing.

Personally, I like the second method. My reasoning is because the major APIs I have worked with use this approach, and I find it makes the most sense.
If you have a list of things (haystack) you would search for (find()) the needle.

#Peter Meyer
You get the idea. I think going as loosely coupled as possible is a good thing, but maybe I was getting a bit carried away! :)
Errr... yeah... I think the IStuffSmallEnoughToBeLostInAHaystack kind of is a red flag :-)

You also have a fourth alternative that I think would be better than alternative 3:
haystack.find(needle, searcher)
I see your point, but what if searcher implements a searching interface that allows for searching other types of objects than haystacks, and finding other things than needles in them?
The interface could also be implemented with different algorithms, for example:
binary_searcher.find(needle, haystack)
vision_searcher.find(pitchfork, haystack)
brute_force_searcher.find(money, wallet)
But, as others have already pointed out, I also think this is only helpful if you actually have multiple search algorithms or multiple searchable or findable classes. If not, I agree haystack.find(needle) is better because of its simplicity, so I am willing to sacrifice some "correctness" for it.

class Haystack {//whatever
};
class Needle {//whatever
}:
class Searcher {
virtual void find() = 0;
};
class HaystackSearcher::public Searcher {
public:
HaystackSearcher(Haystack, object)
virtual void find();
};
Haystack H;
Needle N;
HaystackSearcher HS(H, N);
HS.find();

haystack can contain stuffs
one type of stuff is needle
finder is something that is responsible for searching of stuff
finder can accept a pile of stuffs as the source of where to find thing
finder can also accept a stuff description of thing that it need to find
so, preferably, for a flexible solution you would do something like:
IStuff interface
Haystack = IList<IStuff>
Needle : IStuff
Finder
.Find(IStuff stuffToLookFor, IList<IStuff> stuffsToLookIn)
In this case, your solution will not get tied to just needle and haystack but it is usable for any type that implement the interface
so if you want to find a Fish in the Ocean, you can.
var results = Finder.Find(fish, ocean)

If you have a reference to needle object why do you search for it? :)
The problem domain and use-cases tell you that you do not need exact position of needle in a haystack (like what you could get from list.indexOf(element)), you just need a needle. And you do not have it yet. So my answer is something like this
Needle needle = (Needle)haystack.searchByName("needle");
or
Needle needle = (Needle)haystack.searchWithFilter(new Filter(){
public boolean isWhatYouNeed(Object obj)
{
return obj instanceof Needle;
}
});
or
Needle needle = (Needle)haystack.searchByPattern(Size.SMALL,
Sharpness.SHARP,
Material.METAL);
I agree that there are more possible solutions which are based on different search strategies, so they introduce searcher. There were anough comments on this, so I do not pay attentiont to it here. My point is solutions above forget about use-cases - what is the point to search for something if you already have reference to it?
In the most natural use-case you do not have a needle yet, so you do not use variable needle.

Brad Wilson points out that objects should have a single responsibility. In the extreme case, an object has one responsibility and no state. Then it can become... a function.
needle = findNeedleIn(haystack);
Or you could write it like this:
SynchronizedHaystackSearcherProxyFactory proxyFactory =
SynchronizedHaystackSearcherProxyFactory.getInstance();
StrategyBasedHaystackSearcher searcher =
new BasicStrategyBasedHaystackSearcher(
NeedleSeekingStrategies.getMethodicalInstance());
SynchronizedHaystackSearcherProxy proxy =
proxyFactory.createSynchronizedHaystackSearcherProxy(searcher);
SearchableHaystackAdapter searchableHaystack =
new SearchableHaystackAdapter(haystack);
FindableSearchResultObject foundObject = null;
while (!HaystackSearcherUtil.isNeedleObject(foundObject)) {
try {
foundObject = proxy.find(searchableHaystack);
} catch (GruesomeInjuryException exc) {
returnPitchforkToShed(); // sigh, i hate it when this happens
HaystackSearcherUtil.cleanUp(hay); // XXX fixme not really thread-safe,
// but we can't just leave this mess
HaystackSearcherUtil.cleanup(exc.getGruesomeMess()); // bug 510000884
throw exc; // caller will catch this and get us to a hospital,
// if it's not already too late
}
}
return (Needle) BarnyardObjectProtocolUtil.createSynchronizedFindableSearchResultObjectProxyAdapterUnwrapperForToolInterfaceName(SimpleToolInterfaces.NEEDLE_INTERFACE_NAME).adapt(foundObject.getAdaptable());

The haystack shouldn't know about the needle, and the needle shouldn't know about the haystack. The searcher needs to know about both, but whether or not the haystack should know how to search itself is the real point in contention.
So I'd go with a mix of 2 and 3; the haystack should be able to tell someone else how to search it, and the searcher should be able to use that information to search the haystack.

haystack.magnet().filter(needle);

Is the code trying to find a specific needle or just any needle? It sounds like a stupid question, but it changes the problem.
Looking for a specific needle the code in the question makes sense. Looking for any needle it would be more like
needle = haystack.findNeedle()
or
needle = searcher.findNeedle(haystack)
Either way, I prefer having a searcher that class. A haystack doesn't know how to search. From a CS perspective it is just a data store with LOTS of crap that you don't want.

Definitely the third, IMHO.
The question of a needle in a haystack is an example of an attempt to find one object in a large collection of others, which indicates it will need a complex search algorithm (possibly involving magnets or (more likely) child processes) and it doesn't make much sense for a haystack to be expected to do thread management or implement complex searches.
A searching object, however, is dedicated to searching and can be expected to know how to manage child threads for a fast binary search, or use properties of the searched-for element to narrow the area (ie: a magnet to find ferrous items).

Another possible way can be to create two interfaces for Searchable object e.g. haystack and ToBeSearched object e.g. needle.
So, it can be done in this way
public Interface IToBeSearched
{}
public Interface ISearchable
{
public void Find(IToBeSearched a);
}
Class Needle Implements IToBeSearched
{}
Class Haystack Implements ISearchable
{
public void Find(IToBeSearched needle)
{
//Here goes the original coding of find function
}
}

haystack.iterator.findFirst(/* pass here a predicate returning
true if its argument is a needle that we want */)
iterator can be interface to whatever immutable collection, with collections having common findFirst(fun: T => Boolean) method doing the job. As long as the haystack is immutable, no need to hide any useful data from "outside".
And, of course, it's not good to tie together implementation of a custom non-trivial collection and some other stuff that does have haystack. Divide and conquer, okay?

In most cases I prefer to be able to perform simple helper operations like this on the core object, but depending on the language, the object in question may not have a sufficient or sensible method available.
Even in languages like JavaScript) that allow you to augment/extend built-in objects, I find it can be both convenient and problematic (e.g. if a future version of the language introduces a more efficient method that gets overridden by a custom one).
This article does a good job of outlining such scenarios.

Related

Polymorphism versus switch case tradeoffs

I haven't found any clear articles on this, but I was wondering about why polymorphism is the recommended design pattern over exhaustive switch case / pattern matching. I ask this because I've gotten a lot of heat from experienced developers for not using polymorphic classes, and it's been troubling me. I've personally had a terrible time with polymorphism and a wonderful time with switch cases, the reduction in abstractions and indirection makes readability of the code so much easier in my opinion. This is in direct contrast with books like "clean code" which are typically seen as industry standards.
Note: I use TypeScript, so the following examples may not apply in other languages, but I think the principle generally applies as long as you have exhaustive pattern matching / switch cases.
List the options
If you want to know what the possible values of an action, with an enum, switch case, this is trivial. For classes this requires some reflection magic
// definitely two actions here, I could even loop over them programmatically with basic primitives
enum Action {
A = 'a',
B = 'b',
}
Following the code
Dependency injection and abstract classes mean that jump to definition will never go where you want
function doLetterThing(myEnum: Action) {
switch (myEnum) {
case Action.A:
return;
case Action.B;
return;
default:
exhaustiveCheck(myEnum);
}
}
versus
function doLetterThing(action: BaseAction) {
action.doAction();
}
If I jump to definition for BaseAction or doAction I will end up on the abstract class, which doesn't help me debug the function or the implementation. If you have a dependency injection pattern with only a single class, this means that you can "guess" by going to the main class / function and looking for how "BaseAction" is instantiated and following that type to the place and scrolling to find the implementation. This seems generally like a bad UX for a developer though.
(small note about whether dependency injection is good, traits seem to do a good enough job in cases where they are necessary (though either done prematurely as a rule rather than as a necessity seems to lead to more difficult to follow code))
Write less code
This depends, but if have to define an extra abstract class for your base type, plus override all the function types, how is that less code than single line switch cases? With good types here if you add an option to the enum, your type checker will flag all the places you need to handle this which will usually involve adding 1 line each for the case and 1+ line for implementation. Compare this with polymorphic classes which you need to define a new class, which needs the new function syntax with the correct params and the opening and closing parens. In most cases, switch cases have less code and less lines.
Colocation
Everything for a type is in one place which is nice, but generally whenever I implement a function like this is I look for a similarly implemented function. With a switch case, it's extremely adjacent, with a derived class I would need to find and locate in another file or directory.
If I implemented a feature change such as trimming spaces off the ends of a string for one type, I would need to open all the class files to make sure if they implement something similar that it is implemented correctly in all of them. And if I forget, I might have different behaviour for different types without knowing. With a switch the co location makes this extremely obvious (though not foolproof)
Conclusion
Am I missing something? It doesn't make sense that we have these clear design principles that I basically can only find affirmative articles about but don't see any clear benefits, and serious downsides compared to some basic pattern matching style development
Consider the solid-principles, in particular OCP and DI.
To extend a switch case or enum and add new functionality in the future, you must modify the existing code. Modifying legacy code is risky and expensive. Risky because you may inadvertently introduce regression. Expensive because you have to learn (or re-learn) implementation details, and then re-test the legacy code (which presumably was working before you modified it).
Dependency on concrete implementations creates tight coupling and inhibits modularity. This makes code rigid and fragile, because a change in one place affects many dependents.
In addition, consider scalability. An abstraction supports any number of implementations, many of which are potentially unknown at the time the abstraction is created. A developer needn't understand or care about additional implementations. How many cases can a developer juggle in one switch, 10? 100?
Note this does not mean polymorphism (or OOP) is suitable for every class or application. For example, there are counterpoints in, Should every class implement an interface? When considering extensibility and scalability, there is an assumption that a code base will grow over time. If you're working with a few thousand lines of code, "enterprise-level" standards are going to feel very heavy. Likewise, coupling a few classes together when you only have a few classes won't be very noticeable.
Benefits of good design are realized years down the road when code is able to evolve in new directions.
I think you are missing the point. The main purpose of having a clean code is not to make your life easier while implementing the current feature, rather it makes your life easier in future when you are extending or maintaining the code.
In your example, you may feel implementing your two actions using switch case. But what happens if you need to add more actions in future? Using the abstract class, you can easily create a new action type and the caller doesn't need to be modified. But if you keep using switch case it will be lot more messier, especially for complex cases.
Also, following a better design pattern (DI in this case) will make the code easier to test. When you consider only easy cases, you may not find the usefulness of using proper design patterns. But if you think broader aspect, it really pays off.
"Base class" is against the Clean Code. There should not be a "Base class", not just for bad naming, also for composition over inheritance rule. So from now on, I will assume it is an interface in which other classes implement it, not extend (which is important for my example). First of all, I would like to see your concerns:
Answer for Concerns
This depends, but if have to define an extra abstract class for your
base type, plus override all the function types, how is that less code
than single line switch cases
I think "write less code" should not be character count. Then Ruby or GoLang or even Python beats the Java, obviously does not it? So I would not count the lines, parenthesis etc. instead code that you should test/maintain.
Everything for a type is in one place which is nice, but generally
whenever I implement a function like this is I look for a similarly
implemented function.
If "look for a similarly" means, having implementation together makes copy some parts from the similar function then we also have some clue here for refactoring. Having Implementation class differently has its own reason; their implementation is completely different. They may follow some pattern, lets see from Communication perspective; If we have Letter and Phone implementations, we should not need to look their implementation to implement one of them. So your assumption is wrong here, if you look to their code to implement new feature then your interface does not guide you for the new feature. Let's be more specific;
interface Communication {
sendMessage()
}
Letter implements Communication {
sendMessage() {
// get receiver
// get sender
// set message
// send message
}
}
Now we need Phone, so if we go to Letter implementation to get and idea to how to implement Phone then our interface does not enough for us to guide our implementation. Technically Phone and Letter is different to send a message. Then we need a Design pattern here, maybe Template Pattern? Let's see;
interface Communication {
default sendMessage() {
getMessageFactory().sendMessage(getSender(), getReceiver(), getBody())
}
getSender()
getReceiver()
getBody()
}
Letter implements Communication {
getSender() { returns sender }
getReceiver() {returns receiver }
getBody() {returns body}
getMessageFactory {returns LetterMessageFactory}
}
Now when we need to implement Phone we don't need to look the details of other implementations. We exactly now what we need to return and also our Communication interface's default method handles how to send the message.
If I implemented a feature change such as trimming spaces off the ends
of a string for one type, I would need to open all the class files to
make sure if they implement something similar that it is implemented
correctly in all of them...
So if there is a "feature change" it should be only its implemented class, not in all classes. You should not change all of the implementations. Or if it is same implementation in all of them, then why each implements it differently? It should be kept as the default method in their interface. Then if feature change required, only default method is changed and you should update your implementation and test in one place.
These are the main points that I wanted to answer your concerns. But I think the main point is you don't get the benefit. I was also struggling before I work on a big project that other teams need to extend my features. I will divide benefits to topics with extreme examples which may be more helpful to understand:
Easy to read
Normally when you see a function, you should not feel to go its implementation to understand what is happening there. It should be self-explanatory. Based on this fact; action.doAction(); -> or lets say communication.sendMessage() if they implement Communicate interface. I don't need to go for its base class, search for implementations etc. for debugging. Even implementing class is "Letter" or "Phone" I know that they send message, I don't need their implementation details. So I don't want to see all implemented classes like in your example "switch Letter; Phone.." etc. In your example doLetterThing responsible for one thing (doAction), since all of them do same thing, then why you are showing your developer all these cases?. They are just making the code harder to read.
Easy to extend
Imagine that you are extending a big project where you don't have an access to their source(I want to give extreme example to show its benefit easier). In the java world, I can say you are implementing SPI (Service Provider Interface). I can show you 2 example for this, https://github.com/apereo/cas and https://github.com/keycloak/keycloak where you can see that interface and implementations are separated and you just implement new behavior when it is required, no need to touch the original source. Why this is important? Imagine the following scenario again;
Let's suppose that Keycloak calls communication.sendMessage(). They don't know implementations in build time. If you extend Keycloak in this case, you can have your own class that implements Communication interface, let's say "Computer". Know if you have your SPI in the classpath, Keycloak reads it and calls your computer.sendMessage(). We did not touch the source code but extended the capabilities of Message Handler class. We can't achieve this if we coded against switch cases without touching the source.

Public vs. Private?

I don't really understand why it's generally good practice to make member variables and member functions private.
Is it for the sake of preventing people from screwing with things/more of an organizational tool?
Basically, yes, it's to prevent people from screwing with things.
Encapsulation (information hiding) is the term you're looking for.
By only publishing the bare minimum of information to the outside world, you're free to change the internals as much as you want.
For example, let's say you implement your phone book as an array of entries and don't hide that fact.
Someone then comes along and writes code which searches or manipulates your array without going through your "normal" interface. That means that, when you want to start using a linked list or some other more efficient data structure, their code will break, because it's used that information.
And that's your fault for publishing that information, not theirs for using it :-)
Classic examples are the setters and getters. You might think that you could just expose the temperature variable itself in a class so that a user could just do:
Location here = new Location();
int currTemp = here.temp;
But, what if you wanted to later have it actually web-scrape information from the Bureau of Meteorology whenever you asked for the temperature. If you'd encapsulated the information in the first place, the caller would just be doing:
int currTemp = here.getTemp();
and you could change the implementation of that method as much as you want. The only thing you have to preserve is the API (function name, arguments, return type and so on).
Interestingly, it's not just in code. Certain large companies will pepper their documentation with phrases like:
This technical information is for instructional purposes only and may change in future releases.
That allows them to deliver what the customer wants (the extra information) but doesn't lock them in to supporting it for all eternity.
The main reason is that you, the library developer, have insurance that nobody will be using parts of your code that you don't want to have to maintain.
Every public piece of your code can, and inevitably will get used by your customers. If you later discover that your design was actually terrible, and that version 2.0 should be written much better, then you realise that your paying customers actually want you to preserve all existing functionality, and you're locked in to maintaining backwards compatibility at the price of making better software.
By making as much of your code as possible private, you are unreservedly declaring that this code is nobody's business and that you can and will be able to rewrite it at any time.
It's to prevent people from screwing with things - but not from a security perspective.
Instead, it's intended to allow users of your class to only care about the public sections, leaving you (the author) free to modify the implementation (private) without worrying about breaking someone else's code.
For instance, most programming languages seem to store Strings as a char[] (an array of characters). If for some reason it was discovered that a linked list of nodes (each containing a single character) performed better, the internal implementation using the array could be switched, without (theoretically) breaking any code using the String class.
It's to present a clear code contract to anyone (you, someone else) who is using your object... separate "how to use it" from "how it works". This is known as Encapsulation.
On a side note, at least on .NET (probably on other platforms as well), it's not very hard for someone who really wants access to get to private portions of an object (in .NET, using reflection).
take the typical example of a counter. the thing the bodyguard at your night club is holding in his hands to make his punch harder and to count the people entering and leaving the club.
now the thing is defined like this:
public class Counter{
private int count = 0;
public void increment()
{
count++;
}
public void decrement()
{
count--;
}
}
As you can see, there are no setters/getters for count, because we don't want users (programmers) of this class, to be able to call myCounter.setCount(100), or even worse myCounter.Count -= 10; because that's not what this thing does, it goes up one for everyone entering and down for everyone leaving.
There is a scope for a lot of debate on this.
For example ... If a lot of .Net Framework was private, then this would prevent developers from screwing things up but at the same time it prevents devs from using the funcionality.
In my personal opinion, I would give preference to making methods public. But I would suggest to make use of the Facade pattern. In simple terms, you have a class that encapsulates complex functionality. For example, in the .net framework, the WebClient is a Facade that hides the complex http request/response logic.
Also ... Keep classes simple ... and you should have few public methods. That is a better abstraction than having large classes with lots of private methods
It is useful to know how an object s 'put together' have a look at this video on YouTube
http://www.youtube.com/watch?v=RcZAkBVNYTA&list=PL3FEE93A664B3B2E7&index=11&feature=plpp_video

What is an acid test for "the same level of abstraction" when writing composed functions / methods?

Context: I need to explain "composed methods" to a group of mixed-experience.
I think I heard about it first while reading Beck's Smalltalk Best practices. I've personally not had too many issues writing such methods - however in the local code-wilderness, I've seen quite a few instances where the lack of composed methods had created indecipherable Blobs... and I was in the minority. So I'm taking them through CleanCode - where this one popped up again.
The premise is quite simple.
"Functions should be short, do one
thing well and have an
intention-revealing name. Each step in
the body of the method should be at
the same level of abstraction."
What I'm struggling with a check for the "same level of abstraction".. viz.. forgive the pun a bit abstract for beginners.
My current explanation would be similar to SICP's "wishful thinking". (Imagine the ideal set of steps and then worry about implementation/making it happen.").
Does anyone have a better set of rules / an acid test to evaluate your decisions while writing composed methods ?
Same level of abstraction - examples:
void DailyChores()
{
Dust();
Hoover();
MopKitchenFloor();
AddDirtyClothesToWashingMachine();
PlaceDetergentInWashingMachine();
CloseWashingMachineDoor();
StartWashingMachine();
Relax();
}
Hopefully it should be clear that the WashingMachine saga would be better extracted into a separate method entited WashDirtyLaundry();
Arguably the MopKitchenFloor should also be in a separate method entitled CleanKitchen() as quite likely you would want to extend this in the future to include WashPots(), DefrostFridge() etc.
So a better way would be to write as follows:
void DailyChores()
{
Dust();
Hoover();
CleanKitchen(CleaningLevel.Daily);
WashDirtyClothes();
Relax();
}
void WashDirtyClothes()
{
AddDirtyClothesToWashingMachine();
PlaceDetergentInWashingMachine();
CloseWashingMachineDoor();
StartWashingMachine();
}
void CleanKitchen(CleaningLevel level)
{
MopKitchenFloor();
WashPots();
if(level == CleaningLevel.Monthly)
{
DefrostFridge();
}
}
enum CleaningLevel
{
Daily,
Weekly,
Monthly
}
In terms of "rules" to apply to code not following this principle:
1) Can you describe what the method does in a single sentence without any conjunctions (e.g. "and")? If not split it until you can. e.g. In the example I have AddDirtyClothesToWashingMachine() and PlaceDetergentInWashingMachine() as separate methods - this is correct - to have the code for these 2 separate tasks inside one method would be wrong - however see rule 2.
2) Can you group calls to similar methods together into a higher level method which can be described in a single sentence. In the example, all of the methods relating to washing clothes are grouped into the single method WashDirtyClothes(). Or in consideration of rule 1, the methods AddDirtyClothesToWashingMachine() and PlaceDetergentInWashingMachine() could be called from a single method AddStuffToWashingMachine():
void AddStuffToWashingMachine()
{
AddDirthClothesToWashingMachine();
PlaceDetergentInWashingMachine();
}
3) Do you have any loops with more than a simple statement inside the loop? Any looping behaviour should be a separate method. Ditto for switch statements, or if, then else statements.
Hope this helps
I take "acid test" to mean you would like some concrete rules that help embody the abstract concept in question. As in "You might have mixed levels of abstraction if..."
You might have mixed levels of abstraction if...
you have a variable in the function that is only used for part of the function.
you have multiple loops in the function.
you have multiple if statements who's conditions are independent of each other.
I hope others will add to the above...
EDIT: Warning - long answer from self-taught guy ahead . . .
In my very humble opinion (I am TOTALLY learning here as well) it seems BonyT and Daniel T. are on the right track. The piece which might be missing here is the design part. While it is safe to say that refactoring/composing will always be a necessity, might it also be as safe to say that (ESPECIALLY with beginners!) proper design up front would be the first, second, and third steps to properly composed code?
While I get what you are asking (a test/set of tests for code composition), I think BonyT is applying the earliest of those tests during the "pseudocode" part of the design process, no?
Obviously, in the early stages of project planning, strong design and experienced coders will sniff out the obvious places ripe for composition. As the work progresses, and the team is beginning to fill these initial code stubs with body, some slightly more obtuse exaples are bound to crop up. BonyT's example presents these first two steps quite well.
I think there comes a point at which experience and a finely-tuned "nose" for code smell comes in - in other words, the part you may not be able to teach directly. However, that is where Daniel T's answer comes in - while one may not be able to develop concrete ACID-type "tests" for proper composition, one CAN employ techniques such as Daniel proposes to seek out potential smelly code. Detect "hints" if you will, that should at least prompt further investigation.
If one is not certain whether things are composed at the proper level, it might make sense to work through a particular function and attempt to describe, step-by-step, what is going on in simple, single sentences without conjuctions. This is probably the most basic ACID-type test that could be performed. Not to mention, this process would by default end up correctly documenting the code . . .
In response to BonyT you imply that his pseudocode/method stubs make the next step obvious - I am betting that if one walks through a function and describes things step by step, one will often find that indeed, the next step either obviously follows at the same level of detail, or belongs elsewhere. While there will obviously be cases (many, with complex code) where things are not so neat, I propose that this is where nothing but experience (and possibly genetics!) come in - again, things you can't teach directly. At that point, one must examine the problem domain, and determine a solution which best fits the domain (and also be prepared to change it down the road . . .). Once again, properly documenting the "in-between" cases with short, simple statements (and narrative describing decisions in gray areas) will help the poor maintenence guy down the road.
I realize that I have proposed nothing new here, but what I had to say was longer than a comment would allow!

Why stick to get-set and not car.speed() and car.speed(55) respectively?

Apart from unambiguous clarity, why should we stick to:
car.getSpeed() and car.setSpeed(55)
when this could be used as well :
car.speed() and car.speed(55)
I know that get() and set() are useful to keep any changes to the data member manageable by keeping everything in one place.
Also, obviously, I understand that car.speed() and car.speed(55) are the same function, which makes this wrong, but then in PHP and also in Zend Framework, the same action is used for GET, POST, postbacks.
In VB and C# there are "properties", and are used by many, much to the disgust of purists I've heard, and there are things in Ruby like 5.times and .each, .to_i etc.
And you have operator overloading, multiple inheritance, virtual functions in C++, certain combinations of which could drive anyone nuts.
I mean to say that there are so many paradigms and ways in which things are done that it seems odd that nobody has tried the particular combination that I mentioned.
As for me, my reason is that it is short and cleaner to read the code.
Am I very wrong, slightly wrong, is this just odd and so not used, or what else?
If I still decide to stay correct, I could use car.speed() and car.setSpeed(55).
Is that wrong in any way (just omitting the "get" )?
Thanks for any explanations.
If I called car.speed(), I might think I am telling the car to speed, in other words to increase speed and break the speed limit. It is not clearly a getter.
Some languages allow you to declare const objects, and then restrict you to only calling functions that do not modify the data of the object. So it is necessary to have seperate functions for modification and read operations. While you could use overloads on paramaters to have two functions, I think it would be confusing.
Also, when you say it is clearer to read, I can argue that I have to do a look ahead to understand how to read it:
car.speed()
I read "car speed..." and then I see there is no number so I revise and think "get car speed".
car.getSpeed()
I read "for this car, get speed"
car.setSpeed(55)
I read "for this car, set speed to 55"
It seems you have basically cited other features of the language as being confusing, and then used that as a defense for making getters/setters more confusing? It almost sounds like are admitting that what you have proposed is more confusing. These features are sometimes confusing because of how general purpose they are. Sometimes abstractions can be more confusing, but in the end they often serve the purpose of being more reusable. I think if you wanted to argue in favor of speed() and speed(55), you'd want to show how that can enable new possibilities for the programmer.
On the other hand, C# does have something like what you describe, since properties behave differently as a getter or setter depending on the context in what they are used:
Console.WriteLine(car.Speed); //getter
car.Speed = 55 //setter
But while it is a single property, there are two seperate sections of code for implementing the getting and setting, and it is clear that this is a getter/setter and not a function speed, because they omit the () for properties. So car.speed() is clearly a function, and car.speed is clearly a property getter.
IMHO the C# style of having properties as syntatic sugar for get and set methods is the most expressive.
I prefer active objects which encapsulate operations rather than getters and setters, so you get a semantically richer objects.
For example, although an ADT rather than a business object, even the vector in C++ has paired functions:
size_type capacity() const // how many elements space is reserved for in the vector
void reserve(size_type n) // ensure space is reserved for at least n elements
and
void push_back ( const T& ) // inserts an element at the end
size_type size () const // the number of elements in the vector
If you drive a car, you can set the accelerator, clutch, brakes and gear selection, but you don't set the speed. You can read the speed off the speedometer. It's relatively rare to want both a setter and a getter on an object with behaviour.
FYI, Objective-C uses car.speed() and car.setSpeed(55) (except in a different syntax, [car speed] and [car setSpeed:55].
It's all about convention.
There is no right answer, it's a matter of style, and ultimately it does not matter. Spend your brain cycles elsewhere.
FWIW I prefer the class.noun() for the getter, and class.verb() for the setter. Sometimes the verb is just setNoun(), but other times not. It depends on the noun. For example:
my_vector.size()
returns the size, and
my_vector.resize(some_size)
changes the size.
The groovy approach to properties is quite excellent IMHO, http://groovy.codehaus.org/Groovy+Beans
The final benchmarks of your code should be this:
Does it work correctly?
Is it easy to fix if it breaks?
Is it easy to add new features in the future?
Is it easy for someone else to come in and fix/enhance it?
If those 4 points are covered, I can't imagine why anybody would have a problem with it. Most of the "Best Practices" are generally geared towards achieving those 4 points.
Use whichever style works for you, just be consistent about it, and you should be fine.
This is just a matter of convention. In Smalltalk, it's done the way you suggest and I don't recall ever hearing anybody complain about it. Getting the car's speed is car speed, and setting the car's speed to 55 is car speed:55.
If I were to venture a guess, I would say the reason this style didn't catch on is because of the two lines down which object-oriented programming have come to us: C++ and Objective-C. In C++ (even more so early in its history), methods are very closely related to C functions, and C functions are conventionally named along the lines of setWhatever() and do not have overloading for different numbers of arguments, so that general style of naming was kept. Objective-C was largely preserved by NeXT (which later became Apple), and NeXT tended to favor verbosity in their APIs and especially to distinguish between different kinds of methods — if you're doing anything but just accessing a property, NeXT wanted a verb to make it clear. So that became the convention in Cocoa, which is the de facto standard library for Objective-C these days.
It's convention Java has a convention of getters and setters C# has properties, python has public fields and JavaScript frameworks tend to use field() to get and field(value) to set
Apart from unambiguous clarity, why should we stick to:
car.getSpeed() and car.setSpeed(55)
when this could be used as well : car.speed() and car.speed(55)
Because in all languages I've encountered, car.speed() and car.speed(55) are the same in terms of syntax. Just looking at them like that, both could return a value, which isn't true for the latter if it was meant to be a setter.
What if you intend to call the setter but forget to put in the argument? The code is valid, so the compiler doesn't complain, and it doesn't throw an immediate runtime error; it's a silent bug.
.() means it's a verb.
no () means it's a noun.
car.Speed = 50;
x = car.Speed
car.Speed.set(30)
car.setProperty("Speed",30)
but
car.Speed()
implies command to exceed speed limit.

Is Inheritance really needed?

I must confess I'm somewhat of an OOP skeptic. Bad pedagogical and laboral experiences with object orientation didn't help. So I converted into a fervent believer in Visual Basic (the classic one!).
Then one day I found out C++ had changed and now had the STL and templates. I really liked that! Made the language useful. Then another day MS decided to apply facial surgery to VB, and I really hated the end result for the gratuitous changes (using "end while" instead of "wend" will make me into a better developer? Why not drop "next" for "end for", too? Why force the getter alongside the setter? Etc.) plus so much Java features which I found useless (inheritance, for instance, and the concept of a hierarchical framework).
And now, several years afterwards, I find myself asking this philosophical question: Is inheritance really needed?
The gang-of-four say we should favor object composition over inheritance. And after thinking of it, I cannot find something you can do with inheritance you cannot do with object aggregation plus interfaces. So I'm wondering, why do we even have it in the first place?
Any ideas? I'd love to see an example of where inheritance would be definitely needed, or where using inheritance instead of composition+interfaces can lead to a simpler and easier to modify design. In former jobs I've found if you need to change the base class, you need to modify also almost all the derived classes for they depended on the behaviour of parent. And if you make the base class' methods virtual... then not much code sharing takes place :(
Else, when I finally create my own programming language (a long unfulfilled desire I've found most developers share), I'd see no point in adding inheritance to it...
Really really short answer: No. Inheritance is not needed because only byte code is truly needed. But obviously, byte code or assemble is not a practically way to write your program. OOP is not the only paradigm for programming. But, I digress.
I went to college for computer science in the early 2000s when inheritance (is a), compositions (has a), and interfaces (does a) were taught on an equal footing. Because of this, I use very little inheritance because it is often suited better by composition. This was stressed because many of the professors had seen bad code (along with what you have described) because of abuse of inheritance.
Regardless of creating a language with or without inheritances, can you create a programming language which prevents bad habits and bad design decisions?
I think asking for situations where inheritance is really needed is missing the point a bit. You can fake inheritance by using an interface and some composition. This doesnt mean inheritance is useless. You can do anything you did in VB6 in assembly code with some extra typing, that doesn't mean VB6 was useless.
I usually just start using an interface. Sometimes I notice I actually want to inherit behaviour. That usually means I need a base class. It's that simple.
Inheritance defines an "Is-A" relationship.
class Point( object ):
# some set of features: attributes, methods, etc.
class PointWithMass( Point ):
# An additional feature: mass.
Above, I've used inheritance to formally declare that PointWithMass is a Point.
There are several ways to handle object P1 being a PointWithMass as well as Point. Here are two.
Have a reference from PointWithMass object p1 to some Point object p1-friend. The p1-friend has the Point attributes. When p1 needs to engage in Point-like behavior, it needs to delegate the work to its friend.
Rely on language inheritance to assure that all features of Point are also applicable to my PointWithMass object, p1. When p1 needs to engage in Point-like behavior, it already is a Point object and can just do what needs to be done.
I'd rather not manage the extra objects floating around to assure that all superclass features are part of a subclass object. I'd rather have inheritance to be sure that each subclass is an instance of it's own class, plus is an instance of all superclasses, too.
Edit.
For statically-typed languages, there's a bonus. When I rely on the language to handle this, a PointWithMass can be used anywhere a Point was expected.
For really obscure abuse of inheritance, read about C++'s strange "composition through private inheritance" quagmire. See Any sensible examples of creating inheritance without creating subtyping relations? for some further discussion on this. It conflates inheritance and composition; it doesn't seem to add clarity or precision to the resulting code; it only applies to C++.
The GoF (and many others) recommend that you only favor composition over inheritance. If you have a class with a very large API, and you only want to add a very small number of methods to it, leaving the base implementation alone, I would find it inappropriate to use composition. You'd have to re-implement all of the public methods of the encapsulated class to just return their value. This is a waste of time (programmer and CPU) when you can just inherit all of this behavior, and spend your time concentrating on new methods.
So, to answer your question, no you don't absolutely need inheritance. There are, however, many situations where it's the right design choice.
The problem with inheritance is that it conflates the issue of sub-typing (asserting an is-a relationship) and code reuse (e.g., private inheritance is for reuse only).
So, no it's an overloaded word that we don't need. I'd prefer sub-typing (using the 'implements' keyword) and import (kinda like Ruby does it in class definitions)
Inheritance lets me push off a whole bunch of bookkeeping onto the compiler because it gives me polymorphic behavior for object hierarchies that I would otherwise have to create and maintain myself. Regardless of how good a silver bullet OOP is, there will always be instances where you want to employ a certain type of behavior because it just makes sense to do. And ultimately, that's the point of OOP: it makes a certain class of problems much easier to solve.
The downsides of composition is that it may disguise the relatedness of elements and it may be harder for others to understand. With,say, a 2D Point class and the desire to extend it to higher dimensions, you would presumably have to add (at least) Z getter/setter, modify getDistance(), and maybe add a getVolume() method. So you have the Objects 101 elements: related state and behavior.
A developer with a compositional mindset would presumably have defined a getDistance(x, y) -> double method and would now define a getDistance(x, y, z) -> double method. Or, thinking generally, they might define a getDistance(lambdaGeneratingACoordinateForEveryAxis()) -> double method. Then they would probably write createTwoDimensionalPoint() and createThreeDimensionalPoint() factory methods (or perhaps createNDimensionalPoint(n) ) that would stitch together the various state and behavior.
A developer with an OO mindset would use inheritance. Same amount of complexity in the implementation of domain characteristics, less complexity in terms of initializing the object (constructor takes care of it vs. a Factory method), but not as flexible in terms of what can be initialized.
Now think about it from a comprehensibility / readability standpoint. To understand the composition, one has a large number of functions that are composed programmatically inside another function. So there's little in terms of static code 'structure' (files and keywords and so forth) that makes the relatedness of Z and distance() jump out. In the OO world, you have a great big flashing red light telling you the hierarchy. Additionally, you have an essentially universal vocabulary to discuss structure, widely known graphical notations, a natural hierarchy (at least for single inheritance), etc.
Now, on the other hand, a well-named and constructed Factory method will often make explicit more of the sometimes-obscure relationships between state and behavior, since a compositional mindset facilitates functional code (that is, code that passes state via parameters, not via this ).
In a professional environment with experienced developers, the flexibility of composition generally trumps its more abstract nature. However, one should never discount the importance of comprehensibility, especially in teams that have varying degrees of experience and/or high levels of turnover.
Inheritance is an implementation decision. Interfaces almost always represent a better design, and should usually be used in an external API.
Why write a lot of boilerplate code forwarding method calls to a composed member object when the compiler will do it for you with inheritance?
This answer to another question summarises my thinking pretty well.
Does anyone else remember all of the OO-purists going ballistic over the COM implementation of "containment" instead of "inheritance?" It achieved essentially the same thing, but with a different kind of implementation. This reminds me of your question.
I strictly try to avoid religious wars in software development. ("vi" OR "emacs" ... when everybody knows its "vi"!) I think they are a sign of small minds. Comp Sci Professors can afford to sit around and debate these things. I'm working in the real world and could care less. All of this stuff are simply attempts at giving useful solutions to real problems. If they work, people will use them. The fact that OO languages and tools have been commercially available on a wide scale for going on 20 years is a pretty good bet that they are useful to a lot of people.
There are a lot of features in a programming language that are not really needed. But they are there for a variety of reasons that all basically boil down to reusability and maintainability.
All a business cares about is producing (quality of course) cheaply and quickly.
As a developer you help do this is by becoming more efficient and productive. So you need to make sure the code you write is easily reusable and maintainable.
And, among other things, this is what inheritance gives you - the ability to reuse without reinventing the wheel, as well as the ability to easily maintain your base object without having to perform maintenance on all similar objects.
There's lots of useful usages of inheritance, and probably just as many which are less useful. One of the useful ones is the stream class.
You have a method that should be able stream data. By using the stream base class as input to the method you ensure that your method can be used to write to many kinds of streams without change. To the file system, over the network, with compression, etc.
No.
for me, OOP is mostly about encapsulation of state and behavior and polymorphism.
and that is. but if you want static type checking, you'll need some way to group different types, so the compiler can check while still allowing you to use new types in place of another, related type. creating a hierarchy of types lets you use the same concept (classes) for types and for groups of types, so it's the most widely used form.
but there are other ways, i think the most general would be duck typing, and closely related, prototype-based OOP (which isn't inheritance in fact, but it's usually called prototype-based inheritance).
Depends on your definition of "needed". No, there is nothing that is impossible to do without inheritance, although the alternative may require more verbose code, or a major rewrite of your application.
But there are definitely cases where inheritance is useful. As you say, composition plus interfaces together cover almost all cases, but what if I want to supply a default behavior? An interface can't do that. A base class can. Sometimes, what you want to do is really just override individual methods. Not reimplement the class from scratch (as with an interface), but just change one aspect of it. or you may not want all members of the class to be overridable. Perhaps you have only one or two member methods you want the user to override, and the rest, which calls these (and performs validation and other important tasks before and after the user-overridden methods) are specified once and for all in the base class, and can not be overridden.
Inheritance is often used as a crutch by people who are too obsessed with Java's narrow definition of (and obsession with) OOP though, and in most cases I agree, it's the wrong solution, as if the deeper your class hierarchy, the better your software.
Inheritance is a good thing when the subclass really is the same kind of object as the superclass. E.g. if you're implementing the Active Record pattern, you're attempting to map a class to a table in the database, and instances of the class to a row in the database. Consequently, it is highly likely that your Active Record classes will share a common interface and implementation of methods like: what is the primary key, whether the current instance is persisted, saving the current instance, validating the current instance, executing callbacks upon validation and/or saving, deleting the current instance, running a SQL query, returning the name of the table that the class maps to, etc.
It also seems from how you phrase your question that you're assuming that inheritance is single but not multiple. If we need multiple inheritance, then we have to use interfaces plus composition to pull off the job. To put a fine point about it, Java assumes that implementation inheritance is singular and interface inheritance can be multiple. One need not go this route. E.g. C++ and Ruby permit multiple inheritance for your implementation and your interface. That said, one should use multiple inheritance with caution (i.e. keep your abstract classes virtual and/or stateless).
That said, as you note, there are too many real-life class hierarchies where the subclasses inherit from the superclass out of convenience rather than bearing a true is-a relationship. So it's unsurprising that a change in the superclass will have side-effects on the subclasses.
Not needed, but usefull.
Each language has got its own methods to write less code. OOP sometimes gets convoluted, but I think that is the responsability of the developers, the OOP platform is usefull and sharp when it is well used.
I agree with everyone else about the necessary/useful distinction.
The reason I like OOP is because it lets me write code that's cleaner and more logically organized. One of the biggest benefits comes from the ability to "factor-up" logic that's common to a number of classes. I could give you concrete examples where OOP has seriously reduced the complexity of my code, but that would be boring for you.
Suffice it to say, I heart OOP.
Absolutely needed? no,
But think of lamps. You can create a new lamp from scratch each time you make one, or you can take properties from the original lamp and make all sorts of new styles of lamp that have the same properties as the original, each with their own style.
Or you can make a new lamp from scratch or tell people to look at it a certain way to see the light, or , or, or
Not required, but nice :)
Thanks to all for your answers. I maintain my position that, strictly speaking, inheritance isn't needed, though I believe I found a new appreciation for this feature.
Something else: In my job experience, I have found inheritance leads to simpler, clearer designs when it's brought in late in the project, after it's noticed a lot of the classes have much commonality and you create a base class. In projects where a grand-schema was created from the very beginning, with a lot of classes in an inheritance hierarchy, refactoring is usually painful and dificult.
Seeing some answers mentioning something similar makes me wonder if this might not be exactly how inheritance's supposed to be used: ex post facto. Reminds me of Stepanov's quote: "you don't start with axioms, you end up with axioms after you have a bunch of related proofs". He's a mathematician, so he ought to know something.
The biggest problem with interfaces is that they cannot be changed. Make an interface public, then change it (add a new method to it) and break million applications all around the world, because they have implemented your interface, but not the new method. The app may not even start, a VM may refuse to load it.
Use a base class (not abstract) other programmers can inherit from (and override methods as needed); then add a method to it. Every app using your class will still work, this method just won't be overridden by anyone, but since you provide a base implementation, this one will be used and it may work just fine for all subclasses of your class... it may also cause strange behavior because sometimes overriding it would have been necessary, okay, might be the case, but at least all those million apps in the world will still start up!
I rather have my Java application still running after updating the JDK from 1.6 to 1.7 with some minor bugs (that can be fixed over time) than not having it running it at all (forcing an immediate fix or it will be useless to people).
//I found this QA very useful. Many have answered this right. But i wanted to add...
1: Ability to define abstract interface - E.g., for plugin developers. Of course, you can use function pointers, but this is better and simpler.
2: Inheritance helps model types very close to their actual relationships. Sometimes a lot of errors get caught at compile time, because you have the right type hierarchy. For instance, shape <-- triangle (lets say there is a lot of code to be reused). You might want to compose triangle with a shape object, but shape is an incomplete type. Inserting dummy implementations like double getArea() {return -1;} will do, but you are opening up room for error. That return -1 can get executed some day!
3: void func(B* b); ... func(new D()); Implicit type conversion gives a great notational convenience since Derived is Base. I remember having read Straustrup saying that he wanted to make classes first class citizens just like fundamental data types (hence overloading operators etc). Implicit conversion from Derived to Base, behaves just like an implicit conversion from a data type to broader compatible one (short to int).
Inheritance and Composition have their own pros and cons.
Refer to this related SE question on pros of inheritance and cons of composition.
Prefer composition over inheritance?
Have a look at the example in this documentation link:
The example shows different use cases of overriding by using inheritance as a mean to achieve polymorphism.
In the following, inheritance is used to present a particular property for all of several specific incarnations of the same type thing. In this case, the GeneralPresenation has a properties that are relevant to all "presentation" (the data passed to an MVC view). The Master Page is the only thing using it and expects a GeneralPresentation, though the specific views expect more info, tailored to their needs.
public abstract class GeneralPresentation
{
public GeneralPresentation()
{
MenuPages = new List<Page>();
}
public IEnumerable<Page> MenuPages { get; set; }
public string Title { get; set; }
}
public class IndexPresentation : GeneralPresentation
{
public IndexPresentation() { IndexPage = new Page(); }
public Page IndexPage { get; set; }
}
public class InsertPresentation : GeneralPresentation
{
public InsertPresentation() {
InsertPage = new Page();
ValidationInfo = new PageValidationInfo();
}
public PageValidationInfo ValidationInfo { get; set; }
public Page InsertPage { get; set; }
}