Is my understanding of abstraction correct? - oop

I've read the other posts discussing abstraction and encapsulation, but I'm not confident I understand them; or maybe I understand them but feel unsatisfied with the clarity of their content. Here are my understandings of abstraction and encapsulation. In what regards are they accurate/inaccurate/complete/incomplete?
"Abstractions are data types created by programmers to extend a language when primitive data types are insufficient. Like primitive data types, abstractions have specifications which list the inputs they require and the outputs they return, but the specifications do not overwhelm programmers with the methods, functions, and variables used to operate on the inputs. A class is an example of an abstraction. An API is another example of an abstraction."
"Encapsulation is the state of having abstract data types — i.e. classes — isolated from each other so their methods, functions, and variables do not conflict with each other, and so programmers can easily reuse an existing class in other programs without being concerned that doing so would interfere with the rest of the program (presuming the programmer correctly provides the required inputs and correctly handles the data that get returns)."

I prefer Robert C. Martin's definition in APPP:
Abstraction is the elimination of the irrelevant and the amplification of the essential.

I'd say your understanding is correct ... so much, so, that I hesitate to comment more specifically.
However, if I were to comment, I might say that "Data types can be used to implement abstractions ...", rather than "Abstractions are data types ...", since abstractions can exist outside of software (it hurt me to say that :-).
But that's just nitpicking. I think you understand. I hope I do, after 36 years of coding ... mostly in languages that support reasonable levels of abstraction (PL/1, Pascal, C, C++, Java).
There are a lot of nice intelligent people in industry, though, who have no concept of abstraction in software, and consider it pretentiously high brow.
Personally, I think that good clear misnomer-free abstraction is a key technical ingredient of solid software engineering.

I've never come across that definition of encapsulation before. That definition sounds more like what namespaces are for. I've always read about encapsulation being purely about the ability to restrict access to certain components of your code, such as access modifiers in OOP languages. However, there seems to be a two definitions of encapsulation on wikipedia, which is news to me:
Encapsulation is the packing of data and functions into a single
component. The features of encapsulation are supported using classes
in most object-oriented programming languages, although other
alternatives also exist. It allows selective hiding of properties and
methods in an object by building an impenetrable wall to protect the
code from accidental corruption.
In programming languages, encapsulation is used to refer to one of two
related but distinct notions, and sometimes to the combination
thereof:
A language mechanism for restricting access to some of the object's
components.
A language construct that facilitates the bundling
of data with the methods (or other functions) operating on that
data.
Some programming language researchers and academics use
the first meaning alone or in combination with the second as a
distinguishing feature of object-oriented programming, while other
programming languages which provide lexical closures view
encapsulation as a feature of the language orthogonal to object
orientation.
The second definition is motivated by the fact that in many OOP
languages hiding of components is not automatic or can be overridden;
thus, information hiding is defined as a separate notion by those who
prefer the second definition.
source
So, I guess I've always defined encapsulation in terms of point #1, but it looks like some people define it as the ability to bundle methods and data together, and term #1 "information hiding".

Related

Multimethods vs Interfaces

Are there languages that idiomatically use both notions at the same time? When will that be necessary if ever? What are the pros and cons of each approach?
Background to the question:
I am a novice (with some python knowledge) trying to build a better picture of how multimethods and interfaces are meant to be used (in general).
I assume that they are not meant to be mixed: Either one declares available logic in terms of interfaces (and implements it as methods of the class) or one does it in terms of multimethods. Is this correct?
Does it make sense to speak of a spectrum of OOP notions where:
one starts with naive subclassing (data and logic(methods) and logic implementation(methods) are tightly coupled)
then passes through interfaces (logic is in the interface, data and logic implementation is in the class)
and ends at multimethods (logic is in the signature of the multimethod, logic implementation is scattered, data is in the class(which is only a datastructure with nice handles))?
This answer, to begin, largely derives from my primary experience developing in common-lisp and clojure.
Yes, multimethods do carry some penalty in cost, but offer almost unlimited flexibility in the ability to craft a dispatch mechanism that precisely models whatever you might look to accomplish by their specialization.
Protocols and Interfaces, on one hand, are also involved with sone of these same matters of specializations and dispatch, but they work and are used in a very different manner. These are facilities that follow a convention wherein single dispatch provides only a straightforward mapping of one specialized implementation for a given class. The power of protocols and interfaces is in their typical use to define some group of abstract capabilities that, when taken together, fully specify the API for thus concept. For example, a "pointer" interface might contain the 3 or 4 concepts that represent the notion of what a pointer is. So the general interface of a pointer might look like REFERENCE, DEREFERENCE, ALLOCATE, and DISPOSE. Thus the power of an interface comes from its composition of a group of related definitions that, together, express a compete abstraction -- when implementing an interface in a specific situation, it is normally an all-or-nothing endeavor. Either all four of those functions are present, or whatever this thing us does not represent our definition of pointer.
Hope this helped a little.
Dan Lentz

"Many functions operating upon few abstractions" principle vs OOP

The creator of the Clojure language claims that "open, and large, set of functions operate upon an open, and small, set of extensible abstractions is the key to algorithmic reuse and library interoperability". Obviously it contradicts the typical OOP approach where you create a lot of abstractions (classes) and a relatively small set of functions operating on them. Please suggest a book, a chapter in a book, an article, or your personal experience that elaborate on the topics:
motivating examples of problems that appear in OOP and how using "many functions upon few abstractions" would address them
how to effectively do MFUFA* design
how to refactor OOP code towards MFUFA
how OOP languages' syntax gets in the way of MFUFA
*MFUFA: "many functions upon few abstractions"
There are two main notions of "abstraction" in programming:
parameterisation ("polymorphism", genericity).
encapsulation (data hiding),
[Edit: These two are duals. The first is client-side abstraction, the second implementer-side abstraction (and in case you care about these things: in terms of formal logic or type theory, they correspond to universal and existential quantification, respectively).]
In OO, the class is the kitchen sink feature for achieving both kinds of abstraction.
Ad (1), for almost every "pattern" you need to define a custom class (or several). In functional programming on the other hand, you often have more lightweight and direct methods to achieve the same goals, in particular, functions and tuples. It is often pointed out that most of the "design patterns" from the GoF are redundant in FP, for example.
Ad (2), encapsulation is needed a little bit less often if you don't have mutable state lingering around everywhere that you need to keep in check. You still build ADTs in FP, but they tend to be simpler and more generic, and hence you need fewer of them.
When you write program in object-oriented style, you make emphasis on expressing domain area in terms of data types. And at first glance this looks like a good idea - if we work with users, why not to have a class User? And if users sell and buy cars, why not to have class Car? This way we can easily maintain data and control flow - it just reflects order of events in the real world. While this is quite convenient for domain objects, for many internal objects (i.e. objects that do not reflect anything from real world, but occur only in program logic) it is not so good. Maybe the best example is a number of collection types in Java. In Java (and many other OOP languages) there are both arrays, Lists. In JDBC there's ResultSet which is also kind of collection, but doesn't implement Collection interface. For input you will often use InputStream that provides interface for sequential access to the data - just like linked list! However it doesn't implement any kind of collection interface as well. Thus, if your code works with database and uses ResultSet it will be harder to refactor it for text files and InputStream.
MFUFA principle teaches us to pay less attention to type definition and more to common abstractions. For this reason Clojure introduces single abstraction for all mentioned types - sequence. Any iterable is automatically coerced to sequence, streams are just lazy lists and result set may be transformed to one of previous types easily.
Another example is using PersistentMap interface for structs and records. With such common interfaces it becomes very easy to create resusable subroutines and do not spend lots of time to refactoring.
To summarize and answer your questions:
One simple example of an issue that appears in OOP frequently: reading data from many different sources (e.g. DB, file, network, etc.) and processing it in the same way.
To make good MFUFA design try to make abstractions as common as possible and avoid ad-hoc implementations. E.g. avoid types a-la UserList - List<User> is good enough in most cases.
Follow suggestions from point 2. In addition, try to add as much interfaces to your data types (classes) as it possible. For example, if you really need to have UserList (e.g. when it should have a lot of additional functionality), add both List and Iterable interfaces to its definition.
OOP (at least in Java and C#) is not very well suited for this principle, because they try to encapsulate the whole object's behavior during initial design, so it becomes hard add more functions to them. In most cases you can extend class in question and put methods you need into new object, but 1) if somebody else implements their own derived class, it will not be compatible with yours; 2) sometimes classes are final or all fields are made private, so derived classes don't have access to them (e.g. to add new functions to class String one should implement additional classStringUtils). Nevertheless, rules I described above make it much easier to use MFUFA in OOP-code. And best example here is Clojure itself, which is gracefully implemented in OO-style but still follows MFUFA principle.
UPD. I remember another description of difference between object oriented and functional styles, that maybe summarizes better all I said above: designing program in OO style is thinking in terms of data types (nouns), while designing in functional style is thinking in terms of operations (verbs). You may forget that some nouns are similar (e.g. forget about inheritance), but you should always remember that many verbs in practice do the same thing (e.g. have same or similar interfaces).
A much earlier version of the quote:
"The simple structure and natural applicability of lists are reflected in functions that are amazingly nonidiosyncratic. In Pascal the plethora of declarable data structures induces a specialization within functions that inhibits and penalizes casual cooperation. It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures."
...comes from the foreword to the famous SICP book. I believe this book has a lot of applicable material on this topic.
I think you're not getting that there's a difference between libraries and programmes.
OO libraries which work well usually generate a small number of abstractions, which programmes use to build the abstractions for their domain. Larger OO libraries (and programmes) use inheritance to create different versions of methods and introduce new methods.
So, yes, the same principle applies to OO libraries.

Why avoid subtyping?

I have seen many people in the Scala community advise on avoiding subtyping "like a plague". What are the various reasons against the use of subtyping? What are the alternatives?
Types determine the granularity of composition, i.e. of extensibility.
For example, an interface, e.g. Comparable, that combines (thus conflates) equality and relational operators. Thus it is impossible to compose on just one of the equality or relational interface.
In general, the substitution principle of inheritance is undecidable. Russell's paradox implies that any set that is extensible (i.e. does not enumerate the type of every possible member or subtype), can include itself, i.e. is a subtype of itself. But in order to identify (decide) what is a subtype and not itself, the invariants of itself must be completely enumerated, thus it is no longer extensible. This is the paradox that subtyped extensibility makes inheritance undecidable. This paradox must exist, else knowledge would be static and thus knowledge formation wouldn't exist.
Function composition is the surjective substitution of subtyping, because the input of a function can be substituted for its output, i.e. any where the output type is expected, the input type can be substituted, by wrapping it in the function call. But composition does not make the bijective contract of subtyping-- accessing the interface of the output of a function, does not access the input instance of the function.
Thus composition does not have to maintain the future (i.e. unbounded) invariants and thus can be both extensible and decidable. Subtyping can be MUCH more powerful where it is provably decidable, because it maintains this bijective contract, e.g. a function that sorts a immutable list of the supertype, can operate on the immutable list of the subtype.
So the conclusion is to enumerate all the invariants of each type (i.e. of its interfaces), make these types orthogonal (maximize granularity of composition), and then use function composition to accomplish extension where those invariants would not be orthogonal. Thus a subtype is appropriate only where it provably models the invariants of the supertype interface, and the additional interface(s) of the subtype are provably orthogonal to the invariants of the supertype interface. Thus the invariants of interfaces should be orthogonal.
Category theory provides rules for the model of the invariants of each subtype, i.e. of Functor, Applicative, and Monad, which preserve function composition on lifted types, i.e. see the aforementioned example of the power of subtyping for lists.
One reason is that equals() is very hard to get right when sub-typing is involved. See How to Write an Equality Method in Java. Specifically "Pitfall #4: Failing to define equals as an equivalence relation". In essence: to get equality right under sub-typing, you need a double dispatch.
I think the general context is for the lanaguage to be as "pure" as possible (ie using as much as possible pure functions), and comes from the comparison with Haskell.
From "Ruminations of a Programmer"
Scala, being a hybrid OO-FP language has to take care of issues like subtyping (which Haskell does not have).
As mentioned in this PSE answer:
no way to restrict a subtype so that it can't do more than the type it inherits from.
For example, if the base class is immutable and defines a pure method foo(...), derived classes must not be mutable or override foo() with a function that is not pure
But the actual recommendation would be to use the best solution adapted to the program you are currently developing.
Focusing on subtyping, ignoring the issues related to classes, inheritance, OOP, etc.. We have the idea subtyping represents a isa relation between types. For example, types A and B have different operations but if A isa B we then can use any of B's operations on an A.
OTOH, using another traditional relation, if C hasa B then we can reuse any of B's operations on a C. Usually languages let you write one with a nicer syntax, a.opOnB instead of a.super.opOnB as it would be in the case of composition, c.b.opOnB
The problem is that in many cases there's more than one way to relate two types. For example Real can be embedded in Complex assuming 0 on the imaginary part, but Complex can be embedded in Real by ignoring the imaginary part, so both can be seen as subtypes of the other and subtyping forces one relation to be viewed as preferred. Also, there are more possible relations (e.g. view Complex as a Real using theta component of polar representation).
In formal terminology we usually say morphism to such relations between types and there are special kinds of morphisms for relations with different properties (e.g. isomorphism, homomorphism).
In a language with subtyping usually there's much more sugar on isa relations and given many possible embeddings we tend to see unnecessary friction whenever we're using the unpreferred relation. If we bring inheritance, classes and OOP to the mix the problem becomes much more visible and messy.
My answer does not answer why it is avoided but tries to give another hint at why it can be avoided.
Using "type classes" you can add an abstraction over existing types/classes without modifying them. Inheritance is used to express that some classes are specializations of a more abstract class. But with type classes you can take any existing classes and express that they all share a common property, for example they are Comparable. And as long as you are not concerned with them being Comparable you don't even notice it. The classes don't inherit any methods from some abstract Comparable type as long as you don't use them. It's a bit like programming in dynamic languages.
Further reads:
http://blog.tmorris.net/the-power-of-type-classes-with-scala-implicit-defs/
http://debasishg.blogspot.com/2010/07/refactoring-into-scala-type-classes.html
I don't know Scala, but I think the mantra 'prefer composition over inheritance' applies for Scala exactly the way it does for every other OO programming language (and subtyping is often used with the same meaning as 'inheritance'). Here
Prefer composition over inheritance?
you will find some more information.
I think lots of Scala programmers are former Java programmers. They are used to think in term of Object Oriented subtyping and they should be able to easily find OO-like solution for most problems. But Functional Programing is a new paradigm to discover, so people ask for a different kind of solutions.
This is the best paper I have found on the subject. A motivating quote from the paper –
We argue that while some of the simpler aspects of object-oriented languages are
compatible with ML, adding a full-fledged class-based object system to ML leads to an excessively complex
type system and relatively little expressive gain

What are the advantages or features of object oriented programming?

What makes everyone went from sequential languages to ​​object languages ?
According to Wikipedia the features of object oriented programming are data abstraction, encapsulation, messaging, modularity, polymorphism, and inheritance. For me data abstraction, encapsulation, messaging, modularity also exist in sequential languages. Only the polymorphism, and inheritance are specific to object oriented programming. Is this correct ?
Many non-OOP languages can certainly build those features. Just looking from a C vs. C++ area, you can provide encapsulation in C by using opaque pointers, with a suite of functions that take/return these opaque objects, and an internal set of functions that are all file-static. You can even do polymorphism and inheritance with function pointers and encapsulated objects.
Then again, we could also all still be programming in assembly or machine language. The reason to bring any feature into a language is to make it easier to use that feature.
Again, looking at C vs. C++, dealing with opaque pointers and the like is annoying, repetitive, and semi-difficult. With C++, you can achieve the same effect with much less code. It's obvious to everyone what is going on. It's a lot more difficult to break (though not impossible). Plus, you make it easy to break encapsulation if you need, since you can define language constructs like friend that provide exceptions where necessary.
And then there are those things that are really hard to implement without direct language support. Operator overloading is impossible of course, but function overloading is really, really hard to do without language support.
Most important of all, if it's in the language, then everyone does it the same way. There are multiple ways of implementing inheritance and polymorphism in C. All of them are incompatible with one another. And while C++ users could do any of those methods, they opt to use the actual language feature 99.9% of the time. This means it's much easier to read someone else's code and know what's going on. You don't have to guess what is opaque and what isn't. You don't have to guess at what is derived from what. You know it, since everyone does it the same way.
In any case, most of the OOP-lite language (C++, Java, C#) can be used more or less like a procedural one if you want. You just ignore the objects. So in many ways, they get the best of both worlds.
The advantage can be summarized this way:
OOP can represent the real world more directly and precisely than previous paradigms, so the program becomes simpler and easier to understand.
And about this:
For me data abstraction, encapsulation, messaging, modularity also exist in sequential languages. Only the polymorphism, and inheritance are specific to object oriented programming.
Most human-readable language can provide data abstraction, encapsulation, messaging and modularity (otherwise they would be machine-languages), but OOP supports better these concepts. For example, to set text of a widget in C, you would do something like this:
HANDLE myEditBox = CreateEditBox(hParent, ...);
SetText(myEditBox, "Hello!");
Notice you have a handle to an object, not an actual object. Now in C++ (OOP) you can make this:
EditBox myEditBox(...);
myEditBox.SetText("Hello!");
The difference is subtle, but important. The C style SetText(handle, "Hello!") does not make any distinction between the handle and other parameters. You don't even know that there's a message to the object. Now the C++ style object.SetText("Hello!") it's like telling explicitly: Hey, object, set your text to "Hello!". Here, the notion of message and receiver (the object) are explicit.
C++ can also destroy objects automatically if they are not declared as pointers, which eliminates calls such as DestroyObject(myEditBox).
Also without OOP you have very poor encapsulation, because most things are implemented with structures which contains only public members. So you can't hide data from users, which mean somenone might try to change things in an unexpected way, that may cause bugs. This is quite common in large programs.

Object-oriented programming in a purely functional programming context?

Are there any advantages to using object-oriented programming (OOP) in a functional programming (FP) context?
I have been using F# for some time now, and I noticed that the more my functions are stateless, the less I need to have them as methods of objects. In particular, there are advantages to relying on type inference to have them usable in as wide a number of situations as possible.
This does not preclude the need for namespaces of some form, which is orthogonal to being OOP. Nor is the use of data structures discouraged. In fact, real use of FP languages depend heavily on data structures. If you look at the F# stack implemented in F Sharp Programming/Advanced Data Structures, you will find that it is not object-oriented.
In my mind, OOP is heavily associated with having methods that act on the state of the object mostly to mutate the object. In a pure FP context that is not needed nor desired.
A practical reason may be to be able to interact with OOP code, in much the same way F# works with .NET. Other than that however, are there any reasons? And what is the experience in the Haskell world, where programming is more pure FP?
I will appreciate any references to papers or counterfactual real world examples on the issue.
The disconnect you see is not of FP vs. OOP. It's mostly about immutability and mathematical formalisms vs. mutability and informal approaches.
First, let's dispense with the mutability issue: you can have FP with mutability and OOP with immutability just fine. Even more-functional-than-thou Haskell lets you play with mutable data all you want, you just have to be explicit about what is mutable and the order in which things happen; and efficiency concerns aside, almost any mutable object could construct and return a new, "updated" instance instead of changing its own internal state.
The bigger issue here is mathematical formalisms, in particular heavy use of algebraic data types in a language little removed from lambda calculus. You've tagged this with Haskell and F#, but realize that's only half of the functional programming universe; the Lisp family has a very different, much more freewheeling character compared to ML-style languages. Most OO systems in wide use today are very informal in nature--formalisms do exist for OO but they're not called out explicitly the way FP formalisms are in ML-style languages.
Many of the apparent conflicts simply disappear if you remove the formalism mismatch. Want to build a flexible, dynamic, ad-hoc OO system on top of a Lisp? Go ahead, it'll work just fine. Want to add a formalized, immutable OO system to an ML-style language? No problem, just don't expect it to play nicely with .NET or Java.
Now, you may be wondering, what is an appropriate formalism for OOP? Well, here's the punch line: In many ways, it's more function-centric than ML-style FP! I'll refer back to one of my favorite papers for what seems to be the key distinction: structured data like algebraic data types in ML-style languages provide a concrete representation of the data and the ability to define operations on it; objects provide a black-box abstraction over behavior and the ability to easily replace components.
There's a duality here that goes deeper than just FP vs. OOP: It's closely related to what some programming language theorists call the Expression Problem: With concrete data, you can easily add new operations that work with it, but changing the data's structure is more difficult. With objects you can easily add new data (e.g., new subclasses) but adding new operations is difficult (think adding a new abstract method to a base class with many descendants).
The reason why I say that OOP is more function-centric is that functions themselves represent a form of behavioral abstraction. In fact, you can simulate OO-style structure in something like Haskell by using records holding a bunch of functions as objects, letting the record type be an "interface" or "abstract base class" of sorts, and having functions that create records replace class constructors. So in that sense, OO languages use higher-order functions far, far more often than, say, Haskell would.
For an example of something like this type of design actually put to very nice use in Haskell, read the source for the graphics-drawingcombinators package, in particular the way that it uses an opaque record type containing functions and combines things only in terms of their behavior.
EDIT: A few final things I forgot to mention above.
If OO indeed makes extensive use of higher-order functions, it might at first seem that it should fit very naturally into a functional language such as Haskell. Unfortunately this isn't quite the case. It is true that objects as I described them (cf. the paper mentioned in the LtU link) fit just fine. in fact, the result is a more pure OO style than most OO languages, because "private members" are represented by values hidden by the closure used to construct the "object" and are inaccessible to anything other than the one specific instance itself. You don't get much more private than that!
What doesn't work very well in Haskell is subtyping. And, although I think inheritance and subtyping are all too often misused in OO languages, some form of subtyping is quite useful for being able to combine objects in flexible ways. Haskell lacks an inherent notion of subtyping, and hand-rolled replacements tend to be exceedingly clumsy to work with.
As an aside, most OO languages with static type systems make a complete hash of subtyping as well by being too lax with substitutability and not providing proper support for variance in method signatures. In fact, I think the only full-blown OO language that hasn't screwed it up completely, at least that I know of, is Scala (F# seemed to make too many concessions to .NET, though at least I don't think it makes any new mistakes). I have limited experience with many such languages, though, so I could definitely be wrong here.
On a Haskell-specific note, its "type classes" often look tempting to OO programmers, to which I say: Don't go there. Trying to implement OOP that way will only end in tears. Think of type classes as a replacement for overloaded functions/operators, not OOP.
As for Haskell, classes are less useful there because some OO features are more easily achieved in other ways.
Encapsulation or "data hiding" is frequently done through function closures or existential types, rather than private members. For example, here is a data type of random number generator with encapsulated state. The RNG contains a method to generate values and a seed value. Because the type 'seed' is encapsulated, the only thing you can do with it is pass it to the method.
data RNG a where RNG :: (seed -> (a, seed)) -> seed -> RNG a
Dynamic method dispatch in the context of parametric polymorphism or "generic programming" is provided by type classes (which are not OO classes). A type class is like an OO class's virtual method table. However, there's no data hiding. Type classes do not "belong" to a data type the way that class methods do.
data Coordinate = C Int Int
instance Eq Coordinate where C a b == C d e = a == b && d == e
Dynamic method dispatch in the context of subtyping polymorphism or "subclassing" is almost a translation of the class pattern in Haskell using records and functions.
-- An "abstract base class" with two "virtual methods"
data Object =
Object
{ draw :: Image -> IO ()
, translate :: Coord -> Object
}
-- A "subclass constructor"
circle center radius = Object draw_circle translate_circle
where
-- the "subclass methods"
translate_circle center radius offset = circle (center + offset) radius
draw_circle center radius image = ...
I think that there are several ways of understanding what OOP means. For me, it is not about encapsulating mutable state, but more about organizing and structuring programs. This aspect of OOP can be used perfectly fine in conjunction with FP concepts.
I believe that mixing the two concepts in F# is a very useful approach - you can associate immutable state with operations working on that state. You'll get the nice features of 'dot' completion for identifiers, the ability to easy use F# code from C#, etc., but you can still make your code perfectly functional. For example, you can write something like:
type GameWorld(characters) =
let calculateSomething character =
// ...
member x.Tick() =
let newCharacters = characters |> Seq.map calculateSomething
GameWorld(newCharacters)
In the beginning, people don't usually declare types in F# - you can start just by writing functions and later evolve your code to use them (when you better understand the domain and know what is the best way to structure code). The above example:
Is still purely functional (the state is a list of characters and it is not mutated)
It is object-oriented - the only unusual thing is that all methods return a new instance of "the world"