"Many functions operating upon few abstractions" principle vs OOP - oop

The creator of the Clojure language claims that "open, and large, set of functions operate upon an open, and small, set of extensible abstractions is the key to algorithmic reuse and library interoperability". Obviously it contradicts the typical OOP approach where you create a lot of abstractions (classes) and a relatively small set of functions operating on them. Please suggest a book, a chapter in a book, an article, or your personal experience that elaborate on the topics:
motivating examples of problems that appear in OOP and how using "many functions upon few abstractions" would address them
how to effectively do MFUFA* design
how to refactor OOP code towards MFUFA
how OOP languages' syntax gets in the way of MFUFA
*MFUFA: "many functions upon few abstractions"

There are two main notions of "abstraction" in programming:
parameterisation ("polymorphism", genericity).
encapsulation (data hiding),
[Edit: These two are duals. The first is client-side abstraction, the second implementer-side abstraction (and in case you care about these things: in terms of formal logic or type theory, they correspond to universal and existential quantification, respectively).]
In OO, the class is the kitchen sink feature for achieving both kinds of abstraction.
Ad (1), for almost every "pattern" you need to define a custom class (or several). In functional programming on the other hand, you often have more lightweight and direct methods to achieve the same goals, in particular, functions and tuples. It is often pointed out that most of the "design patterns" from the GoF are redundant in FP, for example.
Ad (2), encapsulation is needed a little bit less often if you don't have mutable state lingering around everywhere that you need to keep in check. You still build ADTs in FP, but they tend to be simpler and more generic, and hence you need fewer of them.

When you write program in object-oriented style, you make emphasis on expressing domain area in terms of data types. And at first glance this looks like a good idea - if we work with users, why not to have a class User? And if users sell and buy cars, why not to have class Car? This way we can easily maintain data and control flow - it just reflects order of events in the real world. While this is quite convenient for domain objects, for many internal objects (i.e. objects that do not reflect anything from real world, but occur only in program logic) it is not so good. Maybe the best example is a number of collection types in Java. In Java (and many other OOP languages) there are both arrays, Lists. In JDBC there's ResultSet which is also kind of collection, but doesn't implement Collection interface. For input you will often use InputStream that provides interface for sequential access to the data - just like linked list! However it doesn't implement any kind of collection interface as well. Thus, if your code works with database and uses ResultSet it will be harder to refactor it for text files and InputStream.
MFUFA principle teaches us to pay less attention to type definition and more to common abstractions. For this reason Clojure introduces single abstraction for all mentioned types - sequence. Any iterable is automatically coerced to sequence, streams are just lazy lists and result set may be transformed to one of previous types easily.
Another example is using PersistentMap interface for structs and records. With such common interfaces it becomes very easy to create resusable subroutines and do not spend lots of time to refactoring.
To summarize and answer your questions:
One simple example of an issue that appears in OOP frequently: reading data from many different sources (e.g. DB, file, network, etc.) and processing it in the same way.
To make good MFUFA design try to make abstractions as common as possible and avoid ad-hoc implementations. E.g. avoid types a-la UserList - List<User> is good enough in most cases.
Follow suggestions from point 2. In addition, try to add as much interfaces to your data types (classes) as it possible. For example, if you really need to have UserList (e.g. when it should have a lot of additional functionality), add both List and Iterable interfaces to its definition.
OOP (at least in Java and C#) is not very well suited for this principle, because they try to encapsulate the whole object's behavior during initial design, so it becomes hard add more functions to them. In most cases you can extend class in question and put methods you need into new object, but 1) if somebody else implements their own derived class, it will not be compatible with yours; 2) sometimes classes are final or all fields are made private, so derived classes don't have access to them (e.g. to add new functions to class String one should implement additional classStringUtils). Nevertheless, rules I described above make it much easier to use MFUFA in OOP-code. And best example here is Clojure itself, which is gracefully implemented in OO-style but still follows MFUFA principle.
UPD. I remember another description of difference between object oriented and functional styles, that maybe summarizes better all I said above: designing program in OO style is thinking in terms of data types (nouns), while designing in functional style is thinking in terms of operations (verbs). You may forget that some nouns are similar (e.g. forget about inheritance), but you should always remember that many verbs in practice do the same thing (e.g. have same or similar interfaces).

A much earlier version of the quote:
"The simple structure and natural applicability of lists are reflected in functions that are amazingly nonidiosyncratic. In Pascal the plethora of declarable data structures induces a specialization within functions that inhibits and penalizes casual cooperation. It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures."
...comes from the foreword to the famous SICP book. I believe this book has a lot of applicable material on this topic.

I think you're not getting that there's a difference between libraries and programmes.
OO libraries which work well usually generate a small number of abstractions, which programmes use to build the abstractions for their domain. Larger OO libraries (and programmes) use inheritance to create different versions of methods and introduce new methods.
So, yes, the same principle applies to OO libraries.

Related

Is my understanding of abstraction correct?

I've read the other posts discussing abstraction and encapsulation, but I'm not confident I understand them; or maybe I understand them but feel unsatisfied with the clarity of their content. Here are my understandings of abstraction and encapsulation. In what regards are they accurate/inaccurate/complete/incomplete?
"Abstractions are data types created by programmers to extend a language when primitive data types are insufficient. Like primitive data types, abstractions have specifications which list the inputs they require and the outputs they return, but the specifications do not overwhelm programmers with the methods, functions, and variables used to operate on the inputs. A class is an example of an abstraction. An API is another example of an abstraction."
"Encapsulation is the state of having abstract data types — i.e. classes — isolated from each other so their methods, functions, and variables do not conflict with each other, and so programmers can easily reuse an existing class in other programs without being concerned that doing so would interfere with the rest of the program (presuming the programmer correctly provides the required inputs and correctly handles the data that get returns)."
I prefer Robert C. Martin's definition in APPP:
Abstraction is the elimination of the irrelevant and the amplification of the essential.
I'd say your understanding is correct ... so much, so, that I hesitate to comment more specifically.
However, if I were to comment, I might say that "Data types can be used to implement abstractions ...", rather than "Abstractions are data types ...", since abstractions can exist outside of software (it hurt me to say that :-).
But that's just nitpicking. I think you understand. I hope I do, after 36 years of coding ... mostly in languages that support reasonable levels of abstraction (PL/1, Pascal, C, C++, Java).
There are a lot of nice intelligent people in industry, though, who have no concept of abstraction in software, and consider it pretentiously high brow.
Personally, I think that good clear misnomer-free abstraction is a key technical ingredient of solid software engineering.
I've never come across that definition of encapsulation before. That definition sounds more like what namespaces are for. I've always read about encapsulation being purely about the ability to restrict access to certain components of your code, such as access modifiers in OOP languages. However, there seems to be a two definitions of encapsulation on wikipedia, which is news to me:
Encapsulation is the packing of data and functions into a single
component. The features of encapsulation are supported using classes
in most object-oriented programming languages, although other
alternatives also exist. It allows selective hiding of properties and
methods in an object by building an impenetrable wall to protect the
code from accidental corruption.
In programming languages, encapsulation is used to refer to one of two
related but distinct notions, and sometimes to the combination
thereof:
A language mechanism for restricting access to some of the object's
components.
A language construct that facilitates the bundling
of data with the methods (or other functions) operating on that
data.
Some programming language researchers and academics use
the first meaning alone or in combination with the second as a
distinguishing feature of object-oriented programming, while other
programming languages which provide lexical closures view
encapsulation as a feature of the language orthogonal to object
orientation.
The second definition is motivated by the fact that in many OOP
languages hiding of components is not automatic or can be overridden;
thus, information hiding is defined as a separate notion by those who
prefer the second definition.
source
So, I guess I've always defined encapsulation in terms of point #1, but it looks like some people define it as the ability to bundle methods and data together, and term #1 "information hiding".

Multimethods vs Interfaces

Are there languages that idiomatically use both notions at the same time? When will that be necessary if ever? What are the pros and cons of each approach?
Background to the question:
I am a novice (with some python knowledge) trying to build a better picture of how multimethods and interfaces are meant to be used (in general).
I assume that they are not meant to be mixed: Either one declares available logic in terms of interfaces (and implements it as methods of the class) or one does it in terms of multimethods. Is this correct?
Does it make sense to speak of a spectrum of OOP notions where:
one starts with naive subclassing (data and logic(methods) and logic implementation(methods) are tightly coupled)
then passes through interfaces (logic is in the interface, data and logic implementation is in the class)
and ends at multimethods (logic is in the signature of the multimethod, logic implementation is scattered, data is in the class(which is only a datastructure with nice handles))?
This answer, to begin, largely derives from my primary experience developing in common-lisp and clojure.
Yes, multimethods do carry some penalty in cost, but offer almost unlimited flexibility in the ability to craft a dispatch mechanism that precisely models whatever you might look to accomplish by their specialization.
Protocols and Interfaces, on one hand, are also involved with sone of these same matters of specializations and dispatch, but they work and are used in a very different manner. These are facilities that follow a convention wherein single dispatch provides only a straightforward mapping of one specialized implementation for a given class. The power of protocols and interfaces is in their typical use to define some group of abstract capabilities that, when taken together, fully specify the API for thus concept. For example, a "pointer" interface might contain the 3 or 4 concepts that represent the notion of what a pointer is. So the general interface of a pointer might look like REFERENCE, DEREFERENCE, ALLOCATE, and DISPOSE. Thus the power of an interface comes from its composition of a group of related definitions that, together, express a compete abstraction -- when implementing an interface in a specific situation, it is normally an all-or-nothing endeavor. Either all four of those functions are present, or whatever this thing us does not represent our definition of pointer.
Hope this helped a little.
Dan Lentz

Achieving polymorphism in functional programming

I'm currently enjoying the transition from an object oriented language to a functional language. It's a breath of fresh air, and I'm finding myself much more productive than before.
However - there is one aspect of OOP that I've not yet seen a satisfactory answer for on the FP side, and that is polymorphism. i.e. I have a large collection of data items, which need to be processed in quite different ways when they are passed into certain functions. For the sake of argument, let's say that there are multiple factors driving polymorphic behaviour so potentially exponentially many different behaviour combinations.
In OOP that can be handled relatively well using polymorphism: either through composition+inheritance or a prototype-based approach.
In FP I'm a bit stuck between:
Writing or composing pure functions that effectively implement polymorphic behaviours by branching on the value of each data item - feels rather like assembling a huge conditional or even simulating a virtual method table!
Putting functions inside pure data structures in a prototype-like fashion - this seems like it works but doesn't it also violate the idea of defining pure functions separately from data?
What are the recommended functional approaches for this kind of situation? Are there other good alternatives?
Putting functions inside pure data structures in a prototype-like fashion - this seems like it works but doesn't it also violate the idea of defining pure functions separately from data?
If virtual method dispatch is the way you want to approach the problem, this is a perfectly reasonable approach. As for separating functions from data, that is a distinctly non-functional notion to begin with. I consider the fundamental principle of functional programming to be that functions ARE data. And as for your feeling that you're simulating a virtual function, I would argue that it's not a simulation at all. It IS a virtual function table, and that's perfectly OK.
Just because the language doesn't have OOP support built in doesn't mean it's not reasonable to apply the same design principles - it just means you'll have to write more of the machinery that other languages provide built-in, because you're fighting against the natural spirit of the language you're using. Modern typed functional languages do have very deep support for polymorphism, but it's a very different approach to polymorphism.
Polymorphism in OOP is a lot like "existential quantification" in logic - a polymorphic value has SOME run-time type but you don't know what it is. In many functional programming languages, polymorphism is more like "universal quantification" - a polymorphic value can be instantiated to ANY compatible type its user wants. They're two sides of the exact same coin (in particular, they swap places depending on whether you're looking at a function from the "inside" or the "outside"), but it turns out to be extremely hard when designing a language to "make the coin fair", especially in the presence of other language features such as subtyping or higher-kinded polymorphism (polymorphism over polymorphic types).
If it helps, you may want to think of polymorphism in functional languages as something very much like "generics" in C# or Java, because that's exactly the type of polymorphism that, e.g., ML and Haskell, favor.
Well, in Haskell you can always make a type-class to achieve a kind of polymorphism. Basically, it is defining functions that are processed for different types. Examples are the classes Eq and Show:
data Foo = Bar | Baz
instance Show Foo where
show Bar = 'bar'
show Baz = 'baz'
main = putStrLn $ show Bar
The function show :: (Show a) => a -> String is defined for every data type that instances the typeclass Show. The compiler finds the correct function for you, depending on the type.
This allows to define functions more generally, for example:
compare a b = a < b
will work with any type of the typeclass Ord. This is not exactly like OOP, but you even may inherit typeclasses like so:
class (Show a) => Combinator a where
combine :: a -> a -> String
It is up to the instance to define the actual function, you only define the type - similar to virtual functions.
This is not complete, and as far as I know, many FP languages do not feature type classes. OCaml does not, it pushes that over to its OOP part. And Scheme does not have any types. But in Haskell it is a powerful way to achieve a kind of polymorphism, within limits.
To go even further, newer extensions of the 2010 standard allow type families and suchlike.
Hope this helped you a bit.
Who said
defining pure functions separately from data
is best practice?
If you want polymorphic objects, you need objects. In a functional language, objects can be constructed by glueing together a set of "pure data" with a set of "pure functions" operating on that data. This works even without the concept of a class. In this sense, a class is nothing but a piece of code that constructs objects with the same set of associated "pure functions".
And polymorphic objects are constructed by replacing some of those functions of an object by different functions with the same signature.
If you want to learn more about how to implement objects in a functional language (like Scheme), have a look into this book:
Abelson / Sussman: "Structure and Interpration of Computer programs"
Mike, both your approaches are perfectly acceptable, and the pros and cons of each are discussed, as Doc Brown says, in Chapter 2 of SICP. The first suffers from having a big type table somewhere, which needs to be maintained. The second is just traditional single-dispatch polymorphism/virtual function tables.
The reason that scheme doesn't have a built-in system is that using the wrong object system for the problem leads to all sorts of trouble, so if you're the language designer, which to choose? Single despatch single inheritance won't deal well with 'multiple factors driving polymorphic behaviour so potentially exponentially many different behaviour combinations.'
To synopsize, there are many ways of constructing objects, and scheme, the language discussed in SICP, just gives you a basic toolkit from which you can construct the one you need.
In a real scheme program, you'd build your object system by hand and then hide the associated boilerplate with macros.
In clojure you actually have a prebuilt object/dispatch system built in with multimethods, and one of its advantages over the traditional approach is that it can dispatch on the types of all arguments. You can (apparently) also use the heirarchy system to give you inheritance-like features, although I've never used it, so you should take that cum grano salis.
But if you need something different from the object scheme chosen by the language designer, you can just make one (or several) that suits.
That's effectively what you're proposing above.
Build what you need, get it all working, hide the details with macros.
The argument between FP and OO is not about whether data abstraction is bad, it's about whether the data abstraction system is the place to stuff all the separate concerns of the program.
"I believe that a programming language should allow one to define new data types. I do not believe that a program should consist solely of definitions of new data types."
http://www.haskell.org/haskellwiki/OOP_vs_type_classes#Everything_is_an_object.3F nicely discusses some solutions.

Object Oriented Programming beyond just methods?

I have a very limited understanding of OOP.
I've been programming in .Net for a year or so, but I'm completely self taught so some of the uses of the finer points of OOP are lost on me.
Encapsulation, inheritance, abstraction, etc. I know what they mean (superficially), but what are their uses?
I've only ever used OOP for putting reusable code into methods, but I know I am missing out on a lot of functionality.
Even classes -- I've only made an actual class two or three times. Rather, I typically just include all of my methods with the MainForm.
OOP is way too involved to explain in a StackOverflow answer, but the main thrust is as follows:
Procedural programming is about writing code that performs actions on data. Object-oriented programming is about creating data that performs actions on itself.
In procedural programming, you have functions and you have data. The data is structured but passive and you write functions that perform actions on the data and resources.
In object-oriented programming, data and resources are represented by objects that have properties and methods. Here, the data is no longer passive: method is a means of instructing the data or resource to perform some action on itself.
The reason that this distinction matters is that in procedural programming, any data can be inspected or modified in any arbitrary way by any part of the program. You have to watch out for unexpected interactions between different functions that touch the same data, and you have to modify a whole lot of code if you choose to change how the data is stored or organized.
But in object-oriented programming, when encapsulation is used properly, no code except that inside the object needs to know (and thus won't become dependent on) how the data object stores its properties or mutates itself. This helps greatly to modularize your code because each object now has a well-defined interface, and so long as it continues to support that interface and other objects and free functions use it through that interface, the internal workings can be modified without risk.
Additionally, the concepts of objects, along with the use of inheritance and composition, allow you to model your data structurally in your code. If you need to have data that represents an employee, you create an Employee class. If you need to work with a printer resource, you create a Printer class. If you need to draw pushbuttons on a dialog, you create a Button class. This way, not only do you achieve greater modularization, but your modules reflect a useful model of whatever real-world things your program is supposed to be working with.
You can try this: http://homepage.mac.com/s_lott/books/oodesign.html It might help you see how to design objects.
You must go though this I can't create a clear picture of implementing OOP concepts, though I understand most of the OOP concepts. Why?
I had same scenario and I too is a self taught. I followed those steps and now I started getting a knowledge of implementation of OOP. I make my code in a more modular way better structured.
OOP can be used to model things in the real world that your application deals with. For example, a video game will probably have classes for the player, the badguys, NPCs, weapons, ammo, etc... anything that the system wants to deal with as a distinct entity.
Some links I just found that are intros to OOD:
http://accu.informika.ru/acornsig/public/articles/ood_intro.html
http://www.fincher.org/tips/General/SoftwareEngineering/ObjectOrientedDesign.shtml
http://www.softwaredesign.com/objects.html
Keeping it very brief: instead of doing operations on data a bunch of different places, you ask the object to do its thing, without caring how it does it.
Polymorphism: different objects can do different things but give them the same name, so that you can just ask any object (of a particular supertype) to do its thing by asking any object of that type to do that named operation.
I learned OOP using Turbo Pascal and found it immediately useful when I tried to model physical objects. Typical examples include a Circle object with fields for location and radius and methods for drawing, checking if a point is inside or outside, and other actions. I guess, you start thinking of classes as objects, and methods as verbs and actions. Procedural programming is like writing a script. It is often linear and it follows step by step what needs to be done. In OOP world you build an available repetoire of actions and tasks (like lego pieces), and use them to do what you want to do.
Inheritance is used common code should/can be used on multiple objects. You can easily go the other way and create way too many classes for what you need. If I am dealing with shapes do I really need two different classes for rectangles and squares, or can I use a common class with different values (fields).
Mastery comes with experience and practice. Once you start scratching your head on how to solve particular problems (especially when it comes to making your code usable again in the future), slowly you will gain the confidence to start including more and more OOP features into your code.
Good luck.

OOP - How to choose a possible object candidate?

I 'm concern about what techniques should I use to choose the right object in OOP
Is there any must-read book about OOP in terms of how to choose objects?
Best,
Just write something that gets the job done, even if it's ugly, then refactor continuously:
eliminate duplicate code (don't repeat yourself)
increase cohesion
reduce coupling
But:
don't over-engineer; keep it simple
don't write stuff you ain't gonna need
It's not a precise recipe, just some general guidelines. Keep practicing.
P.S.
Code objects are not related to tangible real-life objects; they are just constructs that hold related information together.
Don't believe what the Java books/schools teach about objects; they're lying.
You probably mean "the right class", rather than "the right object". :-)
There are a few techniques, such as text analysis (a.k.a. underlining the nouns) and Class Responsibility Collaborator (CRC).
With "underlining the nouns", you basically start with a written, natural language (i.e. plain English) description of the problem you want to solve and underline the nouns. That gives you a list of candidate classes. You will need to perform several passes to refine it into a list of classes to implement.
For CRC, check out the Wikipedia.
I suggest The OPEN Toolbox of Techniques for full reference.
Hope it helps.
I am assuming that there is understanding of what is sctruct, type, class, set, state, alphabet, scalar and vector and relationship.
Object is a noun, method is a verb. Object members can represent identity, state or scalar value per field. Relationships between objects usually are represented with references, where references are members of objects. In cases, when relationships are complex, multidirectional, have arity greater than 2, represent some sort of grouping or containment, then relationships can be expressed as objects.
For other, broader technical reasons objects are most likely the only way to represent any form of information in OOP languages.
I am adding a second answer due to demian's comment:
Sometimes the class is so obvious
because it's tangible, but other times
the concept of object it's to abstract
like a db connector.
That is true. My preferred approach is to perform a behavioural analysis of the system (using use cases, for example), and then derive system operations. Once you have a stable list of system operations (such as PrintDocument, SaveDocument, SpellCheck, MergeMail, etc. for a word processor) you need to assign each of them to a class. If you have developed a list of candidate classes with some of the techniques that I mentioned earlier, you will be able to allocate some of the operations. But some will remain unallocated. These will signal the need of more abstract or unintuitive classes, which you will need to make up, using your good judgment.
The whole method is documented in a white paper at www.openmetis.com.
You should check out Domain-Driven Design, by Eric Evans. It provides very useful concepts in thinking about the objects in your model, what their function are in the domain, and how they could be organized to work together. It's not a cookbook, and probably not a beginner book - but then, I read it at different stages of my career, and every time I found something valuable in it...
(source: domaindrivendesign.org)