Object oriented programming in one way or another is very much possible in R. However, unlike for example Python, there are many ways to achieve object orientation:
The R.oo package
S3 and S4 classes
Reference classes
the proto package
My question is:
What major differences distinguish these ways of OO programming in R?
Ideally the answers here will serve as a reference for R programmers trying to decide which OO programming methods best suits their needs.
As such, I am asking for detail, presented in an objective manner, based on experience, and backed with facts and reference. Bonus points for clarifying how these methods map to standard OO practices.
S3 classes
Not really objects, more of a naming convention
Based around the . syntax: E.g. for print, print calls print.lm print.anova, etc. And if not found,print.default
S4 classes
Can dispatch on multiple arguments
More complicated to implement than S3
Reference classes
Primarily useful to avoid making copies of large objects (pass by reference)
Description of reasons to use RefClasses
proto
ggplot2 was originally written in proto, but will eventually be rewritten using S3.
Neat concept (prototypes, not classes), but seems tricky in practice
Next version of ggplot2 seems to be moving away from it
Description of the concept and implementation
R6 classes
By-reference
Does not depend on S4 classes
"Creating an R6 class is similar to the reference class, except that there’s no need to separate the fields and methods, and you can’t specify the types of the fields."
Edit on 3/8/12: The answer below responds to a piece of the originally posted question which has since been removed. I've copied it below, to provide context for my answer:
How do the different OO methods map to the more standard OO methods used in e.g. Java or Python?
My contribution relates to your second question, about how R's OO methods map to more standard OO methods. As I've thought about this in the past, I've returned again and again to two passages, one by Friedrich Leisch, and the other by John Chambers. Both do a good job of articulating why OO-like programming in R has a different flavor than in many other languages.
First, Friedrich Leisch, from "Creating R Packages: A Tutorial" (warning: PDF):
S is rare because it is both interactive and has a system for object-orientation. Designing classes clearly is programming, yet to make S useful as an interactive data analysis environment, it makes sense that it is a functional language. In "real" object-oriented programming (OOP) languages like C++ or Java class and method definitions are tightly bound together, methods are part of classes (and hence objects). We want incremental and interactive additions like user-defined methods for pre-defined classes. These additions can be made at any point in time, even on the fly at the command line prompt while we analyze a data set. S tries to make a compromise between object orientation and interactive use, and although compromises are never optimal with respect to all goals they try to reach, they often work surprisingly well in practice.
The other passage comes from John Chambers' superb book "Software for Data Analysis". (Link to quoted passage):
The OOP programming model differs from the S language in all but the first
point, even though S and some other functional languages support classes
and methods. Method definitions in an OOP system are local to the class;
there is no requirement that the same name for a method means the same
thing for an unrelated class. In contrast, method definitions in R do not
reside in a class definition; conceptually, they are associated with the generic
function. Class definitions enter in determining method selection, directly
or through inheritance. Programmers used to the OOP model are sometimes
frustrated or confused that their programming does not transfer to R directly,
but it cannot. The functional use of methods is more complicated but also
more attuned to having meaningful functions, and can't be reduced to the
OOP version.
S3 and S4 seem to be the official (i.e. built in) approaches for OO programming. I have begun using a combination of S3 with functions embedded in constructor function/method. My goal was to have a object$method() type syntax so that I have semi-private fields. I say semi-private because there is no way of really hiding them (as far as I know). Here is a simple example that doesn't actually do anything:
#' Constructor
EmailClass <- function(name, email) {
nc = list(
name = name,
email = email,
get = function(x) nc[[x]],
set = function(x, value) nc[[x]] <<- value,
props = list(),
history = list(),
getHistory = function() return(nc$history),
getNumMessagesSent = function() return(length(nc$history))
)
#Add a few more methods
nc$sendMail = function(to) {
cat(paste("Sending mail to", to, 'from', nc$email))
h <- nc$history
h[[(length(h)+1)]] <- list(to=to, timestamp=Sys.time())
assign('history', h, envir=nc)
}
nc$addProp = function(name, value) {
p <- nc$props
p[[name]] <- value
assign('props', p, envir=nc)
}
nc <- list2env(nc)
class(nc) <- "EmailClass"
return(nc)
}
#' Define S3 generic method for the print function.
print.EmailClass <- function(x) {
if(class(x) != "EmailClass") stop();
cat(paste(x$get("name"), "'s email address is ", x$get("email"), sep=''))
}
And some test code:
test <- EmailClass(name="Jason", "jason#bryer.org")
test$addProp('hello', 'world')
test$props
test
class(test)
str(test)
test$get("name")
test$get("email")
test$set("name", "Heather")
test$get("name")
test
test$sendMail("jbryer#excelsior.edu")
test$getHistory()
test$sendMail("test#domain.edu")
test$getNumMessagesSent()
test2 <- EmailClass("Nobody", "dontemailme#nowhere.com")
test2
test2$props
test2$getHistory()
test2$sendMail('nobody#exclesior.edu')
Here is a link to a blog post I wrote about this approach: http://bryer.org/2012/object-oriented-programming-in-r I would welcome comments, criticisms, and suggestions to this approach as I am not convinced myself if this is the best approach. However, for the problem I was trying to solve it has worked great. Specifically, for the makeR package (http://jbryer.github.com/makeR) I did not want users to change data fields directly because I needed to ensure that an XML file that represented my object's state would stay in sync. This worked perfectly as long as the users adhere to the rules I outline in the documentation.
Related
I visited so many sites about the differences between Procedular Oriented Programming and Object Oriented Programming , but I did not get Practical answer .
Everyone is saying theoritical answer .
Can anyone give Practical Explanation for this ?
Procedural programming is a list or set of instructions telling a computer what to do step by step and how to perform from the first code to the second code.
the best example of a procedural language is C
for e.g here is a python code for procedural programming (any code without oops):
x = int(input('enter a number: '))
def even_odd(x):
if x%2 == 0:
print('even')
else:
print('odd')
even_odd(x)
Object oriented programming is the style of programming which uses classes and objects to wrap your code and data which helps to use lesser code and at only one place.
every modern language uses oop
for e.g:
class test:
# your code here along with variables and functions
x = 'something' #some code
def test_func(): # some function
#your function code here
obj = test() #this is the object created for the above class which will be used to access the data inside a class
theoretically, as a real world example I think that even god also uses object oriented programming, maybe he first created a parent class called living things which contains exact same properties like exact 2 eyes, 2 hands, one mouth etc, and then he inherited more subclasses like human being, tiger, rat from the same parent class ;)
I have a lot to learn in the way of OO patterns and this is a problem I've come across over the years. I end up in situations where my classes' sole purpose is procedural, just basically wrapping a procedure up in a class. It doesn't seem like the right OO way to do things, and I wonder if someone is experienced with this problem enough to help me consider it in a different way. My specific example in the current application follows.
In my application I'm taking a set of points from engineering survey equipment and normalizing them to be used elsewhere in the program. By "normalize" I mean a set of transformations of the full data set until a destination orientation is reached.
Each transformation procedure will take the input of an array of points (i.e. of the form class point { float x; float y; float z; }) and return an array of the same length but with different values. For example, a transformation like point[] RotateXY(point[] inList, float angle). The other kind of procedure wold be of the analysis type, used to supplement the normalization process and decide what transformation to do next. This type of procedure takes in the same points as a parameter but returns a different kind of dataset.
My question is, what is a good pattern to use in this situation? The one I was about to code in was a Normalization class which inherits class types of RotationXY for instance. But RotationXY's sole purpose is to rotate the points, so it would basically be implementing a single function. This doesn't seem very nice, though, for the reasons I mentioned in the first paragraph.
Thanks in advance!
The most common/natural approach for finding candidate classes in your problem domain is to look for nouns and then scan for the verbs/actions associated with those nouns to find the behavior that each class should implement. While this is generally a good advise, it doesn't mean that your objects must only represent concrete elements. When processes (which are generally modeled as methods) start to grow and become complex, it is a good practice to model them as objects. So, if your transformation has a weight on its own, it is ok to model it as an object and do something like:
class RotateXY
{
public function apply(point p)
{
//Apply the transformation
}
}
t = new RotateXY();
newPoint = t->apply(oldPoint);
in case you have many transformations you can create a polymorphic hierarchy and even chain one transformation after another. If you want to dig a bit deeper you can also take a look at the Command design pattern, which closely relates to this.
Some final comments:
If it fits your case, it is a good idea to model the transformation at the point level and then apply it to a collection of points. In that way you can properly isolate the transformation concept and is also easier to write test cases. You can later even create a Composite of transformations if you need.
I generally don't like the Utils (or similar) classes with a bunch of static methods, since in most of the cases it means that your model is missing the abstraction that should carry that behavior.
HTH
Typically, when it comes to classes that contain only static methods, I name them Util, e.g. DbUtil for facading DB access, FileUtil for file I/O etc. So find some term that all your methods have in common and name it that Util. Maybe in your case GeometryUtil or something along those lines.
Since the particulars of the transformations you apply seem ad-hoc for the problem and possibly prone to change in the future you could code them in a configuration file.
The point's client would read from the file and know what to do. As for the rotation or any other transformation method, they could go well as part of the Point class.
I see nothing particularly wrong with classes/interfaces having just essentially one member.
In your case the member is an "Operation with some arguments of one type that returns same type" - common for some math/functional problems. You may find convenient to have interface/base class and helper methods that combine multiple transformation classes together into more complex transformation.
Alternative approach: if you language support it is just go functional style altogether (similar to LINQ in C#).
On functional style suggestion: I's start with following basic functions (probably just find them in standard libraries for the language)
collection = map(collection, perItemFunction) to transform all items in a collection (Select in C#)
item = reduce (collection, agregateFunction) to reduce all items into single entity (Aggregate in C#)
combine 2 functions on item funcOnItem = combine(funcFirst, funcSecond). Can be expressed as lambda in C# Func<T,T> combined = x => second(first(x)).
"bind"/curry - fix one of arguments of a function functionOfOneArg = curry(funcOfArgs, fixedFirstArg). Can be expressed in C# as lambda Func<T,T> curried = x => funcOfTwoArg(fixedFirstArg, x).
This list will let you do something like "turn all points in collection on a over X axis by 10 and shift Y by 15": map(points, combine(curry(rotateX, 10), curry(shiftY(15))).
The syntax will depend on language. I.e. in JavaScript you just pass functions (and map/reduce are part of language already), C# - lambda and Func classes (like on argument function - Func<T,R>) are an option. In some languages you have to explicitly use class/interface to represent a "function" object.
Alternative approach: If you actually dealing with points and transformation another traditional approach is to use Matrix to represent all linear operations (if your language supports custom operators you get very natural looking code).
I need to deal with a two objects of a class in a way that will return a third object of the same class, and I am trying to determine whether it is better to do this as an independent function that receives two objects and returns the third or as a method which would take one other object and return the third.
For a simple example. Would this:
from collections import namedtuple
class Point(namedtuple('Point', 'x y')):
__slots__ = ()
#Attached to class
def midpoint(self, otherpoint):
mx = (self.x + otherpoint.x) / 2.0
my = (self.y + otherpoint.y) / 2.0
return Point(mx, my)
a = Point(1.0, 2.0)
b = Point(2.0, 3.0)
print a.midpoint(b)
#Point(x=1.5, y=2.5)
Or this:
from collections import namedtuple
class Point(namedtuple('Point', 'x y')):
__slots__ = ()
#not attached to class
#takes two point objects
def midpoint(p1, p2):
mx = (p1.x + p2.x) / 2.0
my = (p1.y + p2.y) / 2.0
return Point(mx, my)
a = Point(1.0, 2.0)
b = Point(2.0, 3.0)
print midpoint(a, b)
#Point(x=1.5, y=2.5)
and why would one be preferred over the other?
This seems far less clear cut than I had expected when I asked the question.
In summary, it seems that something like a.midpoint(b) is not preferred since it seems to give a special place to one point or another in what is really a symmetric function that returns a completely new point instance. But it seems to be largely a matter of taste and style between something like a freestanding module function or a function attached to the class, but not meant to be called by the insance, such as Point.midpoint(a, b).
I think, personally, I stylistically lean towards free-standing module functions, but it may depend on the circumstances. In cases where the function is definitely tightly bound to the class and there is any risk of namespace pollution or potential confusion, then making a class function probably makes more sense.
Also, a couple of people mentioned making the function more general, perhaps by implementing additional features of the class to support this. In this particular case dealing with points and midpoints, that is probably the overall best approach. It supports polymorphism and code reuse and is highly readable. In a lot of cases though, that would not work (the project that inspired me to ask this for instance), but points and midpoints seemed like a concise and understandable example to illustrate the question.
Thank you all, it was enlightening.
The first approach is reasonable and isn't conceptually different from what set.union and set.intersection do. Any func(Point, Point) --> Point is clearly related to the Point class, so there is no question about interfering with the unity or cohesion of the class.
It would be a tougher choice if different classes were involved: draw_perpendicular(line, point) --> line. To resolve the choice of classes, you would pick the one that has the most related logic. For example, str.join needs a string delimiter and a list of strings. It could have been a standalone function (as it was in the old days with the string module), or it could be a method on lists (but it only works for lists of strings), or a method on strings. The latter was chosen because joining is more about strings than it is about lists. This choice was made eventhough it led to the arguably awkward expression delimiter.join(things_to_join).
I disagree with the other respondent who recommended using a classmethod. Those are often used for alternate constructor signatures but not for transformations on instances of the class. For example, datetime.fromordinal is a classmethod for constructing a date from something other than an instance of the class (in this case, an from an int). This contrasts with datetime.replace which is a regular method for making a new datetime instance based on an existing instance. This should steer you away from using classmethod for the midpoint computation.
One other thought: if you keep midpoint() with the Point() class, it makes it possible to create other classes that have the same Point API but a different internal representation (i.e. polar coordinates may be more convenient for some types of work than Cartesian coordinates). If midpoint() is a separate function you start to lose the benefits of encapsulation and of a coherent interface.
I would choose the second option because, in my opinion, it is clearer than the first. You are performing the midpoint operation between two points; not the midpoint operation with respect to a point. Similarly, a natural extension of this interface could be to define dot, cross, magnitude, average, median, etc. Some of those functions will operate on pairs of Points and others may operate on lists. Making it a function makes them all have consistent interfaces.
Defining it as a function also allows it to be used with any pair of objects that present a .x .y interface, while making it a method requires that at least one of the two is a Point.
Lastly, to address the location of the function, I believe it makes sense to co-locate it in the same package as the Point class. This places it in the same namespace, which clearly indicates its relationship with Point and, in my opinion, is more pythonic than a static or class method.
Update:
Further reading on the Pythonicness of #staticmethod vs package/module:
In both Thomas Wouter's answer to the question What is the difference between staticmethod and classmethod in Python and Mike Steder's answer to init and arguments in Python, the authors indicated that a package or module of related functions is perhaps a better solution. Thomas Wouter has this to say:
[staticmethod] is basically useless in Python -- you can just use a module function instead of a staticmethod.
While Mike Steder comments:
If you find yourself creating objects that consist of nothing but staticmethods the more pythonic thing to do would be to create a new module of related functions.
However, codeape rightly points out below that a calling convention of Point.midpoint(a,b) will co-locate the functionality with the type. The BDFL also seems to value #staticmethod as the __new__ method is a staticmethod.
My personal preference would be to use a function for the reasons cited above, but it appears that the choice between #staticmethod and a stand-alone function are largely in the eye of the beholder.
In this case you can use operator overloading:
from collections import namedtuple
class Point(namedtuple('Point', 'x y')):
__slots__ = ()
#Attached to class
def __add__(self, otherpoint):
mx = (self.x + otherpoint.x)
my = (self.y + otherpoint.y)
return Point(mx, my)
def __div__(self, scalar):
return Point(self.x/scalar, self.y/scalar)
a = Point(1.0, 2.0)
b = Point(2.0, 3.0)
def mid(a,b): # general function
return (a+b)/2
print mid(a,b)
I think the decision mostly depends on how general and abstract the function is. If you can write the function in a way that works on all objects that implement a small set of clean interfaces, then you can turn it into a separate function. The more interfaces your function depends on and the more specific they are, the more it makes sense to put it on the class (as instances of this class will most likely be the only objects the function will work with anyways).
Another option is to use a #classmethod. It is probably what I would prefer in this case.
class Point(...):
#classmethod
def midpoint(cls, p1, p2):
mx = (p1.x + p2.x) / 2.0
my = (p1.y + p2.y) / 2.0
return cls(mx, my)
# ...
print Point.midpoint(a, b)
I would choose version one, because this way all functionality for points is stored in the point class, i.e. grouping related functionality. Additionally, point objects know best about the meaning and inner workings of their data, so it's the right place to implement your function. An external function, for example in C++, would have to be a friend, which smells like a hack.
A different way of doing this is to access x and y through the namedtuple's subscript interface. You can then completely generalize the midpoint function to n dimensions.
class Point(namedtuple('Point', 'x y')):
__slots__ = ()
def midpoint(left, right):
return tuple([sum(a)/2. for a in zip(left, right)])
This design works for Point classes, n-tuples, lists of length n, etc. For example:
>>> midpoint(Point(0,0), Point(1,1))
(0.5, 0.5)
>>> midpoint(Point(5,1), (3, 2))
(4.0, 1.5)
>>> midpoint((1,2,3), (4,5,6))
(2.5, 3.5, 4.5)
Using online dictionary tools doesn't really help. I think the way encapsulate is use in computer science doesn't exactly match its meaning in plain English.
What is the antonym of computer science's version of encaspulate? More specifically, what is an antonym for encapsulate that would work as a function name.
Why should I care? Here's my motivation:
// A class with a private member variable;
class Private
{
public:
// Test will be able to access Private's private members;
class Test;
private:
int i;
}
// Make Test exactly like Private
class Private::Test : public Private
{
public:
// Make Private's copy of i available publicly in Test
using Private::i;
};
// A convenience function to quickly break encapsulation on a class to be tested.
// I don't have good name for what it does
Private::Test& foo( Private& p )
{ return *reinterpret_cast<Private::Test*>(&p); } // power cast
void unit_test()
{
Private p;
// using the function quickly grab access to p's internals.
// obviously it would be evil to use this anywhere except in unit tests.
assert( foo(p).i == 42 );
}
The antonym is "C".
Ok, just kidding. (Sort of.)
The best terms I can come up with are "expose" and "violate".
The purpose behind encapsulation is to hide/cover/protect. The antonym would be reveal/expose/make public.
How about Decapsulation..
Though it aint a computer science term, but in medical science, Surgical removal of a capsule or enveloping membrane.. Check out here..
"Removing/Breaking encapsulation" is about the closest thing I've seen, honestly.
If you think of the word in the English sense, to encapsulate means to enclose within something. But in the CS sense, there's this concept of protection levels and it looks like you want to imply circumventing the access levels as well, so something like "extraction" doesn't really convey the meaning you're looking for.
But if you just think of it in terms of what the access levels are, it looks like you're making something public so, how about "publicizing"?
This is not such a simple question - Scott Meyers had an interesting article to demonstrate some of the nuances around encapsulation here.
I'll start with the punchline: If
you're writing a function that can be
implemented as either a member or as a
non-friend non-member, you should
prefer to implement it as a non-member
function. That decision increases
class encapsulation. When you think
encapsulation, you should think
non-member functions.
How about "Bad Idea"?
The true antonym of "Encapsulation" is "Global State".
The general opposite of encapsulation is coupling and we often talk about systems that are tightly coupled or loosely coupled.
The reason you'd want components to be encapsulated is because it makes it easier to reason about how they work.
Take the analogy of trains: the consequence of coupling the railcars is that the driver must consider the characteristics (inertia, length) of the entire train.
Obviously, though, we couple systems because we need them to work together.
Inverted encapsulation and data structures
There's another term that I've been digging for, which is how I came across this question, that refers to a non-standard style of data structures.
The standard style of encapsulation is exemplified by Java's LinkedList; the actual nodes of the list are designed to be inaccessible to the consumer. The theory is that this is an implementation detail and can change to improve performance, while existing code will continue to run.
Another style is the classic functional cons-list. This is a singly linked list, and the idea is that it's so simple that there's nothing to improve about the data structure, e.g.
data [a] = [] | a : [a] deriving (Eq, Ord)
-- Haskellers then work directly with the list
-- There's nothing to hide because it's so simple
typicalHaskell :: [a] -> b
typicalHaskell [] = emptyValue
typicalHaskell h : t = h `doAThing` (typicalHaskell t)
That's the definition from Haskell's standard prelude though the report notes that isn't valid Haskell syntax, and in practice [a] is defined in the guts of the compiler.
Then there's what I'm calling an "inverted" data structure, but I'm still looking for the correct term. This is, I think, really the opposite of encapsulation.
A good example of this is Python's heapq module. The data structure here is a binary heap, but there isn't a Heap class. Rather, you get a collection of functions that operate on generic Python lists and you're responsible for using those methods correctly to ensure the heap invariants are maintained.
How about "spaghetti"?
I am a C# developer. Coming from OO side of the world, I start with thinking in terms of interfaces, classes and type hierarchies. Because of lack of OO in Haskell, sometimes I find myself stuck and I cannot think of a way to model certain problems with Haskell.
How to model, in Haskell, real world situations involving class hierarchies such as the one shown here: http://www.braindelay.com/danielbray/endangered-object-oriented-programming/isHierarchy-4.gif
First of all: Standard OO design is not going to work nicely in Haskell. You can fight the language and try to make something similar, but it will be an exercise in frustration. So step one is look for Haskell-style solutions to your problem instead of looking for ways to write an OOP-style solution in Haskell.
But that's easier said than done! Where to even start?
So, let's disassemble the gritty details of what OOP does for us, and think about how those might look in Haskell.
Objects: Roughly speaking, an object is the combination of some data with methods operating on that data. In Haskell, data is normally structured using algebraic data types; methods can be thought of as functions taking the object's data as an initial, implicit argument.
Encapsulation: However, the ability to inspect an object's data is usually limited to its own methods. In Haskell, there are various ways to hide a piece of data, two examples are:
Define the data type in a separate module that doesn't export the type's constructors. Only functions in that module can inspect or create values of that type. This is somewhat comparable to protected or internal members.
Use partial application. Consider the function map with its arguments flipped. If you apply it to a list of Ints, you'll get a function of type (Int -> b) -> [b]. The list you gave it is still "there", in a sense, but nothing else can use it except through the function. This is comparable to private members, and the original function that's being partially applied is comparable to an OOP-style constructor.
"Ad-hoc" polymorphism: Often, in OO programming we only care that something implements a method; when we call it, the specific method called is determined based on the actual type. Haskell provides type classes for compile-time function overloading, which are in many ways more flexible than what's found in OOP languages.
Code reuse: Honestly, my opinion is that code reuse via inheritance was and is a mistake. Mix-ins as found in something like Ruby strike me as a better OO solution. At any rate, in any functional language, the standard approach is to factor out common behavior using higher-order functions, then specialize the general-purpose form. A classic example here are fold functions, which generalize almost all iterative loops, list transformations, and linearly recursive functions.
Interfaces: Depending on how you're using an interface, there are different options:
To decouple implementation: Polymorphic functions with type class constraints are what you want here. For example, the function sort has type (Ord a) => [a] -> [a]; it's completely decoupled from the details of the type you give it other than it must be a list of some type implementing Ord.
Working with multiple types with a shared interface: For this you need either a language extension for existential types, or to keep it simple, use some variation on partial application as above--instead of values and functions you can apply to them, apply the functions ahead of time and work with the results.
Subtyping, a.k.a. the "is-a" relationship: This is where you're mostly out of luck. But--speaking from experience, having been a professional C# developer for years--cases where you really need subtyping aren't terribly common. Instead, think about the above, and what behavior you're trying to capture with the subtyping relationship.
You might also find this blog post helpful; it gives a quick summary of what you'd use in Haskell to solve the same problems that some standard Design Patterns are often used for in OOP.
As a final addendum, as a C# programmer, you might find it interesting to research the connections between it and Haskell. Quite a few people responsible for C# are also Haskell programmers, and some recent additions to C# were heavily influenced by Haskell. Most notable is probably the monadic structure underlying LINQ, with IEnumerable being essentially the list monad.
Let's assume the following operations: Humans can speak, Dogs can bark, and all members of a species can mate with members of the same species if they have opposite gender. I would define this in haskell like this:
data Gender = Male | Female deriving Eq
class Species s where
gender :: s -> Gender
-- Returns true if s1 and s2 can conceive offspring
matable :: Species a => a -> a -> Bool
matable s1 s2 = gender s1 /= gender s2
data Human = Man | Woman
data Canine = Dog | Bitch
instance Species Human where
gender Man = Male
gender Woman = Female
instance Species Canine where
gender Dog = Male
gender Bitch = Female
bark Dog = "woof"
bark Bitch = "wow"
speak Man s = "The man says " ++ s
speak Woman s = "The woman says " ++ s
Now the operation matable has type Species s => s -> s -> Bool, bark has type Canine -> String and speak has type Human -> String -> String.
I don't know whether this helps, but given the rather abstract nature of the question, that's the best I could come up with.
Edit: In response to Daniel's comment:
A simple hierarchy for collections could look like this (ignoring already existing classes like Foldable and Functor):
class Foldable f where
fold :: (a -> b -> a) -> a -> f b -> a
class Foldable m => Collection m where
cmap :: (a -> b) -> m a -> m b
cfilter :: (a -> Bool) -> m a -> m a
class Indexable i where
atIndex :: i a -> Int -> a
instance Foldable [] where
fold = foldl
instance Collection [] where
cmap = map
cfilter = filter
instance Indexable [] where
atIndex = (!!)
sumOfEvenElements :: (Integral a, Collection c) => c a -> a
sumOfEvenElements c = fold (+) 0 (cfilter even c)
Now sumOfEvenElements takes any kind of collection of integrals and returns the sum of all even elements of that collection.
Instead of classes and objects, Haskell uses abstract data types. These are really two compatible views on the problem of organizing ways of constructing and observing information. The best help I know of on this subject is William Cook's essay Object-Oriented Programming Versus Abstract Data Types. He has some very clear explanations to the effect that
In a class-based system, code is organized around different ways of constructing abstractions. Generally each different way of constructing an abstraction is assigned its own class. The methods know how to observe properties of that construction only.
In an ADT-based system (like Haskell), code is organized around different ways of observing abstractions. Generally each different way of observing an abstraction is assigned its own function. The function knows all the ways the abstraction could be constructed, and it knows how to observe a single property, but of any construction.
Cook's paper will show you a nice matrix layout of abstractions and teach you how to organize any class as an ADY or vice versa.
Class hierarchies involve one more element: the reuse of implementations through inheritance. In Haskell, such reuse is achieved through first-class functions instead: a function in a Primate abstraction is a value and an implementation of the Human abstraction can reuse any functions of the Primate abstraction, can wrap them to modify their results, and so on.
There is not an exact fit between design with class hierarchies and design with abstract data types. If you try to transliterate from one to the other, you will wind up with something awkward and not idiomatic—kind of like a FORTRAN program written in Java.
But if you understand the principles of class hierarchies and the principles of abstract data types, you can take a solution to a problem in one style and craft a reasonably idiomatic solution to the same problem in the other style. It does take practice.
Addendum: It's also possible to use Haskell's type-class system to try to emulate class hierarchies, but that's a different kettle of fish. Type classes are similar enough to ordinary classes that a number of standard examples work, but they are different enough that there can also be some very big surprises and misfits. While type classes are an invaluable tool for a Haskell programmer, I would recommend that anyone learning Haskell learn to design programs using abstract data types.
Haskell is my favorite language, is a pure functional language.
It does not have side effects, there is no assignment.
If you find to hard the transition to this language, maybe F# is a better place to start with functional programming. F# is not pure.
Objects encapsulate states, there is a way to achieve this in Haskell, but this is one of the issues that takes more time to learn because you must learn some category theory concepts to deeply understand monads. There is syntactic sugar that lets you see monads like non destructive assignment, but in my opinion it is better to spend more time understanding the basis of category theory (the notion of category) to get a better understanding.
Before trying to program in OO style in Haskell, you should ask yourself if you really use the object oriented style in C#, many programmers use OO languages, but their programs are written in the structured style.
The data declaration allows you to define data structures combining products (equivalent to structure in C language) and unions (equivalent to union in C), the deriving part o the declaration allows to inherit default methods.
A data type (data structure) belongs to a class if has an implementation of the set of methods in the class.
For example, if you can define a show :: a -> String method for your data type, then it belong to the class Show, you can define your data type as an instance of the Show class.
This is different of the use of class in some OO languages where it is used as a way to define structures + methods.
A data type is abstract if it is independent of it's implementation. You create, mutate, and destroy the object by an abstract interface, you do not need to know how it is implemented.
Abstraction is supported in Haskell, it is very easy to declare.
For example this code from the Haskell site:
data Tree a = Nil
| Node { left :: Tree a,
value :: a,
right :: Tree a }
declares the selectors left, value, right.
the constructors may be defined as follows if you want to add them to the export list in the module declaration:
node = Node
nil = Nil
Modules are build in a similar way as in Modula. Here is another example from the same site:
module Stack (Stack, empty, isEmpty, push, top, pop) where
empty :: Stack a
isEmpty :: Stack a -> Bool
push :: a -> Stack a -> Stack a
top :: Stack a -> a
pop :: Stack a -> (a,Stack a)
newtype Stack a = StackImpl [a] -- opaque!
empty = StackImpl []
isEmpty (StackImpl s) = null s
push x (StackImpl s) = StackImpl (x:s)
top (StackImpl s) = head s
pop (StackImpl (s:ss)) = (s,StackImpl ss)
There is more to say about this subject, I hope this comment helps!