Restructuring an OOP datatype into Haskell types - oop

Coming from an OOP background, Haskell's type system and the way data constructors and typeclasses interact is difficult to conceptualize. I can understand how each are used for simple examples, but some more complication examples of data structures that are very well-suited for an OOP style are proving non-trivial to translate into similarly elegant and understandable types.
In particular, I have a problem with organizing a hierarchy of data such as the following.
This is a deeply nested hierarchical inheritance structure, and the lack of support for subtyping makes it unclear how to turn this structure into a natural-feeling alternative in Haskell. It may be fine to replace something like Polygon with a sum data type, declaring it like
data Polygon
= Quad Point Point
| Triangle Point Point Point
| RegularNGon Int Radius
| ...
But this loses some of the structure, and can only really satisfactorily be done for one level of the hierarchy. Typeclasses can be used to implement a form of inheritance and substructure in that a Polygon typeclass could be a subclass of a Shape, and so maybe all Polygon instances have implementations for centroid :: Point and also vertices :: [Point], but this seems unsatisfactory. What would be a good way of capturing the structure of the picture in Haskell?

You can use sum types to represent the entire hierarchy, without losing structure. Something like this would do it:
data Shape = IsPoint Point
| IsLine Line
| IsPolygon Polygon
data Point = Point { x :: Int, y :: Int }
data Line = Line { a :: Point, b :: Point }
data Polygon = IsTriangle Triangle
| IsQuad Quad
| ...
And so on. The basic pattern is you translate each OO abstract class into a Haskell sum type, with each of its immediate OO subclasses (that may themselves be abstract) as variants in the sum type. The concrete classes are product/record types with the actual data members in them.1
The thing you lose compared to the OOP you're used to by modeling things this way isn't the ability to represent your hierarchy, but the ability to extend it without touching existing code. Sum types are "closed", where OO inheritance is "open". If you later decide that you want a Circle option for Shape, you have to add it to Shape and then add cases for it everywhere you pattern match on a Shape.
However, this kind of hierarchy probably requires fairly liberal downcasting in OO. For example, if you want a function that can tell if two shapes intersect that's probably an abstract method on Shape like Shape.intersects(Shape other), so each sub-type gets to write its own implementation. But when I'm writing Rectangle.intersects(Shape other) it's basically impossible generically, without knowing what other subclasses of Shape are out there. I'll have to be using isinstance checks to see what other actually is. But that actually means that I probably can't just add my new Circle subclass without revisiting existing code; an OO hierarchy where isinstance checks are needed is de-facto just as "closed" as the Haskell sum type hierarchy is. Basically pattern matching on one of the sum-types generated by applying this pattern is the equivalent of isinstancing and downcasting in the OO version. Only because the sum types are exhaustively known to the compiler (only possible because they're closed), if I do add a Circle case to Shape the compiler is able to tell me about all the places that I need to revisit to handle that case.2
If you have a hierarchy that doesn't need a lot of downcasting, it means that the various base classes have substantial and useful interfaces that they guarantee to be available, and you usually use things through that interface rather than switching on what it could possibly be, then you can probably use type classes. You still need all the "leaf" data types (the product types with the actual data fields), only instead of adding sum type wrappers to group them up you add type classes for the common interface. If you can use this style of translation, then you can add new cases more easily (just add the new Circle data type, and an instance to say how it implements the Shape type class; all the places that are polymorphic in any type in the Shape class will now handle Circles as well). But if you're doing that in OO you always have downcasts available as an escape hatch when it turns out you can't handle shapes generically; with this design in Haskell it's impossible.3
But my "real" answer to "how do I represent OO type hierarchies in Haskell" is unfortunately the trite one: I don't. I design differently in Haskell than I do in OO languages4, and in practice it's just not a huge problem. But to say how I'd design this case differently, I'd have to know more about what you're using them for. For example you could do something like represent a shape as a Point -> Bool function (that tells you whether any given point is inside the shape), and having things like circle :: Point -> Int -> (Point -> Bool) for generating such functions corresponding to normal shapes; that representation is awesome for forming composite intersection/union shapes without knowing anything about them (intersect shapeA shapeB = \point -> shapeA point && shapeB point), but terrible for calculating things like areas and circumferences.
1 If you have abstract classes with data members, or you have concrete classes that also have further subclasses you can manually push the data members down into the "leaves", factor out the inherited data members into a shared record and make all of the "leaves" contain one of those, split a layer so that you have a product type containing the inherited data members and a sum type (where that sum type then "splits" into the options for the subclasses), stuff like that.
2 If you use catch-all patterns then the warning might not be exhaustive, so it's not always bullet proof, but how bullet proof it is is up to how you code.
3 Unless you opt into runtime type information with a solution like Typeable, but that's not an invisible change; your callers have to opt into it as well.
4 Actually I probably wouldn't design a hierarchy like this even in OO languages. I find it doesn't turn out to be as useful as you'd think in real programs, hence the "favour composition over inheritance" advice.

You may be looking for a Haskell equivalent of dynamic dispatch, such that you could store a heterogeneous list of values supporting distinct implementations of a common Shape interface.
Haskell's existential types support this kind of usage. It's fairly rare for a Haskell program to actually need existential types -- as Ben's answer demonstrates, sum types can handle this kind of problem. However, existential types are appropriate for a large, open-ended collection of cases:
{-# LANGUAGE ExistentialQuantification #-}
...
class Shape a where
bounds :: a -> AABB
draw :: a -> IO ()
data AnyShape = forall a. Shape a => AnyShape a
This lets you declare instances in an open-ended style:
data Line = Line Point Point
instance Shape Line where ...
data Circle= Circle {center :: Point, radius :: Double}
instance Shape Circle where ...
...
Then, you can build your heterogeneous list:
shapes = [AnyShape(Line a b),
AnyShape(Circle a 3.0),
AnyShape(Circle b 1.8)]
and use it in a uniform way:
drawIn box xs = sequence_ [draw s | AnyShape s <- xs, bounds s `hits` box]
Note that you need to unwrap your AnyShape in order to use the class Shape interface functions. Also note that you must use the class functions to access your heterogeneous data -- there is no other way to "downcast" the unwrapped existential value s! Its type only makes sense within the local scope, so the compiler will not let it escape.
If you are trying to use existential types, yet find yourself needing to "downcast" them, sum types might be a better fit.

Related

Standard ML: Datatype vs. Structure

I'm reading through Paulson's ML For the Working Programmer and am a bit confused about the distinction between datatypes and structures.
On p. 142, he defines a type for binary trees as follows:
datatype 'a tree = Lf
| Br of 'a * 'a tree * 'a tree;
This seems to be a recursive definition where 'a denotes some fixed type. So any time I see 'a, it must refer to the same type throughout.
On p. 148, he discusses a structure for binary trees:
"...we have been following an imaginary ML session in which we typed in the tree functions one at a time. Now we ought to collect the most important of those functions into a structure, called Tree. We really must do so, because one of our functions (size) clashes with a built-in function. One reason for using structures is to prevent such name clashes.
We shall, however, leave the datatype declaration of tree outside of the structure. If it were inside, we should be forced to refer to the constructors by Tree.Lf and Tree.Br, which would make our patters unreadable. Thus, in the sequel, imagine that we have made the following declarations:
datatype 'a tree = Lf
| Br of 'a * 'a tree * 'a tree;
structure Tree =
struct
fun size Lf = 0
| size (Br( v, t1, t2)) = 1 + size t1 + size t2;
fun depth...
etc...
end;
I'm a little confused.
1) What is the relationship between a datatype and a structure?
2) What is the role of "struct" within the structure definition?
3) Later on, Paulson discusses a structure for dictionaries as binary search trees. He does the following:
structure Dict : DICTIONARY =
struct
type key = string;
type 'a t = (key * 'a) tree;
val empty = Lf;
<a bunch of functions for dictionaries>
This makes me think struct specifies the different primitive or compound types involved int he definition of a Dict.
That's a really fuzzy definition though. Anyone like to clarify?
Thanks for the help,
bclayman
A structure is a module. Everything between the struct and end keywords forms the body of this module. Similarly, you can view a signature as the description of an abstract module interface. Ascribing a signature to a structure (like the : DICTIONARY syntax does in your example) limits the exports of the module to what is specified in that signature (by default, everything would be accessible). That allows you to hide implementation details of a module.
However, ML modules are much richer than that. They can be arbitrarily nested. There are also functors, which are effectively functions from modules to modules ("parameterised modules", if you want). Altogether, the module language in ML forms a full functional language on its own, with structures as the basic entities, functors over them, and signatures describing the "types" of such modules. This little language is a layer on top of the so-called core language, where ordinary values and types live.
So, to answer your individual questions:
1) There is no specific relationship between the datatype and the structure. The latter simply uses the former.
2) struct-end is simply a keyword pair to delimit the structure body (languages in C tradition would probably use curly braces there).
3) As explained above, a structure is a basic module. It can contain (and export) arbitrary other language entities, including other modules. By grouping definitions together, and potentially hiding some of them through a signature ascription, you can express namespacing and encapsulation (in particular, abstract data types).
I should also note that Paulson's book is outdated regarding its description of modules, as it predates the current language version. In particular, it does not describe how to express abstract data types through modules, but instead introduces the obsolete abstype declaration which nobody has been using in almost 20 years. A more extensive and up-to-date introduction to modular programming in ML can be found in Harper's Programming in Standard ML.
In this example, the datatype 'a tree is describing a binary tree (https://en.wikipedia.org/wiki/Binary_tree) that is capable of storing any value of a single type. The 'a in the definition is a variant type which will later be constrained down to a concrete type wherever tree is used with a different type. This allows you to define the structure of a tree once and then use it with any type later on.
The Tree structure is separate from the datatype definition. It is being used to group functions together that operate on the 'a tree datatype. It is being used right now as a way to modularize the code and, as it points out, to prevent namespace clashes.
struct is just an identifier keyword to let the compiler know where your structure definition starts while the end keyword is used to let the compiler know where the definition ends.
The dictionary structure is defining a dictionary (a key -> value data structure) that uses a tree as the internal data structure. Once again, the structure is a collection of functions that will be used to create and operate on dictionaries. The types within the dictionary structure compose the type of the internal data structure that makes up the dictionary. The following functions define the public interface that you're exposing to allow clients to work with dictionaries.

Why is type inference impractical for object oriented languages?

I'm currently researching ideas for a new programming language where ideally I would like the language to mix some functional and procedural (object oriented) concepts.
One of the things that I'm really fascinated about with languages like Haskell is that it's statically typed, but you do not have to annotate types (magic thanks to Hindley-Milner!).
I would really like this for my language, however after reading up on the subject it seems that most agree that type inference is impractical/impossible with subtyping/object orientation, however I have not understood why this is. I do not know F#, but I understand that it uses Hindley-Milner AND is object-oriented.
I would really like an explanation for this and preferably examples on scenarios where type inference is impossible for object oriented languages.
To add to seppk's response: with structural object types the problem he describes actually goes away (f could be given a polymorphic type like ∀A &leq; {x : Int, y : Int}. A → Int, or even just {x : Int, y : Int} → Int). However, type inference is still problematic.
The fundamental reason is this: in a language without subtyping, the typing rules impose equality constraints on types. These are very nice to deal with during type checking, because they can usually be simplified immediately using unification on the types. With subtyping, however, these constraints are generalised to inequality constraints. You cannot use unification any more, which has at least three unpleasant consequences:
The number and complexity of constraints explodes combinatorially.
The information you have to display to the user in case of errors is incomprehensible.
Certain forms of quantification can quickly become undecidable.
Thus, type inference for subtyping is not impossible (there have been many papers on the subject in the 90s), but it is not very practical.
A much simpler alternative is employed by OCaml, which uses so-called row polymorphism in place of subtyping. That is actually tractable.
When using nominal typing (that is a typing system where two classes whose members have the same name and the same types are not interchangeable), there would be many possible types for a method like the following:
let f(obj) =
obj.x + obj.y
Any class that has both a member x and a member y (of types that support the + operator) would qualify as a possible type for obj and the type inference algorithm would have no way of knowing which one is the one you want.
In F# the above code would need a type annotation. So F# has object orientation and type inference, but not at the same time (with the exception of local type inference (let myVar = expressionWhoseTypeIKNow), which always works).

What should I name a class whose sole purpose is procedural?

I have a lot to learn in the way of OO patterns and this is a problem I've come across over the years. I end up in situations where my classes' sole purpose is procedural, just basically wrapping a procedure up in a class. It doesn't seem like the right OO way to do things, and I wonder if someone is experienced with this problem enough to help me consider it in a different way. My specific example in the current application follows.
In my application I'm taking a set of points from engineering survey equipment and normalizing them to be used elsewhere in the program. By "normalize" I mean a set of transformations of the full data set until a destination orientation is reached.
Each transformation procedure will take the input of an array of points (i.e. of the form class point { float x; float y; float z; }) and return an array of the same length but with different values. For example, a transformation like point[] RotateXY(point[] inList, float angle). The other kind of procedure wold be of the analysis type, used to supplement the normalization process and decide what transformation to do next. This type of procedure takes in the same points as a parameter but returns a different kind of dataset.
My question is, what is a good pattern to use in this situation? The one I was about to code in was a Normalization class which inherits class types of RotationXY for instance. But RotationXY's sole purpose is to rotate the points, so it would basically be implementing a single function. This doesn't seem very nice, though, for the reasons I mentioned in the first paragraph.
Thanks in advance!
The most common/natural approach for finding candidate classes in your problem domain is to look for nouns and then scan for the verbs/actions associated with those nouns to find the behavior that each class should implement. While this is generally a good advise, it doesn't mean that your objects must only represent concrete elements. When processes (which are generally modeled as methods) start to grow and become complex, it is a good practice to model them as objects. So, if your transformation has a weight on its own, it is ok to model it as an object and do something like:
class RotateXY
{
public function apply(point p)
{
//Apply the transformation
}
}
t = new RotateXY();
newPoint = t->apply(oldPoint);
in case you have many transformations you can create a polymorphic hierarchy and even chain one transformation after another. If you want to dig a bit deeper you can also take a look at the Command design pattern, which closely relates to this.
Some final comments:
If it fits your case, it is a good idea to model the transformation at the point level and then apply it to a collection of points. In that way you can properly isolate the transformation concept and is also easier to write test cases. You can later even create a Composite of transformations if you need.
I generally don't like the Utils (or similar) classes with a bunch of static methods, since in most of the cases it means that your model is missing the abstraction that should carry that behavior.
HTH
Typically, when it comes to classes that contain only static methods, I name them Util, e.g. DbUtil for facading DB access, FileUtil for file I/O etc. So find some term that all your methods have in common and name it that Util. Maybe in your case GeometryUtil or something along those lines.
Since the particulars of the transformations you apply seem ad-hoc for the problem and possibly prone to change in the future you could code them in a configuration file.
The point's client would read from the file and know what to do. As for the rotation or any other transformation method, they could go well as part of the Point class.
I see nothing particularly wrong with classes/interfaces having just essentially one member.
In your case the member is an "Operation with some arguments of one type that returns same type" - common for some math/functional problems. You may find convenient to have interface/base class and helper methods that combine multiple transformation classes together into more complex transformation.
Alternative approach: if you language support it is just go functional style altogether (similar to LINQ in C#).
On functional style suggestion: I's start with following basic functions (probably just find them in standard libraries for the language)
collection = map(collection, perItemFunction) to transform all items in a collection (Select in C#)
item = reduce (collection, agregateFunction) to reduce all items into single entity (Aggregate in C#)
combine 2 functions on item funcOnItem = combine(funcFirst, funcSecond). Can be expressed as lambda in C# Func<T,T> combined = x => second(first(x)).
"bind"/curry - fix one of arguments of a function functionOfOneArg = curry(funcOfArgs, fixedFirstArg). Can be expressed in C# as lambda Func<T,T> curried = x => funcOfTwoArg(fixedFirstArg, x).
This list will let you do something like "turn all points in collection on a over X axis by 10 and shift Y by 15": map(points, combine(curry(rotateX, 10), curry(shiftY(15))).
The syntax will depend on language. I.e. in JavaScript you just pass functions (and map/reduce are part of language already), C# - lambda and Func classes (like on argument function - Func<T,R>) are an option. In some languages you have to explicitly use class/interface to represent a "function" object.
Alternative approach: If you actually dealing with points and transformation another traditional approach is to use Matrix to represent all linear operations (if your language supports custom operators you get very natural looking code).

Efficient way to define a class with multiple, optionally-empty slots in S4 of R?

I am building a package to handle data that arrives with up to 4 different types. Each of these types is a legitimate class in the form of a matrix, data.frame or tree. Depending on the way the data is processed and other experimental factors, some of these data components may be missing, but it is still extremely useful to be able to store this information as an instance of a special class and have methods that recognize the different component data.
Approach 1:
I have experimented with an incremental inheritance structure that looks like a nested tree, where each combination of data types has its own class explicitly defined. This seems difficult to extend for additional data types in the future, and is also challenging for new developers to learn all the class names, however well-organized those names might be.
Approach 2:
A second approach is to create a single "master-class" that includes a slot for all 4 data types. In order to allow the slots to be NULL for the instances of missing data, it appears necessary to first define a virtual class union between the NULL class and the new data type class, and then use the virtual class union as the expected class for the relevant slot in the master-class. Here is an example (assuming each data type class is already defined):
################################################################################
# Use setClassUnion to define the unholy NULL-data union as a virtual class.
################################################################################
setClassUnion("dataClass1OrNULL", c("dataClass1", "NULL"))
setClassUnion("dataClass2OrNULL", c("dataClass2", "NULL"))
setClassUnion("dataClass3OrNULL", c("dataClass3", "NULL"))
setClassUnion("dataClass4OrNULL", c("dataClass4", "NULL"))
################################################################################
# Now define the master class with all 4 slots, and
# also the possibility of empty (NULL) slots and an explicity prototype for
# slots to be set to NULL if they are not provided at instantiation.
################################################################################
setClass(Class="theMasterClass",
representation=representation(
slot1="dataClass1OrNULL",
slot2="dataClass2OrNULL",
slot3="dataClass3OrNULL",
slot4="dataClass4OrNULL"),
prototype=prototype(slot1=NULL, slot2=NULL, slot3=NULL, slot4=NULL)
)
################################################################################
So the question might be rephrased as:
Are there more efficient and/or flexible alternatives to either of these approaches?
This example is modified from an answer to a SO question about setting the default value of slot to NULL. This question differs in that I am interested in knowing the best options in R for creating classes with slots that can be empty if needed, despite requiring a specific complex class in all other non-empty cases.
In my opinion...
Approach 2
It sort of defeats the purpose to adopt a formal class system, and then to create a class that contains ill-defined slots ('A' or NULL). At a minimum I would try to make DataClass1 have a 'NULL'-like default. As a simple example, the default here is a zero-length numeric vector.
setClass("DataClass1", representation=representation(x="numeric"))
DataClass1 <- function(x=numeric(), ...) {
new("DataClass1", x=x, ...)
}
Then
setClass("MasterClass1", representation=representation(dataClass1="DataClass1"))
MasterClass1 <- function(dataClass1=DataClass1(), ...) {
new("MasterClass1", dataClass1=dataClass1, ...)
}
One benefit of this is that methods don't have to test whether the instance in the slot is NULL or 'DataClass1'
setMethod(length, "DataClass1", function(x) length(x#x))
setMethod(length, "MasterClass1", function(x) length(x#dataClass1))
> length(MasterClass1())
[1] 0
> length(MasterClass1(DataClass1(1:5)))
[1] 5
In response to your comment about warning users when they access 'empty' slots, and remembering that users usually want functions to do something rather than tell them they're doing something wrong, I'd probably return the empty object DataClass1() which accurately reflects the state of the object. Maybe a show method would provide an overview that reinforced the status of the slot -- DataClass1: none. This seems particularly appropriate if MasterClass1 represents a way of coordinating several different analyses, of which the user may do only some.
A limitation of this approach (or your Approach 2) is that you don't get method dispatch -- you can't write methods that are appropriate only for an instance with DataClass1 instances that have non-zero length, and are forced to do some sort of manual dispatch (e.g., with if or switch). This might seem like a limitation for the developer, but it also applies to the user -- the user doesn't get a sense of which operations are uniquely appropriate to instances of MasterClass1 that have non-zero length DataClass1 instances.
Approach 1
When you say that the names of the classes in the hierarchy are going to be confusing to your user, it seems like this is maybe pointing to a more fundamental issue -- you're trying too hard to make a comprehensive representation of data types; a user will never be able to keep track of ClassWithMatrixDataFrameAndTree because it doesn't represent the way they view the data. This is maybe an opportunity to scale back your ambitions to really tackle only the most prominent parts of the area you're investigating. Or perhaps an opportunity to re-think how the user might think of and interact with the data they've collected, and to use the separation of interface (what the user sees) from implementation (how you've chosen to represent the data in classes) provided by class systems to more effectively encapsulate what the user is likely to do.
Putting the naming and number of classes aside, when you say "difficult to extend for additional data types in the future" it makes me wonder if perhaps some of the nuances of S4 classes are tripping you up? The short solution is to avoid writing your own initialize methods, and rely on the constructors to do the tricky work, along the lines of
setClass("A", representation(x="numeric"))
setClass("B", representation(y="numeric"), contains="A")
A <- function(x = numeric(), ...) new("A", x=x, ...)
B <- function(a = A(), y = numeric(), ...) new("B", a, y=y, ...)
and then
> B(A(1:5), 10)
An object of class "B"
Slot "y":
[1] 10
Slot "x":
[1] 1 2 3 4 5

How to model class hierarchies in Haskell?

I am a C# developer. Coming from OO side of the world, I start with thinking in terms of interfaces, classes and type hierarchies. Because of lack of OO in Haskell, sometimes I find myself stuck and I cannot think of a way to model certain problems with Haskell.
How to model, in Haskell, real world situations involving class hierarchies such as the one shown here: http://www.braindelay.com/danielbray/endangered-object-oriented-programming/isHierarchy-4.gif
First of all: Standard OO design is not going to work nicely in Haskell. You can fight the language and try to make something similar, but it will be an exercise in frustration. So step one is look for Haskell-style solutions to your problem instead of looking for ways to write an OOP-style solution in Haskell.
But that's easier said than done! Where to even start?
So, let's disassemble the gritty details of what OOP does for us, and think about how those might look in Haskell.
Objects: Roughly speaking, an object is the combination of some data with methods operating on that data. In Haskell, data is normally structured using algebraic data types; methods can be thought of as functions taking the object's data as an initial, implicit argument.
Encapsulation: However, the ability to inspect an object's data is usually limited to its own methods. In Haskell, there are various ways to hide a piece of data, two examples are:
Define the data type in a separate module that doesn't export the type's constructors. Only functions in that module can inspect or create values of that type. This is somewhat comparable to protected or internal members.
Use partial application. Consider the function map with its arguments flipped. If you apply it to a list of Ints, you'll get a function of type (Int -> b) -> [b]. The list you gave it is still "there", in a sense, but nothing else can use it except through the function. This is comparable to private members, and the original function that's being partially applied is comparable to an OOP-style constructor.
"Ad-hoc" polymorphism: Often, in OO programming we only care that something implements a method; when we call it, the specific method called is determined based on the actual type. Haskell provides type classes for compile-time function overloading, which are in many ways more flexible than what's found in OOP languages.
Code reuse: Honestly, my opinion is that code reuse via inheritance was and is a mistake. Mix-ins as found in something like Ruby strike me as a better OO solution. At any rate, in any functional language, the standard approach is to factor out common behavior using higher-order functions, then specialize the general-purpose form. A classic example here are fold functions, which generalize almost all iterative loops, list transformations, and linearly recursive functions.
Interfaces: Depending on how you're using an interface, there are different options:
To decouple implementation: Polymorphic functions with type class constraints are what you want here. For example, the function sort has type (Ord a) => [a] -> [a]; it's completely decoupled from the details of the type you give it other than it must be a list of some type implementing Ord.
Working with multiple types with a shared interface: For this you need either a language extension for existential types, or to keep it simple, use some variation on partial application as above--instead of values and functions you can apply to them, apply the functions ahead of time and work with the results.
Subtyping, a.k.a. the "is-a" relationship: This is where you're mostly out of luck. But--speaking from experience, having been a professional C# developer for years--cases where you really need subtyping aren't terribly common. Instead, think about the above, and what behavior you're trying to capture with the subtyping relationship.
You might also find this blog post helpful; it gives a quick summary of what you'd use in Haskell to solve the same problems that some standard Design Patterns are often used for in OOP.
As a final addendum, as a C# programmer, you might find it interesting to research the connections between it and Haskell. Quite a few people responsible for C# are also Haskell programmers, and some recent additions to C# were heavily influenced by Haskell. Most notable is probably the monadic structure underlying LINQ, with IEnumerable being essentially the list monad.
Let's assume the following operations: Humans can speak, Dogs can bark, and all members of a species can mate with members of the same species if they have opposite gender. I would define this in haskell like this:
data Gender = Male | Female deriving Eq
class Species s where
gender :: s -> Gender
-- Returns true if s1 and s2 can conceive offspring
matable :: Species a => a -> a -> Bool
matable s1 s2 = gender s1 /= gender s2
data Human = Man | Woman
data Canine = Dog | Bitch
instance Species Human where
gender Man = Male
gender Woman = Female
instance Species Canine where
gender Dog = Male
gender Bitch = Female
bark Dog = "woof"
bark Bitch = "wow"
speak Man s = "The man says " ++ s
speak Woman s = "The woman says " ++ s
Now the operation matable has type Species s => s -> s -> Bool, bark has type Canine -> String and speak has type Human -> String -> String.
I don't know whether this helps, but given the rather abstract nature of the question, that's the best I could come up with.
Edit: In response to Daniel's comment:
A simple hierarchy for collections could look like this (ignoring already existing classes like Foldable and Functor):
class Foldable f where
fold :: (a -> b -> a) -> a -> f b -> a
class Foldable m => Collection m where
cmap :: (a -> b) -> m a -> m b
cfilter :: (a -> Bool) -> m a -> m a
class Indexable i where
atIndex :: i a -> Int -> a
instance Foldable [] where
fold = foldl
instance Collection [] where
cmap = map
cfilter = filter
instance Indexable [] where
atIndex = (!!)
sumOfEvenElements :: (Integral a, Collection c) => c a -> a
sumOfEvenElements c = fold (+) 0 (cfilter even c)
Now sumOfEvenElements takes any kind of collection of integrals and returns the sum of all even elements of that collection.
Instead of classes and objects, Haskell uses abstract data types. These are really two compatible views on the problem of organizing ways of constructing and observing information. The best help I know of on this subject is William Cook's essay Object-Oriented Programming Versus Abstract Data Types. He has some very clear explanations to the effect that
In a class-based system, code is organized around different ways of constructing abstractions. Generally each different way of constructing an abstraction is assigned its own class. The methods know how to observe properties of that construction only.
In an ADT-based system (like Haskell), code is organized around different ways of observing abstractions. Generally each different way of observing an abstraction is assigned its own function. The function knows all the ways the abstraction could be constructed, and it knows how to observe a single property, but of any construction.
Cook's paper will show you a nice matrix layout of abstractions and teach you how to organize any class as an ADY or vice versa.
Class hierarchies involve one more element: the reuse of implementations through inheritance. In Haskell, such reuse is achieved through first-class functions instead: a function in a Primate abstraction is a value and an implementation of the Human abstraction can reuse any functions of the Primate abstraction, can wrap them to modify their results, and so on.
There is not an exact fit between design with class hierarchies and design with abstract data types. If you try to transliterate from one to the other, you will wind up with something awkward and not idiomatic—kind of like a FORTRAN program written in Java.
But if you understand the principles of class hierarchies and the principles of abstract data types, you can take a solution to a problem in one style and craft a reasonably idiomatic solution to the same problem in the other style. It does take practice.
Addendum: It's also possible to use Haskell's type-class system to try to emulate class hierarchies, but that's a different kettle of fish. Type classes are similar enough to ordinary classes that a number of standard examples work, but they are different enough that there can also be some very big surprises and misfits. While type classes are an invaluable tool for a Haskell programmer, I would recommend that anyone learning Haskell learn to design programs using abstract data types.
Haskell is my favorite language, is a pure functional language.
It does not have side effects, there is no assignment.
If you find to hard the transition to this language, maybe F# is a better place to start with functional programming. F# is not pure.
Objects encapsulate states, there is a way to achieve this in Haskell, but this is one of the issues that takes more time to learn because you must learn some category theory concepts to deeply understand monads. There is syntactic sugar that lets you see monads like non destructive assignment, but in my opinion it is better to spend more time understanding the basis of category theory (the notion of category) to get a better understanding.
Before trying to program in OO style in Haskell, you should ask yourself if you really use the object oriented style in C#, many programmers use OO languages, but their programs are written in the structured style.
The data declaration allows you to define data structures combining products (equivalent to structure in C language) and unions (equivalent to union in C), the deriving part o the declaration allows to inherit default methods.
A data type (data structure) belongs to a class if has an implementation of the set of methods in the class.
For example, if you can define a show :: a -> String method for your data type, then it belong to the class Show, you can define your data type as an instance of the Show class.
This is different of the use of class in some OO languages where it is used as a way to define structures + methods.
A data type is abstract if it is independent of it's implementation. You create, mutate, and destroy the object by an abstract interface, you do not need to know how it is implemented.
Abstraction is supported in Haskell, it is very easy to declare.
For example this code from the Haskell site:
data Tree a = Nil
| Node { left :: Tree a,
value :: a,
right :: Tree a }
declares the selectors left, value, right.
the constructors may be defined as follows if you want to add them to the export list in the module declaration:
node = Node
nil = Nil
Modules are build in a similar way as in Modula. Here is another example from the same site:
module Stack (Stack, empty, isEmpty, push, top, pop) where
empty :: Stack a
isEmpty :: Stack a -> Bool
push :: a -> Stack a -> Stack a
top :: Stack a -> a
pop :: Stack a -> (a,Stack a)
newtype Stack a = StackImpl [a] -- opaque!
empty = StackImpl []
isEmpty (StackImpl s) = null s
push x (StackImpl s) = StackImpl (x:s)
top (StackImpl s) = head s
pop (StackImpl (s:ss)) = (s,StackImpl ss)
There is more to say about this subject, I hope this comment helps!